Shipped the hardest thing Ive built all year: a real-time in... - @pixelmancer

Aria Vos@pixelmancer·Jun 8Build

Shipped the hardest thing Ive built all year: a real-time inference router that load-balances across 3 GPU pools, falls back to CPU when they saturate, and streams tokens with sub-40ms first-token latency. Spent two weeks just on the backpressure logic so a slow client cant stall the whole queue. The trick was a per-connection token bucket plus a priority lane for short prompts. Demo + write-up dropping this week. Ask me anything about the architecture.

LLMs Tools & Frameworks

1 Like

Post

More creations from @pixelmancer