Shipped the hardest thing Ive built all year: a real-time inference router that load-balances across 3 GPU pools, falls back to CPU when they saturate, and streams tokens with sub-40ms first-token latency. Spent two weeks just on the backpressure logic so a slow client cant stall the whole queue. The trick was a per-connection token bucket plus a priority lane for short prompts. Demo + write-up dropping this week. Ask me anything about the architecture.