Pick the best model. Route around the broken one.
A quality aware router that watches every call. When your primary degrades, by error, latency, or output quality, traffic spills to a backup in <120ms. Set it once; never page on call again.
- WHATAn auto failover layer between your code and every model provider.
- WHYQuality, latency and errors are watched live. Failover fires before pagerduty.
- HOWAdd
router.fallbackto any API call. The trace shows what fired.
Quality aware. Latency aware. Live.
The router doesn’t just retry on 5xx. It watches output quality, latency, and provider health, and reroutes traffic the moment any signal degrades. Set fallbacks once; we handle the chaos.
Quality aware fallback
Most routers retry on 5xx. Ours measures output drift in real time, perceptual hashing on images, audio fingerprinting on TTS, and reroutes the moment the signal degrades. The check lives on the call path; no separate ML pipeline.
Latency thresholds
Set a p95 threshold per provider. When one breaches, the router shifts traffic until it recovers, globally, per-tier, or per-experiment. Every failover lands in the trace with a reason.
Sticky cohorts
Variant assignment is sticky-by-user. The same user always lands on the same provider until the router has a reason to switch, keeping cohort data clean for A/B tests and tier-based pricing.
seedance-2.0 just degraded. wan-2.7 took over.
A real failover at 03:14 AM. User latency unchanged, no on call paged, traced end to end.
Three signals. One decision. Zero pages.
The router doesn’t just retry on 5xx. It watches latency, error rate, and output quality on every call, and reroutes the moment any of them slips, before pagerduty notices.
Three signals, watched live.
Latency p95 breach. 5xx error spike. Output quality drift. Any one pulls the trigger.
Score every candidate. Pick the best.
The router scores each fallback by recent health, latency, and quality. Highest score wins. Decision in <40ms.
Trace logs the swap. Users see nothing.
The decision and reason land in the trace. Sticky cohorts keep the user on the new provider until conditions change.
Reach for router when…
It's 3:14 AM and seedance-2.0 just died
Your on call hasn't slept in two days. The dashboard is red. Users are seeing broken videos. Router catches this 124ms in, before your pager fires, before users notice, before the team reaches Slack.
incident.caught = trueYour bill jumped 40% this week
Cost shifts when fallbacks fire, wan-2.7 might be 2× the cost of seedance-2.0, and you have no idea which calls switched. Router writes the served-provider on every trace; finance gets the answer in two clicks.
trace.served_by · per callp95 tripled overnight, no one knows why
A provider degraded silently, same status codes, slower outputs. Without router you'd be hunting in logs at 11 PM. With it, the failover already fired and the trace tells you which provider went bad and when.
p95: 540ms → 1.42s → spilledYou're shipping to 3 regions on day 30
Single-provider capacity has a ceiling, region throttles, daily limits, model deprecations land at the worst time. Router lets you shop across providers without rewriting a single call site.
regions: us, eu, apacOther products you’ll use alongside this.
FAQ
What is the each::labs router?
Stop writing retry loops. Start routing.
Router is included on every plan. Quality aware mode + custom fallback chains on Pro and up.