Fleet v2: 16 agents with smart model routing
Fleet v2 routes simple tasks to local Gemma (via Ollama) and complex reasoning to Claude Sonnet cloud. 16 agents reconfigured for the new routing model. Local-first with cloud fallback — matching model to task instead of pure-local or pure-cloud.
Fleet v2 shipped in v0.2 with a new routing architecture: match the model to the task instead of running everything on one model.
Simple tasks (status checks, FAQ responses, venue listing, basic search) route to local Gemma via Ollama. Zero cost, sub-second latency, data stays on-device. Complex reasoning (multi-tool conversations, booking coordination, personalized recommendations) routes to Claude Sonnet in the cloud.
The router decides based on message classification, user tier, and current system load. Free tier users get Haiku for everything. Pro tier users get Sonnet for complex queries and Haiku for simple ones. Enterprise tier gets Sonnet end-to-end.
16 fleet agents were reconfigured for the new routing model. Andy (user concierge), Sam (venue agent), and 14 specialized agents now declare their preferred model tier and the router respects it with graceful fallback.
The trade-off: pure local is cheap but limited. Pure cloud is capable but expensive. Smart routing captures the best of both. Early measurements show a 40% reduction in cloud LLM spend with no measurable quality degradation on simple tasks.
See also
Share this signal
Submit to
Public submit links. No API keys. Opens in a new tab with the title and URL pre-filled.