Stop paying Opus prices for Haiku work.
Most AI coding setups default to the most capable model for every task — including fixing a typo. That's a 4x cost penalty on work a cheaper model handles just as well.
Reuven Cohen put it plainly: the going rate for a single developer running Claude Code in swarm mode is ~$2,500/day. Not because the models are expensive — because the same context keeps getting resent, and because every task, regardless of complexity, gets routed to the heaviest model available.
Fixing a typo in an error message does not require Opus. Renaming a variable does not require Opus. Adding a log line does not require Opus. But if your tooling doesn't know that, it will happily bill you as if it does.
The routing table
Agent Booster's new route_model tool applies three signals to decide which model tier to recommend:
| Signal | Model |
|---|---|
| Complexity keyword: refactor, architect, design, migrate, security… | Opus |
| Task touches 5+ distinct files | Opus |
| Task touches 2–4 files | Sonnet |
| 1 file, fewer than 3 matching symbols | Haiku |
| Everything else | Sonnet (default) |
How it works
When you call route_model(task), Booster first checks the task description for complexity keywords. If none match, it runs a vector search across your indexed codebase to count how many distinct files are semantically relevant. Narrow task, few files, few symbols → Haiku. Broad cross-cutting task → Opus.
The result is a JSON object the calling agent can act on immediately:
# narrow task
{ "model": "haiku", "reason": "narrow task — 1 symbol in 1 file" }
# architecture change
{ "model": "opus", "reason": "complexity keyword: refactor" }
Using it from the CLI
You don't need to be inside an MCP session to use it. The CLI command works anywhere in your project:
# returns the recommended tier before you start
booster route "fix typo in error message"
haiku (narrow task — 1 symbol in 1 file)
booster route "refactor auth middleware to support OAuth2"
opus (complexity keyword: refactor)
Where this fits in the stack
Agent Booster operates at three compounding layers:
- 3AST + semantic routing — smart_read, search_context, get_symbols, route_model. Reduces what the model sees per call.
- 2RTK — strips noise from CLI, git, build, and test output before it enters the context window.
- 1Prompt caching — stable prefixes cached at 90% discount. Native to Claude Code and the Anthropic API.
route_model is the newest addition to Layer 3. It doesn't change what context is sent — it changes which model receives it. Combined with smart_read already cutting token volume 3–15x, routing to Haiku on narrow tasks compounds the savings further.
Get it
# install
pip install agent-booster
# index your codebase
booster index
# route a task
booster route "your task here"
The MCP tool is available automatically once you run booster init claude. No config changes needed — it shows up alongside get_symbols, search_context, and smart_read.
