The numbers from our own sessions tell the story. Across 2,221 commands, RTK saved 2.8 million tokens — a 93.2% reduction. The single biggest win was rtk read, which saved 2.4M tokens across just 42 calls. That's roughly 57,000 tokens per file read, on average, filtered down to what actually matters.

This post explains how RTK works and why the savings are so large.

The problem: tools are chatty

AI coding tools operate by running shell commands and feeding the output back to the model. The model then decides what to do next. This loop is powerful — but the raw output of most developer tools is designed for humans, not for language models.

Consider what a typical session looks like without filtering:

Without RTK — tokens consumed per command

git log --oneline -20~800 tokens

git diff HEAD~1~3,200 tokens

pnpm install~2,400 tokens

cat src/app/executor.py~12,000 tokens

eslint src/~40,000 tokens

A busy session runs 50–100 commands. Without filtering, that's millions of tokens — most of it noise the model scrolls past anyway.

What RTK does

RTK (Rust Token Killer) is a transparent proxy. You prefix any command with rtk and it intercepts the output, applies a filter matched to that command type, and returns only the signal.

For commands RTK knows about, it applies a dedicated filter. For commands it doesn't recognise, it passes through unchanged. This means it's always safe to use — there's no risk of accidentally silencing output RTK doesn't understand.

Command category	Typical savings	What gets stripped
Tests (vitest, jest, pytest)	90–99%	Passing test names, progress bars, timing
Build (next, tsc, eslint)	70–87%	Route tables, asset sizes, banner ASCII art
Git (status, log, diff)	59–80%	Unchanged files, decorative formatting
Package managers (pnpm, npm)	70–90%	Progress lines, audit noise, dependency tree
Infrastructure (docker, kubectl)	85%	Container IDs, layer hashes, timestamps

The test filter: why 99% is achievable

The highest savings come from test runners. A typical vitest run for a large project might print 800 passing test names, 40 lines of timing breakdown, a progress bar, and finally — buried at the bottom — one failing test.

The model needs exactly one thing: what failed and why. RTK's test filter keeps only failure blocks and the summary line. Everything else is discarded. That's where the 99.5% figure comes from — not compression, but surgical removal of content that carries zero information for the task.

Raw vitest output

✓ auth.test.ts (42 tests)

✓ user.test.ts (18 tests)

✓ api.test.ts (31 tests)

... 790 more passing lines

✕ executor.test.ts

AssertionError: expected…

Duration: 14.2s

RTK output

✕ executor.test.ts

AssertionError: expected…

1 failed, 881 passed

The CLAUDE.md hook

RTK integrates via a hook in CLAUDE.md. Once configured, Claude Code automatically prefixes commands with rtk — you don't have to remember to do it manually. The savings accumulate silently across every session.

Track them any time:

rtk gain

Total commands: 2221

Tokens saved: 2.8M (93.2%)

Efficiency: ████████████████████░░ 93.2%

RTK as Layer 2

RTK sits at Layer 2 of the Agent Booster stack — between prompt caching (Layer 1) and AST-level symbol routing (Layer 3). The three layers compound:

L1Prompt caching cuts cost on stable context that's already been sent.
L2RTK cuts the volume of tool output before it enters the context window.
L3Agent Booster cuts what code the model reads — AST routing, smart_read, route_model.

Together they're responsible for the bulk of the 3–15x cost reduction we see in production sessions. RTK alone accounts for the 93% CLI noise reduction shown above.

RTK: how we cut 93% of CLI token noise from AI coding sessions.

The problem: tools are chatty

What RTK does

The test filter: why 99% is achievable

The CLAUDE.md hook

RTK as Layer 2