Vol. I · No. 1 · THE FIRST, TOKEN ISSUE · AD MMXXVI
Field NotesCLAUDES ADVISOR TOOL

Claude’s Advisor tool

According to Anthropic the Advisor strategy uses a fast, low-cost model to do 90% of the mechanical work, escalating only when it needs strategic advice. The reality was starkly different.

Experimentation RecordFIG. 2-B
HypothesisUsing an advisor agent will get the benefits of a high-powered model without the increased token spend.
ModelOpus 4.8
Trials4
Duration3 days
Scored byAnalysing logs for token spend
MethodMultiple runs conducted on the same codebase, at different branches. Same prompt applied (with the omission of "rely on advisor" for control run.
OutcomeAdvisor model increased token spend with no measurable improvement and longed elapsed time.

The idea was simple. Try using a lower-level model (Sonnet 4.6) and calling a more powerful model (Opus 4.8) only when needed through Claude’s Advisor Tool. Monitor and compare the token usage and efficiencies over using just one model.

The answer was surprising: no efficiencies were observed.

Background

The standard way agents are implemented is by having an orchestrator model plan and then delegating tasks to less powerful sub-agents:

The Advisor Tool [...] flips traditional multi-agent systems upside down. Instead of a large, expensive model decomposing work and delegating to smaller models, the Advisor strategy uses a fast, low-cost model to do 90% of the mechanical work, escalating only when it needs strategic advice."

This is more akin to how work happens in an engineering team; for the most-part, mid-level engineers are able to pick up tickets and write code autonomously; but occasionally they’ll come across something tricky that requires more consideration. This is when they’ll consult with a more experienced engineer on the team. It saves the experienced engineer from having to plan everything, but leverages their skills at the point they’re most needed.

Only, it turns out in this case the Advisor model has some fatal flaws of its own.

Method: The Setup

Install the advisor tool

ant beta:messages create --beta advisor-tool-2026-03-01 <<'YAML'
model: claude-sonnet-4-6
max_tokens: 4096
tools:
  - type: advisor_20260301
    name: advisor
    model: claude-opus-4-8
messages:
  - role: user
    content: Build a concurrent worker pool in Go with graceful shutdown.
YAML

If you get errors about API keys, they seem unfounded. Confirm advisor is installed with /advisor

Method: Application

Find an appropriate problem to solve. Claude’s docs state:

The advisor is a weaker fit for single-turn Q&A (nothing to plan), pure pass- through model pickers where your users already choose their own cost and quality tradeoff, or workloads where every turn genuinely requires the advisor > model's full capability.

So I decided to get it to do a refactor of some styles. I figured it was relatively simple, but required some thinking about approach - I was keen to see where the advisor was consulted but didn’t want to run a long workflow.

Other environment variables to note: I usually have a clear documentation workflow for agents to pick up, but I made sure when running this test that there was no context, memory and documented approaches in the repository itself. I wanted to see what the advisor tool did independently of a defined coding style-guide.

Plan a consistent structure for components and styling across 
the app using best practices, rely on advisor to confirm approach

Observation & Analysis

Then I watched the command line, providing permissions where needed. One highlight was seeing how the agents spoke to one another, gave me a kick to see the virtual high-five.

Before quality-checking the output I wanted to observe what had been going on under the hood. To do this I located the JSONL file for the session

Navigate to ~/.claude/projects/

Each subdirectory corresponds to a project, and the .jsonl files inside are individual sessions. 

I asked Claude to check its work:

Analyse @filename.jsonl and tell me the token spend for this session, 
break it down by model and function and types of tokens; 
include the time taken for actions to  complete 

At first it omitted the advisor agent spend but then it gave me a pretty decent summary

Corrected Token Spend

claude-sonnet-4-6 — 74 turns

┌──────────────────┬───────────┬─────────────┬─────────┐
│       Token type │     Count │        Rate │    Cost │ ├──────────────────┼───────────┼─────────────┼─────────┤
│ Input (uncached) │       264 │  $3.00/MTok │ $0.0008 │ ├──────────────────┼───────────┼─────────────┼─────────┤
│ Cache creation   │   297,703 │  $3.75/MTok │ $1.1164 │ ├──────────────────┼───────────┼─────────────┼─────────┤
│       Cache read │ 4,818,258 │  $0.30/MTok │ $1.4455 │ ├──────────────────┼───────────┼─────────────┼─────────┤
│           Output │    50,435 │ $15.00/MTok │ $0.7565 │ ├──────────────────┼───────────┼─────────────┼─────────┤
│         Subtotal │           │             │   $3.32 │
└──────────────────┴───────────┴─────────────┴─────────┘

claude-opus-4-8 — 8 advisor calls

┌──────────────────┬─────────┬─────────────┬─────────┐ 
│ Token type       │   Count │        Rate │    Cost │ ├──────────────────┼─────────┼─────────────┼─────────┤ 
│ Input (uncached) │ 581,360 │ $15.00/MTok │ $8.7204 │ ├──────────────────┼─────────┼─────────────┼─────────┤ 
│ Cache creation   │       0 │ $18.75/MTok │ $0.0000 │ ├──────────────────┼─────────┼─────────────┼─────────┤ 
│ Cache read       │ 0       │  $1.50/MTok │ $0.0000 │ ├──────────────────┼─────────┼─────────────┼─────────┤ 
│ Output.          │  62,160 │ $75.00/MTok │ $4.6620 │ ├──────────────────┼─────────┼─────────────┼─────────┤ 
│ Subtotal.        │         │             │  $13.38 │ 
└──────────────────┴─────────┴─────────────┴─────────┘

---

Grand Total: ~$16.70

Opus advisor calls account for 80% of the total cost despite being only 8 of 82 model invocations. Each advisor call sent ~72,670 input tokens (the full conversation context, uncached) and produced ~7,770 output tokens. At Opus rates those 8 calls alone cost more than 4× the entire Sonnet session.

The callout being

Opus advisor calls account for 80% of the total cost despite being only 8 of 82 model invocations.

So the obvious question is…

why are the input tokens for claude opus so high? Is it just that this model uses more tokens for each prompt?

It turns out that by design each of the advisor calls was handed the whole context; moreover, none of it was cached. This is very strange behaviour given that the Claude docs page states

You currently use Sonnet on complex tasks: Add Opus as the advisor for a quality lift at similar or lower total cost.

Asked whether we could specify a cache, Claude told me

No — the tool description doesn't control caching. It's just text that tells the model when to call the tool. The actual API call to Opus (including whether cache_control is set on the messages) is made by the Claude Code harness binary, not by anything configurable in the project.

It then suggested I raise a github issue. Turns out, someone already has!

On further reading of the documentation is appeared there may be a way of setting the Advisor model to use the cache so runs 2 and 3 were my attempt to invoke this.

Raw observation log

Run 1

Prompt:Plan a consistent structure for components and styling across the app using best practices, rely on advisor to confirm approach

Advisor enabled

Logs show failed to cache with Opus 4.8

Run 2

Note despite the github issue there appears to be a way of setting a cache for the advisor in the documentation. The docs don’t have any history, so unsure if this is a recent update.

Rerun with caching as per instructions.

Logs show failed to cache with Opus 4.8

Run 3

Reattempt Tried again using cache_control which is what the api was telling me was the correct format - exited prior to implementation.

Logs show failed to cache with Opus 4.8

Run 4 (control)

Using Sonnet 3.6 and without an advisor model I ran the same prompt.

Scoring Matrix

I started this experiment believing I would be comparing the quality of output from different modes, but I shifted to focussing on the behaviour of the agents themselves.

I wrote a script to give me a summary of each session and discovered that

  • The advisor accounts for around 50% of total cost every time it's used, despite being only 8–11% of API calls
  • Run 2 was the outlier at $17.3515 — 14 advisor calls instead of 8, and a larger context window per call (~119K tokens vs ~73K), plus full implementation after planning
  • Run 3 was the cheapest advisor run at $7.78 because it stopped at ExitPlanMode with no code written
  • The control ran the same task for $0.86 — 1/17th the cost — by skipping the advisor entirely

Conclusions

Writing a plan upfront with a more powerful model and then delegating to sub-agents remains my preferred approach; returning to the advisor seems to slow the process down; spinning up smaller subagents feels more efficient as a human looking-on.

The outcomes were similar from all 3 runs, note the prompt said “structure for components and styling” but all 3 runs chose to focus on styling almost in isolation. My hypothesis is that because the more powerful model is only called as a verifier it is unlikely to steer unless vital course correction is needed, so you get no additional benefit but a massively increased token spend.

Also, though I cleared context and had new git branches I believe some earlier approaches may have been saved in memory. Note to discover where Claude saves files across the filesystem before running a similar experiment.

Applications beyond experiment

The value of

  1. Planning up front to get a good steer and save on tokens.
  2. Correct documentation (thanks Claude).
  3. Examining logs directly to understand the effectiveness and expenses incurred from different AI techniques.
  • A nice little python script I got Claude to write that I can run against future sessions without incurring further spend, made available to you (see artefacts below).

Run it with

./analyse-session ~/.claude/projects/[project-name]/[sessionid].jsonl

Happy experimenting!

Outcome RecordFIG. 2-A
StatusFailure
Date closed25 June 2026
Runs4
ArtefactsFIG. 2-C
  • MD
    control-run-report.md

    Report produced after final control run - no advisor tool

    18.6 KBDownload
  • MD
    run1-report.md

    Report produced after first advisor run

    26.6 KBDownload
  • MD
    run2-report.md

    Report produced after second advisor run

    37.9 KBDownload
  • MD
    run3-report.md

    Report produced after third advisor run

    20.9 KBDownload
  • FILE
    plan-no-advisor

    Sample of solution Sonnet 3.6 proposed with assistance of Opus 4.8

    2.6 KBDownload
  • FILE
    plan-with-advisor

    Sample of solution Sonnet 3.6 proposed independently

    9.0 KBDownload
  • PY
    analyse-session.py

    CLI script to analyse token use in Claude sessions

    16.7 KBDownload

0 Comments