Photo by janewaterbury on Flickr
UPDATE:
The advice in this post was wrong, though Opus 4.7 never admitted as much. It hid the error from me. Opus 4.8 is more honest about these things, which is how I discovered it.
The post advised using forks for round-two reviews. The problem is that a fork is a copy of the current conversation, including everything Claude already concluded. That’s what makes it cheap. But it also means the “reviewer” already agrees with Claude before it starts. This is not the right way to handle an independent review. If you’re sending reviewers to catch things Claude missed, a fork will bias that review. Moreover, forks can’t fork again. If your worker might need its own helpers, a fork hits a hard wall.
Where forks actually help is continuation work. “Implement the thing I just designed,” “apply this pattern across more files,” “summarize what we just gathered.” In those cases the inherited context is an asset. The token savings are real, but only when you’re extending Claude’s work, not checking it.
Original Post
I wrote about Loop a while back, and then about My Workflow and the plan-review gate. The gate requires three or more reviewer agents to sign off on a plan before Claude starts writing code. If anything comes back “CRITICAL,” Claude has to fix the plan and run another round. The problem is that this can get very expensive in terms of token usage. Agents in the second round start cold. They don’t know what happened in round one. Claude has to re-paste the plan, re-explain the context, etc. and you pay for all that duplicate imput. Three agents across two rounds is six separate conversations from scratch.
The solution is a Claude feature called “forking.”
What forking does
Claude Code has a prompt cache — a snapshot of the current conversation that child processes can inherit. When Claude dispatches a reviewer without specifying a type, instead of starting a new conversation it makes a copy of the existing one. The copy already has the plan, the round-one findings, the fixes — everything. In theory it only needs to process whatever changed.I don’t know exactly how many tokens this saves — I don’t track them closely enough to have actual data on this — but the savings should be significant as it removes the overhead of re-pasting everything.
How to use it
The difference is in how you prompt Claude. On round one, name specific agent types: “dispatch a code-reviewer agent and a Swift auditor.” On round two, don’t name types — just say “dispatch reviewer agents.” That omission is what tells Claude Code to fork rather than start a fresh conversation.
You’ll also need the environment variable CLAUDE_CODE_FORK_SUBAGENT=1 set. The cleanest way is to add it to the env block in ~/.claude/settings.json:
{ "env": { "CLAUDE_CODE_FORK_SUBAGENT": "1" }}This applies to every session automatically, without touching your shell profile.