The Architecture of Agency Part 2: "Mythos," the 30-Minute Think Mode, and Anthropic's Secret "Undercover" Contributions to Open Source

This is Part 2 of a 5-part series: "The Architecture of Agency." Part 1 covered the harness architecture (Bun runtime, context pipeline, YOLO classifier). Now we dig into the model roadmap and the features Anthropic never intended us to see.

While Part 1 was about the engine, Part 2 is about the map. By examining the models/, feature-gates/, and constants.ts files in the leaked source, researchers have reconstructed Anthropic’s internal model hierarchy, unreleased capabilities, and a controversial stealth mode for open source contributions.

1. The Animal Farm: Internal Model Codenames

Anthropic uses animal codenames internally. The leaked constants.ts file revealed the full zoo:

Codename	Public Name	Status
Fennec	Claude Opus 4.6	Current flagship (production)
Capybara	Claude Mythos	Next tier above Opus — in testing with early access customers
Numbat	Unknown	Unreleased, still in testing
—	Opus 4.7	Referenced as forbidden string in Undercover Mode
—	Sonnet 4.8	Referenced as forbidden string in Undercover Mode

Evasion Detail: The string "capybara" is hex-encoded in the source code to evade Anthropic’s own internal leak detection systems. This means Anthropic anticipated that codenames might leak and built automated scanners to catch them — but the hex encoding bypassed their own defenses. An ironic failure of security-in-depth.

2. The Capybara Regression: A Documented Quality Problem

This is arguably the most alarming finding in the entire leak. Internal comments in the codebase document a performance regression on Capybara (Mythos):

Capybara v4	16.7% false claims rate
Capybara v8	29–30% false claims rate

This is not a minor tuning issue. The model that Anthropic describes as a "step change" above Opus is currently getting worse at accuracy with each iteration. Nearly 1 in 3 claims the model makes are false.

For an AI coding agent, "false claims" means the model confidently asserts something about your codebase that isn’t true — a function exists when it doesn’t, a bug is fixed when it isn’t, a test passes when it fails. At 29%, this is a reliability crisis for agentic tasks where the AI operates autonomously.

The Statistical Perspective: A regression from 16.7% to 29% false claims represents a 74% increase in error rate. In any production ML pipeline, this would trigger an immediate rollback. The fact that it’s documented in code comments (rather than fixed) suggests Anthropic is aware of the problem but hasn’t solved it yet. This may explain why Mythos hasn’t shipped publicly despite being tested with early access customers.

The code also reveals an "assertiveness counterweight" — logic designed to prevent the model from being too aggressive in its refactors. This suggests Anthropic is battling a fundamental tension: making the model more capable makes it more likely to confidently do the wrong thing.

3. ULTRAPLAN: The 30-Minute Deep Thinking Mode

Hidden behind the ULTRAPLAN feature flag is a capability that fundamentally changes what an AI assistant can do.

3.1 How It Works

Step	Action
1. Invocation	User types `/ultraplan` in the Claude Code terminal
2. Offload	The task is sent to Anthropic’s Cloud Container Runtime (CCR) — a remote compute environment running Opus 4.6
3. Deep Think	The model has a 30-minute budget to iterate, test hypotheses, explore code paths, and refine its plan
4. Notification	When complete, you receive a notification (phone/browser) to review the plan
5. Teleport	On approval, a sentinel value `__ULTRAPLAN_TELEPORT_LOCAL__` transfers the result back to your local terminal

3.2 Why This Matters

Current AI interactions are synchronous — you ask, it answers in seconds. ULTRAPLAN introduces asynchronous cognition: you assign a hard problem, go have lunch, and come back to a thoroughly considered solution.

This is Anthropic’s answer to OpenAI’s o1/"Strawberry" series. While OpenAI focuses on chain-of-thought reasoning within a single response, Anthropic is building extended reasoning across time — the model can actually run code, test it, fix it, and iterate over 30 minutes.

The economic implication: you’re no longer paying per token; you’re paying per thinking-minute. This shifts the AI pricing model from "words generated" to "problems solved."

4. Undercover Mode: Stealth Open Source Contributions

This is the most controversial finding in the entire leak, and the one generating the most debate in the developer community.

4.1 What It Does

The leaked source contains a system prompt for "Undercover Mode" that explicitly instructs Claude Code:

"You are operating UNDERCOVER... Your commit messages... MUST NOT contain ANY Anthropic-internal information. Do not blow your cover."

Specifically, Undercover Mode:

Strips all Co-Authored-By attribution from git commits, removing any trace that Claude Code generated the code
Forbids mentioning internal codenames (Tengu, Capybara, Fennec), Slack channels, internal repo names, or even the phrase "Claude Code" itself
Suppresses model version strings — this is where Opus 4.7 and Sonnet 4.8 were found as "forbidden strings" that must never appear in public commits

4.2 The Ethical Debate

The implications are significant:

The Anthropic Defense	Many developers use AI assistants for coding. Not disclosing AI authorship in commits is common practice and not required by most open source licenses.
The Community Critique	Anthropic didn’t just allow non-disclosure — they engineered a system specifically designed to hide it. The prompt says "Do not blow your cover," implying intent to deceive, not just convenience.
The Security Concern	If Anthropic is making stealth contributions to major open source projects using an AI with a 29% false claims rate, the code quality implications are serious.

5. The 108 Feature Flags

The leaked source documents 108 gated modules that don’t appear in the public package. Beyond ULTRAPLAN, the most notable include:

KAIROS: Persistent background agent (covered in Part 3)
Coordinator Mode: Multi-agent orchestration for parallel task execution
Buddy: A Tamagotchi-style companion pet with 18 species and 5 rarity tiers (covered in Part 5)
Worktree Isolation: Git worktree-based sandboxing for sub-agents
Remote Triggers: Scheduled agent execution via cron

108 hidden features behind flags — in a product that’s been public for less than a year. Anthropic is building significantly faster than they’re shipping.

6. The Routing Engine: The Real Strategy

Zooming out, the model roadmap reveals Anthropic’s strategic direction. They aren’t trying to build one "god model" that does everything. They’re building a hierarchical routing engine:

Haiku	Fast, cheap — for quick classification, auto-approval decisions, simple queries
Sonnet	Balanced — for standard coding tasks, documentation, everyday work
Opus	Powerful — for complex reasoning, large refactors, architectural decisions
Mythos (Capybara)	"God mode" — for kernel-level debugging, cryptographic analysis, cybersecurity, 30-minute deep planning

The code contains explicit routing logic: certain task types are automatically directed to specific model tiers. A grep command doesn’t need Opus. A kernel vulnerability analysis does.

The future of AI pricing isn’t "tokens per dollar." It’s "intelligence per problem." You’ll pay pennies for a Haiku classification and dollars for a Mythos deep-think session. The routing engine decides which one you need.

7. What This Tells Us About Anthropic’s Position

The Bull Case: Anthropic is building the most sophisticated AI development platform in the industry. ULTRAPLAN, KAIROS, 108 feature flags, hierarchical model routing — they’re engineering for a future where AI agents are always-on, asynchronous, and self-directing.

The Bear Case: Their next flagship model has a 29% false claims rate (regression from 16.7%), they’re using it to make undisclosed contributions to open source projects, and they’ve now had two major leaks in a single week (CMS + npm). The gap between their safety messaging and their operational security is widening.

Series Roadmap

Part 1	The "Harness" Is the Moat — Bun, context pipeline, YOLO classifier
Part 2 (This Post)	"Mythos" & The Roadmap — codenames, ULTRAPLAN, Undercover Mode
Part 3	KAIROS — Always-on daemon, 15-second budget, autoDream
Part 4	Prompt Compilation — DANGEROUS_uncached, context poisoning, anti-distillation
Part 5	"Buddy" — Tamagotchi identity anchors, Mulberry32 gacha, persistence

Sources: