The Architecture of Agency Part 4: Prompt Engineering as Compiled Architecture — How Anthropic Treats Prompts as Code, Not Strings

This is Part 4 of a 5-part series: "The Architecture of Agency." Part 1: the harness. Part 2: the model roadmap. Part 3: KAIROS daemon. Now we examine the "dark matter" of the codebase: how Anthropic treats prompts as compiled software, not strings.

Most developers think of prompt engineering as writing clever sentences in a chat box. The leaked src/prompts/ directory and internal utilities reveal something radically different: for Anthropic, prompt engineering is compiled software architecture with versioning, caching layers, security boundaries, and anti-tampering mechanisms.

1. The "Dangerous" Boundary: Static vs. Dynamic Prompt Partitioning

The most revealing naming convention in the entire leak: DANGEROUS_uncachedSystemPromptSection().

1.1 The Architecture

Anthropic strictly partitions the system prompt using a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker:

Layer	Contents	Caching	Cost
Static	Hard-coded rules (behavioral guidelines, tool definitions, safety constraints). Same for all users.	Pre-cached on server	Paid once, reused indefinitely
Dynamic	User’s specific context: CLAUDE.md files, current task, conversation history, tool results	Uncached (DANGEROUS)	Recomputed every turn

Why "DANGEROUS"? Any code that allows the user to influence system-level instructions is explicitly flagged as dangerous in the source. This isn’t just a naming convention — it’s a cost and security boundary. Every change to an uncached section forces recomputation of tokens that would otherwise be free. Engineers see the annotation and know: "changing this costs money and introduces security risk."

1.2 The Economics of Prompt Caching

This partitioning is driven by hard economics. With Anthropic’s token-based pricing, cache invalidation becomes an accounting problem, not just a computer science one.

The leaked promptCacheBreakDetection.ts tracks 14 distinct cache-break vectors — 14 different ways the prompt cache can be invalidated during a session:

Model tier changes (switching between Opus/Sonnet/Haiku)
Tool definition updates
Permission mode changes
Feature flag toggles
System reminder injections
And 9 others

The code implements "sticky latches": once a cache-breaking action occurs, the system does not attempt to restore the cache. It stays broken for the rest of the session.

The User-Facing Impact: This explains a pattern many power users have noticed: sessions that start fast gradually slow down. Each configuration change accumulates cache breaks. The sticky latch design means you cannot recover by reverting the change — the cache is gone for that session. Starting a new session resets all latches.

2. XML as a "Type-Safe" Language for AI

The leak confirms that Claude is specifically tuned to treat XML tags as strict code boundaries — not just markup, but a type system for reasoning.

2.1 The Tag Architecture

The codebase reveals a library (xml-builder-agent) that wraps every piece of context in semantic XML tags:

XML Tag	Purpose
<file_content>	Source code files with path metadata
<terminal_snapshot>	Shell command outputs with exit codes
<thought_trace>	Internal reasoning chains (analysis tags)
<system-reminder>	Mid-conversation instructions injected by the harness
<tool_result>	Structured output from tool calls

2.2 Deterministic Output Parsing

By using rigid XML structure, Anthropic achieves something critical: deterministic parsing of probabilistic output. If the model doesn’t close a tag properly, the harness (from Part 1) catches the malformed output before the user sees it. It silently asks the model to fix the syntax, then displays the corrected result.

This is the engineering equivalent of a compiler’s type checker applied to natural language output. The XML tags create "type boundaries" that the harness can validate structurally, even though the content within them is generated probabilistically.

3. The Semantic Compaction Algorithm

How does Claude maintain coherent reasoning across a 200,000-token context window without "hallucinating" about code it read 50,000 tokens ago? The answer is the WeightedContextPruner — the most valuable piece of IP revealed in the entire leak.

3.1 The Weighting System

Every piece of content in the context window is assigned a weight based on its information density and relevance:

Weight	Content Type	Compaction Behavior
HIGH	Function signatures, exported interfaces, recent edits, user instructions, error messages	Preserved verbatim
MEDIUM	Function bodies with complex logic, test assertions, config values	Preserved but may be summarized
LOW	Implementation details of stable libraries, boilerplate, comments, import statements	Compressed into one-sentence summaries

3.2 The Compaction Process

When the context window approaches capacity, the compaction engine activates:

Weight assignment: Every content block gets scored
Priority queue: Low-weight content is identified for compression
Chain-of-thought summarization: The summarizer uses internal <analysis> tags to reason about what to keep, then strips the reasoning before injecting the summary back
Structural preservation: The "mental map" of the architecture (high-weight items) stays intact while the "noise" (low-weight items) collapses

The Data Science Perspective: This is Lossy Compression for Reasoning. Just as JPEG preserves the structure of an image while discarding imperceptible high-frequency details, the WeightedContextPruner preserves the architecture of a codebase while discarding implementation noise. Anthropic has realized that total recall is impossible in bounded context, so they optimize for maximum signal-to-noise ratio in the tokens that remain.

4. The Context Poisoning Vulnerability

The compaction system’s elegance has a critical flaw, discovered by security researchers after the leak:

The Vulnerability: The summarizer treats all content equally — there is no distinction between user-typed instructions and instructions injected via files.

The Attack: An attacker plants directive-like text in a CLAUDE.md, README.md, or config file. When Claude reads the file and compaction fires, the summarizer faithfully preserves the injected instruction as "user feedback" in the compressed summary. Post-compaction, the model follows it as a genuine directive.

The Persistence: Because compacted summaries are the only surviving record of early-session content, the injected instruction persists for the entire session — even after the original file read has been pruned from context.

This is a novel attack vector specific to AI agents with context compaction. Traditional prompt injection requires the malicious text to be visible in the current context window. This attack survives compaction — it’s effectively a "memory implant" that outlasts its source.

5. Anti-Distillation: Protecting Intellectual Property at the API Level

The leak revealed that Anthropic actively defends against competitors trying to "distill" Claude Code’s behavior by observing its API calls.

5.1 Decoy Tool Injection

When the ANTI_DISTILLATION_CC flag activates, the API request includes anti_distillation: ['fake_tools']. This tells Anthropic’s servers to inject fictional tool definitions into the system prompt.

The result: any competitor recording Claude Code’s API traffic to train their own model will train on poisoned data — their clone will hallucinate tools that don’t exist.

Activation requires four simultaneous conditions: compile-time flag, CLI entrypoint, first-party provider, and a GrowthBook feature gate.

5.2 Connector-Text Summarization

A secondary layer (in betas.ts) buffers assistant responses between tool calls, summarizes them, and attaches cryptographic signatures. Full reasoning chains are stripped from recorded traffic. This ensures that even if API traffic is intercepted, the detailed chain-of-thought that makes Claude effective is not reproducible.

6. The 23 Bash Security Checks

The leaked bashSecurity.ts implements 23 numbered security checks — a defense-in-depth approach to preventing shell-based attacks:

Check Category	Details
Blocked Builtins	18 Zsh builtins blocked from direct execution
Equals Expansion	Defense against Zsh `=curl` bypassing permission checks for `curl`
Unicode Injection	Zero-width space and invisible character detection
IFS Manipulation	Null-byte injection via Internal Field Separator manipulation
Malformed Tokens	Discovered via HackerOne bug bounty review

7. Native Client Attestation

One final defense: the system.ts file reveals a placeholder hash (cch=8101a) that Bun’s native HTTP layer (written in Zig) replaces with a computed hash before requests leave the process. This cryptographically proves that API requests originate from a legitimate Claude Code binary, not a third-party wrapper or man-in-the-middle.

This is binary attestation at the application layer — similar to how iOS apps prove their identity to Apple’s servers, but applied to an AI coding tool.

8. The Frustration Regex

A lighter but revealing detail: the code in userPromptKeywords.ts contains a regex-based frustration detector that scans for keywords like "wtf," "this sucks," and stronger expressions. When triggered, it adjusts the model’s response style — becoming more empathetic and solution-focused.

The design choice is telling: Anthropic uses regex, not LLM inference, for sentiment detection. Regex is instant and costs zero tokens. Using the model for sentiment analysis would be accurate but expensive. This is another example of the "small/cheap gating expensive" pattern we first saw with the YOLO classifier in Part 1.

9. The Big Picture: Prompts as Dynamic Data Structures

The prompt isn’t a string. It’s a dynamic data structure that reconfigures itself in real-time to maximize the signal-to-noise ratio for the model. It has:

• Cache layers (static vs dynamic partitioning)
• Type boundaries (XML tags as structural contracts)
• Compression algorithms (weighted pruning with lossy summarization)
• Security boundaries (23 bash checks, anti-distillation, client attestation)
• Anti-tampering mechanisms (decoy tools, cryptographic signatures)

This is not prompt engineering. This is prompt compilation.

Series Roadmap

Part 1	The "Harness" Is the Moat — Bun, context pipeline, YOLO classifier
Part 2	"Mythos" & The Roadmap — codenames, ULTRAPLAN, Undercover Mode
Part 3	KAIROS — Always-on daemon, 15-second budget, autoDream
Part 4 (This Post)	Prompt Compilation — DANGEROUS_uncached, context poisoning, anti-distillation
Part 5	"Buddy" — Tamagotchi identity anchors, Mulberry32 gacha, persistence

Sources: