May 2026 · AI Infrastructure

Anthropic split your subscription in two. Here's the engineering logic.

NotebookLM Podcast

Anthropic split your subscription in two

0:00 / 0:00

Same week. Interactive Claude Code got significantly more permissive. Programmatic Claude Code got significantly more expensive. Those aren't contradictions. They're the same move, and if you run agents in CI or rely on any third-party Claude Code harness, you have until June 15 to understand it.

The compute headroom from the SpaceX/Colossus deal (220,000+ GPUs turned on in Memphis) paid for the interactive generosity. The programmatic carve-out recovers the cache-economics subsidy that was funding a third-party agent ecosystem Anthropic now intends to compete with directly. And the simultaneous shipping of Agent View, /goal, and per-session orchestration flags in Claude Code v2.1.139-143 turns the third-party wrapper category into an Anthropic feature, right at the moment those wrappers got expensive to run on a subscription.

The line between interactive and programmatic Claude Code usage is now load-bearing. Everything downstream (pricing, product strategy, ecosystem positioning, the competitor response) sits on that line. I'm going to walk through how we got here, what the economics actually say, and what to do before June 15.

Mind map of Anthropic's Great Repricing: central strategy (split subscription, interactive vs. programmatic divide, subsidy recovery, first-party orchestration), compute infrastructure (SpaceX Colossus deal, 220,000 NVIDIA GPUs, multi-cloud strategy), pricing changes (interactive limits raised, programmatic carve-out), Claude Code updates (Agent View, /goal, supervisor architecture, plugin marketplace), market context (OpenAI Codex promotion, Cursor Ultra, Copilot usage-based), and the technical logic (cache-hit economics, KV cache optimization, ARR protection). — The full landscape: strategy, compute, pricing, orchestration, market context, and the technical logic underneath.

1. The Compute Unlock

On May 6, 2026, Anthropic announced at Code with Claude in San Francisco that it had signed a deal for the entirety of SpaceX's Colossus 1 cluster in Memphis: 300+ megawatts of capacity, 220,000+ NVIDIA GPUs (H100s, H200s, and next-generation GB200 accelerators) available within a month.

A few pieces of context worth holding onto. Colossus 1 is now operated by SpaceXAI, the entity that emerged from the xAI-SpaceX merger. xAI shifted Grok training to Colossus 2 to free up the first cluster. The CNBC reporting on this deal noted that Elon Musk's relationship with Anthropic has been openly adversarial. He called the company “doomed to become misanthropic.” That the deal happened anyway tells you something about how tight Anthropic's capacity situation was. I won't read into the politics. I'll just note what the behavior implies.

This is short-term capacity. Anthropic's long-term infrastructure commitments are in a different weight class: a $25B+ Amazon deal, a 5-gigawatt Google/Broadcom buildout projected for 2027, a $30B Microsoft/NVIDIA Azure expansion, a $50B Fluidstack commitment. Those are multi-year construction projects. Colossus 1 is what Anthropic could turn on now.

The immediate user-facing changes landed the same day, May 6:

5-hour rate windows doubled (2×) for Pro, Max, Team, and seat-based Enterprise plans
Peak-hour throttling removed entirely for Pro and Max users
Claude Opus API rate limits significantly increased
Weekly limits bumped an additional 50%, valid through July 13, 2026

That last point matters for how you read the competitive response later. The 2× 5-hour windows and the removal of peak-hour throttling appear to be permanent changes. The 50% weekly bump is explicitly temporary. It expires July 13. Keep those two categories distinct.

AI Engineer's Take: Capacity headroom is upstream of every other decision in this week. You can't carve out programmatic usage, give interactive users a 2× window, and remove peak throttling unless you've solved the supply side first. The SpaceX deal is the thing that made the rest of the play possible.

Without it, the June 15 changes would look like a panic cap: a company rationing its way through constrained infrastructure. With it, they look like deliberate workload segmentation: a company that finally has enough room to price the two workloads correctly. Engineers reading the wire stories saw three disconnected announcements. Engineers reading the serving stack saw one.

2. The Arbitrage Closure

This is the section that actually matters. Everything else is context.

On June 15, 2026, usage of the following tools will stop drawing from your standard subscription limits and will instead draw from a new, separate monthly credit pool:

The Claude Agent SDK (@anthropic-ai/claude-agent-sdk and equivalents)
The claude -p non-interactive command
Claude Code GitHub Actions
Third-party Agent SDK tools: Conductor, gstack, OpenClaw, Zed via ACP (Anthropic's agent communication protocol), T3 Code, Jean

One thing that is not changing: API-key authenticated SDK calls. If you're authenticating the Agent SDK with a direct API key rather than your subscription credentials, nothing happens on June 15. Pay-as-you-go continues exactly as before. This entire policy is about subscription-authenticated programmatic usage. The path where a $20/month Pro plan was doing the pricing work of a direct API account.

The new credit pool mechanics:

Plan	Monthly Credit
Pro	$20
Max 5x	$100
Max 20x	$200
Team	$20–$100/seat
Enterprise	$20–$200/seat

The credits are dollar-denominated and metered at full API list rates. They don't roll over. They can't be pooled across seats. They're per-user. They must be claimed once via a toggle after June 15. Anthropic says it will email on June 8 with instructions.

The overage (“extra usage”) toggle defaults to OFF. If you don't enable it, programmatic calls fail silently when the credit pool is exhausted. If you do enable it, you pay full API list rates for overages, with the option to set a monthly dollar cap. That default-off behavior is the thing most engineers I've talked to haven't absorbed yet.

The math, and this is where the post earns its credibility

The Register documented one OpenClaw user extracting approximately $236/month of API-equivalent token value from a $20 Pro plan before the April ban. That's roughly a 12× ratio. Theo Browne, creator of T3 Code, cited what he described as a 25× effective cut for his users, the middle of the range. At the high end, Sonnet-heavy agent fleets running near the top of Max 20x weekly quotas could reach 150-175× extracted API value. That last figure is reconstructed from documented weekly quotas priced at current API list rates. Anthropic hasn't published it directly, and I'm flagging it as a reconstruction, not a confirmed data point.

So the effective price increase is 12×-175× depending on workload shape, cache hit rates, and model mix. The spread is enormous. Your position in that range depends almost entirely on how you've been running your agents.

Boris Cherny, Head of Claude Code, to The Register and VentureBeat:

“Our systems are highly optimized for one kind of workload. Our subscriptions weren't built for the usage patterns of these third-party tools.”

That quote is the technical Rosetta Stone for the whole policy. Everything else in this announcement follows from it.

Claude Code's serving infrastructure is built around KV cache: the mechanism by which a transformer model can reuse computation from earlier in a conversation rather than reprocessing tokens it's already seen. In a long interactive session, every turn benefits from the context built up in previous turns. The KV cache is warm. The per-token serving cost is low.

Third-party harnesses break that pattern. They spawn new sessions with each run. They use different system prompts from what Anthropic's infrastructure anticipates. They miss the cache. The KV cache is cold. Every call starts fresh, and Anthropic's serving stack has to do the full computation. From the infrastructure perspective, an OpenClaw call costs meaningfully more compute per token than an interactive Claude Code call, even for identical underlying models.

The cleanest way to see the workload-shape asymmetry is to trace what actually happens at the serving layer across the two paths. The diagram below maps the cache-hit differential and shows where the credit pool boundary sits.

Two Paths Through the Same Model. Two Prices. Interactive path (cache-warm): long-lived IDE sessions, consistent system prompts, warm KV cache with 70%+ hit rate, subscription-based limits. Programmatic path (cache-cold): stateless script invocations, fresh sessions with no prior context, cold KV cache with 0% hit rate, metered credit pool at full API list rates. June 15, 2026 transition delineates cache-warm interactive from cache-cold programmatic for sustainable pricing. — Same weights, different cost. The 12×–175× price differential comes from cache-hit economics, not the model.

The split-pool is, mechanically, a per-workload-shape pricing model. Interactive users get the cache-warm path and the subscription. Programmatic users get the cache-cold path and API list rates. The “free credit” framing (Anthropic calling the new pool a “monthly credit” that comes with your plan) is a marketing wrapper around what is structurally a serving-cost recovery scheme.

The Lydia Hallie moment

Hallie, an Anthropic employee on the Claude Code team, posted a clarification on X that read: “you don't pay extra. Same subscription, same price.” The post received a Community Note within hours. The note stated: “Previously, programmatic usage counted toward subsidized subscription limits. Starting June 15, it draws from a separate $20-$200 credit at full API rates, while interactive limits remain unchanged.”

Community Notes require cross-ideological consensus to post. A note landing on a company employee's framing tweet is a specific kind of public failure. It means people with opposing priors agreed the framing omitted the material fact: that programmatic usage was previously drawing from subsidized subscription capacity, and is now metered at full API list rates. I'm not piling on. I'm noting the signal: the framing was clumsy enough that the crowd-correction mechanism activated.

The strongest community reactions said the same thing in different ways. One Max user on Hacker News said 99% of their usage is non-interactive and the new pricing “will far, far exceed what I can afford.” Ben Hylak, CTO of Raindrop.ai, posted that this was “either really silly, or shows how bad of a spot Anthropic is in re: gpus.” Theo Browne was more direct. He had to make T3 Code's Claude experience “significantly worse” to avoid burning through the credit ceiling for his users.

AI Engineer's Take: This is a cache-economics story, not a price-gouge story. The previous regime was unsustainable not because users were doing anything wrong, but because the serving stack was paying for a workload it wasn't designed to serve. The right read isn't “Anthropic is screwing developers.” It's “Anthropic finally priced the workload differential.”

The framing was clumsy. The economics were inevitable. If you were getting 12-25× effective value out of a Pro plan via OpenClaw, you had been receiving a subsidy that no SaaS business survives giving away indefinitely. The question for you isn't whether this was fair. It's whether your workflow is cache-warm or cache-cold, and which side of that line you want to be on before June 15.

3. The Orchestration Shipped In-House

Between May 12 and May 16, Claude Code shipped five daily releases: v2.1.139 through v2.1.143. The changelogs read like a checklist of every reason someone might have downloaded a third-party orchestration wrapper in the first place.

The major landings:

Agent View (claude agents, Research Preview): A single-list dashboard for every Claude Code session (running, blocked, and finished) hosted by a per-user supervisor process. Haiku-class one-line row summaries for each session, refreshing at up to 15 times per second. You can dispatch, peek into, attach to, and detach from any background session from one terminal. This is what every multi-session harness was building.

The /goal command: Set a completion condition for Claude Code in natural language. Claude iterates across turns (sometimes for hours, sometimes for days) until the condition is met. Works in interactive mode, -p, and Remote Control. Shows a live overlay of elapsed time, turns, and tokens consumed. The architecture worth understanding: an independent Claude session, separate from the one doing the work, acts as the goal supervisor. It audits whether the stated goal was actually achieved before notifying you. That separation is what makes /goal trustworthy rather than just persistent.

Per-session flags for claude agents: --model, --effort, --mcp-config (MCP is Anthropic's Model Context Protocol, the standard for connecting Claude to external tools and data sources), --permission-mode, --plugin-dir, --add-dir, --settings, --dangerously-skip-permissions. Parallel background sessions can each carry their own configuration. This is what every multi-agent harness was threading through at the command-line level.

Plugin marketplace: Shows projected per-session token cost before install. Plugin dependency enforcement: disabling a plugin refuses if another installed plugin depends on it; enabling force-enables transitive dependencies. No more silent broken-state failures after a plugin change.

The worktree.bgIsolation: “none” setting: An escape hatch for repositories where Git worktrees (separate working trees from the same repo, used to run parallel Claude agents without file conflicts) are impractical.

Fast Mode now defaults to Opus 4.7 (was 4.6). Override via CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1 if you need the old behavior.

The pattern across all five releases is the same: every major reason a developer reached for OpenClaw, Conductor, or T3 Code (multi-session orchestration, completion conditions, per-agent configuration, context budgeting, dependency safety) just got a first-party implementation.

The `/goal` + stop hook + test suite loop

Here's the cleanest “leave it running until done” pattern Claude Code has ever shipped, and it's worth tracing in full:

/goal sets the completion condition: “all tests in the auth module pass, no regressions in the full suite”
A stop hook runs your test suite at the end of each turn
If tests fail, the hook blocks the stop; Claude continues iterating
If tests pass and the goal supervisor confirms the completion condition is met, the goal resolves and Claude notifies you
The CLAUDE_CODE_STOP_HOOK_BLOCK_CAP environment variable (default: 8 consecutive blocks) prevents runaway loops if something goes wrong

ExplainX called /goal “the single most underrated AI feature of 2026.” More telling: OpenAI's Codex has integrated the same pattern. Completion conditions for agent sessions are becoming a cross-vendor standard, not a differentiator. The window to build a business on “Claude Code, but with completion conditions” has closed.

AI Engineer's Take: Third-party orchestration wrappers built their value proposition on a gap: “Claude Code can't do X, but we can.” That gap just got an order of magnitude narrower.

Agent View is what every harness was building. /goal is what every harness was wrapping. Per-session flags are what every harness was passing through. The wrappers that survive will be the ones doing something Anthropic structurally can't or won't do: bringing a non-Claude model into the mix, providing tight IDE integration the terminal can't match, or serving a specific vertical workflow that requires a different permission model.

The middle of the market: “Claude Code, but with multi-session management” is gone. And it got expensive to run on the same day it became redundant. That's not an accident. That's the move.

4. The Competitor Response

OpenAI moved the same day. On May 14, Sam Altman announced that enterprise users switching from Claude Code to Codex within 30 days would receive two months free, plus a one-click migration tool that transfers Claude Code prompts, skills, and MCP configurations.

The timing is not subtle. Anthropic's 50% weekly usage bump runs through July 13. OpenAI's 30-day window plus two free months lands on almost exactly the same horizon. Both companies are racing to lock in heavy agentic users before the IPO windows most analysts are projecting for H2 2026.

Codex's structural pitch in this window: Codex Pro at $100 gets 10× Plus through May 31. Codex Pro at $200 gets a permanent 20× Plus bump plus 25× Plus 5-hour limits through May 31. Critically (and this is Codex's first credible structural pitch since launch) Codex doesn't split interactive from programmatic. It's a unified pool. That's the explicit pitch now: one plan, no fragmentation, no credit pool to claim via toggle.

The timeline below maps how the May 6 – June 15 – July 13 sequence lines up against where each major vendor's incentives sit, and what the competitive window actually looks like for someone deciding where to route their workloads this quarter.

Cursor is the obvious migration target for ex-Max-20x heavy users. The tiers are Pro at $20, Pro+ at $60, and Ultra at $200 (20× Pro usage). Model flexibility is a real feature here: Claude Opus 4.6, GPT-5.4, Gemini 3 Pro, Grok Code. If you're worried about vendor capture after this week, Cursor's multi-model architecture is a genuine hedge. Cursor switched to usage-based pricing back in June 2025. Cursor's own usage data puts daily Agent users at $60-$100/month and power users at $200+/month. They solved the “what if my agent eats the credit?” problem months ago. They just got new marketing ammunition.

Zed is facing direct existential pressure, not from the market, but from their own product blog. Zed published a post acknowledging that ACP-routed Claude Code usage now draws from the new credit pool, and recommending that users run the native Anthropic CLI directly inside Zed's terminal as a workaround. That's a remarkable concession. ACP (the protocol Zed uses to integrate Claude Code) is now explicitly the worse path compared to the native terminal. The same bind applies to any tool built on ACP.

GitHub Copilot is moving to usage-based billing on June 1, 2026. New Copilot Pro and Pro+ signups have been paused since April 20. Flat-fee inference subsidies are ending industry-wide. Anthropic just moved first on the part that produces the sharpest developer reaction.

AI Engineer's Take: Flat-fee inference subsidies with no usage floor are a tragedy of the commons. Any one vendor that moves first forces the others to follow within 6-12 months, or bleed to death subsidizing every heavy user the market pushes their way.

Anthropic moved first on the programmatic side because they had the most leverage. Highest-perceived-quality coding model means the lowest near-term churn risk on a price change. Cursor already solved the problem with metering. Copilot is solving it by going usage-based June 1. OpenAI is the only major vendor still subsidizing programmatic usage outright, and they're doing it as an explicit two-month promotional spike into a competitive window.

By Q4 2026, every coding agent stack will look structurally like Anthropic's split. The right framing isn't “Anthropic is greedy.” It's “Anthropic is early.”

5. What to Actually Do Before June 15

Concrete steps. No fluff.

1. Audit your usage. Grep your repos and CI pipelines for claude -p, Agent SDK imports (@anthropic-ai/claude-agent-sdk and any equivalents), and GitHub Actions referencing anthropics/claude-code-action. Anything you find is moving to the credit pool on June 15. Estimate your monthly token spend for those calls at full API list rates. That's your new cost baseline.

2. Decide whether to enable the “extra usage” toggle. It defaults to OFF, which means your CI silently fails when the monthly credit is exhausted. If you have any unattended workload that can't fail mid-month (a nightly PR-review action, a scheduled refactor agent, a background documentation pass) turn it on and set a dollar cap. Write the cap into your runbook so you don't lose track of the toggle state three months from now.

3. Try claude agents and /goal before investing another dollar in a third-party orchestrator. The /goal + stop hook + test suite loop covers roughly 80% of what engineers were using OpenClaw for. If your harness was primarily a wrapper around multi-session management and completion conditions, the wrapper just got obsoleted. If it was doing something structurally different (cross-model routing, IDE-tight integration, a vertical workflow requiring a specific permission model) it's still valuable. Know which category you're in.

4. If your programmatic spend exceeds your plan's credit allocation ($20 for Pro, $100 for Max 5x, $200 for Max 20x) at API-equivalent token rates, hybrid-route. Move cache-cold batch workloads (large-scale code review, classification pipelines, anything with cold-start sessions and no cache warmth) to Codex or local models. Keep cache-warm interactive work on Claude. The new pricing rewards correct workload placement. Route accordingly.

5. If you're authenticating the Agent SDK with a direct API key, not your subscription credentials, nothing changes for you. Pay-as-you-go continues exactly as before. No credit pool, no toggle to claim, no June 15 cutover. This point is buried in most coverage; confirm your auth path before assuming you're affected.

The Line Is Load-Bearing

Interactive got better. Programmatic got expensive. The line between them is now the architecture.

The compute unlock made the move possible. Without 220,000 GPUs coming online in Memphis, Anthropic didn't have the supply-side room to be generous with one side while pricing the other correctly. The credit pool closed a subsidized arbitrage at ratios no SaaS model survives: 12× at the documented floor, 150-175× at the reconstructed ceiling. The orchestration primitives shipped the same week turned the third-party wrapper category from a gap to fill into an Anthropic feature to compete with.

The competitor response is already visible. The industry math is already moving. Within 12 months, I expect every major coding agent stack to have an interactive/programmatic split of some kind. The economics of warm-cache and cold-cache workloads are different, and the platforms will price them differently.

The question for an engineer in May 2026 isn't whether to like or dislike the change. It's whether your workflow is on the right side of the new line.

Audit. Toggle. /goal. Move before June 15.

Built by an AI Engineer. Not a journalist.

Follow along for more AI research breakdowns.

← Back to Context Window