Apr 2026 · AI Infrastructure

Arm just broke a 35-year rule and it matters more than you think

NotebookLM Podcast

0:00 / 0:00

For 35 years, Arm designed the chips inside every device you own and never sold a single one. That ended on March 24, 2026. The ripple effects are going to reshape how AI infrastructure gets built.

What Arm just did

On March 24, at an event called "Arm Everywhere" in San Francisco, Arm CEO Rene Haas walked onstage and held up a physical chip. The Arm AGI CPU. The company's first finished silicon product in its entire 35-year history.

This is not a prototype. Not a reference design. It's production silicon, manufactured on TSMC's 3-nanometer process, available to order today from Lenovo, Supermicro, ASRock Rack, and Quanta Computer.

Meta co-developed the chip over 18 months. Arm's expanded Austin campus grew to over 1,000 engineers and $71 million in new lab space to build it. The launch partner list reads like an AI infrastructure roll call: OpenAI, Cerebras, Cloudflare, SAP, SK Telecom. Over 50 ecosystem supporters issued statements at launch, including AWS, Google Cloud, Microsoft Azure, NVIDIA, Micron, Snowflake, and Hugging Face.

The market agreed this was significant. Arm's stock surged over 16% the following day. CEO Haas projects the AGI CPU alone will generate $15 billion in revenue by 2031, growing Arm's total revenue from roughly $4 billion in 2024 to $25 billion.

What Arm actually is, for those who don't know

To understand why this matters, you need to understand what Arm has been doing for the past three decades.

Arm Holdings, based in Cambridge, England and majority-owned by SoftBank, has run one of the cleanest business models in tech. They design chip architectures - the fundamental instruction sets and blueprints that tell a processor how to think - and license those designs to companies that actually manufacture the chips. Arm collects royalties on every chip shipped. Over 30 billion Arm-based chips shipped in 2024 alone.

You use Arm chips constantly and probably don't realize it. Every iPhone. Every Android phone. AWS Graviton, the backbone of Amazon's cloud. Apple's M-series chips that transformed the Mac lineup. All Arm architecture. None of them made by Arm.

The analogy: Arm was the architect who designed every skyscraper in the city. It made money selling blueprints. Apple, NVIDIA, Amazon, Google were the construction companies. Arm was everyone's partner and nobody's rival. The “Switzerland of semiconductors.”

On March 24, that architect showed up at the job site with a hard hat, a crane, and a finished building of its own.

Why this is happening now

This is the part that matters most, especially if you build AI systems for a living.

For the past three years, the AI hardware conversation has been almost entirely about GPUs. For good reason. Training large language models requires massive parallel computation, exactly what GPUs do best. NVIDIA's H100s and Blackwell chips became the currency of the AI arms race. Companies measured their AI ambitions in GPU count.

CPUs, in that era, were an afterthought. They sat next to the GPU, handled some basic preprocessing, passed data along. Nobody cared about their performance. They were the boring chip.

That era is ending. And the reason is the shift from chatbots to agents.

The chatbot era (roughly 2023-2025) had a simple loop: user sends prompt, GPU runs inference, response comes back. One shot. The GPU does the heavy lifting. The CPU barely breaks a sweat.

Agentic AI is a fundamentally different beast. An AI agent doesn't respond to prompts. It plans multi-step tasks, calls external APIs, queries databases, browses the web, manages memory, spawns sub-agents, and coordinates everything continuously and autonomously. OpenAI Codex, Anthropic's Claude with tool use, Microsoft's Copilot agents, hundreds of enterprise agent frameworks: this is where AI is heading in 2026 and beyond.

Here's what happens inside an agentic workflow: the agent receives a task. The CPU routes the request, loads context, and prepares input for the GPU. The GPU runs inference and generates a next action. The CPU receives that output, parses it, and decides what to execute next: an API call, a database query, a memory write, a sub-agent spawn. The CPU executes the tool call, processes the result, and prepares the next GPU input. Repeat this dozens or hundreds of times for a single complex task. Multiply by hundreds of agents running in parallel.

Every step between GPU inference calls is CPU work. Every tool call, API request, memory lookup, scheduling decision, data movement operation: all of it lands on the CPU. Industry analysis puts CPUs at 50% to 90% of total end-to-end latency in agentic workflows. The chip everyone ignored became the bottleneck.

The kitchen analogy: the GPU is a world-class chef who cooks at incredible speed. The CPU is the kitchen manager. It decides what to cook next, fetches ingredients, coordinates waitstaff, manages the order queue. One customer? The chef is the bottleneck. A hundred AI agents placing simultaneous orders, each requiring multiple courses, each spawning sub-orders, each maintaining state? The kitchen manager is drowning. Hiring faster chefs won't fix it.

The demand numbers confirm this. Arm CEO Rene Haas stated that today's AI data centers use roughly 30 million CPU cores per gigawatt of power capacity. For agentic AI, that needs to grow to 120 million - a 4x increase. Bloomberg Intelligence projects the inference market will surpass the training market by 2029.

The November 2025 AWS-OpenAI partnership included “hundreds of thousands of GPUs.” Everyone focused on that number. The deal also included tens of millions of CPUs to rapidly scale agentic workloads. That's the signal most people missed.

What makes this chip special

The AGI CPU packs 136 Arm Neoverse V3 cores into a 300-watt thermal envelope. For comparison, AMD's top EPYC and Intel's high-end Xeon processors offer around 128 cores at 500 watts. More cores. 40% less power.

The memory architecture is built for agentic workloads: 12 channels of DDR5 running at 8,800 MT/s, delivering over 800 GB/s of aggregate bandwidth. Each core gets approximately 6 GB/s of dedicated bandwidth at sub-100 nanosecond latency, designed so agent threads don't compete for memory access under sustained parallel load. Total memory capacity: up to 6TB per chip, with CXL 3.0 support for further expansion.

In a standard air-cooled rack, you fit roughly 8,000+ cores. With liquid cooling, over 45,000.

Arm's claim: more than 2x performance per rack versus the latest x86 platforms, translating to up to $10 billion in CAPEX savings per gigawatt of data center capacity. Independent benchmarks haven't been published yet. But when Meta is building gigawatt-scale data centers with a $115-135 billion 2026 capex budget, efficiency isn't a nice-to-have. It's the binding constraint. Meta's head of infrastructure, Santosh Janardhan, put it plainly at launch: “Wattage is a very scarce resource.”

The bigger race

Arm isn't alone in recognizing the CPU opportunity. The server CPU market hasn't been this competitive in 20 years.

NVIDIA launched the Vera CPU at GTC 2026: 88 custom “Olympus” cores purpose-built for agentic reasoning and reinforcement learning workloads. Connected to Rubin GPUs via NVLink-C2C at 1.8 TB/s coherent bandwidth. Jensen Huang's line at launch: “The CPU is no longer simply supporting the model. It's driving it.” That's Arm's entire thesis, from NVIDIA's mouth.

AMD's EPYC Venice brings 256 Zen 6 cores on TSMC's 2nm process with a claimed 70% generational performance jump. Intel's Clearwater Forest packs 288 E-cores and is already supply-constrained. The narrative has shifted: everyone agrees CPUs matter again.

But Arm occupies a position nobody else does. AWS Graviton, Google Axion, Microsoft Cobalt, NVIDIA Vera: all built on Arm architecture. Arm collects royalties from every one of them, regardless of who wins market share. Now Arm also sells its own finished silicon alongside those licensees. It collects rent from the entire neighborhood while building its own house on the same block.

Evercore ISI analyst Mark Lipacis framed it concisely: “Agents are to Arm as AI is to Nvidia.”

The neutrality risk is real and worth naming. For 35 years, Arm's superpower was being Switzerland, trusted because it competed with no one. NVIDIA, once a potential acquirer of Arm in the failed $40 billion deal, reportedly liquidated its Arm equity stake in February 2026. Qualcomm is accelerating its RISC-V investment and recently acquired Ventana Micro Systems to build an alternative ecosystem. Arm's licensees are watching this move carefully.

One more thing worth acknowledging: the name “AGI CPU” raised eyebrows across the industry. ServeTheHome, Electronic Design, and significant corners of technical Twitter have noted that this chip doesn't achieve AGI. It's a high-core-count server CPU optimized for agentic workloads. The name rides the hype wave deliberately. Credibility matters more than marketing.

Mind map of the Arm AGI CPU - market launch, technical specs, agentic AI shift, competitive landscape, and ecosystem partnerships. — Full breakdown of the Arm AGI CPU: market launch, technical specifications, the shift to agentic AI, competitive landscape, and ecosystem partnerships. Generated via NotebookLM.

What this means if you build things

Your GPU-to-CPU ratio needs revisiting. Traditional inference serving assumed low CPU overhead. Agentic workloads flip that assumption. Each agent thread needs consistent CPU resources for orchestration, tool execution, and memory management. A GPU-heavy, CPU-light cluster will hit orchestration bottlenecks that leave expensive accelerators idle.

Your orchestration framework runs on CPUs. LangChain, CrewAI, AutoGen, LlamaIndex, whatever you've built internally - its efficiency directly determines GPU utilization. A slow orchestration layer doesn't just add latency; it means your $30,000 GPUs wait around doing nothing between inference calls. Optimizing that layer is now as important as optimizing your model.

System-level architecture matters more than raw GPU count. The next generation of AI infrastructure isn't about stacking the most chips in a rack. It's about balanced systems where CPUs, GPUs, memory, and networking work together efficiently. The companies that figure out system-level optimization first will have a real competitive edge.

The AI race isn't about who has the biggest GPU cluster anymore. It's about who can orchestrate intelligence most efficiently, at the lowest power cost, at the largest scale. Arm just entered that race with 35 years of architectural DNA, the biggest names in AI already signed on, and a chip that makes a serious argument for why the most important processor in an agentic data center isn't the GPU.

It's the one that tells the GPU what to do next.

Follow along for more AI research breakdowns.

← Back to Context Window