Google Just Split Its AI Chip in Two — and Nvidia Should Be Worried
Google dropped a bombshell at Cloud Next 2026 yesterday. Instead of releasing one new AI chip, they released two — and the reasoning behind that split tells you everything about where the AI industry is heading right now.
Google TPU 8t vs TPU 8i
Two Chips. Two Missions. One Agentic Era.
TPU 8t
Training Powerhouse
9,600 chips per pod
2 PB shared memory
2.7x price/performance
3D torus topology
Native FP4 operations
2x perf-per-watt vs Ironwood
TPU 8i
Inference Engine
1,152 chips per pod
3x more on-chip SRAM
80% perf/$ improvement
Boardfly ICI topology
Collectives Accel Engine
50% fewer network hops
Tools Stack AI | April 2026
Meet the TPU 8t and TPU 8i — Google’s eighth-generation tensor processing units, and the first time the company has deliberately forked its AI chip line into two separate products. The TPU 8t is built purely for training massive models. The TPU 8i exists to run inference at scale. And looking at the numbers, both are pretty serious upgrades.

Why Two Chips Instead of One?
Here’s the thing — training an AI model and running that model for millions of users are fundamentally different workloads. Training requires brute-force compute across thousands of chips working in sync. Inference needs low latency and massive throughput to serve real-time requests from potentially millions of concurrent agents.
Until now, most chipmakers (including Google) tried to build one chip that handled both. That compromise meant neither workload got fully optimized silicon. Google’s calling this the “agentic era” split, and the timing isn’t accidental — when you’ve got millions of AI agents running simultaneously, the inference bottleneck becomes the most expensive problem in your data center.
TPU 8t: The Training Beast
The training chip doesn’t mess around with specs. A single superpod can network 9,600 TPU 8t chips together with a shared 2 petabytes of high-bandwidth memory. That’s up from 9,216 chips on the previous-gen Ironwood, and the memory architecture is completely redesigned.
Google claims 2.7x better price-per-performance compared to Ironwood for training workloads. The chip uses a 3D torus network topology for scaling and includes a new SparseCore accelerator that handles irregular memory access patterns — basically the messy parts of training recommendation models and other real-world AI systems.
What caught my eye was the native FP4 (four-bit floating point) support. Quantized training has been gaining serious momentum this year, and having hardware-level support means you can train larger models with smaller memory footprints without sacrificing accuracy. Google says this doubles throughput while maintaining quality.
TPU 8i: Built for the Agent Swarm
The inference chip is where things get really interesting for anyone building AI products. The TPU 8i packs three times more on-chip SRAM than Ironwood, which translates directly to larger key-value caches at inference time. If you’ve ever wondered why some chatbots feel slow on long conversations — it’s often because the KV cache doesn’t fit efficiently on the chip.
Google engineered a new Collectives Acceleration Engine specifically for autoregressive decoding and chain-of-thought processing. That’s the bread and butter of how modern LLMs generate text token-by-token. Combined with a custom Boardfly ICI network topology connecting up to 1,152 chips, the 8i delivers 80% better performance-per-dollar at low-latency targets compared to Ironwood.
Network hops are cut by 50%, which matters enormously when you’re serving millions of concurrent requests. Every nanosecond of latency at that scale translates to real money.
Performance Gains Over Ironwood (Previous Gen)
TPU 8t Price/Perf: 2.7x
TPU 8t Perf/Watt: 2.0x
Ironwood: 1.0x
TPU 8i Perf/$: 1.8x
TPU 8i Perf/Watt: 2.0x
Ironwood: 1.0x
Training improvements
Inference improvements
Tools Stack AI | Source: Google Cloud Next 2026
What About Nvidia?
Google notably didn’t make direct comparisons to Nvidia’s Blackwell or GB200 chips during the announcement. But the subtext is impossible to miss. By splitting training and inference into dedicated silicon, Google can potentially undercut Nvidia’s one-size-fits-all GPU approach on both price and power efficiency.
The energy angle is significant. Both chips deliver 2x better performance-per-watt over the previous generation. With data centers increasingly constrained by power availability and cooling costs, that’s not just an engineering flex — it’s a competitive weapon. Nvidia’s latest GPUs are notorious power hogs, and Google is positioning itself as the greener, cheaper alternative.
That said, Google is still keeping Nvidia GPUs available in Google Cloud. They’re not burning that bridge. The strategy seems to be: use TPUs for Google’s own services and offer them as an alternative for cloud customers, while still hosting Nvidia hardware for customers who want it.
What This Means for Developers and Businesses

Both chips will be “generally available later this year” through Google Cloud. If the performance claims hold up in production, here’s what changes practically:
For teams training models: The TPU 8t’s 2.7x price-performance improvement means your training budget goes almost three times further. A model that cost $1 million to train on Ironwood would theoretically cost around $370K on TPU 8t hardware.
For teams running inference: The TPU 8i’s 80% performance-per-dollar improvement at low latency makes it substantially cheaper to serve AI-powered features to users. If you’re building products with AI agents, chatbots, or real-time generation, this is the chip Google designed specifically for your workload.
For the broader industry: Google just made the clearest bet yet that training and inference are diverging enough to need their own silicon. If this approach works, expect AMD and even Nvidia to follow with similar product splits within 18 months.
My Quick Take
This is arguably the most strategically important chip announcement from Google since the original TPU. Not because the specs are new on their own, but because the philosophy behind the split signals where the entire industry is going. We’re past the era of general-purpose AI chips. The workloads have gotten specific enough — and expensive enough — to justify specialized hardware for each phase of the AI pipeline.
Whether these chips actually threaten Nvidia’s dominance remains to be seen. Nvidia has an enormous software ecosystem advantage with CUDA. But Google’s playing the long game here, and with both performance and efficiency metrics showing serious gains, the TPU 8t and 8i deserve attention from anyone making infrastructure decisions in 2026.
FAQ
When will TPU 8t and TPU 8i be available?
Google says both chips will be generally available through Google Cloud “later this year” in 2026. No specific date has been announced yet, but expect availability in H2 2026 based on Google’s typical rollout timeline.
Do TPU 8t and 8i replace the Ironwood TPU?
Effectively, yes. The TPU 8t replaces Ironwood for training workloads with a 2.7x price-performance improvement, while the TPU 8i replaces it for inference with an 80% performance-per-dollar gain. Google’s Ironwood chips will likely remain available during the transition period.
Can I use Google’s new TPUs if I’m not a Google Cloud customer?
Currently, TPU access is exclusive to Google Cloud Platform. You’ll need a GCP account to provision TPU 8t or 8i instances once they launch. There’s no on-premises or third-party cloud option for Google TPUs.



