Exclusive: Nvidia buying AI chip startup Groq's assets for about $20 billion in largest deal on record

Exclusive: Nvidia buying AI chip startup Groq's assets for about $20 billion in largest deal on record Hello and welcome to AI with the Rome Show. 20 billion. That’s the dollar amount Nvidia has just put on the table to change the future of artificial intelligence.
For the past few years, Nvidia has been the undisputed king of AI. But in December 2025, they made a move that surprised everyone. They didn’t just buy a competitor.
They bought a whole new way of thinking. Today, we’re breaking down the Grog acquisition.
Why did Nvidia pay such a big premium for a startup and why does this signal the end of the trend and the beginning of something much faster? First, let’s look at
Inside "The Deal of the Decade"
Breaking News According to CNBC’s exclusive reports, Nvidia has agreed to pay about $20 billion in cash for the assets of chip startup Gro.
Now, hold on to that detail. This isn’t a standard merger. This is a one-quote asset acquisition and licensing deal. Nvidia is buying technology and hiring key talent, including founder Jonathan Ross.
ButGrock, the entity, will technically remain an independent company operating its own cloud business. Why such a structure? We’ll get to the legal 4Dchess later.
But first, you need to understand why Nvidia, a near-monopoly company, felt the need to spend that kind of money. To understand the
Training vs. Inference: The Hard Hat Phase
Why, we need to look at where we are in the AI timeline. For the past 3 years, we’ve been in what analysts call the hard hat phase. Imagine a construction site.
We’re building these massive skyscrapers, which are AI models like GPT4 or Claude. They require heavy machinery and brute force to build.
That’s what Nvidia GPUs are famous for. But now the building is finished. We’re moving into the inference era. This is where people actually go in and use the building.
The goal is no longer brute force. It’s getting people in and out of the elevator as quickly as possible. And that’s where Nvidia started to see trouble. Here’s the problem. Nvidia’s
The "One User" Problem (The Inference Wedge)
Chips are like big buses. If you fill each seat with 100 people and drive them around in the same place, they’re incredibly efficient. But real-world AI, like when you chat with a bot,
is like a taxi service. It’s just one person asking one question at a time. That’s called batch size one. When you use a big NvidiaGPU bus to pick up just one passenger,
it’s expensive and incredibly slow. That difference is called the inference wedge. Nvidia realized that as AI moved from training to real-time chatting, their bus architecture wasn’t efficient enough for the taxi market. That brings us to
Architecture Wars: GPU vs. LPU
Technology. On the left, we have the Nvidia GPU. It uses a probabilistic approach. Think of it like a busy highway intersection. There are traffic lights and police officers, managing the cars.
Sometimes you get it in 5 seconds, sometimes it takes 10 because of the traffic. On the right, we have Gro's invention, the LPU, the language processing unit.
This is decisive. There are no traffic lights. The software plans the entire journey before the car leaves the garage. It knows exactly where every piece of data is going to be at every second. This is not traffic. This is an asynchronous train schedule.
Why Determinism is the Secret Sauce
Why does it matter? Because Grokr removed all the hardware that manages traffic. They don't have schedulers or arbiter on the chip. This allows for fork-free execution.
Every tick of the clock is used to perform a calculation, not to decide what to do. Because of this, Grokr chips have zero latency. This means that
if a task takes 5 milliseconds today, it will take exactly 5 milliseconds next year. There's no guessing. For Nvidia, achieving this was about buying certainty in an uncertain world.
Tokenomics: 300 Tokens Per Second?
Let's talk numbers. In the world of AI, speed is measured in tokens per second. Check out this chart. On a user request, a standard NVIDIA GPU can give you 30 tokens per second.
That's read speed. Grok, they can hit 300 tokens per second. That's instant. Although Grok chips require many units at once because they don't use high-capacity memory,
but they process so fast that the cost per token drops significantly for the end user. Nvidiadid not just buy speed, they bought the ability to make AI cheaper to run. You
Real-World Use Cases: Robotics & Finance
You might be thinking, “Do I really need mychatbot to type fast?” Probably not. But the future isn’t just chatbots. Think about voice agents.
If you’re talking to an AI and it takes 2 seconds to respond, the conversation feels awkward and broken. Grock makes it as quick as a real human in under 200 milliseconds.
Think about robotics. If a robot sees a wall, it can’t wait 100 milliseconds to decide to stop. It needs a guaranteed response time. This is a market Nvidia wants to own. Nvidia isn’t just buying silicon.
No comments:
Post a Comment