Why DeepSeek Could Change What Silicon Valley Believe About A.I.

by Pelican Press
8 minutes read

Why DeepSeek Could Change What Silicon Valley Believe About A.I.

The artificial intelligence breakthrough that is sending shock waves through stock markets, spooking Silicon Valley giants, and generating breathless takes about the end of America’s technological dominance arrived with an unassuming, wonky title: “Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.”

The 22-page paper, released last week by a scrappy Chinese A.I. start-up called DeepSeek, didn’t immediately set off alarm bells. It took a few days for researchers to digest the paper’s claims, and the implications of what it described. The company had created a new A.I. model called DeepSeek-R1, built by a team of researchers who claimed to have used a modest number of second-rate A.I. chips to match the performance of leading American A.I. models at a fraction of the cost.

DeepSeek said it had done this by using clever engineering to substitute for raw computing horsepower. And it had done it in China, a country many experts thought was in a distant second place in the global A.I. race.

Some industry watchers initially reacted to DeepSeek’s breakthrough with disbelief. Surely, they thought, DeepSeek had cheated to achieve R1’s results, or fudged their numbers to make their model look more impressive than it was. Maybe the Chinese government was promoting propaganda to undermine the narrative of American A.I. dominance. Maybe DeepSeek was hiding a stash of illicit Nvidia H100 chips, banned under U.S. export controls, and lying about it. Maybe R1 was actually just a clever re-skinning of American A.I. models that didn’t represent much in the way of real progress.

Eventually, as more people dug into the details of DeepSeek-R1 — which, unlike most leading A.I. models, was released as open-source software, allowing outsiders to examine its inner workings more closely — their skepticism morphed into worry.

And late last week, when lots of Americans started to use DeepSeek’s models for themselves, and the DeepSeek mobile app hit the number one spot on Apple’s App Store, it tipped into full-blown panic.

I’m skeptical of the most dramatic takes I’ve seen over the past few days — such as the claim, made by one Silicon Valley investor, that DeepSeek is an elaborate plot by the Chinese government to destroy the American tech industry. I also think it’s plausible that the company’s shoestring budget has been badly exaggerated, or that it piggybacked on advancements made by American A.I. firms in ways it hasn’t disclosed.

But I do think that DeepSeek’s R1 breakthrough was real. Based on conversations I’ve had with industry insiders, and a week’s worth of experts poking around and testing the paper’s findings for themselves, it appears to be throwing into question several major assumptions the American tech industry has been making.

The first is the assumption that in order to build cutting-edge A.I. models, you need to spend huge amounts of money on powerful chips and data centers.

It’s hard to overstate how foundational this dogma has become. Companies like Microsoft, Meta and Google have already spent tens of billions of dollars building out the infrastructure they thought was needed to build and run next-generation A.I. models. They plan to spend tens of billions more — or, in the case of OpenAI, as much as $500 billion through a joint venture with Oracle and SoftBank that was announced last week.

DeepSeek appears to have spent a small fraction of that building R1. We don’t know the exact cost, and there are plenty of caveats to make about the figures they’ve released so far. It’s almost certainly higher than $5.5 million, the number the company claims it spent training a previous model.

But even if R1 cost 10 times more to train than DeepSeek claims, and even if you factor in other costs they may have excluded, like engineer salaries or the costs of doing basic research, it would still be orders of magnitude less than what American A.I. companies are spending to develop their most capable models.

The obvious conclusion to draw is not that American tech giants are wasting their money. It’s still expensive to run powerful A.I. models once they’re trained, and there are reasons to think that spending hundreds of billions of dollars will still make sense for companies like OpenAI and Google, which can afford to pay dearly to stay at the head of the pack.

But DeepSeek’s breakthrough on cost challenges the “bigger is better” narrative that has driven the A.I. arms race in recent years by showing that relatively small models, when trained properly, can match or exceed the performance of much bigger models.

That, in turn, means that A.I. companies may be able to achieve very powerful capabilities with far less investment than previously thought. And it suggests that we may soon see a flood of investment into smaller A.I. start-ups, and much more competition for the giants of Silicon Valley. (Which, because of the enormous costs of training their models, have mostly been competing with each other until now.)

There are other, more technical reasons that everyone in Silicon Valley is paying attention to DeepSeek. In the research paper, the company reveals some details about how R1 was actually built, which include some cutting-edge techniques in model distillation. (Basically, that means compressing big A.I. models down into smaller ones, making them cheaper to run without losing much in the way of performance.)

DeepSeek also included details that suggested that it had not been as hard as previously thought to convert a “vanilla” A.I. language model into a more sophisticated reasoning model, by applying a technique known as reinforcement learning on top of it. (Don’t worry if these terms go over your head — what matters is that methods for improving A.I. systems that were previously closely guarded by American tech companies are now out there on the web, free for anyone to take and replicate.)

Even if the stock prices of American tech giants recover in the coming days, the success of DeepSeek raises important questions about their long-term A.I. strategies. If a Chinese company is able to build cheap, open-source models that match the performance of expensive American models, why would anyone pay for ours? And if you’re Meta — the only U.S. tech giant that releases its models as free open-source software — what prevents DeepSeek or another start-up from simply taking your models, which you spent billions of dollars on, and distilling them into smaller, cheaper models that they can offer for pennies?

DeepSeek’s breakthrough also undercuts some of the geopolitical assumptions many American experts had been making about China’s position in the A.I. race.

First, it challenges the narrative that China is meaningfully behind the frontier, when it comes to building powerful A.I. models. For years, many A.I. experts (and the policymakers who listen to them) have assumed that the United States had a lead of at least several years, and that copying the advancements made by American tech firms was prohibitively hard for Chinese companies to do quickly.

But DeepSeek’s results show that China has advanced A.I. capabilities that can match or exceed models from OpenAI and other American A.I. companies, and that breakthroughs made by U.S. firms may be trivially easy for Chinese firms — or, at least, one Chinese firm — to replicate in a matter of weeks.

(The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement of news content related to A.I. systems. OpenAI and Microsoft have denied those claims.)

The results also raise questions about whether the steps the U.S. government has been taking to limit the spread of powerful A.I. systems to our adversaries — namely, the export controls used to prevent powerful A.I. chips from falling into China’s hands — are working as designed, or whether those regulations need to adapt to take into account new, more efficient ways of training models.

And, of course, there are concerns about what it would mean for privacy and censorship if China took the lead in building powerful A.I. systems used by millions of Americans. Users of DeepSeek’s models have noticed that they routinely refuse to respond to questions about sensitive topics inside China, such as the Tiananmen Square massacre and Uyghur detention camps. If other developers build on top of DeepSeek’s models, as is common with open-source software, those censorship measures may get embedded across the industry.

Privacy experts have also raised concerns about the fact that data shared with DeepSeek models may be accessible by the Chinese government. If you were worried about TikTok being used as an instrument of surveillance and propaganda, the rise of DeepSeek should worry you, too.

I’m still not sure what the full impact of DeepSeek’s breakthrough will be, or whether we will consider the release of R1 a “Sputnik moment” for the A.I. industry, as some have claimed.

But it seems wise to take seriously the possibility that we are in a new era of A.I. brinkmanship now — that the biggest and richest American tech companies may no longer win by default, and that containing the spread of increasingly powerful A.I. systems may be harder than we thought.

At the very least, DeepSeek has shown that the A.I. arms race is truly on, and that after several years of dizzying progress, there are still more surprises left in store.



Source link

#DeepSeek #Change #Silicon #Valley #A.I

You may also like