Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4 — Mark Zuckerberg says that Llama 4 is being trained on a cluster “bigger than anything that I’ve seen”

by Pelican Press
21 views 4 minutes read

Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4 — Mark Zuckerberg says that Llama 4 is being trained on a cluster “bigger than anything that I’ve seen”

Mark Zuckerberg said on a Meta earnings call earlier this week that the company is training Llama 4 models “on a cluster that is bigger than 100,000 H100 AI GPUs, or bigger than anything that I’ve seen reported for what others are doing.” While the Facebook founder didn’t give any details on what Llama 4 could do, Wired quoted Zuckerberg referring to Llama 4 as having “new modalities,” “stronger reasoning,” and “much faster.” This is a crucial development as Meta competes against other tech giants like Microsoft, Google, and Musk’s xAI to develop the next generation of AI LLMs.

Meta isn’t the first company to have an AI training cluster with 100,000 Nvidia H100 GPUs. Elon Musk fired up a similarly sized cluster in late July, calling it a ‘Gigafactory of Compute’ with plans to double its size to 200,000 AI GPUs. However, Meta stated earlier this year that it expects to have over half a million H100-equivalent AI GPUs by the end of 2024, so it likely already has a significant number of AI GPUs running for training Llama 4.

Meta’s Llama 4 is taking a unique approach to developing AI, as it releases its Llama models entirely for free, allowing other researchers, companies, and organizations to build upon it. This differs from other models like OpenAI’s GPT-4o and Google’s Gemini, which are only accessible via an API. However, the company still places limitations on Llama’s license, like restricting its commercial use and not offering any information on how it was trained. Nevertheless, its “open source” nature could help it dominate the future of AI — we’ve seen this with Chinese AI models built off open-source code that could match GPT-4o and Llama-3 in benchmark tests.

Power consumption concerns

All this computing power results in a massive power demand, especially as a single modern AI GPU could use up to 3.7MWh of power annually. That means a 100,000 AI GPU cluster would use at least 370GWh annually — enough to power over 34 million average American households. This raises concerns about how these companies could find such massive supplies, especially as bringing new power sources online takes time. After all, even Zuckerberg himself said that power constraints will limit AI growth.

For example, Elon Musk used several large mobile power generators to power his 100,000-strong compute in Memphis. Google has been slipping behind its carbon targets, increasing its greenhouse gas emissions by 48% since 2019. Even the former Google CEO suggested we should drop our climate goals, let AI companies go full tilt, and then use the AI technologies we’ve developed to solve the climate crisis.

However, Meta executives dodged the question when an analyst asked them how the company was able to power such a massive computing cluster. On the other hand, Meta’s AI competitors, like Microsoft, Google, Oracle, and Amazon, are jumping on the nuclear bandwagon. They’re either investing in small modular reactors or restarting old nuclear plants to ensure they will have enough electricity to power their future developments.

While these will take time to develop and deploy, giving AI data centers their small nuclear plants would help reduce the burden of these power-hungry clusters on the national power grid.



Source link

#Meta #Nvidia #H100 #GPUs #train #Llama4 #Mark #Zuckerberg #Llama #trained #cluster #bigger #Ive

You may also like