Moore Threads GPUs allegedly show ‘excellent’ inference performance with DeepSeek models

by Pelican Press
2 minutes read

Moore Threads GPUs allegedly show ‘excellent’ inference performance with DeepSeek models

Moore Threads GPUs allegedly show ‘excellent’ inference performance with DeepSeek models

One of the breakthroughs of DeepSeek’s open source AI models is that they can be run locally using relatively inexpensive hardware, like the Raspberry Pi.
As it turns out, the DeepSeek V3 and R1 models can even be run on Moore Threads GPUs developed in China, reports ITHome. If true, this is a major achievement for DeepSeek, the hardware designer, and China as this potentially opens new doors for Moore Threads and reduces reliance of DeepSeek and China on Nvidia hardware. 

Moore Threads reportedly says it had successfully deployed the DeepSeek-R1-Distill-Qwen-7B distilled model on its own MTT S80 client graphics card and MTT S4000 datacenter-grade graphics cards. The company used the Ollama lightweight framework that enables users to run large language models directly on their MacOS, Linux, and Windows machines as well as an optimized inference engine to achieve ‘high’ performance. 

Although the report claims ‘excellent’ and ‘high’ performance when describing how the MTT S80 and MTT S4000 performance with the DeepSeek-R1-Distill-Qwen-7B distilled model, it does not specify actual performance numbers or make comparisons to other hardware. To that end, it is impossible to evaluate the claims. Furthermore, given the fact that the MTT S80 is barely available outside of China, it is impossible to verify them. 

Ollama supports models like Llama 3.3, DeepSeek-R1, Phi-4, Mistral, and Gemma 2, enabling their efficient local execution without relying on cloud-based services. Ollama is developed primarily for macOS and uses Metal for Apple GPU acceleration, CUDA for Nvidia GPU acceleration, and ROCm for AMD GPU acceleration. 

Officially, Ollama does not support Moore Threads’s GPUs, but the company claims that its graphics processors can execute code compiled for CUDA GPUs. The results confirmed that Moore Threads’s GPUs are indeed compatible with CUDA and suitable for AI workloads, particularly in Chinese-language applications. 

To further enhance performance, Moore Threads employed a proprietary inference engine featuring custom computational optimizations and improved memory management. This software-hardware integration significantly boosts computing performance and resource efficiency and ensures smooth deployment process and supporting future AI models, according to the report. Of course, we are talking about a distilled model, so for now we cannot really compare performance of Moore Threads GPUs with performance of solutions from AMD, Apple, or Nvidia.



Source link

#Moore #Threads #GPUs #allegedly #show #excellent #inference #performance #DeepSeek #models

You may also like