DeepSeek's new AI model appears to be one of the best 'open' challengers yet

A Chinese lab has created what appears to be one of the most powerful “open” AI models to date.

the model, Deep Search V3was developed by artificial intelligence firm DeepSeek and was released Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial ones.

DeepSeek V3 can handle a variety of workloads and text-based tasks, such as encoding, translating, and writing essays and emails from a descriptive message.

According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both “openly” available downloadable models and “closed” AI models that can only be accessed through an API. In a subset of coding competitions hosted on Codeforces, a platform for programming contests, DeepSeek outperforms models such as Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B.

DeepSeek V3 also crushes the competition in Aider Polgyglot, a test designed to measure, among other things, whether a model can successfully write new code that integrates into existing code.

DeepSeek-V3!
60 tokens/second (3 times faster than V2!)
API compatibility intact
Fully open source models and documents
Parameters of MoE 671B
37B parameters activated
Trained with high quality 14.8T tokens
Outperforms Llama 3.1 405b in almost all benchmarks pic.twitter.com/jVwJU07dqf
— Gordito (@kimmonismus) December 26, 2024

DeepSeek claims that DeepSeek V3 was trained on a data set of 14.8 billion tokens. In data science, tokens are used to represent bits of raw data; 1 million tokens are equivalent to about 750,000 words.

It’s not just the training set that is huge. DeepSeek V3 is huge in size: 685 billion parameters. (Parameters are the models of internal variables that are used to make predictions or decisions). That’s about 1.6 times the size of Llama 3.1 405B, which has 405 billion parameters.

DeepSeek (Chinese AI co) makes it all look easy today with an open launch of a frontier-level LLM trained on a joke budget (2048 GPUs for 2 months, $6 million).
For reference, this level of capacity is assumed to require clusters of GPUs close to 16K, which are… https://t.co/EW7q2pQ94B
—Andrej Karpathy (@karpathy) December 26, 2024

Parameter count often (but not always) correlates with skill; Models with more parameters tend to outperform models with fewer parameters. But large models also require more robust hardware to operate. An unoptimized version of DeepSeek V3 would need a bank of high-end GPUs to answer questions at reasonable speeds.

While not the most practical model, DeepSeek V3 is an achievement in some ways. DeepSeek was able to train the model using a data center Nvidia H800 GPU in just two months, a GPU that Chinese companies recently restricted by the US Department of Commerce for procurement. The company also claims that it only spent $5.5 million to train DeepSeek V3, a fraction of the development cost of models like OpenAI’s GPT-4.

The downside is that the model’s political views are a bit filtered. Ask DeepSeek V3 about Tiananmen Square, for example, and it won’t answer.

DeepSeek, being a Chinese company, is subject to comparative evaluation by China’s internet regulator to ensure its models’ responses “incorporate core socialist values.” Many Chinese AI systems refuse to respond to issues that could draw the ire of regulators, such as speculation about Xi Jinping regime.

DeepSeek, which recently introduced DeepSeek-R1, a response to OpenAI’s o1 “reasoning” model, is a curious organization. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses artificial intelligence to inform its trading decisions.

DeepSeek’s models have forced competitors such as ByteDance, Baidu and Alibaba to reduce the usage prices of some of their models and make others completely free.

High-Flyer builds its own server clusters for model training, one of the most recent reportedly It has 10,000 Nvidia A100 GPUs and costs 1 billion yen (~$138 million). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “super-intelligent” AI through its organization DeepSeek.

in a interview Earlier this year, Liang described open source as a “cultural act” and characterized closed-source AI like OpenAI as a “temporary” moat. “Even OpenAI’s closed-source approach hasn’t stopped others from catching up,” he noted.

Indeed.

DeepSeek’s new AI model appears to be one of the best ‘open’ challengers yet– BC

Leave a Comment Cancel Reply