Home News ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet

by Christian Feb 23,2025

OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, may have been trained using OpenAI's data. This revelation, coupled with DeepSeek's rapid rise in popularity, triggered a significant market downturn for major AI players. Nvidia, a key GPU supplier, experienced its largest-ever single-day stock loss, while Microsoft, Meta, Alphabet, and Dell also saw substantial drops.

DeepSeek's R1 model, based on the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) and computational needs compared to Western models. While this claim is debated, it has fueled concerns about the massive investments Western companies are making in AI.

OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by using a technique called "distillation" – extracting data from larger models to train smaller ones. OpenAI confirms that Chinese companies, and others, continually attempt to replicate leading U.S. AI models. They are actively implementing countermeasures and collaborating with the U.S. government to protect their intellectual property.

David Sacks, President Trump's AI czar, supports the claim that DeepSeek used OpenAI's models, highlighting the need for preventative measures by leading AI companies.

This situation highlights the irony of OpenAI's position, given its own past accusations of using copyrighted material without permission to train ChatGPT. OpenAI previously argued to the UK's House of Lords that training large language models without copyrighted material is impossible. This stance is further complicated by existing lawsuits from the New York Times and 17 authors alleging copyright infringement. OpenAI maintains that its training practices constitute "fair use." The legal battles surrounding AI training data and copyright continue to unfold, with the August 2023 ruling that AI-generated art cannot be copyrighted adding another layer of complexity.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.