Microsoft Probes DeepSeek for Alleged OpenAI Data Misuse in AI Model Training

Microsoft Investigates DeepSeek for Potential OpenAI Data Misuse
Microsoft, a significant investor in OpenAI, is reportedly investigating Chinese AI company DeepSeek for allegedly violating OpenAI's terms of service. The core of the investigation centers on whether DeepSeek used OpenAI's application programming interface (API) to train its recently announced R1 model. This probe follows public statements by White House AI and crypto czar, David Sacks, who suggested that DeepSeek might have illicitly obtained intellectual property from the United States.
Allegations of Data Theft and Model Training
Sacks indicated that there is substantial evidence suggesting DeepSeek "distilled the knowledge" from OpenAI's models. This practice, known as distillation, involves a teacher-student dynamic where one model learns from another. Such a method could explain DeepSeek's reported efficiency in training AI models quickly and cost-effectively, using only $5.6 million and less powerful Nvidia H800 chips. However, it raises serious questions about the legality and ethics of their development process.
Reverse Engineering vs. Data Exhumation
Prior to this investigation, industry experts had speculated that DeepSeek might have employed reverse engineering to train its models. Reverse engineering, which involves analyzing models to identify patterns and biases, is a common and legal practice among open-source developers. However, the current allegations suggest a more direct and potentially unlawful method: exhuming code directly from OpenAI's API. Security researchers, reportedly working with Microsoft, believe DeepSeek may have acquired a significant amount of code from OpenAI's API in the fall of 2024, with Microsoft being informed of the breach at that time.
DeepSeek's Open-Source Approach and OpenAI's Stance
DeepSeek has been recognized for its open-source AI applications, allowing anyone to develop on its platform. This open-source nature, coupled with its performance rivaling top AI brands like OpenAI and Google Gemini, has generated considerable excitement. In contrast, OpenAI's services are not open-source, although its API is accessible. OpenAI's terms of service explicitly prohibit using output from its API to train other AI models. An OpenAI spokesperson acknowledged that companies attempting to copy models from well-known U.S. companies is a common occurrence, regardless of regulations. The company is actively implementing countermeasures to protect its intellectual property and is collaborating with the U.S. government to safeguard its most capable models from adversaries and competitors.
Industry Context and Future Implications
The AI industry is highly competitive, with companies constantly seeking advantages in model development and efficiency. DeepSeek's rapid progress and cost-effectiveness have positioned it as a notable player, but the current investigation could have significant implications for its operations and the broader AI landscape. The company may need to provide evidence of its lawful development practices to address these concerns.
Related Developments and News
- Microsoft's AI Initiatives: Microsoft continues to invest heavily in AI, preparing its infrastructure for future OpenAI models like GPT-4.5 and GPT-5. The company is also exploring AI for its own products, such as Copilot.
- OpenAI's Growth: OpenAI has reported a substantial increase in weekly active users, reaching 400 million, highlighting its dominant position in the generative AI market.
- AI Competition: The industry sees constant innovation and competition, with companies like Alibaba introducing rivals to DeepSeek's models. The focus on open-source versus proprietary models remains a key differentiator.
- AI in Various Sectors: Beyond core AI development, AI is impacting various sectors, from NASA's use of VR for training to Dyson's innovative growing systems and the use of AI in sports officiating.
Author Information
Fionna Agomuoh, a Computing Writer at Digital Trends, covers a wide range of topics in the computing space, including AI, software, and hardware. Her work aims to keep readers informed about the latest technological advancements and trends.
Topics Covered:
- Artificial Intelligence
- Microsoft
- Tech News
Editors' Recommendations
- Microsoft's use of generative AI in advertising.
- Microsoft's plans for AI model development to control Copilot.
- OpenAI's strategy to offer Deep Research for free on ChatGPT.
- Alibaba's new AI model as a rival to DeepSeek.
- Microsoft Copilot for Mac, offering an alternative to Siri.
Related Articles:
- Microsoft prepares for major GPT-5 updates from OpenAI: Discusses Microsoft's infrastructure preparations for OpenAI's upcoming language models.
- DeepSeek invites users behind the curtain of its open source AI code: Highlights DeepSeek's move towards greater transparency in its AI model development.
- With 400 million users, OpenAI maintains lead in competitive AI landscape: Reports on OpenAI's user growth and its position in the AI market.
About Digital Trends:
Digital Trends provides readers with the latest tech news, product reviews, and insightful articles. The platform covers a broad spectrum of technology, including computing, mobile, gaming, audio/video, smart home, entertainment, and automotive.
Follow Digital Trends:
- YouTube
- TikTok
Key Categories:
- Mobile
- Computing
- Gaming
- Audio / Video
- Smart Home
- Entertainment
- Automotive
- Space
- Streaming Guides
- Original Shows
- How-To Guides
Company Information:
- About Us
- Contact Us
- Editorial Guidelines
- Logo & Accolade Licensing
- Subscribe to our Newsletter
- Branded Content
- Digital Trends Wallpapers
- Digital Trends in Spanish
Digital Trends Media Group:
- Careers
- Work With Us
- Diversity & Inclusion
- Terms of Use
- Privacy Policy
- Press Room
- Manage Privacy Options
- Sitemap
©2025 Digital Trends Media Group. All rights reserved.
Original article available at: https://www.digitaltrends.com/computing/microsoft-investigates-the-legality-of-deepseeks-r1-model/