Ai2's Tulu 3 405B AI Model Outperforms Competitors, Emphasizes US Open-Source Leadership

Ai2's Tulu 3 405B: A New American AI Champion
Move over, DeepSeek. There's a new AI champion in town, and it's American. On Thursday, Ai2, a nonprofit AI research institute based in Seattle, released a model that it claims outperforms DeepSeek V3, one of Chinese AI company DeepSeek's leading systems. Ai2's model, called Tulu 3 405B, also beats OpenAI's GPT-4o on certain AI benchmarks, according to Ai2's internal testing. Moreover, unlike GPT-4o and even DeepSeek V3, Tulu 3 405B is open source, meaning all components necessary to replicate it from scratch are freely available and permissively licensed.
Ai2's Vision for US AI Leadership
A spokesperson for Ai2 told TechCrunch that the lab believes Tulu 3 405B "underscores the U.S.' potential to lead the global development of best-in-class generative AI models." This milestone is seen as a key moment for the future of open AI, reinforcing the U.S.' position as a leader in competitive, open-source models. Ai2 aims to showcase that the U.S. can lead with competitive, open-source AI independent of tech giants.
Understanding Tulu 3 405B's Scale and Training
Tulu 3 405B is a substantial model, boasting 405 billion parameters. Training this model required 256 GPUs running in parallel. Parameters in AI models generally correlate with problem-solving skills, with more parameters typically leading to better performance.
Key Training Techniques: Reinforcement Learning with Verifiable Rewards (RLVR)
Ai2 attributes the competitive performance of Tulu 3 405B to a technique called reinforcement learning with verifiable rewards (RLVR). This method trains models on tasks with "verifiable" outcomes, such as solving math problems and following instructions.
Benchmark Performance: Outperforming the Competition
Ai2 claims that Tulu 3 405B excels on popular benchmarks:
- PopQA: On this benchmark, which consists of 14,000 specialized knowledge questions sourced from Wikipedia, Tulu 3 405B surpassed DeepSeek V3, GPT-4o, and Meta's Llama 3.1 405B model.
- GSM8K: Tulu 3 405B achieved the highest performance among its peers on this test, which features grade-school level math word problems.
Accessibility and Availability
Tulu 3 405B is accessible for testing via Ai2's chatbot web app. The code required to train the model is available on GitHub and the AI development platform Hugging Face.
Related Topics and Further Reading
This article touches upon several key areas in AI development:
- AI Benchmarks: The importance of standardized tests for evaluating AI model performance.
- Open Source AI: The benefits and implications of making AI models and their training code publicly available.
- Generative AI: The advancements in creating AI models capable of generating human-like content.
- US AI Leadership: The ongoing discussion about the United States' role in the global AI landscape.
TechCrunch has an AI-focused newsletter. Sign up here to get it in your inbox every Wednesday.
Topics:
AI AI AI2 DeepSeek v3 Generative AI open source open source ai Tulu3-405B
Original article available at: https://techcrunch.com/2025/01/30/ai2-says-its-new-ai-model-beats-one-of-deepseeks-best/