Ai2's Tulu 3 405B AI Model Outperforms Competitors, Emphasizes US Open-Source Leadership

Ai2's Tulu 3 405B: A New American AI Champion

Move over, DeepSeek. There's a new AI champion in town, and it's American. On Thursday, Ai2, a nonprofit AI research institute based in Seattle, released a model that it claims outperforms DeepSeek V3, one of Chinese AI company DeepSeek's leading systems. Ai2's model, called Tulu 3 405B, also beats OpenAI's GPT-4o on certain AI benchmarks, according to Ai2's internal testing. Moreover, unlike GPT-4o and even DeepSeek V3, Tulu 3 405B is open source, meaning all components necessary to replicate it from scratch are freely available and permissively licensed.

Ai2's Vision for US AI Leadership

A spokesperson for Ai2 told TechCrunch that the lab believes Tulu 3 405B "underscores the U.S.' potential to lead the global development of best-in-class generative AI models." This milestone is seen as a key moment for the future of open AI, reinforcing the U.S.' position as a leader in competitive, open-source models. Ai2 aims to showcase that the U.S. can lead with competitive, open-source AI independent of tech giants.

Understanding Tulu 3 405B's Scale and Training

Tulu 3 405B is a substantial model, boasting 405 billion parameters. Training this model required 256 GPUs running in parallel. Parameters in AI models generally correlate with problem-solving skills, with more parameters typically leading to better performance.

Ai2 Tulu3-405B Benchmark Results

Key Training Techniques: Reinforcement Learning with Verifiable Rewards (RLVR)

Ai2 attributes the competitive performance of Tulu 3 405B to a technique called reinforcement learning with verifiable rewards (RLVR). This method trains models on tasks with "verifiable" outcomes, such as solving math problems and following instructions.

Benchmark Performance: Outperforming the Competition

Ai2 claims that Tulu 3 405B excels on popular benchmarks:

PopQA: On this benchmark, which consists of 14,000 specialized knowledge questions sourced from Wikipedia, Tulu 3 405B surpassed DeepSeek V3, GPT-4o, and Meta's Llama 3.1 405B model.
GSM8K: Tulu 3 405B achieved the highest performance among its peers on this test, which features grade-school level math word problems.

Accessibility and Availability

Tulu 3 405B is accessible for testing via Ai2's chatbot web app. The code required to train the model is available on GitHub and the AI development platform Hugging Face.