What Does 'Open Source AI' Mean, Anyway?

What Does 'Open Source AI' Mean, Anyway?
The article delves into the complex and often debated definition of "open source AI," exploring the challenges of applying traditional open-source software principles to artificial intelligence.
The Core Debate
The central issue is the lack of a universally agreed-upon definition for "open source AI." While the term is widely used, particularly in relation to Meta's Llama models, its application to AI is contentious. The Open Source Initiative (OSI), the steward of the Open Source Definition (OSD) for software, is actively working to establish a clear definition for AI.
AI vs. Software Code
Joseph Jacks, founder of OSS Capital, argues that "there is no such thing as open-source AI" because the concept was invented specifically for software source code. He highlights key differences:
- Neural Network Weights (NNWs): Unlike software source code, NNWs are unreadable by humans and not debuggable.
- Fundamental Rights: The core rights associated with open-source software do not directly translate to NNWs.
This has led to the proposal of an "open weights" definition, which Maffulli, OSI's executive director, agrees is a more accurate descriptor for many AI models.
Meta's Llama and the OSI
Meta's involvement with the OSI is significant, especially concerning its Llama models. While Meta has promoted Llama as "open source," it imposes restrictions, such as requiring a special license for developers with over 700 million monthly users. This has led to a semantic shift from "open source" to "openly available" or "openly accessible" in Meta's recent communications, though the "open source" label still appears in some contexts.
OSI's Role and Funding
The OSI, a non-profit organization founded in 1998, has been instrumental in defining open-source software for over two decades. Its funding comes from various sources, including major tech companies like Amazon, Google, Microsoft, and Meta. This reliance on corporate funding has raised questions about potential conflicts of interest, particularly given Meta's use of the "open source" branding for its Llama models.
To address these concerns and diversify its funding, the OSI has secured a $250,000 grant from the Sloan Foundation. This funding aims to support its global efforts to develop the Open Source AI Definition.
The Draft Open Source AI Definition
The OSI is working on a draft definition (currently v0.0.8) that includes three core parts:
- Preamble: Outlines the document's purpose.
- Open Source AI Definition: The core principles.
- Checklist: Components required for an open-source compliant AI system.
Key freedoms granted by the definition include the freedom to use the system for any purpose, study its workings, and modify and share it.
Data and Reproducibility Challenges
A significant challenge in defining open-source AI is the role of data. The OSI's draft considers the availability of training datasets, but emphasizes the importance of knowing the data's origin, labeling, de-duplication, and filtering processes, as well as access to the code used for dataset assembly.
While access to the full dataset is an "optional" component, the OSI acknowledges that it's not always practical or possible due to confidential or copyrighted information. Techniques like federated learning, differential privacy, and homomorphic encryption also complicate direct data sharing.
Key Differences from Open Source Software
The article stresses that AI and software are fundamentally different:
- Source Code vs. Models: In software, source code and binary code are different views of the same artifact. In AI, training datasets and trained models are distinct, and the training process involves statistical and random logic that makes replication inconsistent.
- Reproducibility: The OSI's definition emphasizes the need for AI systems to be easy to replicate, drawing on frameworks like the Model Openness Framework (MOF), which assesses models based on completeness and openness, including training methodologies and model parameters.
The "Stable Version"
The OSI plans to release a "stable version" of the definition, acknowledging that it will likely evolve. The core principles are expected to remain consistent, but the checklist components may change as technology advances. The definition is slated for formal approval at the All Things Open conference in October, following a global roadshow to gather diverse input.
Key Takeaways:
- The definition of "open source AI" is a complex and evolving issue.
- AI models differ significantly from traditional software, posing challenges for existing open-source frameworks.
- The OSI is leading the effort to create a standardized definition for open-source AI.
- Meta's Llama models highlight the debate, with questions about whether they truly meet open-source criteria.
- Reproducibility, data availability, and licensing are critical aspects of the definition.
Image Credits: Westend61 via Getty, Aleksei Morozov / Getty Images, OSI
Topics: AI, Enterprise, Llama, Meta, Open Source, AI Ethics, AI Licensing, AI Development, Neural Networks, AI Models, Tech Policy, Software Development, Proprietary Software, AI Debate, AI Community, AI Research, AI Innovation, AI Strategy, AI Governance.
Original article available at: https://techcrunch.com/2024/06/22/what-does-open-source-ai-mean-anyway/