AI Food Fights in the Enterprise

This article features a conversation between Ali Ghodsi, CEO and cofounder of Databricks, and Ben Horowitz, cofounder of a16z, discussing the complexities and challenges of enterprise adoption of Generative AI.

Why is it so hard for enterprise to adopt AI?

Ben Horowitz notes that while many startups have traction in selling to developers, consumers, or small firms, few enterprises have shown significant traction with generative AI. He questions why this is the case, especially for AI infrastructure providers.

Ali Ghodsi attributes the slow adoption to several factors:

Enterprise Slowness: Enterprises move deliberately, which can be an advantage if a solution is adopted, as it's harder to displace.
Data Privacy and Security: Enterprises are increasingly aware of the immense value of their data and are hesitant to share it with third-party AI providers due to concerns about data leakage and the potential for sensitive information (like source code) to be exposed by LLMs.
Accuracy Requirements: Many enterprise use cases demand high accuracy and precision, which can be a hurdle for current AI models.
Internal "Food Fights": There's often internal competition and political maneuvering within large enterprises regarding who owns generative AI initiatives (IT, product lines, business lines), leading to slower decision-making and internal conflicts.

Ghodsi emphasizes that cracking the code to overcome these hurdles presents a significant opportunity for new companies.

Enterprise Data Wars

Horowitz probes whether enterprises' reluctance to share data with companies like OpenAI or Anthropic is justified or if they are missing out on value. Ghodsi explains that CEOs and boards now recognize generative AI as a potential competitive advantage. They realize they can leverage their unique datasets to gain an edge over rivals. This leads to a desire to build proprietary AI solutions, retaining intellectual property rather than outsourcing it to external AI providers. This internal drive creates a demand for internal AI development teams and solutions.

Big vs. Small LLMs

The discussion shifts to the strategic decision enterprises face: building their own specialized LLMs or utilizing large, general-purpose models. Ghodsi highlights that building custom LLMs from scratch is possible, especially with Databricks' acquisition of Mosaic, which specializes in scaling LLM development. However, this process is resource-intensive, requiring significant GPU power and investment.

He explains that while large models offer broad intelligence, they can be costly to train and operate (inference). For specific enterprise use cases (e.g., classifying manufacturing defects from images), a smaller, fine-tuned model can be more efficient, offering lower latency and reduced costs, while still achieving high accuracy. These specialized models might not possess the broad capabilities of larger models (like answering general knowledge questions), but they excel in their designated tasks.

Ghodsi also touches upon scaling laws, noting that increasing model parameters requires a proportional increase in data to maintain efficiency and achieve optimal performance. Without this balance, models become inefficient.

Finetuning

Ghodsi elaborates on fine-tuning, describing it as modifying an existing foundation model to excel at specific tasks. He notes that current methods often require modifying the entire model, which is costly and inefficient, especially when serving multiple specialized models. The industry is seeking more efficient fine-tuning techniques (like LoRA, prefix tuning) that allow for smaller modifications to achieve high performance, a concept he calls the "holy grail."

He envisions a future with large, intelligent foundation models that can be augmented with specialized "tuned brains" for specific tasks. However, he cautions that no one has truly mastered this yet. Databricks is experiencing high demand for specialized models, to the point where GPU scarcity limits their ability to serve all customers.

Open Source AI

The conversation addresses the debate around open-source AI models. Ghodsi argues that the release of models like LLaMA has significantly accelerated AI progress. He believes open-source will continue to thrive, driven by innovation in efficiency techniques due to GPU scarcity. However, he also acknowledges that companies developing highly advanced proprietary models may not release them.

He notes that open-source models typically lag behind proprietary ones, but exceptions like Linux show the potential for open source to disrupt. The current need for substantial GPU resources makes it difficult for academic institutions to participate, leading to a "brain drain" of talent to industry. Ghodsi predicts that as GPUs become more accessible or efficiency techniques improve, universities will play a larger role in open-source AI innovation.

Benchmarks are Bullshit

Ghodsi expresses skepticism about the validity of current AI benchmarks, comparing them to a scenario where students receive exam answers beforehand. He argues that benchmarks like MMLU, which often involve multiple-choice questions available online, can be gamed through deliberate training or accidental data exposure. This memorization doesn't necessarily translate to real-world problem-solving capabilities, such as medical diagnosis.

He advocates for more rigorous, secretive benchmarks developed by domain experts (like doctors) to better assess true AI performance. Ghodsi stresses the current necessity of a "human in the loop" for critical applications, as AI models still make mistakes and lack the nuanced understanding of humans.

Why Ali Isn't Afraid of Today's AI

Horowitz asks about the ethics and responsibilities surrounding AI, particularly the potential threat of open-source AI.

Ghodsi addresses common concerns:

Job Displacement: He argues that automation, including AI, historically leads to economic growth and job creation in the long run, citing the example of nations that embrace automation. The key is not to halt progress but to manage its societal impact.
Malicious Use: Like any technology, AI can be misused by malicious actors. This is an ongoing challenge that requires societal and regulatory solutions, not a ban on the technology itself.
Existential Risk (Superintelligence): Ghodsi believes the immediate threat of AI deciding to destroy humanity is low, primarily because training and deploying advanced AI models are currently extremely costly and difficult due to GPU scarcity and the complexity of the process. He contrasts this with the idea of a simple toaster, stating that highly intelligent AI connected to physical systems could pose a risk if uncontrolled.

However, he warns that if the cost and difficulty of training advanced models decrease significantly (e.g., training a state-of-the-art model in minutes), it could enable rapid self-improvement loops, leading to uncontrollable AI. He also points out that AI currently lacks biological self-replication capabilities, which is a key factor in human evolution and intelligence.

Ultimately, Ghodsi concludes that the current high cost and complexity of AI development act as a temporary safeguard against runaway AI scenarios. He believes the focus should be on building specialized, reliable AI applications rather than solely on the size of foundational models.