Diffbot Aims To Build The Intel Of Data For Artificial Intelligence

The article "Diffbot Aims To Build The Intel Of Data For Artificial Intelligence," published on February 11, 2016, by Jonathan Shieber, details Diffbot's ambitious goal to become the foundational data provider for artificial intelligence (AI) applications. The company announced securing $10 million in new funding, led by Tencent, a major Chinese internet conglomerate. Diffbot's CEO, Mike Tung, articulates the company's vision as becoming the "Intel of data" for AI, aiming to supply structured knowledge that AI applications can readily consume and utilize.
The AI Data Challenge and Diffbot's Solution
A significant hurdle in AI development is the need for vast amounts of structured data. Large technology companies like Google, Facebook, and Baidu possess extensive internal resources, including dedicated data entry teams, to meticulously categorize and structure web content. This structured data is vital for training their sophisticated AI algorithms. However, smaller companies and startups often lack these substantial resources, creating a gap in AI development accessibility. Diffbot seeks to fill this void by providing structured data to these entities, democratizing access to essential AI training material.
Diffbot's Technological Edge and Business Model
Tung highlights that Diffbot had achieved a remarkable 90-95% accuracy with its core technology by the previous year and had also reached profitability, marking a significant milestone. A key differentiator for Diffbot is its reliance on proprietary AI algorithms, developed and refined over several years, rather than human curation. This approach contrasts sharply with many AI projects that depend on extensive human input for data classification and structuring. Diffbot's algorithmic methodology not only ensures high accuracy but also significantly reduces operational costs, particularly those related to electricity and bandwidth, while eliminating the need for costly human data entry and ongoing curation.
Building the World's Largest Knowledge Database
Diffbot's overarching objective is to construct "the world's largest database of structured knowledge." The company's unique selling proposition lies in its ability to convert the vast, unstructured expanse of the internet into organized, semantic information. Tung draws a parallel between Diffbot's mission and the foundational role of Intel in the computing industry, positioning Diffbot as essential infrastructure for the burgeoning AI ecosystem. He contrasts Diffbot's approach with that of industry giants like Google, whose Knowledge Graph and IBM's Watson, which heavily depend on human curation and rule-setting. Diffbot's self-learning AI is described metaphorically as a "Manhattan project for AI," where the AI itself acts as the researcher and developer.
From Bootstrapping to Significant Funding
Diffbot's journey began with seed funding in 2012. CEO Mike Tung, who famously dropped out of Stanford's graduate program, played a crucial role in the company's early survival and development. To support the company's growth without relying solely on external funding, Tung took on a second job, dedicating his spare hours to learning patent law and filing patent applications. This effort, earning him $20,000 per patent, provided essential rent money, allowing him to live frugally on a diet of beans and rice while focusing on the core mathematical principles behind Diffbot's software. The company also benefited from early acceleration through the StartX program, a premier Stanford initiative for graduate entrepreneurs.
Revenue Generation and Key Clients
Diffbot adopted an on-demand service model from its inception, allowing clients to submit URLs for processing and generating revenue for each server interaction. This strategy proved effective, as customers paid for the web structuring service, which simultaneously provided Diffbot with valuable data to expand its learning capabilities. Many of these early on-demand clients remain with the company, including major entities such as AOL (the parent company of TechCrunch), Yandex, eBay, Microsoft Bing, Cisco, and Adobe.
Accelerating Data Collection and Expansion
By 2015, Diffbot had achieved profitability and demonstrated high confidence in its AI's accuracy. Recognizing the need to accelerate its data collection efforts, the company began actively spidering the web. The ultimate ambition is to create a comprehensive, structured taxonomy encompassing the entire internet, comprising trillions of discrete data points. Since initiating its web spidering project in the preceding year, Diffbot's taxonomy has already grown to include over 1.2 billion objects, with an impressive addition of 10 million new objects per day. This rapid growth rate significantly outpaces that of Google's Knowledge Graph, which had recently surpassed the 1 billion object mark.
Investor Confidence and Future Outlook
Diffbot's ambitious goals and innovative approach have attracted substantial investment from prominent Silicon Valley figures and institutions. Early seed funding included contributions from notable individuals such as Sky Dayton (founder of EarthLink), Andy Bechtolsheim (co-founder of Sun Microsystems), Joi Ito (Director of MIT Media Lab), Brad Garlinghouse (CEO of YouSendIt), Maynard Webb (former eBay COO), Elad Gil (VP of Corporate Strategy at Twitter), Jonathan Heiliger (former VP at Facebook), Aaron Lee (co-founder of Redbeacon), and Montgomery Kersten (founder of VitalSigns). The latest funding round saw strategic investment from Tencent, one of China's largest internet companies, and Felicis Ventures, which has a growing portfolio in the AI sector. Additional angel investors and institutions also participated, reinforcing the market's confidence in Diffbot's vision.
Conclusion: The Foundation for AI
The article underscores the critical importance of structured data as a foundational element for the advancement of artificial intelligence. Diffbot's unique strategy, leveraging proprietary AI algorithms for automated data structuring, offers a scalable and cost-effective solution compared to traditional human-centric methods. The company's journey from its bootstrapped beginnings to securing significant funding highlights the increasing demand for robust AI infrastructure and sophisticated data management solutions in the rapidly evolving AI landscape.
Original article available at: https://techcrunch.com/2016/02/11/diffbot-aims-to-build-the-intel-of-data-for-artificial-intelligence/