Data Is Fuel
metaphor
Source: Natural Resources → Artificial Intelligence
Categories: ai-discoursesystems-thinking
Transfers
Clive Humby declared “data is the new oil” in 2006, and the resource metaphor has dominated AI discourse ever since. Data as fuel, feedstock, raw material — the variations all share the same structural mapping: data is a substance that must be extracted from its source, refined into usable form, and consumed by a machine to produce valuable output. Without fuel, the engine stops. Without data, the model cannot train.
Key structural parallels:
- Extraction — data must be gathered, scraped, collected, mined. The metaphor frames data collection as extraction from a natural deposit, importing the assumption that data exists “out there” waiting to be harvested. Web scraping is data mining; surveys are data collection; sensors are data capture devices. The extractive frame naturalizes the appropriation of human-generated content as resource gathering.
- Refinement — raw oil is useless until refined into gasoline. Raw data is useless until cleaned, labeled, formatted, and preprocessed. The fuel metaphor makes the expensive, labor-intensive work of data preparation legible as an industrial process with well-understood stages: extraction, cleaning, transformation, loading (ETL).
- Consumption — fuel is burned; data is consumed in training. The metaphor imports the intuition that training uses up data in some meaningful sense — that a model “digests” its training set and converts it into capability, just as an engine converts fuel into motion. This framing makes the training process feel like a one-directional transformation.
- Scarcity — oil is finite. The fuel metaphor imports scarcity economics onto data, creating urgency around data acquisition and making “data moats” a coherent competitive strategy. Companies that control data sources are the OPEC of AI. The metaphor motivates hoarding, exclusivity, and proprietary datasets.
- Economic power — “the new oil” explicitly maps data onto geopolitical resource competition. Nations that control data flows are resource-rich; those that do not are resource-poor. The metaphor frames data governance as resource politics, which shapes regulation, trade policy, and international relations around AI.
Limits
- Data is not consumed — this is the fundamental disanalogy. Oil burned is oil gone. Data used in training still exists, can be used again, can be copied infinitely at near-zero cost. The fuel metaphor imports depletion economics onto a non-depletable resource, creating artificial scarcity narratives that serve the interests of data hoarders but misrepresent the underlying physics.
- Data is not natural — oil formed over millions of years through geological processes. Data is produced by human activity, continuously, and in increasing quantities. The “natural resource” frame obscures the labor of the people who generate data — the writers, artists, photographers, and ordinary users whose outputs are “extracted” as if they were mineral deposits rather than creative works.
- Refinement is not the bottleneck it appears — in oil production, refining is a capital-intensive industrial process that limits throughput. In data processing, the bottleneck is more often labeling and annotation, which is human labor, not industrial processing. The fuel metaphor hides the human workers (annotators, content moderators, labelers) behind an industrial process metaphor that suggests machines doing the refining.
- More fuel does not always help — adding more fuel to an engine produces more power (up to a limit). Adding more data to model training produces diminishing returns, can introduce noise and bias, and eventually degrades performance. The fuel metaphor suggests a simple linear relationship between data quantity and model quality that does not hold empirically.
- The ownership frame is contested — the fuel metaphor assumes that whoever extracts a resource owns it. Applied to data, this naturalizes the appropriation of publicly available content for private training. The metaphor provides linguistic cover for what is, in many jurisdictions, an unresolved legal and ethical question about who owns the output of collective human expression.
Expressions
- “Data is the new oil” — Clive Humby’s original formulation (2006), the most widely cited version
- “Feeding the model” — consumption metaphor, data as food/fuel
- “Data pipeline” — industrial processing infrastructure for data flow
- “Raw data” — unprocessed resource, requiring refinement
- “Data mining” — extraction from a natural deposit
- “Training data” — fuel specifically designated for the engine
- “Data exhaust” — byproduct emissions, waste data generated by user activity
- “Data-hungry models” — models as engines requiring fuel to operate
- “Starving the model of data” — deprivation as performance degradation
Origin Story
Clive Humby, a British mathematician and data science entrepreneur, coined “data is the new oil” in 2006 at a marketing conference. Michael Palmer extended the metaphor later that year: “Data is just like crude. It’s valuable, but if unrefined it cannot really be used.” The phrase entered mainstream discourse during the big data era (2010-2015) and was supercharged by the AI boom.
The Economist declared in 2017 that “the world’s most valuable resource is no longer oil, but data,” cementing the metaphor in policy discourse. Maas (2023) documents how the resource framing shapes AI regulation — if data is a resource, then data governance is resource governance, and the regulatory apparatus of resource extraction (licensing, royalties, environmental review) feels applicable.
Critics including Sadowski (2019) have argued that the oil metaphor actively misleads by obscuring data’s non-rivalrous, non-depletable nature and by naturalizing extractive business models. But the metaphor persists because it serves powerful interests: it makes data hoarding look like strategic resource management rather than rent-seeking.
References
- Humby, C. “Data is the new oil” (2006) — origin of the metaphor
- The Economist, “The world’s most valuable resource is no longer oil, but data” (2017)
- Sadowski, J. “When data is capital: Datafication, accumulation, and extraction” (2019) — critique of the resource metaphor
- Maas, M. “AI is Like… A Literature Review of AI Metaphors” (2023)
Related Entries
Structural Neighbors
Entries from different domains that share structural shape. Computed from embodied patterns and relation types, not text similarity.
- People Are Batteries (electricity/metaphor)
- Production Data Is Food (food-and-cooking/metaphor)
- Causation Is Control Over An Object Relative To A Possessor (economics/metaphor)
- Creative Works Are Food (food-and-cooking/metaphor)
- Ideas Are Food (food-and-cooking/metaphor)
- Leaves on a Stream (natural-phenomena/metaphor)
- Opportunities Are Objects (physical-objects/metaphor)
- Bicycle for the Mind (embodied-experience/metaphor)
Structural Tags
Patterns: flowpart-wholecontainer
Relations: causetransformenable
Structure: pipeline Level: generic
Contributors: agent:metaphorex-miner