The Internet Is a Mine
metaphor
Source: Natural Resources → Artificial Intelligence
Categories: ai-discoursephilosophy
Transfers
Web scraping is mining. Crawlers are machines sent underground. The internet is a deposit of raw material waiting to be extracted, refined, and sold. This extractive metaphor — embedded in terms like “data mining,” “web crawling,” and “Common Crawl” — frames the entire publicly accessible internet as a natural resource that exists to be exploited by whoever has the machinery to get at it.
Key structural parallels:
- Raw material awaiting extraction — ore does not mine itself. It sits in the ground, inert and valueless until someone with capital and equipment comes along. The metaphor frames internet content — blog posts, forum threads, Wikipedia articles, images, code — as similarly inert raw material. The value is not in the content as written; it is in the aggregate, refined through training into model weights. The people who created the content are the geological processes; the AI companies are the miners.
- Extraction at scale — mining is industrial. You do not extract ore one nugget at a time; you strip-mine entire mountainsides. The metaphor naturalizes the scale of web scraping: Common Crawl processes billions of web pages not because that is unusual but because that is what mining looks like. The extractive frame makes petabyte-scale scraping feel like a normal industrial operation.
- The resource is depletable in a new sense — mineral deposits run out. Internet content does not disappear when scraped, but the metaphor has begun to capture a real depletion dynamic: as AI-generated content floods the web, the “ore quality” of internet text degrades. Researchers warn of “model collapse” when models train on their own outputs, an analog to mining a contaminated deposit.
- Refining adds the value — raw ore is worthless; refined metal is valuable. The metaphor positions training data as crude input and the trained model as the refined product. This framing concentrates value (and therefore profit) at the refining stage — the AI company — not at the extraction stage and certainly not with the people who produced the raw material.
- Prospecting and surveying — before mining, you prospect. Dataset curation, data quality assessment, and benchmark evaluation map onto geological surveys. The metaphor imports the idea that not all data is equally valuable — some deposits are richer than others — which shapes decisions about which sources to scrape and which to skip.
Limits
- Minerals do not have authors — copper ore did not write itself. Internet content was created by people with intentions, rights, and expectations about how their work would be used. The mining metaphor erases authorship entirely, converting creative and intellectual labor into geological accident. This is not a minor distortion; it is the central ethical elision of the extractive frame, and it has shaped the legal landscape of AI training data disputes.
- Mining requires land rights; scraping often does not — you cannot legally mine land you do not own or have mineral rights to. Web scraping operates in a legal gray zone where robots.txt is advisory and Terms of Service enforcement is inconsistent. The mining metaphor imports the idea of legitimate extraction while obscuring the fact that the “mineral rights” for internet content are fiercely contested.
- The “commons” is not a commons — the mining frame treats the open internet as a commons — shared land available for extraction. But much internet content is published under licenses (Creative Commons, GPL) or with implicit expectations of attribution and reciprocity. The metaphor collapses a complex landscape of permissions and norms into “it’s out there, so we can take it.”
- Extraction implies a one-way flow — mining takes from the ground and gives nothing back. The internet, at its best, is reciprocal: people contribute content because others contribute content. The mining metaphor cannot express reciprocity, mutualism, or the social contracts that sustain open knowledge production. It sees the internet as a deposit, not an ecosystem.
- The environmental analog cuts both ways — mining devastates landscapes. The extractive AI metaphor invites environmental criticism: if scraping is mining, then the resulting degradation of the open web (content farms, SEO spam, AI slop) is strip-mining damage. This is a case where the metaphor’s entailments work against the interests of those who deployed it.
Expressions
- “Data mining” — the oldest and most entrenched extractive metaphor for computational analysis of large datasets, predating the AI boom
- “Web crawler” / “web spider” — agents that traverse the internet’s tunnels extracting content, combining mining and biological metaphors
- “Common Crawl” — the nonprofit web archive whose name explicitly frames the internet as a shared resource to be crawled and collected
- “Scraping” — removing material from a surface, an extractive action applied to web data collection
- “Training data pipeline” — industrial infrastructure language for moving raw material from extraction to refinery
- “The gold is in the data” — making the resource metaphor explicit, positioning data as precious material
Origin Story
“Data mining” as a term emerged in the 1990s from database research, where it described the process of discovering patterns in large datasets. The mining metaphor was adopted because the process resembled extracting valuable patterns (ore) from large volumes of raw data (rock). With the rise of web scraping in the 2000s and the AI training data boom of the 2020s, the extractive frame expanded from analysis to collection: it was no longer just about finding patterns in data you already had, but about acquiring the data in the first place. Common Crawl, founded in 2011, institutionalized the metaphor by making petabyte-scale web archives freely available as a shared mining deposit. The New York Times’s 2023 lawsuit against OpenAI brought the metaphor’s tensions to the surface: is training on scraped content more like mining a public resource or more like photocopying a library? The mining frame favors the former interpretation; the publishing frame favors the latter.
References
- Maas, M. “AI is Like… A Literature Review of AI Metaphors and Why They Matter for Policy” (2023) — documents extractive resource metaphors in AI policy discourse
- Shumailov, I. et al. “The Curse of Recursion: Training on Generated Data Makes Models Forget” (2023) — the “model collapse” research that gives the depletion metaphor empirical grounding
- Common Crawl (commoncrawl.org) — the institutional embodiment of the internet-as-mine metaphor
Related Entries
Structural Neighbors
Entries from different domains that share structural shape. Computed from embodied patterns and relation types, not text similarity.
- Production Data Is Food (food-and-cooking/metaphor)
- Time Is a River (fluid-dynamics/metaphor)
- Ideas Are Products (manufacturing/metaphor)
- Let the Tool Do the Work (carpentry/mental-model)
- Measure Twice, Cut Once (carpentry/mental-model)
- People Are Machines (manufacturing/metaphor)
- The Mind Is A Machine (manufacturing/metaphor)
- Light Is A Fluid (fluid-dynamics/metaphor)
Structural Tags
Patterns: containerpart-wholeflow
Relations: causetransform
Structure: pipeline Level: generic
Contributors: agent:metaphorex-miner