Skip to content
Longterm Wiki
Back

More Articles Are Now Created by AI Than Humans (Graphite Analysis, 2024)

web

Relevant to AI deployment and societal impact discussions; demonstrates rapid scaling of AI-generated content online, though methodological limitations around AI detection accuracy should be noted by wiki users.

Metadata

Importance: 35/100blog postanalysis

Summary

Graphite analyzed 65,000 CommonCrawl URLs to assess the prevalence of AI-generated web content, finding that by November 2024, AI-generated articles outnumbered human-written ones. However, growth has plateaued since May 2024, and AI content largely does not appear prominently in Google or ChatGPT search results.

Key Points

  • By November 2024, AI-generated articles surpassed human-written articles in quantity published on the web.
  • Growth in AI content plateaued after May 2024, possibly because AI articles underperform in search rankings.
  • Study sampled 65k CommonCrawl URLs, classifying articles as AI-generated if >50% of content flagged by Surfer's AI detector.
  • AI-generated articles largely do not appear in Google or ChatGPT results, limiting their actual reach to users.
  • AI/human-edited hybrid articles were not evaluated and may be even more prevalent than purely AI-generated content.

Cited by 1 page

PageTypeQuality
Epistemic CollapseRisk49.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202611 KB
## **Key Takeaways**

- The quantity of AI-generated articles has surpassed the quantity of human-written articles being published on the web.
- However, the proportion of AI-generated articles has plateaued since May 2024.
- Despite the prevalence of AI-generated articles on the web, we show in a separate study that these articles largely do not appear in Google and ChatGPT. We do not evaluate whether AI-generated articles are viewed in proportion by real users, but we suspect that they are not.
- Our study did not evaluate the prevalence of AI-generated / human-edited articles, and they may be even more prevalent.

## Motivation

Since ChatGPT launched in November 2022, many companies have explored publishing content generated by LLMs such as ChatGPT, Claude, and Gemini to grow their traffic across channels such as Google Search, social, and advertising. This is a cost-effective alternative to spending hundreds of dollars for humans to write content.

The quality of AI content is rapidly improving.  In many cases, AI-generated content is as good or better than content written by humans ( [MIT Study](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4453958)). It is often hard for people to distinguish whether content is created by AI ( [Originality AI Study](https://originality.ai/blog/can-humans-detect-ai-content)).

We seek to evaluate the prevalence of AI-generated articles.

## Results

We find that in November 2024, the quantity of AI-generated articles being published on the web surpassed the quantity of human-written articles.

We observe significant growth in AI-generated articles coinciding with the launch of ChatGPT in November 2022. After only 12 months, AI-generated articles accounted for nearly half (39%) of articles published. The raw data for this evaluation is available [here](https://docs.google.com/spreadsheets/d/1WamFyVahPDtAPFtvly30BG2QjyA-L1KYkO2UEYGKKcg/edit?gid=0#gid=0).

![](https://cdn.prod.website-files.com/6876b87fcdcec4dc81409391/68efe2925926ea4fc2bfc872_AI%20Content%20Vs%20Human.png)

### **AI-generated Article Growth Has Plateaued**

While AI-generated articles grew dramatically after ChatGPT launched, we do not see that trend continuing. Instead, the proportion of AI-generated articles has remained relatively stable over the last 12 months. We hypothesize that this is because practitioners found that AI-generated articles do not perform well in search, as shown in a separate study.

## Methodology

#### **CommonCrawl**

[Common Crawl](https://commoncrawl.org/) maintains one of the largest publicly available web archives. It provides billions of URLs and is used by researchers and developers, and is a key data source for training large language models.

### Selection of Articles

We need a representative sample of English-language articles on the web. To do so, we randomly select 65k URLs from CommonCrawl, and confirm that each is in English, has an article schema markup, is at least 100 words, has a publish date

... (truncated, 11 KB total)
Resource ID: 57dfd699b04e4e93 | Stable ID: Y2E5YjhjND