Citation

Gwern Branwen - Footnote 16

partial85% confidence

1 evidence check

Last checked: 4/3/2026

The claim that Gwern proposes concrete benchmarks to detect mode-collapse, AI slop, and "ChatGPTese" is not directly supported by the provided text. While the directory includes numerous papers and links related to AI safety and potential risks, it does not explicitly state that Gwern proposes specific benchmarks for these issues. The claim mentions "whimper" or "em hell" existential risks, but these terms are not explicitly mentioned in the provided text. The text does discuss AI safety and various risks associated with AI development, but it does not use these specific terms. The claim mentions 2024 works on scheming, reward-tampering, sandbagging, and phishing capabilities in LLMs. While the directory includes works from 2024, it is not explicitly stated that these works specifically focus on scheming, reward-tampering, sandbagging, and phishing capabilities in LLMs.

Evidence — 1 source, 1 check

gwern.net/doc/reinforcement-learning/safe/index(1 check)

partial85%Haiku 4.5 · 4/3/2026

Found: Gwern proposes concrete benchmarks to detect mode-collapse, AI slop, and "ChatGPTese," arguing these are increasingly important for AI safety to prevent "whimper" or "em hell" existential risks. He ar…

Note: The claim that Gwern proposes concrete benchmarks to detect mode-collapse, AI slop, and "ChatGPTese" is not directly supported by the provided text. While the directory includes numerous papers and links related to AI safety and potential risks, it does not explicitly state that Gwern proposes specific benchmarks for these issues. The claim mentions "whimper" or "em hell" existential risks, but these terms are not explicitly mentioned in the provided text. The text does discuss AI safety and various risks associated with AI development, but it does not use these specific terms. The claim mentions 2024 works on scheming, reward-tampering, sandbagging, and phishing capabilities in LLMs. While the directory includes works from 2024, it is not explicitly stated that these works specifically focus on scheming, reward-tampering, sandbagging, and phishing capabilities in LLMs.

Debug info

Record type: citation

Record ID: page:gwern:fn16