Enhancement Queue
This page tracks which pages need enhancement to match their respective style guides. Use this to prioritize work and avoid duplicate effort.
Quick Stats
| Content Type | Pending | In Progress | Complete | Style Guide |
|---|---|---|---|---|
| Models | ≈26 | 0 | 29 | Model Style Guide |
| Risks | ≈34 | 0 | 0 | KB Style Guide |
| Responses | ≈40 | 0 | 1 | KB Style Guide |
Enhancement Queues
High Priority (quality < 3)
| Page | Quality | Status | Notes |
|---|---|---|---|
| mesa-optimizationRiskMesa-OptimizationMesa-optimization—where AI systems develop internal optimizers with different objectives than training goals—shows concerning empirical evidence: Claude exhibited alignment faking in 12-78% of moni...Quality: 63/100 | 2 | Pending | Needs significant work |
Accident Risks
| Page | Status |
|---|---|
| corrigibility-failureRiskCorrigibility FailureCorrigibility failure—AI systems resisting shutdown or modification—represents a foundational AI safety problem with empirical evidence now emerging: Anthropic found Claude 3 Opus engaged in alignm...Quality: 62/100 | Pending |
| distributional-shiftRiskAI Distributional ShiftComprehensive analysis of distributional shift showing 40-45% accuracy drops when models encounter novel distributions (ObjectNet vs ImageNet), with 5,202 autonomous vehicle accidents and 15-30% me...Quality: 91/100 | Pending |
| emergent-capabilitiesRiskEmergent CapabilitiesEmergent capabilities—abilities appearing suddenly at scale without explicit training—pose high unpredictability risks. Wei et al. documented 137 emergent abilities; recent models show step-functio...Quality: 61/100 | Pending |
| goal-misgeneralizationRiskGoal MisgeneralizationGoal misgeneralization occurs when AI systems learn transferable capabilities but pursue wrong objectives in deployment, with 60-80% of RL agents exhibiting this failure mode under distribution shi...Quality: 63/100 | Pending |
| instrumental-convergenceRiskInstrumental ConvergenceComprehensive review of instrumental convergence theory with extensive empirical evidence from 2024-2025 showing 78% alignment faking rates, 79-97% shutdown resistance in frontier models, and exper...Quality: 64/100 | Pending |
| power-seekingRiskPower-Seeking AIFormal proofs demonstrate optimal policies seek power in MDPs (Turner et al. 2021), now empirically validated: OpenAI o3 sabotaged shutdown in 79% of tests (Palisade 2025), and Claude 3 Opus showed...Quality: 67/100 | Pending |
| reward-hackingRiskReward HackingComprehensive analysis showing reward hacking occurs in 1-2% of OpenAI o3 task attempts, with 43x higher rates when scoring functions are visible. Mathematical proof establishes it's inevitable for...Quality: 91/100 | Pending |
| sandbaggingRiskAI Capability SandbaggingSystematically documents sandbagging (strategic underperformance during evaluations) across frontier models, finding 70-85% detection accuracy with white-box probes, 18-24% accuracy drops on autono...Quality: 67/100 | Pending |
| schemingRiskSchemingScheming—strategic AI deception during training—has transitioned from theoretical concern to observed behavior across all major frontier models (o1: 37% alignment faking, Claude: 14% harmful compli...Quality: 74/100 | Pending |
| sharp-left-turnRiskSharp Left TurnThe Sharp Left Turn hypothesis proposes AI capabilities may generalize discontinuously while alignment fails to transfer, with compound probability estimated at 15-40% by 2027-2035. Empirical evide...Quality: 69/100 | Pending |
| sycophancyRiskSycophancySycophancy—AI systems agreeing with users over providing accurate information—affects 34-78% of interactions and represents an observable precursor to deceptive alignment. The page frames this as a...Quality: 65/100 | Pending |
| treacherous-turnRiskTreacherous TurnComprehensive analysis of treacherous turn risk where AI systems strategically cooperate while weak then defect when powerful. Recent empirical evidence (2024-2025) shows frontier models exhibit sc...Quality: 67/100 | Pending |
Misuse Risks
| Page | Status |
|---|---|
| autonomous-weaponsRiskAutonomous WeaponsComprehensive overview of lethal autonomous weapons systems documenting their battlefield deployment (Libya 2020, Ukraine 2022-present) with AI-enabled drones achieving 70-80% hit rates versus 10-2...Quality: 56/100 | Pending |
| deepfakesRiskDeepfakesComprehensive overview of deepfake risks documenting $60M+ in fraud losses, 90%+ non-consensual imagery prevalence, and declining detection effectiveness (65% best accuracy). Reviews technical capa...Quality: 50/100 | Pending |
| disinformationRiskAI DisinformationPost-2024 analysis shows AI disinformation had limited immediate electoral impact (cheap fakes used 7x more than AI content), but creates concerning long-term epistemic erosion with 82% higher beli...Quality: 54/100 | Pending |
| fraudRiskAI-Powered FraudAI-powered fraud losses reached $16.6B in 2024 (33% increase) and are projected to hit $40B by 2027, with voice cloning requiring just 3 seconds of audio and deepfakes enabling sophisticated attack...Quality: 47/100 | Pending |
| surveillanceRiskAI Mass SurveillanceComprehensive analysis of AI-enabled mass surveillance documenting deployment in 97 of 179 countries, with detailed evidence of China's 600M cameras and Xinjiang detention of 1-1.8M Uyghurs. NIST s...Quality: 64/100 | Pending |
Structural Risks
| Page | Status |
|---|---|
| concentration-of-powerRiskAI-Driven Concentration of PowerDocuments how AI development is concentrating in ~20 organizations due to $100M+ compute costs, with 5 firms controlling 80%+ of cloud infrastructure and projections reaching $1-10B per model by 20...Quality: 65/100 | Pending |
| enfeeblementRiskAI-Induced EnfeeblementDocuments the gradual risk of humanity losing critical capabilities through AI dependency. Key findings: GPS users show 23% navigation decline (Nature 2020), AI writes 46% of code with 4x more clon...Quality: 91/100 | Pending |
| erosion-of-agencyRiskErosion of Human AgencyComprehensive analysis of AI-driven agency erosion across domains: 42.3% of EU workers under algorithmic management (EWCS 2024), 70%+ of Americans consuming news via social media algorithms, and do...Quality: 91/100 | Pending |
| lock-inRiskAI Value Lock-inComprehensive analysis of AI lock-in scenarios where values, systems, or power structures become permanently entrenched. Documents evidence including Big Tech's 66-70% cloud control, AI surveillanc...Quality: 64/100 | Pending |
Epistemic Risks
| Page | Status |
|---|---|
| epistemic-collapseRiskEpistemic CollapseEpistemic collapse describes the complete erosion of society's ability to establish factual consensus when AI-generated synthetic content overwhelms verification capacity. Current AI detectors achi...Quality: 49/100 | Pending |
| institutional-captureRiskAI-Driven Institutional Decision CaptureComprehensive analysis of how AI systems could capture institutional decision-making across healthcare, criminal justice, hiring, and governance through systematic biases. Documents 85% racial bias...Quality: 73/100 | Pending |
| knowledge-monopolyRiskAI Knowledge MonopolyAnalyzes the risk that 2-3 AI systems could dominate humanity's knowledge access by 2040, projecting 80%+ market concentration with correlated errors and epistemic lock-in. Provides comprehensive m...Quality: 50/100 | Pending |
| learned-helplessnessRiskEpistemic Learned HelplessnessAnalyzes how AI-driven information environments induce epistemic learned helplessness (surrendering truth-seeking), presenting survey evidence showing 36% news avoidance and declining institutional...Quality: 53/100 | Pending |
| reality-fragmentationRiskAI-Accelerated Reality FragmentationReality fragmentation describes the breakdown of shared epistemological foundations where populations hold incompatible beliefs about basic facts (e.g., 73% Republicans vs 23% Democrats believe 202...Quality: 28/100 | Pending |
| trust-cascadeRiskAI Trust Cascade FailureAnalysis of how declining institutional trust (media 32%, government 16%) could create self-reinforcing collapse where no trusted entity can validate others, potentially accelerated by AI-enabled s...Quality: 36/100 | Pending |
Already High Quality (quality 4+)
These are lower priority but could still benefit from kb-2.0 alignment:
| Page | Quality | Status |
|---|---|---|
| authoritarian-takeoverRiskAI-Enabled Authoritarian TakeoverComprehensive analysis documenting how 72% of global population (5.7 billion) now lives under autocracy with AI surveillance deployed in 80+ countries, showing 15 consecutive years of declining int...Quality: 61/100 | 4 | Pending |
| bioweaponsRiskBioweapons RiskComprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities (0.3% → 1.5% annual epidemic probability), Anthro...Quality: 91/100 | 4 | Pending |
| cyber-psychosisRiskAI-Induced Cyber PsychosisSurveys psychological harms from AI interactions including parasocial relationships, AI-induced delusions, manipulation through personalization, reality confusion from synthetic content, and radica...Quality: 37/100 | 4 | Pending |
| cyberweaponsRiskCyberweapons RiskComprehensive analysis showing AI-enabled cyberweapons represent a present, high-severity threat with GPT-4 exploiting 87% of one-day vulnerabilities at $8.80/exploit and the first documented AI-or...Quality: 91/100 | 4 | Pending |
| deceptive-alignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100 | 4 | Pending |
| legal-evidence-crisisRiskAI-Driven Legal Evidence CrisisOutlines how AI-generated synthetic media (video, audio, documents) could undermine legal systems by making digital evidence unverifiable, creating both wrongful convictions from fake evidence and ...Quality: 43/100 | 4 | Pending |
How to Use This
For Claude Code Sessions
- Pick 3-5 items marked "Pending" from one category
- Update their status to "In Progress" in this file
- Enhance the pages following the style guide
- Mark as "Complete" when done
- Commit changes
Enhancement Checklist
For All Pages (Common Writing Principles):
- No insider jargon ("EA money", "non-EA causes") — use descriptive terms
- Estimates use ranges, not point values; labeled "Est." or "Approx."
- Analytical tone, not prescriptive ("this suggests..." not "we recommend...")
- Counter-arguments included for key claims
-
objectivityrating in frontmatter
For Risk Pages (kb-2.0):
- 2-3 paragraph Overview
- Risk Assessment table (Severity, Likelihood, Timeline)
- Responses That Address This Risk table
- Why This Matters section
- Key Uncertainties section
- Proper h2/h3 hierarchy
-
styleGuideVersion: "kb-2.0"in frontmatter
For Response Pages (kb-2.0):
- 2-3 paragraph Overview
- Quick Assessment table
- Risks Addressed table
- How It Works section
- Critical Assessment section
- Getting Involved section
- Proper h2/h3 hierarchy
-
styleGuideVersion: "kb-2.0"in frontmatter
For Model Pages:
- Overview with flowing prose
- Mermaid diagram
- Quantitative tables with estimates/ranges
- Scenario analysis
- Limitations section
- "Why These Numbers Might Be Wrong" section (for CE estimates)
- Ratings in frontmatter (including objectivity)