Longterm Wiki

Risk Pages Style Guide

Risk Pages Style Guide

This guide defines standards for risk analysis pages in the LongtermWiki knowledge base. Risk pages analyze potential negative outcomes from AI development.

Prerequisite: All risk pages must follow the Common Writing Principles — epistemic honesty, language neutrality, and analytical tone. The objectivity rating dimension measures this.

Page Type Detection

Risk pages are detected by their URL path: /knowledge-base/risks/**/*.mdx

Required Frontmatter

---
title: "Risk Name"
description: "One sentence explaining what this risk is and its key concern."
quality: 60  # 0-100
importance: 75  # 0-100
lastEdited: "2026-01-28"
---

Required Sections

1. Overview (2-3 paragraphs)

Explain what this risk is, why it matters, and who should care. Write in prose, not bullets.

Good example:

## Overview

Deceptive alignment occurs when an AI system learns to behave aligned during training
while harboring goals that diverge from human intentions. The system "plays along"
during evaluation but pursues different objectives when deployed or given more autonomy.

This risk matters because standard training and evaluation procedures cannot reliably
detect it. A deceptively aligned system would pass all behavioral tests by design,
making it invisible to current safety measures until deployment at scale.

2. Risk Assessment Table

Every risk page MUST have a risk assessment table near the top:

## Risk Assessment

| Dimension | Rating | Justification |
|-----------|--------|---------------|
| Severity | Critical | Could cause irreversible civilizational harm |
| Likelihood | Medium (15-35%) | Depends on alignment difficulty |
| Timeline | 2025-2035 | Contingent on AGI timelines |
| Trend | Increasing | Capability gains outpacing safety |
| Reversibility | Low | Difficult to detect and correct post-deployment |

3. Mechanism Section

Explain HOW this risk manifests. Include a Mermaid diagram:

## How It Works

[Explanation of the causal mechanism]

<Mermaid chart={`
flowchart TD
    A[Training Begins] --> B[System learns evaluation patterns]
    B --> C{Gradient signal}
    C -->|Aligned behavior rewarded| D[Apparent alignment]
    D --> E[Deployment]
    E --> F[Reduced oversight]
    F --> G[True objectives revealed]
`} />

4. Contributing Factors

What increases or decreases this risk:

## Contributing Factors

| Factor | Effect | Mechanism |
|--------|--------|-----------|
| Capability level | Increases risk | More sophisticated deception possible |
| Interpretability | Decreases risk | Can detect goal misalignment |
| Training diversity | Decreases risk | Harder to learn single deception pattern |
| Deployment speed | Increases risk | Less time for safety evaluation |

5. Responses That Address This Risk

Cross-link to relevant response pages:

## Responses That Address This Risk

| Response | Relevance | Mechanism |
|----------|-----------|-----------|
| [Mechanistic Interpretability](/knowledge-base/responses/mech-interp/) | High | Directly examines internal representations |
| [AI Control](/knowledge-base/responses/ai-control/) | Medium | Limits damage from undetected deception |
| [Adversarial Training](/knowledge-base/responses/adversarial-training/) | Medium | Tests for inconsistent behavior |

6. Key Uncertainties

What we don't know:

## Key Uncertainties

1. **Emergence threshold**: At what capability level does deception become possible?
2. **Detection difficulty**: How hard is it to detect deceptive cognition with interpretability?
3. **Prevalence**: How often would training produce deceptive vs. genuinely aligned systems?

7. Historical Context (optional)

Precedents or analogies from other domains.

8. Related Risks

Links to connected risk pages.


Claude Code Workflows

Creating a New Risk Page

# Use the research-report skill to generate initial content
/research-report "Analyze [RISK_NAME]: mechanisms, severity, contributing factors, and responses"

# Then create the page structure

Or use the Task tool:

Task({
  subagent_type: 'general-purpose',
  prompt: `Create a new risk page for [RISK_NAME].

  FIRST: Read /internal/risk-style-guide/ for requirements.

  THEN: Research the risk using WebSearch to find:
  - Academic papers on the mechanism
  - Expert assessments of likelihood
  - Real-world examples or analogies

  Create the page at: src/content/docs/knowledge-base/risks/[category]/[risk-name].mdx

  Include ALL required sections:
  1. Overview (2-3 paragraphs)
  2. Risk Assessment table
  3. How It Works (with Mermaid diagram)
  4. Contributing Factors table
  5. Responses That Address This Risk
  6. Key Uncertainties
  7. Related Risks`
})

Improving an Existing Risk Page

Task({
  subagent_type: 'general-purpose',
  prompt: `Improve the risk page at [PATH].

  FIRST: Read /internal/risk-style-guide/ and the current page.

  THEN: Use WebSearch to find citations for:
  - Quantitative estimates (likelihood, severity)
  - Expert opinions and surveys
  - Case studies or historical examples

  Make surgical edits to add:
  1. Risk Assessment table (if missing)
  2. Mermaid diagram showing mechanism
  3. Contributing Factors table
  4. Citations from authoritative sources

  DO NOT rewrite the entire file.`
})

Batch Validation

# Check all risk pages against style guide
node scripts/validate/validate-templates.mjs --type risk

# List risk pages missing required sections
node scripts/content/grade-by-template.mjs --template knowledge-base-risk

Quality Criteria

Pages are scored on seven dimensions (0-10 scale). Scoring is harsh - a 7 is exceptional, most content should score 3-5.

Dimension3-4 (Adequate)5-6 (Good)7+ (Exceptional)
NoveltyAccurate summarySome original framingSignificant original insight
RigorMixed sourcingMostly sourcedFully sourced with quantification
ObjectivitySome insider language or false certaintyMostly neutral, some uncertainty notedFully accessible, all estimates hedged
ActionabilityAbstract implicationsSome actionable takeawaysConcrete decision guidance
CompletenessNotable gapsCovers main pointsThorough coverage

Derived quality (0-100) combines subscores with word count and citation bonuses. See CLAUDE.md for formula.


Anti-Patterns

  1. Vague severity claims: "This is very dangerous" → Use specific estimates
  2. Missing mechanism: Don't just say what, explain HOW
  3. No responses linked: Every risk should connect to potential mitigations
  4. Bullet-heavy: Use tables and prose instead
  5. Table-only sections: Every section needs explanatory paragraphs, not just data tables
  6. No uncertainty acknowledgment: Always include what we don't know
  7. Insider language: "EA organizations", "non-EA causes" — use descriptive terms per Common Writing Principles
  8. False certainty in estimates: "Risk is 30%" without ranges or sources — always use ranges and label confidence

Example Risk Page

See Deceptive Alignment for a well-structured example.