Automation Tools
This page documents all automation tools available for maintaining and improving the knowledge base.
Quick Reference
| Tool | Purpose | Command |
|---|---|---|
| Content Commands | Improve, grade, create pages | pnpm crux w |
| Validators | Check content quality | pnpm crux w validate |
| Analyzers | Analysis and reporting | pnpm crux w analyze |
| Auto-fixers | Fix common issues | pnpm crux w fix |
| Data Builder | Regenerate entity data | pnpm build-data |
| Resources | External resource management | pnpm crux w resources |
Page Improvement Workflow
The recommended way to improve wiki pages to quality 5 (Q5).
Commands
# List pages that need improvement (sorted by priority)
pnpm crux w improve --list
# Improve a specific page
pnpm crux w improve economic-disruption --tier=standard --apply
# Show page info only (no prompt)
pnpm crux w improve racing-dynamics --info
# Filter by quality and importance
pnpm crux w improve --list --max-qual 3 --min-imp 50
What Makes a Q5 Page
Q5 is an internally-defined quality standard used by the grading pipeline to classify pages as comprehensive.1 The following elements are required:
| Element | Requirement |
|---|---|
| Quick Assessment Table | 5+ rows, 3 columns (Dimension, Assessment, Evidence) |
| Substantive Tables | 2+ additional tables with real data |
| Mermaid Diagram | 1+ showing key relationships2 |
| Citations | 10+ real URLs from authoritative sources |
| Quantified Claims | Replace "significant" with "25-40%" etc. |
| Word Count | 800+ words of substantive content |
Cost Estimates
Cost per page depends on the improvement tier and the Claude model used.3 The estimates below are approximate and reflect typical token usage; actual costs vary with page length and complexity.
| Model | Approximate Cost per Page |
|---|---|
| Claude Opus 4.5 | $3–5 |
| Claude Sonnet 4.5 | $0.50–1.00 |
Reference Examples
The following pages meet Q5 criteria and can serve as structural references:
content/docs/knowledge-base/risks/bioweapons.mdx— includes Quick Assessment Table, 2+ data tables, 1+ Mermaid diagram, and 10+ citationscontent/docs/knowledge-base/risks/racing-dynamics.mdx— demonstrates quantified claims and structured evidence sections
Scheduled Updates
The update schedule system tracks which pages are overdue for refresh based on update_frequency (days) in frontmatter. It prioritizes pages by combining staleness with importance.
How Scheduling Works
Each page declares how often it should be updated:
update_frequency: 7 # Check weekly
lastEdited: "2026-01-15"
readerImportance: 85
Priority scoring formula: staleness × (importance / 100), where staleness = days since edit / update frequency. Pages with staleness >= 1.0 are overdue.
| Importance | Default Frequency |
|---|---|
| >= 80 | 7 days (weekly) |
| >= 60 | 21 days (3 weeks) |
| >= 40 | 45 days (6 weeks) |
| >= 20 | 90 days (3 months) |
Opting Out: Non-Evergreen Pages
Some pages are point-in-time content (reports, experiments, proposals) that don't need ongoing updates. Set evergreen: false in frontmatter to exclude them from the update schedule:
evergreen: false # This page is a snapshot, not maintained
Pages with evergreen: false are skipped by the update schedule, the bootstrap script, and the reassign script. All existing report pages under internal/reports/ use this flag.
Commands
# See what needs updating
pnpm crux updates list # Top 10 by priority
pnpm crux updates list --overdue --limit=20 # All overdue pages
# Preview triage recommendations (~$0.08/page based on Sonnet 4.5 rates)
pnpm crux updates triage --count=10
# Run updates (triage is ON by default)
pnpm crux updates run --count=5 # Triage + update top 5
pnpm crux updates run --count=3 --no-triage --tier=polish # Skip triage
# Statistics
pnpm crux updates stats
Cost-Aware Triage
By default, updates run performs a low-cost news check (≈$0.08 per page at Sonnet 4.5 rates)3 using web search + SCRY before committing to a full update. This check recommends a tier based on whether new developments were found:
| Triage Result | Action | Cost |
|---|---|---|
| skip | No new developments — page is current | $0 |
| polish | Minor tweaks only | $2–33 |
| standard | Notable new developments to add | $5–83 |
| deep | Major developments requiring thorough research | $10–153 |
Example scenario: 10 pages scheduled at standard tier ≈ $65 total. If triage finds that 6 have no new developments, those are skipped: ≈$26 (4 pages × standard) + $0.80 triage overhead ≈ $27. Note that when most pages do require updates, triage overhead adds cost without reducing the update spend; the net benefit depends on the proportion of pages that can be skipped.
Use --no-triage to skip the news check and apply a fixed tier to all pages.
Update Tiers (Page Improver)
# Single page with specific tier
pnpm crux w improve <page-id> --tier=polish --apply
pnpm crux w improve <page-id> --tier=standard --apply --grade
pnpm crux w improve <page-id> --tier=deep --apply --grade
# Auto-select tier via triage
pnpm crux w improve <page-id> --tier=triage --apply --grade
# Triage only (no update)
pnpm crux w improve <page-id> --triage
| Tier | Approximate Cost3 | Phases |
|---|---|---|
| polish | $2–3 | analyze, improve, validate |
| standard | $5–8 | analyze, research, improve, validate, review |
| deep | $10–15 | analyze, research-deep, improve, validate, review, gap-fill |
| triage | ≈$0.08 + tier cost | news check, then auto-selects above |
Page Change Tracking
Claude Code sessions log which wiki pages they modify. This creates a per-page change history that feeds two dashboards.
How It Works
- Each Claude Code session appends an entry to
.claude/sessions/with a**Pages:**field listing the page slugs that were edited. - At build time,
build-data.mjsparses session logs and attaches achangeHistoryarray to each page indatabase.json. - PR numbers are auto-populated at build time via the GitHub API (
github-pr-lookup.mjs), mapping branch names to PR numbers. Session logs can also include an explicit**PR:** #123as an override. - The data flows to two places:
- Per-page: A "Change History" section in the PageStatus card shows which sessions touched this page, with clickable PR links.
- Site-wide: The Page Changes dashboard shows all page edits across all sessions in a sortable table with PR links.
Session Log Format
The **Pages:** field uses page slugs (filenames without .mdx), comma-separated:
**Pages:** ai-risks, compute-governance, anthropic
Omit the field entirely for infrastructure-only sessions that don't edit wiki pages.
Content Grading
Uses the Claude Sonnet API to automatically grade pages with importance, quality, and AI-generated summaries.3
Commands
# Preview what would be graded (no API calls)
pnpm crux w grade --dry-run
# Grade a specific page
pnpm crux w grade --page scheming
# Grade pages and apply to frontmatter
pnpm crux w grade --limit 10 --apply
# Grade a category with parallel processing
pnpm crux w grade --category responses --parallel 3
# Skip already-graded pages
pnpm crux w grade --skip-graded --limit 50
Options
| Option | Description |
|---|---|
--page ID | Grade a single page |
--dry-run | Preview without API calls |
--limit N | Only process N pages |
--parallel N | Process N pages concurrently (default: 1) |
--category X | Only process pages in category |
--skip-graded | Skip pages with existing importance |
--apply | Write grades to frontmatter (caution) |
--output FILE | Write results to JSON file |
Grading Criteria
Importance (0–100):
- 90–100: Essential for prioritization (core interventions, key risk mechanisms)
- 70–89: High value (concrete responses, major risk categories)
- 50–69: Useful context (supporting analysis, secondary risks)
- 30–49: Reference material (historical, profiles, niche)
- 0–29: Peripheral (internal docs, stubs)
Quality (0–100):
- 80–100: Comprehensive (2+ tables, 1+ diagram, 5+ citations, quantified claims) — maps to Q51
- 60–79: Good (1+ table, 3+ citations, mostly prose)
- 40–59: Adequate (structure but lacks tables/citations)
- 20–39: Draft (poorly structured, heavy bullets, no evidence)
- 0–19: Stub (minimal content)
Cost Estimate
Approximately $0.02 per page at Sonnet 4.5 rates.3 Total cost for the full wiki (approximately 329 pages as of early 2026) is approximately $6–7, though this figure changes as pages are added.
Validation Suite
All validators are accessible via the unified crux CLI:
pnpm crux w validate # Run all validators
pnpm crux w validate --help # List all validators
Individual Validators
| Command | Description |
|---|---|
crux validate compile | MDX4 compilation check |
crux validate data | Entity data integrity |
crux validate refs | Internal reference validation |
crux validate mermaid | Mermaid2 diagram syntax |
crux validate sidebar | Sidebar configuration |
crux validate entity-links | EntityLink component validation |
crux validate templates | Template compliance |
crux validate quality | Content quality metrics |
crux validate unified | Unified rules engine (escaping, formatting) |
Advanced Usage
# Run specific validators
pnpm crux w validate compile --quick
pnpm crux w validate unified --rules=dollar-signs,markdown-lists
# Skip specific checks
pnpm crux w validate all --skip=component-refs
# CI mode
pnpm crux w validate gate
Citation & Content Tools
Tools for verifying citations, fetching source content, and scanning wiki pages. Data is stored in the wiki-server PostgreSQL database and accessed via Hono RPC API.
Citation Verification
# Verify all citations on a page (fetches URLs, checks quotes)
pnpm crux citations verify <page-id>
# Run citation audits across multiple pages
pnpm crux citations audit
# View citation archive for a page
pnpm crux citations show <page-id>
Citation results are stored in two places:
- YAML archive (
data/citation-archive/<page-id>.yaml) — per-page verification records, checked into git - PostgreSQL (
citation_contenttable) — full fetched text for quote verification, accessed via wiki-server API
Content Scanning
# Scan MDX files for content analysis
pnpm crux scan-content
Source Fetching
When verifying citations, the system fetches source URLs through a multi-layer cache:
- In-memory LRU cache (500 entries, session-scoped; implementation-defined in
crux/lib/source-cache.ts) - PostgreSQL
citation_content(durable, cross-machine) - Network fetch via Firecrawl API5 or built-in fallback
Fetched content is cached in .cache/sources/ locally and persisted to PostgreSQL for future sessions.
Environment Variables
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | Claude API3 for summaries and grading |
FIRECRAWL_KEY | Web page fetching via Firecrawl5 (optional, has built-in fallback) |
Data Layer
Build Data
The data build step must run before the site build, as apps/web reads database.json at build time rather than making runtime API calls.
pnpm build-data # Regenerate all data files
pnpm dev # Auto-runs build-data first
pnpm build # Auto-runs build-data first
Generated Files
After running build-data:
apps/web/src/data/database.json— Main entity databaseapps/web/src/data/entities.json— Entity definitionsapps/web/src/data/backlinks.json— Cross-referencesapps/web/src/data/tagIndex.json— Tag indexapps/web/src/data/pathRegistry.json— URL path mappingsapps/web/src/data/pages.json— Page metadata for scripts
Other Data Scripts
pnpm crux tb sync:descriptions # Sync model descriptions from files
pnpm crux tb extract # Extract data from pages
pnpm crux tb generate-yaml # Generate YAML from data
pnpm crux tb cleanup-data # Clean up data files
Content Management CLI
Unified tool for managing and improving content quality via pnpm crux w.
Commands
# Improve pages
pnpm crux w improve <page-id>
# Grade pages using Claude API
pnpm crux w grade --page scheming
pnpm crux w grade --limit 5 --apply
# Regrade pages
pnpm crux w regrade --page scheming
# Create new pages
pnpm crux w create "Page Title" --tier=standard
Options
| Option | Description |
|---|---|
--dry-run | Preview without API calls |
--limit N | Process only N pages |
--apply | Apply changes directly to files |
--page ID | Target specific page |
Resource Linking
Convert URLs to R Components
# Find URLs that can be converted to <R> components
pnpm crux w resources map expertise-atrophy # Specific file
pnpm crux w resources map # All files
pnpm crux w resources map --stats # Statistics only
# Auto-convert markdown links to R components
pnpm crux w resources convert --dry-run # Preview
pnpm crux w resources convert --apply # Apply changes
Export Resources
pnpm crux w resources export # Export resource data
Content Generation
Generate New Pages
# Generate a model page from YAML input
pnpm crux w create "Model Name" --type model --file input.yaml
# Generate a risk page
pnpm crux w create "Risk Name" --type risk --file input.yaml
# Generate a response page
pnpm crux w create "Response Name" --type response --file input.yaml
Batch Summaries
pnpm crux w generate summaries --batch 50 # Generate summaries for multiple pages
Testing
pnpm test # Run all tests
pnpm test:lib # Test library functions
pnpm test:validators # Test validator functions
Linting and Formatting
pnpm lint # Check for linting issues
pnpm lint:fix # Fix linting issues
pnpm format # Format all files
pnpm format:check # Check formatting without changing
Temporary Files
Convention: All temporary/intermediate files go in .claude/temp/ (gitignored).
Scripts that generate intermediate output (like grading results) write here by default. This keeps the project root clean and prevents accidental commits.
Common Workflows
Improve a Low-Quality Important Page
-
Find candidates:
pnpm crux w improve --list --max-qual 3 -
Run improvement:
pnpm crux w improve economic-disruption --tier=standard --apply -
Validate the result:
pnpm crux w validate compile pnpm crux w validate gate
Grade All New Pages
-
Preview:
pnpm crux w grade --skip-graded --dry-run -
Grade and apply:
pnpm crux w grade --skip-graded --apply --parallel 3 -
Review results:
cat .claude/temp/grades-output.json
Check Content Quality Before PR
pnpm crux w validate gate --fix
Update After Editing entities.yaml
pnpm build-data
pnpm crux w validate data
Footnotes
-
Q5 quality standards are defined internally within this wiki's content pipeline documentation. Q5 maps to the 80–100 range on the grading scale and requires: Quick Assessment Table (5+ rows), 2+ substantive tables, 1+ Mermaid diagram, 10+ authoritative-source citations, quantified claims, and 800+ words of substantive content. See
crux/lib/page-templates.tsfor the authoritative implementation. ↩ ↩2 -
Diagram Syntax | Mermaid — Mermaid.js project, 2026. Version 11.13.0 supports 25+ diagram types. All diagram definitions begin with a diagram type declaration; comments use
%%notation; configuration via YAML frontmatter or%%{ }%%directives. ↩ ↩2 -
Pricing — Claude API Docs — Anthropic, 2026. Claude Opus 4.5: $5/MTok input, $25/MTok output. Claude Sonnet 4.5: $3/MTok input, $15/MTok output. Batch API provides a flat 50% discount on all models (Sonnet 4.5 batch: $1.50/$7.50 per MTok). All cost estimates on this page are approximate, reflect typical token usage per page, and are subject to change with Anthropic pricing updates. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
Docs | MDX — MDX project, 2026. Core compilation package
@mdx-js/mdxwith remark+rehype pipeline. Supports markdown, JSX, JavaScript expressions, and ESM imports/exports in one format. ↩ -
Firecrawl — Web Data API for AI — Firecrawl, 2026. Web scraping and crawling API; 1 credit per scraped page. Free tier provides 500 lifetime credits. The
FIRECRAWL_KEYenvironment variable enables Firecrawl-backed fetching; the system falls back to a built-in fetch implementation if the key is absent. ↩ ↩2