59QualityAdequate •Quality: 59/100LLM-assigned rating of overall page quality, considering depth, accuracy, and completeness.Structure suggests 1372.5ImportanceHighImportance: 72.5/100How central this topic is to AI safety. Higher scores mean greater relevance to understanding or mitigating AI risk.25.5ResearchMinimalResearch Value: 25.5/100How much value deeper investigation of this topic could yield. Higher scores indicate under-explored topics with high insight potential.
Content1/13
SummarySummaryBasic text summary used in search results, entity link tooltips, info boxes, and related page cards.crux content improve <id>ScheduleScheduleHow often the page should be refreshed. Drives the overdue tracking system.Set updateFrequency in frontmatterEntityEntityYAML entity definition with type, description, and related entries.Edit historyEdit historyTracked changes from improve pipeline runs and manual edits.crux edit-log view <id>OverviewOverviewA ## Overview heading section that orients readers. Helps with search and AI summaries.Add a ## Overview section at the top of the page
Tables0/ ~1TablesData tables for structured comparisons and reference material.Add data tables to the pageDiagrams0DiagramsVisual content — Mermaid diagrams, charts, or Squiggle estimate models.Add Mermaid diagrams or Squiggle modelsInt. links0/ ~3Int. linksLinks to other wiki pages. More internal links = better graph connectivity.Add links to other wiki pagesExt. links0/ ~1Ext. linksLinks to external websites, papers, and resources outside the wiki.Add links to external sourcesFootnotes0/ ~2FootnotesFootnote citations [^N] with source references at the bottom of the page.Add [^N] footnote citationsReferences0/ ~1ReferencesCurated external resources linked via <R> components or cited_by in YAML.Add <R> resource linksQuotes0QuotesSupporting quotes extracted from cited sources to back up page claims.crux citations extract-quotes <id>Accuracy0AccuracyCitations verified against their sources for factual accuracy.crux citations verify <id>RatingsN:4 R:7 A:4.5 C:7.5RatingsSub-quality ratings: Novelty, Rigor, Actionability, Completeness (0-10 scale).Backlinks2BacklinksNumber of other wiki pages that link to this page. Higher backlink count means better integration into the knowledge graph.
Issues2
QualityRated 59 but structure suggests 13 (overrated by 46 points)
StructureNo tables or diagrams - consider adding visual content
AI Value Learning
Research Area
Value Learning
Training AI systems to infer and adopt human values from observation and interaction
Stuart RussellPersonStuart RussellStuart Russell (born 1962) is a British computer scientist and UC Berkeley professor who co-authored the dominant AI textbook 'Artificial Intelligence: A Modern Approach' (used in over 1,500 univer...Quality: 30/100
Organizations
Center for Human-Compatible AIOrganizationCenter for Human-Compatible AICHAI is UC Berkeley's AI safety research center founded by Stuart Russell in 2016, pioneering cooperative inverse reinforcement learning and human-compatible AI frameworks. The center has trained 3...Quality: 37/100Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100
Risks
Goal MisgeneralizationRiskGoal MisgeneralizationGoal misgeneralization occurs when AI systems learn transferable capabilities but pursue wrong objectives in deployment, with 60-80% of RL agents exhibiting this failure mode under distribution shi...Quality: 63/100
Stuart ArmstrongPersonStuart ArmstrongAI safety researcher formerly at the Future of Humanity Institute, Oxford. Known for work on value learning, corrigibility, and interruptibility in AI systems. Author of "Smarter Than Us."Jessica TaylorPersonJessica TaylorAI safety researcher. Previously at MIRI where she worked on alignment theory and value learning. Contributed to research on quantilization, approval-directed agents, and other approaches to safe A...RLHFResearch AreaRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100
Approaches
AI AlignmentApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metrics for existing systems but face critical scalabi...Quality: 91/100
Key Debates
Why Alignment Might Be HardArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100AI Alignment Research AgendasCruxAI Alignment Research AgendasComprehensive comparison of major AI safety research agendas ($100M+ Anthropic, $50M+ DeepMind, $5-10M nonprofits) with detailed funding, team sizes, and failure mode coverage (25-65% per agenda). ...Quality: 69/100
Analysis
Alignment Robustness Trajectory ModelAnalysisAlignment Robustness Trajectory ModelThis model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 10-30x where systems are dangerous but can't help s...Quality: 64/100
Risks
Epistemic SycophancyRiskEpistemic SycophancyAI sycophancy—where models agree with users rather than provide accurate information—affects all five state-of-the-art models tested, with medical AI showing 100% compliance with illogical requests...Quality: 60/100