Anthropic

Company BlogHigh(4)

AI safety company, Claude developer

Resources

129

Citing pages

152

Tracked domains

1

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Tracked Domains

anthropic.com

Resources (129)

129 resources

		Authors	Summary
Anthropic - AI Safety Company Homepage	web	—	S	38
Anthropic's Work on AI Safety	paper	—	S	36
Anthropic's 2024 alignment faking study	web	—	S	15
Responsible Scaling Policy	web	—	S	12
Anthropic's follow-up research on defection probes	web	—	S	12
Constitutional AI: Harmlessness from AI Feedback	paper	Yanuo Zhou	S	11
Responsible Scaling Policy	web	—	S	9
Anthropic's Core Views on AI Safety	web	—	S	7
Anthropic safety evaluations	web	—	S	6
Activating AI Safety Level 3 protections	blog	—	S	6
Anthropic: Announcing our updated Responsible Scaling Policy	web	—	S	6
Constitutional AI: Harmlessness from AI Feedback	web	—	S	5
Anthropic's Constitutional AI work	web	—	S	5
Collective Constitutional AI	paper	—	S	5
Anthropic: Raises \$30 Billion Series G Funding at \$380 Billion Post-Money Valuation	web	—	S	5
first documented AI-orchestrated cyberattack	web	—	S	4
Introducing Claude Opus 4.5	blog	—	S	4
Anthropic's sleeper agents research (2024)	web	—	S	4
Anthropic pioneered the Responsible Scaling Policy	web	—	S	4
Constitutional Classifiers: Defending Against Universal Jailbreaks	web	—	S	4
Anthropic Interpretability Research Team	web	—	S	4
Anthropic Careers	web	—	S	3
Anthropic's research on sycophancy	web	—	S	3
Introducing Claude Opus 4.6	blog	—	S	3
Natural Emergent Misalignment from Reward Hacking	web	—	S	3

Rows per page:

Page 1 of 6

Citing Pages (152)

AI Accident Risk Cruxes Agentic AI AGI Development AGI Timeline AI-Assisted Alignment AI Control AI Cyber Damage: Bounding the Tail AI Alignment Alignment Evaluations Alignment Robustness Trajectory Model Anthropic Anthropic Core Views Anthropic (Funder)Anthropic IPO Anthropic Pre-IPO DAF Transfers Anthropic Stakeholders Anthropic Valuation Analysis Apollo Research Alignment Research Center (ARC)Autonomous Weapons Escalation Model Bioweapons Risk Bioweapons Attack Chain Model Center for AI Safety (CAIS)Safe and Secure Innovation for Frontier Artificial Intelligence Models Act Capabilities-to-Safety Pipeline Model Capability Elicitation AI Capability Threshold Model The Case For AI Existential Risk Catastrophic Cyber Tail Risk Circuit Breakers / Inference Interventions Claude Code Espionage Incident (2025)Autonomous Coding AI Compounding Risks Analysis Model AI-Driven Concentration of Power Constitutional AI Corporate AI Safety Responses Corrigibility Corrigibility Failure Corrigibility Failure Pathways Cyberweapons Risk Autonomous Cyber Attack Timeline Dangerous Capability Evaluations Daniela Amodei Dario Amodei AI Safety via Debate Deceptive Alignment Deceptive Alignment Decomposition Model AI Safety Defense in Depth Model AI-Assisted Deliberation AI Disinformation EA Shareholder Diversification from Anthropic AI Policy Effectiveness Emergent Capabilities AI-Induced Enfeeblement Epistemic Sycophancy EU AI Act Eval Saturation & The Evals Gap Evals-Based Deployment Gates AI Evaluation Evaluation Awareness Evan Hubinger Goal Misgeneralization Goal Misgeneralization Probability Model Goal Misgeneralization Research AI Governance and Policy Heavy Scaffolding / Agentic Systems Holden Karnofsky AI-Human Hybrid Systems Instrumental Convergence Instrumental Convergence Framework Interpretability AI Safety Intervention Effectiveness Matrix Is AI Existential Risk Real?Jaan Tallinn AI Knowledge Monopoly AI Lab Safety Culture Frontier AI Labs (Overview)Large Language Models Large Language Models AI Value Lock-in Long-Horizon Autonomous Tasks Anthropic Long-Term Benefit Trust MAIM (Mutually Assured AI Malfunction)Mechanistic Interpretability Mesa-Optimization Mesa-Optimization Risk Analysis Metaculus Machine Intelligence Research Institute (MIRI)AI Misuse Risk Cruxes Third-Party Model Auditing AI Model Specifications Multipolar Trap Dynamics Model Open Source AI Safety OpenAI OpenAI Foundation Optimistic Alignment Worldview AI Output Filtering Paul Christiano Pause Advocacy Should We Pause AI Development?Persuasion and Social Manipulation Power-Seeking AI Power-Seeking Emergence Conditions Model Pre-TAI Capital Deployment: $100B-$300B+ Spending Analysis Probing / Linear Probes AI Proliferation AI Development Racing Dynamics Racing Dynamics Impact Model Reasoning and Planning Red Teaming Redwood Research Refusal Training Reward Hacking Reward Hacking Taxonomy and Severity Model AI Risk Activation Timeline Model AI Risk Cascade Pathways Model AI Risk Interaction Matrix AI Risk Interaction Network Model RLHF Responsible Scaling Policies AI Safety Cases AI Safety Culture Equilibrium Model AI Safety Research Allocation Model AI Safety Research Value Model AI Safety Researcher Gap Model AI Capability Sandbagging Sandboxing / Containment Scalable Oversight AI Scaling Laws Scheming Scheming & Deception Detection Scheming Likelihood Assessment Survival and Flourishing Fund (SFF)Sharp Left Turn Situational Awareness Sleeper Agent Detection AI Safety Solution Cruxes State-Space Models / Mamba AI Model Steganography Structured Access / API-Only Sycophancy Sycophancy Feedback Loop Model AI Safety Technical Pathway Decomposition Technical AI Safety Research Tool Use and Computer Use Treacherous Turn Voluntary AI Safety Commitments AI Risk Warning Signs Model AI Whistleblower Protections Why Alignment Might Be Easy Why Alignment Might Be Hard Worldview-Intervention Mapping

Publication ID: anthropic

Anthropic | Publications | Longterm Wiki