Rose Hadshar's 2024 review
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: AI Impacts
This is an AI Impacts blog post summarizing a 2023 arXiv report by visiting researcher Rose Hadshar; the full paper is linked and provides a systematic empirical review relevant to x-risk arguments.
Metadata
Summary
Rose Hadshar reviews empirical evidence for existential risk from AI, focusing on misalignment and power-seeking behaviors. She finds evidence of misaligned goals (via specification gaming and goal misgeneralization) but no clear examples of power-seeking AI, concluding the evidence is concerning but inconclusive. The uncertainty itself is treated as worrying given the potential severity of the risks.
Key Points
- •Empirical evidence exists for AI misalignment via specification gaming and goal misgeneralization, including in deployment settings.
- •Conceptual arguments for power-seeking behavior are considered strong, but no clear empirical examples of power-seeking AI were found.
- •The review concludes uncertainty cuts both ways: hard to be confident either that misaligned power-seeking poses large existential risk or no risk.
- •Part of a larger AI Impacts project including expert interviews, a claims mapping, and a database of empirical evidence on AI risk.
- •Author calls for more reviews of evidence, including evidence against AI risks, to reduce uncertainty.
Cached Content Preview
[](https://blog.aiimpacts.org/)
# [AI Impacts blog](https://blog.aiimpacts.org/)
SubscribeSign in
# New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
[Harlan Stewart](https://substack.com/profile/132526043-harlan-stewart)
Nov 06, 2023
Share
Visiting researcher Rose Hadshar recently published [a review of some evidence for existential risk from AI, focused on empirical evidence for misalignment and power seeking](https://arxiv.org/pdf/2310.18244.pdf). (Previously from this project: a blogpost outlining some of the [key claims that are often made about AI risk](https://blog.aiimpacts.org/p/a-mapping-of-claims-about-ai-risk), a series of [interviews](https://wiki.aiimpacts.org/arguments_for_ai_risk/is_ai_an_existential_threat_to_humanity/interviews_on_the_strength_of_the_evidence_for_ai_risk_claims) of AI researchers, and a [database](https://wiki.aiimpacts.org/arguments_for_ai_risk/is_ai_an_existential_threat_to_humanity/database_of_empirical_evidence_about_ai_risk) of empirical evidence for misalignment and power seeking.)
In this report, Rose looks into evidence for:
- Misalignment,[1](https://blog.aiimpacts.org/p/new-report-a-review-of-the-empirical#footnote-1-138652358) where AI systems develop goals which are misaligned with human goals; and
- Power-seeking,[2](https://blog.aiimpacts.org/p/new-report-a-review-of-the-empirical#footnote-2-138652358) where misaligned AI systems seek power to achieve their goals.
Rose found the current state of this evidence for existential risk from misaligned power-seeking to be concerning but inconclusive:
- There is empirical evidence of AI systems developing misaligned goals (via specification gaming[3](https://blog.aiimpacts.org/p/new-report-a-review-of-the-empirical#footnote-3-138652358) and via goal misgeneralization[4](https://blog.aiimpacts.org/p/new-report-a-review-of-the-empirical#footnote-4-138652358)), including in deployment (via specification gaming), but it's not clear to Rose whether these problems will scale far enough to pose an existential risk.
- Rose considers the conceptual arguments for power-seeking behavior from AI systems to be strong, but notes that she could not find any clear examples of power-seeking AI so far.
With these considerations, Rose thinks that it’s hard to be very confident either that misaligned power-seeking poses a large existential risk, or that it poses no existential risk. She finds this uncertainty to be concerning, given the severity of the potential risks in question. Rose also expressed that it would be good to have more reviews of evidence, including evidence for other claims about AI risks[5](https://blog.aiimpacts.org/p/new-report-a-review-of-the-empir
... (truncated, 8 KB total)db007950f4432eb2 | Stable ID: OTJhZGY3OD