Back
Beth Barnes - Personal Homepage
webbarnes.page·barnes.page/
Beth Barnes is a key figure in AI safety evaluations, particularly around dangerous capabilities and autonomous replication; her homepage serves as a hub for her research output and professional work, relevant to those following frontier model safety assessments.
Metadata
Importance: 55/100homepage
Summary
Personal homepage of Beth Barnes, an AI safety researcher known for work on evaluations, dangerous capabilities assessments, and autonomous replication risks in AI systems. The page likely links to her research, projects, and professional background in the AI safety field.
Key Points
- •Beth Barnes is a prominent AI safety researcher focusing on evaluations and dangerous capabilities
- •Her work includes developing frameworks for assessing autonomous replication and adaptation risks in AI
- •She has been involved with organizations such as ARC Evals (now part of METR) focused on model evaluations
- •Her research informs how frontier AI labs and policymakers assess dangerous capability thresholds
- •Evaluations work contributes to responsible scaling policies and deployment decisions
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| METR | Organization | 66.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20263 KB
# Elizabeth (Beth) Barnes

Founder & CEO at [METR](https://metr/), building evaluations so we know if we're getting close to very risky AI. Formerly at DeepMind and OpenAI.
Some research highlights:
- [Measuring AI ability to complete long tasks](https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) \- We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has
been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend
predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
- [GPT-5 autonomy evaluation report](https://metr.github.io/autonomy-evals-guide/gpt-5-report/#fnref:1) \- We evaluate whether GPT-5 poses significant catastrophic risks via AI self-improvement,
rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However,
capability trends continue rapidly, and models display increasing eval awareness.
- [Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/) \- We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity
of experienced open-source developers working on their own repositories. Surprisingly, we find that when
developers use AI tools, they take 19% longer than without: AI makes them slower. We view this result as a
snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve,
we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation
- [Resources for Autonomy Evaluations](https://metr.github.io/autonomy-evals-guide/) \- task suite, evaluation protocol, estimates of the "elicitation gap"
- [Evaluating LLM Agents on Realistic Autonomous Tasks](https://arxiv.org/abs/2312.11671)
- [Evaluating LLMs trained on code](https://arxiv.org/abs/2107.03374) (alignment section)
- [Obfuscated arguments problem](https://www.alignmentforum.org/posts/PJLABqQ962hZEqhdB/debate-update-obfuscated-arguments-problem) \- a problem with recursive-decomposition-based alignment approaches
- ["Imitative generalisation"](https://www.alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1) \- explainer for Paul Christiano's 'Learning the Prior'
- [Risks from AI persuasion](https://www.alignmentforum.org/posts/5cWtwATHL6KyzChck/risks-from-ai-persuasion) \- thoughts on the likelihood and consequences of superhuman persuasion before AGI
- [Reflection mechanisms as an alignment target](https://www.alignmentforum.org/posts/XyBWkoaqfnuEyNWXi/reflection-mechanisms-as-an-alignment-target-a-survey-1) \- work done by my AI safety c
... (truncated, 3 KB total)Resource ID:
f895b1f8c1806c5e | Stable ID: ZDc2OGIxMT