AXRP Episode 34 - AI Evaluations with Beth Barnes

web

axrp.net·axrp.net/episode/2024/07/28/episode-34-ai-evaluations-bet...

A detailed podcast interview with the founder of METR (Model Evaluation and Threat Research), a leading organization developing AI capability evaluations used in responsible scaling policies at major labs.

Metadata

Importance: 72/100podcast episodeprimary source

Summary

Beth Barnes, co-founder and head of research at METR, discusses how AI capability evaluations work, their limitations, and how they fit into safety policy. The conversation covers threat modeling, capability buffers, alignment evaluations, and how METR's work informs responsible scaling policies at AI labs.

Key Points

•METR's mission is to prevent the world from being surprised by dangerous AI capabilities, through threat modeling, eval creation, and policy recommendations.
•Current evaluations focus on dangerous capabilities, but future work may shift toward control and alignment evaluations as models advance.
•A key challenge is whether models are actually showing their full capabilities during evals, which affects how much safety margins are needed.
•METR works with labs and governments to shape responsible scaling policies, specifying what mitigations are required at different capability thresholds.
•Evaluating alignment poses distinct methodological challenges from capability evaluations, requiring different frameworks and assumptions.

Cited by 1 page

Page	Type	Quality
METR	Organization	66.0

Cached Content Preview

HTTP 200Fetched May 17, 202698 KB

[YouTube link](https://youtu.be/TZNlKcDI4To)

How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more.

Topics we discuss:

- [What is METR?](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#what-is-metr)
- [What is an “eval”?](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#what-is-an-eval)
- [How good are evals?](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#how-good-are-evals)
- [Are models showing their full capabilities?](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#are-models-showing-their-full-capabilities)
- [Evaluating alignment](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#evaluating-alignment)
- [Existential safety methodology](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#existential-safety-methodology)
- [Threat models and capability buffers](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#threat-models-capability-buffers)
- [METR’s policy work](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#metrs-policy-work)
- [METR’s relationship with labs](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#metrs-relationship-with-labs)
- [Related research](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#related-research)
- [Roles at METR, and following METR’s work](https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html#roles-following)

**Daniel Filan:**
Hello everybody. In this episode I’ll be speaking with Beth Barnes. Beth is the co-founder and head of research at METR. Previously, she was at OpenAI and DeepMind, doing a diverse set of things, including testing AI safety by debate and evaluating cutting-edge machine learning models. In the description, there are links to research and writings that we discussed during the episode. And if you’re interested in a transcript, it’s available at axrp.net. Well, welcome to AXRP.

**Beth Barnes:**
Hey, great to be here.

## What is METR?

**Daniel Filan:**
Cool. So, in the introduction, I mentioned that you worked for Model Evaluation and Threat Research, or METR. What is METR?

**Beth Barnes:**
Yeah, so basically, the basic mission is: have the world not be taken by surprise by dangerous AI stuff happening. So, we do threat modeling and eval creation, currently mostly around capabilities evaluation, but we’re interested in whatever evaluation it is that is most load-bearing for why we think AI systems are safe. With current models, that’s capabilities evaluations; in future that might be more like control or alignment evaluations. And yeah, \[the aim is

... (truncated, 98 KB total)

Resource ID: 762cb9e7f2b6b886 | Stable ID: sid_Cecbd6PZYR