Nobody Knows How to Safety-Test AI

web

TIME·time.com/6958868/artificial-intelligence-safety-evaluatio...

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: TIME

Accessible 2024 TIME piece providing a journalistic overview of METR and the state of AI safety evaluations; useful for understanding the institutional landscape and limitations of current dangerous-capability assessments.

Metadata

Importance: 62/100news articlenews

Summary

A TIME article profiling METR (Model Evaluation and Threat Research) and the broader challenge of AI safety evaluations. It examines how researchers attempt to probe frontier AI systems for dangerous capabilities, highlighting that current evaluation methods are immature and the field lacks consensus on how to rigorously assess AI risks.

Key Points

•METR, led by Beth Barnes, is one of the few organizations systematically trying to evaluate whether AI models could cause catastrophic harm if prompted correctly.
•OpenAI, Anthropic, and the UK government have partnered with METR for safety testing, but methods remain underdeveloped and unvalidated.
•Researchers describe frontier LLMs as 'vast alien intelligences' whose true capabilities are poorly understood, making safety evaluation deeply uncertain.
•The field of 'model psychology'—understanding how to elicit dangerous behaviors from AI—is nascent and relies heavily on adversarial human probing.
•The article underscores a systemic gap: AI systems are being deployed faster than reliable safety-testing methodologies can be developed.

Cited by 2 pages

Page	Type	Quality
METR	Organization	66.0
Emergent Capabilities	Risk	61.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202618 KB

- [Tech](https://time.com/section/tech)

# Nobody Knows How to Safety-Test AI

[ADD TIME ON GOOGLE](https://www.google.com/preferences/source?q=https://time.com)

Show me more content from TIME on Google Search

![Will Henshall](https://time.com/redesign/_next/image/?url=https%3A%2F%2Fapi.time.com%2Fwp-content%2Fuploads%2F2023%2F06%2FHeadshot-March-2023_5-7_BW.jpg%3Fquality%3D85%26w%3D1024&w=256&q=75)

by

[Will Henshall](https://time.com/author/will-henshall/)

Mar 21, 2024 1:58 PM ET

![](https://time.com/redesign/_next/image/?url=https%3A%2F%2Fapi.time.com%2Fwp-content%2Fuploads%2F2024%2F03%2FSafetyTestAI.png%3Fw%3D1800&w=3840&q=75)

Illustration by TIME

![Will Henshall](https://time.com/redesign/_next/image/?url=https%3A%2F%2Fapi.time.com%2Fwp-content%2Fuploads%2F2023%2F06%2FHeadshot-March-2023_5-7_BW.jpg%3Fquality%3D85%26w%3D1024&w=256&q=75)

by

[Will Henshall](https://time.com/author/will-henshall/)

Mar 21, 2024 1:58 PM ET

Beth Barnes and three of her colleagues sit cross-legged in a semicircle on a damp lawn on the campus of the University of California, Berkeley. They are describing their attempts to interrogate artificial intelligence chatbots.

“They are, in some sense, these vast alien intelligences,” says Barnes, 26, who is the founder and CEO of Model Evaluation and Threat Research (METR), an AI-safety nonprofit. “They know so much about whether the next word is going to be ‘is’ versus ‘was.’ We're just playing with a tiny bit on the surface, and there's all this, miles and miles underneath,” she says, gesturing at the potentially immense depths of large language models’ capabilities. (Large language models, such as OpenAI’s GPT-4 and Anthropic’s Claude, are giant AI systems that are trained by predicting the next word for a vast amount of text, and that can answer questions and carry out basic reasoning and planning.)

Researchers at METR look a lot like Berkeley students—the four on the lawn are in their twenties and dressed in jeans or sweatpants. But rather than attending lectures or pulling all-nighters in the library, they spend their time probing the latest and most powerful AI systems to try and determine whether, if you asked just right, they could do something dangerous. As they explain how they try to ascertain whether the current generation of chatbots or the next could cause a catastrophe, they pick at the grass. They may be young, but few people have thought about how to elicit danger from AIs as much as they have.

Two of the world’s most prominent AI companies—OpenAI and Anthropic—have worked with METR as part of their efforts to safety-test their AI models. The [U.K. government](https://www.gov.uk/government/publications/frontier-ai-taskforce-first-progress-report/frontier-ai-taskforce-first-progress-report#:~:text=of%20partnerships%20with%3A-,ARC%20Evals,-is%20a%20non) partnered with METR as part of its efforts to start safety-testing AI systems, and President Barack Obama called METR out as a civil society orga

... (truncated, 18 KB total)

Resource ID: 4f4d29912b960092 | Stable ID: sid_2unzfuTyQx