ARC Evals - Giving What We Can
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Giving What We Can
ARC Evals has since rebranded to METR (Model Evaluation and Threat Research); this page is useful for those researching donation opportunities or the organizational landscape of AI safety evaluations.
Metadata
Summary
This Giving What We Can page profiles ARC Evals (now METR), an organization focused on evaluating frontier AI models for dangerous capabilities and autonomous behaviors. The page provides donation and impact information for those considering funding AI safety evaluations work. ARC Evals develops rigorous benchmarks and tests to assess whether AI systems pose catastrophic risks before deployment.
Key Points
- •ARC Evals (rebranded as METR) specializes in evaluating frontier AI models for potentially dangerous or autonomous capabilities
- •The organization works with leading AI labs to conduct pre-deployment safety evaluations and red-teaming
- •Giving What We Can features ARC Evals as a recommended or notable charity in the AI safety funding space
- •Their evaluations help inform deployment decisions and safety commitments at major AI developers
- •Supports the broader ecosystem of third-party independent AI safety auditing and evaluation
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Model Organisms of Misalignment | Analysis | 65.0 |
| Dustin Moskovitz | Person | 49.0 |
Cached Content Preview
METR (formerly called ARC Evals) is the [evaluations project](https://evals.alignment.org/) at the [Alignment Research Center](https://www.alignment.org/). Its work assesses whether cutting-edge AI systems could pose catastrophic risks to civilization.
[Website](https://evals.alignment.org/ "Website")
[Reducing global catastrophic risks](https://www.givingwhatwecan.org/cause-areas/reducing-global-catastrophic-risks)
## What problem is METR working on?
As AI systems become more powerful, it becomes increasingly important to ensure these systems are safe and aligned with our interests. A [growing number of experts](https://www.safe.ai/statement-on-ai-risk) are concerned that future AI systems pose an existential risk to humanity — and according to one study of machine learning researchers conducted by AI Impacts, the median respondent reported believing that there is a 5% chance of an “ [extremely bad outcome (e.g. human extinction)](https://aiimpacts.org/how-bad-a-future-do-ml-researchers-expect/)”. One way to prepare for this is to be able to evaluate current systems and [receive warning signs if new risks emerge](https://www.deepmind.com/blog/an-early-warning-system-for-novel-ai-risks).
## What does METR do?
METR is contributing to the following AI governance approach:
1. Before a new large-scale system is released, assess whether it is capable of potentially catastrophic activities.
2. If so, require strong guarantees that the system _will not_ carry out such activities.
METR's current work focuses primarily on evaluating capabilities (the first step above), in particular a capability they call **autonomous replication** — the ability of an AI system to survive on a cloud server, obtain money and compute resources, and use those resources to make more copies of itself.
Evals was given early access to [OpenAI’s](https://openai.com/) [GPT-4](https://openai.com/gpt-4) and [Anthropic’s](https://www.anthropic.com/) [Claude](https://www.anthropic.com/index/introducing-claude) to assess them for safety. They determined that these systems are not capable of “ [fairly basic steps towards autonomous replication](https://evals.alignment.org/)” — but still, some of the steps they can take are already somewhat alarming. One [highly](https://gizmodo.com/gpt4-open-ai-chatbot-task-rabbit-chatgpt-1850227471) [publicised](https://www.foxbusiness.com/technology/openais-gpt-4-faked-being-blind-deceive-taskrabbit-human-helping-solve-captcha) example from METR’s assessment was that GPT-4 successfully pretended to be a vision-impaired human to convince a TaskRabbit worker to solve a CAPTCHA code.
Suppose AI systems _could_ autonomously replicate, what are the risks?
- They could become extremely powerful tools for malicious actors.
- They could replicate, accrue power and resources, and use these to further their own goals. Without guarantees these goals are aligned with our own, this could have catastrophic — potentially even existential — consequences fo
... (truncated, 8 KB total)d9ec627fc4ddf08e | Stable ID: OTZjOWFhZW