Skip to content
Longterm Wiki
Back

Center for AI Safety (CAIS) Research Publications

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Center for AI Safety

This is the research portal for CAIS, a prominent AI safety organization known for influential work including the AI safety statement signed by leading researchers; useful as an index of ongoing safety-focused empirical and conceptual research.

Metadata

Importance: 72/100homepage

Summary

The Center for AI Safety (CAIS) publishes both technical and conceptual research aimed at mitigating high-consequence, societal-scale risks from AI. Their technical work focuses on safety benchmarks, robustness, machine ethics, and biosecurity, while their conceptual research draws on philosophy, safety engineering, and international relations to understand AI risk.

Key Points

  • CAIS explicitly avoids research that improves safety only as a side effect of improving general capabilities, focusing on differential safety improvements.
  • Technical research includes benchmarks like MASK (honesty), MoReBench (moral reasoning), VCT (biosecurity), and Remote Labor Index (automation).
  • Conceptual research incorporates multidisciplinary perspectives including safety engineering, complex systems, international relations, and philosophy.
  • Research spans multiple risk domains: robustness, machine ethics, biosecurity, and capability evaluation.
  • Key researchers include Dan Hendrycks, Mantas Mazeika, and collaborators from major academic and industry institutions.

Cited by 1 page

PageTypeQuality
Center for AI SafetyOrganization42.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202625 KB
[CAIS 2024 Impact Report\\
\\
CAIS 2024 Impact Report](https://safe.ai/work/impact-report/2024)

#### Categories

[Guiding Principles](https://safe.ai/work/research#guiding-principles) [Technical AI Research](https://safe.ai/work/research#technical-ml-research) [Conceptual Research](https://safe.ai/work/research#conceptual-research)

## Guiding Principles

#### At the Center for AI Safety, our research focuses on mitigating high-consequence, societal-scale risks posed by AI.

We seek to develop foundational benchmarks and methods. To ensure that our work differentially improves the safety of AI systems, we do not pursue research which improves safety as a result of improving a model’s underlying general capabilities. Through our work, we strive to solve the technical challenge at the heart of AI safety.

In addition to our technical research, we also pursue conceptual research, examining AI safety from a multidisciplinary perspective and incorporating insights from safety engineering, complex systems, international relations, philosophy, and so on. Through our conceptual research, we create frameworks that aid in understanding the current technical challenges and publish papers which provide insight into the societal risks posed by future AI systems.

## Technical AI Research

Research which improves the safety of existing AI systems.

![](https://cdn.prod.website-files.com/63fe96aeda6bea77ac7d3000/640cb8f73ecaf887f93a53ea_CAIS%20Logomark%20Black.svg)

#### Remote Labor Index: Measuring AI Automation of Remote Work

Capability Benchmark

Mantas Mazeika\*, Alice Gatti\*, Cristina Menghini\*, Udari Madhushani Sehwag\*, Shivam Singhal\*, Yury Orlovskiy\*, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola, John Teichmann, Kevin Yu, Zaina Shaik, Adam Khoja, Richard Ren, Jason Hausenloy, Long Phan, Ye Hyet, Ankit Aich, Tahseen Rabbani, Vivswan Shah, Andriy Novykov, Felix Binder, Kirill Chugunov, Luis Ramirez, Matias Geralnik, Hernán Mesura, Dean Lee, Ed-Yeremai Hernandez Cardona, Annette Diamond, Summer Yue\*\*, Alexandr Wang\*\*, Bing Liu\*\*, Ernesto Hernandez\*\*, Dan Hendrycks\*\*

[View Research](https://www.remotelabor.ai/)

![](https://cdn.prod.website-files.com/63fe96aeda6bea77ac7d3000/640cb8f73ecaf887f93a53ea_CAIS%20Logomark%20Black.svg)

#### MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

Machine Ethics

Yu Ying Chiu\*, Michael S. Lee\*, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Knight, Harry Lloyd, Florence Bacus, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell Gordon, Sydney Levine

[View Research](https://www.arxiv.org/abs/2510.16380)

![](https://cdn.prod.website-files.com/63fe96aeda6bea77ac7d3000

... (truncated, 25 KB total)
Resource ID: 51721cfcac0c036a | Stable ID: ZWQxMzA4NT