Center for AI Safety (CAIS) Research Publications

web

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Center for AI Safety

This is the research portal for CAIS, a prominent AI safety organization known for influential work including the AI safety statement signed by leading researchers; useful as an index of ongoing safety-focused empirical and conceptual research.

Metadata

Importance: 72/100homepage

Summary

The Center for AI Safety (CAIS) publishes both technical and conceptual research aimed at mitigating high-consequence, societal-scale risks from AI. Their technical work focuses on safety benchmarks, robustness, machine ethics, and biosecurity, while their conceptual research draws on philosophy, safety engineering, and international relations to understand AI risk.

Key Points

•CAIS explicitly avoids research that improves safety only as a side effect of improving general capabilities, focusing on differential safety improvements.
•Technical research includes benchmarks like MASK (honesty), MoReBench (moral reasoning), VCT (biosecurity), and Remote Labor Index (automation).
•Conceptual research incorporates multidisciplinary perspectives including safety engineering, complex systems, international relations, and philosophy.
•Research spans multiple risk domains: robustness, machine ethics, biosecurity, and capability evaluation.
•Key researchers include Dan Hendrycks, Mantas Mazeika, and collaborators from major academic and industry institutions.

Cited by 1 page

Page	Type	Quality
Center for AI Safety (CAIS)	Organization	42.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202616 KB

Research Projects | CAIS 
 

 

 

 
 AI risk Resources
 Contact Careers Donate Resources
 AI Risk Contact Careers Donate Careers Donate 
 
 

 Categories

 Guiding Principles Technical AI Research Conceptual Research Guiding Principles

 At the Center for AI Safety, our research focuses on mitigating high-consequence, societal-scale risks posed by AI.

 We seek to develop foundational benchmarks and methods. To ensure that our work differentially improves the safety of AI systems, we do not pursue research which improves safety as a result of improving a model’s underlying general capabilities. Through our work, we strive to solve the technical challenge at the heart of AI safety.

 In addition to our technical research, we also pursue conceptual research, examining AI safety from a multidisciplinary perspective and incorporating insights from safety engineering, complex systems, international relations, philosophy, and so on. Through our conceptual research, we create frameworks that aid in understanding the current technical challenges and publish papers which provide insight into the societal risks posed by future AI systems.

 Technical AI Research

 Research which improves the safety of existing AI systems.

 Remote Labor Index: Measuring AI Automation of Remote Work

 Capability Benchmark Mantas Mazeika*, Alice Gatti*, Cristina Menghini*, Udari Madhushani Sehwag*, Shivam Singhal*, Yury Orlovskiy*, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola, John Teichmann, Kevin Yu, Zaina Shaik, Adam Khoja, Richard Ren, Jason Hausenloy, Long Phan, Ye Hyet, Ankit Aich, Tahseen Rabbani, Vivswan Shah, Andriy Novykov, Felix Binder, Kirill Chugunov, Luis Ramirez, Matias Geralnik, Hernán Mesura, Dean Lee, Ed-Yeremai Hernandez Cardona, Annette Diamond, Summer Yue**, Alexandr Wang**, Bing Liu**, Ernesto Hernandez**, Dan Hendrycks**

 View Research 
 
 MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

 Machine Ethics Yu Ying Chiu*, Michael S. Lee*, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Knight, Harry Lloyd, Florence Bacus, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell Gordon, Sydney Levine

 View Research 
 
 Safety Pretraining: Toward the Next Generation of Safe AI

 Robustness Pratyush Maini*, Sachin Goyal*, Dylan Sam*, Alex Robey, Yash Savani, Yiding Jiang, Andy Zou, Zachary C. Lipton, J. Zico Kolter

 View Research 
 
 Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark

 Biosecurity Jasper Götting, Pedro Medeiros, Jon G Sanders, Nathaniel Li, Long Phan, Karam Elabd, Lennart Justen, Dan Hendrycks, Seth Donoughe

 View Research 
 
 The MASK Benchmark: Disentangling Honesty From Accuracy in AI S

... (truncated, 16 KB total)

Resource ID: 51721cfcac0c036a | Stable ID: sid_rGCIFTsxv7