Casper, S., et al. (2024). "Black-Box Access is Insufficient for Rigorous AI Audits."

paper

2024·arXiv·arxiv.org/abs/2401.14446

Authors

Stephen Casper·Carson Ezell·Charlotte Siegmann·Noam Kolt·Taylor Lynn Curtis·Benjamin Bucknall·Andreas Haupt·Kevin Wei·Jérémy Scheurer·Marius Hobbhahn·Lee Sharkey·Satyapriya Krishna·Marvin Von Hagen·Silas Alberti·Alan Chan·Qinyi Sun·Michael Gerovitch·David Bau·Max Tegmark·David Krueger·Dylan Hadfield-Menell

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research paper examining limitations of black-box AI auditing and demonstrating why white-box access is necessary for rigorous safety evaluations, directly addressing AI governance and accountability mechanisms.

Paper Details

Citations

Year

2024

Methodology

peer-reviewed

Metadata

arxiv preprintprimary source

Abstract

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

Summary

This paper argues that black-box access to AI systems—where auditors can only query and observe outputs—is insufficient for rigorous AI audits. The authors demonstrate that white-box access (to model weights, activations, and gradients) and outside-the-box access (to training data, code, documentation, and deployment details) enable substantially stronger evaluations, including more effective attacks, better model interpretation, and targeted fine-tuning. The paper discusses safeguards for conducting these deeper audits while managing security risks, and concludes that audit transparency and access levels are critical for properly interpreting results.

Cited by 1 page

Page	Type	Quality
Corrigibility	Research Area	59.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202698 KB

[2401.14446] Black-Box Access is Insufficient for Rigorous AI Audits 
 
 
 
 
 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 Black-Box Access is Insufficient for Rigorous AI Audits

 
 
 Stephen Casper
 
 scasper@mit.edu 
 
 MIT CSAIL, scasper@mit.edu 
 
 ,  
 Carson Ezell
 
 cezell@college.harvard.edu 
 
 Harvard University, cezell@college.harvard.edu 
 
 ,  
 Charlotte Siegmann
 
 MIT 
 
 ,  
 Noam Kolt
 
 University of Toronto 
 
 ,  
 Taylor Lynn Curtis
 
 MIT CSAIL 
 
 ,  
 Benjamin Bucknall
 
 Centre for the Governance of AI 
 
 ,  
 Andreas Haupt
 
 MIT 
 
 ,  
 Kevin Wei
 
 Harvard Law School 
 
 ,  
 Jérémy Scheurer
 
 Apollo Research 
 
 ,  
 Marius Hobbhahn
 
 Apollo Research 
 
 ,  
 Lee Sharkey
 
 Apollo Research 
 
 ,  
 Satyapriya Krishna
 
 Harvard University 
 
 ,  
 Marvin Von Hagen
 
 MIT 
 
 ,  
 Silas Alberti
 
 Stanford University 
 
 ,  
 Alan Chan
 
 Mila - Quebec AI Institute, Centre for the Governance of AI 
 
 ,  
 Qinyi Sun
 
 MIT 
 
 ,  
 Michael Gerovitch
 
 MIT 
 
 ,  
 David Bau
 
 Northeastern University 
 
 ,  
 Max Tegmark
 
 MIT 
 
 ,  
 David Krueger
 
 University of Cambridge 
 
  and  
 Dylan Hadfield-Menell
 
 MIT CSAIL 
 
 
 (2024) 

 
 Abstract.

 External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of system access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system’s inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to its training and deployment information (e.g., methodology, code, documentation, hyperparameters, data, deployment details, findings from internal evaluations) allows for auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

 
 Auditing, Evaluation, Governance, Regulation, Policy, Risk, Fairness, Black-Box Access, White-Box Access, Adversarial Attacks, Interpretability, Explainability, Fine-Tuning
 
 † † copyright: acmcopyright † † journalyear: 2024 † † ccs: Security and privacy Social aspects of security and privacy † † ccs: Social and professional topics Governmental regulations 
 

 
 
 1. Introduction

 


... (truncated, 98 KB total)

Resource ID: e8d4a1a628967548 | Stable ID: sid_Hmwy5Lw8JS