Skip to content
Longterm Wiki
Back

Casper, S., et al. (2024). "Black-Box Access is Insufficient for Rigorous AI Audits."

paper

Authors

Stephen Casper·Carson Ezell·Charlotte Siegmann·Noam Kolt·Taylor Lynn Curtis·Benjamin Bucknall·Andreas Haupt·Kevin Wei·Jérémy Scheurer·Marius Hobbhahn·Lee Sharkey·Satyapriya Krishna·Marvin Von Hagen·Silas Alberti·Alan Chan·Qinyi Sun·Michael Gerovitch·David Bau·Max Tegmark·David Krueger·Dylan Hadfield-Menell

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research paper examining limitations of black-box AI auditing and demonstrating why white-box access is necessary for rigorous safety evaluations, directly addressing AI governance and accountability mechanisms.

Paper Details

Citations
44
Year
2024
Methodology
peer-reviewed
Categories
The 2024 ACM Conference on Fairness, Accountabilit

Metadata

arxiv preprintprimary source

Abstract

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

Summary

This paper argues that black-box access to AI systems—where auditors can only query and observe outputs—is insufficient for rigorous AI audits. The authors demonstrate that white-box access (to model weights, activations, and gradients) and outside-the-box access (to training data, code, documentation, and deployment details) enable substantially stronger evaluations, including more effective attacks, better model interpretation, and targeted fine-tuning. The paper discusses safeguards for conducting these deeper audits while managing security risks, and concludes that audit transparency and access levels are critical for properly interpreting results.

Cited by 1 page

PageTypeQuality
CorrigibilityResearch Area59.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Black-Box Access is Insufficient for Rigorous AI Audits

Stephen Casper
[scasper@mit.edu](mailto:scasper@mit.edu)MIT CSAIL, scasper@mit.edu, Carson Ezell
[cezell@college.harvard.edu](mailto:cezell@college.harvard.edu)Harvard University, cezell@college.harvard.edu, Charlotte Siegmann
MIT, Noam Kolt
University of Toronto, Taylor Lynn Curtis
MIT CSAIL, Benjamin Bucknall
Centre for the Governance of AI, Andreas Haupt
MIT, Kevin Wei
Harvard Law School, Jérémy Scheurer
Apollo Research, Marius Hobbhahn
Apollo Research, Lee Sharkey
Apollo Research, Satyapriya Krishna
Harvard University, Marvin Von Hagen
MIT, Silas Alberti
Stanford University, Alan Chan
Mila - Quebec AI Institute, Centre for the Governance of AI, Qinyi Sun
MIT, Michael Gerovitch
MIT, David Bau
Northeastern University, Max Tegmark
MIT, David Krueger
University of Cambridge and Dylan Hadfield-Menell
MIT CSAIL

(2024)

###### Abstract.

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of system access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on _black-box_ access, in which auditors can only query the system and observe its outputs. However, white-box access to the system’s inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, _outside-the-box_ access to its training and deployment information (e.g., methodology, code, documentation, hyperparameters, data, deployment details, findings from internal evaluations) allows for auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

Auditing, Evaluation, Governance, Regulation, Policy, Risk, Fairness, Black-Box Access, White-Box Access, Adversarial Attacks, Interpretability, Explainability, Fine-Tuning

††copyright: acmcopyright††journalyear: 2024††ccs: Security and privacy Social aspects of security and privacy††ccs: Social and professional topics Governmental regulations

## 1\. Introduction

External evaluations of AI systems are emerging as a key component of AI oversight (Brown et al., [2021](https://ar5iv.labs.arxiv.org/html/2401.14446#bib.bib1 ""); Watkins et al., [2021](https://ar5iv.labs.arxiv.org/html/2401.14446#bib.bib2 ""); Metcalf et al., [2021](https://ar5iv.labs.arxiv.org/html/2401

... (truncated, 98 KB total)
Resource ID: e8d4a1a628967548 | Stable ID: YWY1MDRlYz