Zhang et al. (2021)

paper

2021·arXiv·arxiv.org/abs/2106.15590

Authors

Abeba Birhane·Pratyusha Kalluri·Dallas Card·William Agnew·Ravit Dotan·Michelle Bao

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research paper examining values encoded in machine learning research papers through annotation schemes, directly addressing how ML systems reflect and perpetuate specific values—critical for understanding AI safety implications and responsible AI development.

Paper Details

Citations

25 influential

Year

2021

arXiv:2106.15590 DOI:10.7717/peerj.20049/fig-7 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly cited machine learning papers published at premier machine learning conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: their justification for their choice of project, which attributes of their project they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that few of the papers justify how their project connects to a societal need (15\%) and far fewer discuss negative potential (1\%). Through line-by-line content analysis, we identify 59 values that are uplifted in ML research, and, of these, we find that the papers most frequently justify and assess themselves based on Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty. We present extensive textual evidence and identify key themes in the definitions and operationalization of these values. Notably, we find systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power.Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

Summary

Zhang et al. (2021) develops a method to systematically analyze the values encoded in machine learning research papers. By annotating 100 highly cited papers from ICML and NeurIPS conferences, the authors identify 59 distinct values that ML research upholds. They find that papers rarely justify connections to societal needs (15%) or discuss potential harms (1%), instead prioritizing Performance, Generalization, Quantitative evidence, Efficiency, and Novelty. Critically, the analysis reveals that these dominant values are defined and applied in ways that systematically support power centralization, while funding and institutional ties increasingly concentrate among tech companies and elite universities.

Cited by 1 page

Page	Type	Quality
AI Safety Research Allocation Model	Analysis	65.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202698 KB

[2106.15590] The Values Encoded in Machine Learning Research 
 
 
 
 
 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 The Values Encoded in Machine Learning Research

 
 
 Abeba Birhane
 
 abeba@mozillafoundation.org 
 
 0000-0001-6319-7937 

 Mozilla Foundation & School of Computer Science, University College Dublin Belfied Dublin Ireland 
 
 ,  
 Pratyusha Kalluri
 
 pkalluri@stanford.edu 
 
 0000-0001-7202-8027 

 Computer Science Department, Stanford University 353 Jane Stanford Way Palo Alto USA 
 
 ,  
 Dallas Card
 
 dalc@umich.edu 
 
 0000-0001-5573-8836 

 School of Information, University of Michigan 105 S State St Ann Arbor USA 
 
 ,  
 William Agnew
 
 wagnew3@cs.washington.edu 
 
 

 Paul G. Allen School of Computer Science and Engineering, University of Washington 185 E Stevens Way NE Seattle USA 
 
 ,  
 Ravit Dotan
 
 ravit.dotan@berkeley.edu 
 
 0000-0002-9646-8315 

 Center for Philosophy of Science, University of Pittsburgh 4200 Fifth Ave Pittsburgh USA 
 
  and  
 Michelle Bao
 
 baom@stanford.edu 
 
 0000-0002-4410-0703 

 Computer Science Department, Stanford University 353 Jane Stanford Way Palo Alto USA 
 
 
 (2022) 

 
 Abstract.

 Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly cited machine learning papers published at premier machine learning conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: their justification for their choice of project, which attributes of their project they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that few of the papers justify how their project connects to a societal need (15%) and far fewer discuss negative potential (1%). Through line-by-line content analysis, we identify 59 values that are uplifted in ML research, and, of these, we find that the papers most frequently justify and assess themselves based on Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty.
We present extensive textual evidence and identify key themes in the definitions and operationalization of these values. Notably, we find systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power.
Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

 
 Encoded values of ML, ICML, NeurIPS, Corporate ties, Power asymmetries
 
 † † conference: ACM Conference on Fairness,

... (truncated, 98 KB total)

Resource ID: 4c76d88cc9dd70a0 | Stable ID: sid_UxYG9HrJtk