Shortcut Learning in Deep Neural Networks

paper

2020·arXiv·arxiv.org/abs/2004.07780

Authors

Robert Geirhos·Jörn-Henrik Jacobsen·Claudio Michaelis·Richard Zemel·Wieland Brendel·Matthias Bethge·Felix A. Wichmann

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper identifies shortcut learning as a fundamental problem in deep neural networks where models exploit spurious correlations instead of learning robust features, directly relevant to AI safety concerns about model reliability and robustness.

Paper Details

Citations

2,668

131 influential

Year

2020

Methodology

peer-reviewed

Metadata

arxiv preprintprimary source

Abstract

Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distill how many of deep learning's problems can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

Summary

This perspective paper identifies shortcut learning as a unifying explanation for many limitations in deep neural networks. Shortcuts are decision rules that achieve high performance on standard benchmarks but fail to generalize to real-world conditions or more challenging test scenarios. The authors argue that shortcut learning is a common characteristic across biological and artificial learning systems, drawing parallels from comparative psychology, education, and linguistics. The paper proposes recommendations for model interpretation, benchmarking practices, and robustness improvements to enhance transferability from laboratory settings to practical applications.

Cited by 1 page

Page	Type	Quality
Long-Timelines Technical Worldview	Concept	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202688 KB

[2004.07780] Shortcut Learning in Deep Neural Networks 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Shortcut Learning in Deep Neural Networks

 
 
 
Robert Geirhos 1,2, ∗ ∗ \ast , § § \S 
 
 
 
 
 
Jörn-Henrik Jacobsen 3, ∗ ∗ \ast 
 
 
 
 
 
Claudio Michaelis 1,2, ∗ ∗ \ast 
 
 
 
 
 
 Richard Zemel † † \dagger ,3 
 
 
 
 
 
Wieland Brendel † † \dagger ,1 
 
 
 
 
 
Matthias Bethge † † \dagger ,1 &
Felix A. Wichmann † † \dagger , 
 
 
 
 

 
 Abstract

 Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today’s machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distill how many of deep learning’s problems can be seen as different symptoms of the same underlying problem: shortcut learning . Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications. 

 
 
 
 1 Introduction

 
 If science was a journey, then its destination would be the discovery of simple explanations to complex phenomena. There was a time when the existence of tides, the planet’s orbit around the sun, and the observation that “things fall down” were all largely considered to be independent phenomena—until 1687, when Isaac Newton formulated his law of gravitation that provided an elegantly simple explanation to all of these (and many more). Physics has made tremendous progress over the last few centuries, but the thriving field of deep learning is still very much at the beginning of its journey—often lacking a detailed understanding of the underlying principles. † † This is the preprint version of an article that has been published by Nature Machine Intelligence ( https://doi.org/10.1038/s42256-020-00257-z ). 

 
 
 For some time, the tremendous success of deep learning has perhaps overshadowed the need to thoroughly understand the behaviour of Deep Neural Networks (DNNs). In an ever-increasing pace, DNNs were reported as having achieved human-level object classification performance [ 1 ] , beating world-class human Go, Poker, and Starcraft players [ 2 , 3 ] , detecting cancer from X-ray scans [ 4 ] , translating text across languages [ 5 ] , helping combat climate change [ 6 ] , and accelerating the pace of scientific progress itself [ 7 ] . Because of these successes, deep learning has gained a strong influence on our lives and society. At the same time, however, resear

... (truncated, 88 KB total)

Resource ID: de2f3e11b7093ba6 | Stable ID: sid_KwHTG9QoZu