Back
WILDS: A Benchmark of in-the-Wild Distribution Shifts
webwilds.stanford.edu·wilds.stanford.edu/
WILDS is a widely-used benchmark in ML robustness research; relevant to AI safety discussions about whether models maintain reliable performance when deployed in conditions differing from their training distribution.
Metadata
Importance: 62/100tool pagedataset
Summary
WILDS is a benchmark suite from Stanford designed to evaluate machine learning model robustness to real-world distribution shifts, covering scenarios where training and test data differ due to geographic, temporal, or demographic factors. It provides curated datasets across diverse domains (medical imaging, wildlife detection, text classification, etc.) to measure model generalization under distribution shift. The benchmark aims to close the gap between academic ML performance metrics and real-world deployment reliability.
Key Points
- •Provides 10+ datasets spanning diverse domains where distribution shift is a practical, documented real-world problem rather than synthetic.
- •Distinguishes between subpopulation shifts (same domain, different groups) and domain shifts (different environments/conditions).
- •Enables standardized evaluation of how well models generalize to out-of-distribution data, critical for safe deployment.
- •Includes unlabeled data splits to support and benchmark unsupervised and semi-supervised domain adaptation methods.
- •Directly relevant to AI safety concerns about model reliability and performance degradation in deployment settings.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Distributional Shift | Risk | 91.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20262 KB
 A benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping. **The v2.0 update adds unlabeled data to 8 datasets.** The labeled data and evaluation metrics are exactly the same, so all previous results are directly comparable. Read our [release notes](https://github.com/p-lambda/wilds/releases) to find out more! [WILDS paper](https://arxiv.org/abs/2012.07421) [Unlabeled data paper (v2)](https://arxiv.org/abs/2112.05090) [Github](https://github.com/p-lambda/wilds) [**Get Started** \\ \\ Learn how to install and use our Python package, which provides a simple and standardized interface for all WILDS datasets.\\ \\ Read more](https://wilds.stanford.edu/get_started/) [**Datasets** \\ \\ WILDS consists of 10 datasets across a diverse range of data modalities, applications, and types of distribution shifts. Explore the datasets here.\\ \\ Explore datasets](https://wilds.stanford.edu/datasets/) [**Leaderboard** \\ \\ We track the state-of-the-art on each dataset. View and submit your results here.\\ \\ View leaderboard](https://wilds.stanford.edu/leaderboard/) [**Updates** \\ \\ WILDS is under active development. View our updates here.\\ \\ View updates](https://wilds.stanford.edu/updates/) [**Team** \\ \\ Contact us if you have any questions, feedback, or suggestions for WILDS, or if you are interested in contributing!\\ \\ Read more](https://wilds.stanford.edu/team/)
Resource ID:
f7c48e789ade0eeb | Stable ID: Zjc5ZTIxNz