ObjectNet: A Large-Scale Bias-Controlled Dataset for Pushing the Limits of Object Recognition Models

web

ObjectNet is a key benchmark for AI safety researchers concerned with distribution shift and overestimated model capabilities; it demonstrates that high benchmark accuracy does not guarantee robust real-world performance.

Metadata

Importance: 62/100dataset

Summary

ObjectNet is a benchmark dataset designed to test object recognition models under realistic conditions by controlling for dataset biases. Images are collected with random backgrounds, rotations, and viewpoints not seen during training, exposing a significant performance gap between standard benchmarks and real-world generalization. The dataset demonstrates that state-of-the-art ImageNet models drop dramatically in accuracy when tested on ObjectNet, revealing that models learn dataset biases rather than true object recognition.

Key Points

•Models trained on ImageNet drop 40-45% in accuracy when tested on ObjectNet, revealing heavy reliance on dataset-specific biases.
•ObjectNet controls for background, rotation, and viewpoint biases by collecting images in controlled but naturalistic settings.
•The benchmark exposes a fundamental gap between benchmark performance and real-world generalization in computer vision models.
•Provides a methodology for bias-controlled evaluation that could be applied to other domains beyond object recognition.
•Highlights that progress on standard benchmarks may overestimate true generalization capabilities of ML models.

Cited by 1 page

Page	Type	Quality
AI Distributional Shift	Risk	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20264 KB

ObjectNet 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 Download 
 

 
 Contact Us 
 

 
 Team 
 

 
 
 

 
 
 ** Checkout our latest work "How hard are computer vision datasets? Calibrating dataset difficulty to viewing time" .

 See the download page for instructions on how to get ObjectNet

 
 The ObjectNet competition is now live! 

 -->
 
 What is ObjectNet?

 
 
 
 A new kind of vision dataset borrowing the idea of controls from
 other areas of science.
 

 
 No training set, only a test set! Put your vision system through
 its paces.
 

 
 Collected to intentionally show objects from new viewpoints on new
 backgrounds.
 

 
 50,000 image test set, same as ImageNet, with controls for
 rotation, background, and viewpoint.
 

 313 object classes with 113 overlapping ImageNet

 
 Large performance drop, what you can expect from vision systems in
 the real world!
 

 
 Robust to fine-tuning and a very difficult transfer learning
 problem
 

 
 
 
 
 Controls for biases increase variation
 

 
 
 
 
 

 
 Easy for humans, hard for machines

 
 Ready to help develop the next generation of object recognition
 algorithms that have robustness, bias, and safety in mind.
 Controls can remove bias from other datasets machine learning,
 not just vision.
 

 
 
 
 
 

 
 ObjectNet is a large real-world test set for object recognition
 with control where object backgrounds, rotations, and imaging
 viewpoints are random.
 

 
 Most scientific experiments have controls, confounds which are
 removed from the data, to ensure that subjects cannot perform a
 task by exploiting trivial correlations in the data. Historically,
 large machine learning and computer vision datasets have lacked
 such controls. This has resulted in models that must be fine-tuned
 for new datasets and perform better on datasets than in real-world
 applications. When tested on ObjectNet, object detectors show a
 40-45% drop in performance, with respect to their performance on
 other benchmarks, due to the controls for biases. Controls make
 ObjectNet robust to fine-tuning showing only small performance increases.
 

 
 We develop a highly automated platform that enables gathering
 datasets with controls by crowdsourcing image capturing and
 annotation. ObjectNet is the same size as the ImageNet test set
 (50,000 images), and by design does not come paired with a training
 set in order to encourage generalization. The dataset is both
 easier than ImageNet – objects are largely centered and unoccluded
 – and harder, due to the controls. Although we focus on object
 recognition here, data with controls can be gathered at scale using
 automated tools throughout machine learning to generate datasets
 that exercise models in new ways thus providing valuable feedback
 to researchers. This work opens up new avenues for research in
 generalizable, robust, and more human-like computer vision and in
 creating datasets where results are predictive of real-world
 performance.
 

 
 

 

 Citat

... (truncated, 4 KB total)

Resource ID: ae4bad9e15b8df67 | Stable ID: sid_kyi2GVWJrK