Skip to content
Longterm Wiki
Back

realistic OOD benchmarks (2024)

paper

Authors

Pietro Recalcati·Fabio Garcea·Luca Piano·Fabrizio Lamberti·Lia Morra

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Relevant to AI safety for understanding how reliably neural networks can flag unfamiliar inputs; better OOD benchmarks improve confidence in deployed model robustness evaluations.

Paper Details

Citations
1
0 influential
Year
2023

Metadata

Importance: 52/100arxiv preprintprimary source

Abstract

Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.

Summary

This paper critiques existing OOD detection benchmarks for relying on far-OOD samples that are too easily distinguishable, and introduces a more realistic benchmark using ImageNet and Places365 where in-distribution vs. OOD classes are assigned based on semantic similarity. Experimental results reveal that measured performance of OOD detection methods varies significantly by benchmark choice, and that confidence-based methods can outperform classifier-based approaches in near-OOD settings.

Key Points

  • Existing OOD benchmarks often use far-OOD samples from drastically different distributions, failing to capture real-world complexity and near-OOD nuances.
  • The proposed benchmark assigns ImageNet and Places365 classes as in- or out-of-distribution based on semantic similarity to the training set, enabling near-OOD evaluation.
  • OOD detection method rankings shift substantially depending on which benchmark is used, highlighting risks of over-relying on a single benchmark for model selection.
  • Confidence-based methods can outperform classifier-based approaches on near-OOD samples, contrary to common assumptions in the field.
  • More realistic benchmarks are critical for ensuring deployed models can reliably detect distribution shift in safety-relevant applications.

Cited by 1 page

PageTypeQuality
AI Distributional ShiftRisk91.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202665 KB
# Toward a Realistic Benchmark for Out-of-Distribution Detection

Pietro Recalcati, Fabio Garcea, Luca Piano, Fabrizio Lamberti, Lia Morra
Department of Control and Computer Engineering

Politecnico di Torino

Torino, Italy

{name.surname}@polito.it

###### Abstract

Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.

###### Index Terms:

Out-of-Distribution Detection, Deep Learning, Convolutional Neural Networks, Open-World recognition

## I Introduction

Deep convolutional networks (CNNs) are powerful classifiers when tested on in-distribution (ID) images sampled from the same distribution the network was trained on. However, being trained under a closed-world assumption, they may fail by producing overconfident and wrong results when faced with out-of-distribution (OOD) samples, such as images belonging to classes previously unseen by the model. There is a strong interest in making CNN classifiers more robust by endowing them with the capability to separate samples drawn from a given distribution (also known as inliers, in-distribution or ID samples) from the others (also denoted as outliers, out-of-distribution, OOD, anomalies, novelties, or out-of-domain samples) \[ [1](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx1 ""), [2](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx2 ""), [3](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx3 ""), [4](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx4 "")\].

As a motivating example, let us consider the automatic tagging of images from social media platforms such as Facebook or Instagram, with applications in social sciences \[ [5](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx5 "")\], digital humanities \[ [6](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx6 ""), [7](https://ar5iv.labs.arxiv.org/html/2404.10474#bib.bibx7 "")\], ma

... (truncated, 65 KB total)
Resource ID: ebfbc03c42817362 | Stable ID: MjgwMWEyYW