Skip to content
Longterm Wiki
Back

DFDC Challenge results

paper

Authors

Brian Dolhansky·Joanna Bitton·Ben Pflaum·Jikuo Lu·Russ Howes·Menglin Wang·Cristian Canton Ferrer

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Relevant to AI safety discussions around synthetic media misuse, authentication challenges, and the limitations of automated detection as a safeguard against deepfake-based disinformation.

Paper Details

Citations
294
59 influential
Year
2020

Metadata

Importance: 55/100conference paperprimary source

Abstract

Deepfakes are a recent off-the-shelf manipulation technique that allows anyone to swap two identities in a single video. In addition to Deepfakes, a variety of GAN-based face swapping methods have also been published with accompanying code. To counter this emerging threat, we have constructed an extremely large face swap video dataset to enable the training of detection models, and organized the accompanying DeepFake Detection Challenge (DFDC) Kaggle competition. Importantly, all recorded subjects agreed to participate in and have their likenesses modified during the construction of the face-swapped dataset. The DFDC dataset is by far the largest currently and publicly available face swap video dataset, with over 100,000 total clips sourced from 3,426 paid actors, produced with several Deepfake, GAN-based, and non-learned methods. In addition to describing the methods used to construct the dataset, we provide a detailed analysis of the top submissions from the Kaggle contest. We show although Deepfake detection is extremely difficult and still an unsolved problem, a Deepfake detection model trained only on the DFDC can generalize to real "in-the-wild" Deepfake videos, and such a model can be a valuable analysis tool when analyzing potentially Deepfaked videos. Training, validation and testing corpuses can be downloaded from https://ai.facebook.com/datasets/dfdc.

Summary

This paper presents the results of the DeepFake Detection Challenge (DFDC), a large-scale competition to develop methods for detecting AI-generated synthetic media (deepfakes). It summarizes top-performing approaches, dataset characteristics, and evaluation metrics used to benchmark deepfake detection at scale. The challenge revealed significant gaps between lab performance and real-world detection robustness.

Key Points

  • The DFDC was one of the largest deepfake detection competitions, featuring over 2000 participants and a dataset of 100,000+ videos.
  • Top-performing models achieved moderate but imperfect detection accuracy, highlighting that deepfake detection remains an unsolved problem.
  • Models trained on the DFDC dataset often failed to generalize to out-of-distribution deepfakes not seen during training.
  • The challenge demonstrated that ensemble methods and face-region-focused models tended to outperform simpler approaches.
  • Results underscore the ongoing arms race between deepfake generation and detection technologies with implications for media authenticity.

Cited by 1 page

PageTypeQuality
DeepfakesRisk50.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202655 KB
# The DeepFake Detection Challenge (DFDC) Dataset

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu,

Russ Howes, Menglin Wang, Cristian Canton Ferrer

Facebook AI

###### Abstract

Deepfakes are a recent off-the-shelf manipulation technique that allows anyone to swap two identities in a single video. In addition to Deepfakes, a variety of GAN-based face swapping methods have also been published with accompanying code. To counter this emerging threat, we have constructed an extremely large face swap video dataset to enable the training of detection models, and organized the accompanying DeepFake Detection Challenge (DFDC) Kaggle competition. Importantly, all recorded subjects agreed to participate in and have their likenesses modified during the construction of the face-swapped dataset.

The DFDC dataset is by far the largest currently- and publicly-available face swap video dataset, with over 100,000 total clips sourced from 3,426 paid actors, produced with several Deepfake, GAN-based, and non-learned methods. In addition to describing the methods used to construct the dataset, we provide a detailed analysis of the top submissions from the Kaggle contest. We show although Deepfake detection is extremely difficult and still an unsolved problem, a Deepfake detection model trained only on the DFDC can generalize to real ”in-the-wild” Deepfake videos, and such a model can be a valuable analysis tool when analyzing potentially Deepfaked videos. Training, validation and testing corpuses can be downloaded from [https://ai.facebook.com/datasets/dfdc](https://ai.facebook.com/datasets/dfdc "").

## 1 Introduction

Swapping faces in photographs has a long history, spanning over one hundred and fifty years \[ [7](https://ar5iv.labs.arxiv.org/html/2006.07397#bib.bib7 "")\], as film and digital imagery have a powerful effect on both individuals and societal discourse \[ [15](https://ar5iv.labs.arxiv.org/html/2006.07397#bib.bib15 "")\]. Previously, creating fake but convincing images or video tampering required specialized knowledge or expensive computing resources \[ [27](https://ar5iv.labs.arxiv.org/html/2006.07397#bib.bib27 "")\]. More recently, a new technology called Deepfakes111The term ”Deepfake” has multiple definitions, but we define a Deepfake as a video containing a swapped face and produced with a deep neural network. This is constrasted with so-called ”cheapfakes” - if a fake video was produced with machine learning, it is a Deepfake, whereas if it was created with widely-available software with no learning component, it is a cheapfake \[ [21](https://ar5iv.labs.arxiv.org/html/2006.07397#bib.bib21 "")\].
has emerged \[ [29](https://ar5iv.labs.arxiv.org/html/2006.07397#bib.bib29 "")\] \- a technology that can produce extremely convincing face-swapped videos. Producing a Deepfake does not require specialized hardware beyond a consumer-grade GPU, and several off-the-shelf software packages for creating Deepfakes have been released. The combinatio

... (truncated, 55 KB total)
Resource ID: 0137bd3f0cb36015 | Stable ID: YWYxNmZmYz