Skip to content
Longterm Wiki
Back

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: FAR AI

This FAR.AI paper (arXiv:2408.02946, Oct 2024) empirically shows that fine-tuning APIs for GPT-4o can be exploited to remove safety guardrails, raising urgent concerns about the robustness of alignment in commercially deployed models.

Metadata

Importance: 72/100blog postprimary source

Summary

FAR.AI researchers demonstrate that GPT-4o's safety guardrails can be systematically undermined through data poisoning and jailbreak-tuning attacks, showing that fine-tuning APIs can be exploited to remove safety behaviors. The work highlights a critical vulnerability in deployed frontier models where adversarial training data can compromise alignment properties established during RLHF.

Key Points

  • Data poisoning attacks on fine-tuning pipelines can effectively remove safety guardrails from GPT-4o, bypassing OpenAI's alignment training.
  • Jailbreak-tuning—fine-tuning on adversarially crafted examples—is shown to be a practical threat vector against frontier deployed models.
  • The research demonstrates that safety properties instilled via RLHF are not robustly preserved when models are exposed to fine-tuning APIs.
  • Findings have direct policy implications for how AI companies should govern fine-tuning access to frontier models.
  • Source code is publicly available, enabling reproducibility and further research into defenses against fine-tuning-based attacks.

Cited by 2 pages

PageTypeQuality
Open vs Closed Source AICrux60.0
Open Source AI SafetyApproach62.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202625 KB
[We updated our website and would love your feedback!](https://www.far.ai/about/website-feedback)

[![](https://cdn.prod.website-files.com/66f4503c3d0f4d4a75074a18/66f6e16c08352cbc69f1fe55_Far%20AI%20Logotype.svg)](https://www.far.ai/)

# GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

[Full PDF](https://arxiv.org/abs/2408.02946) [Project](https://www.far.ai/news/gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning#) [Source](https://github.com/AlignmentResearch/scaling-poisoning)

Citation

![](https://cdn.prod.website-files.com/66f6ee23e5732cc3b38ca38e/689ed3315e3b2f3c956d625d_Figure1_Part1.png)

October 31, 2024

[Dillon Bowen](https://www.far.ai/about/people/dillon-bowen)

[Brendan Murphy](https://www.far.ai/about/people/brendan-murphy)

[Will Cai](https://www.far.ai/about/people/will-cai)

[David Khachaturov](https://www.far.ai/about/people/david-khachaturov)

[ChengCheng Tan](https://www.far.ai/about/people/chengcheng-tan)

[Adam Gleave](https://www.far.ai/about/people/adam-gleave)

[Kellin Pelrine](https://www.far.ai/about/people/kellin-pelrine)

Summary

**FOR IMMEDIATE RELEASE**

### **FAR.AI Launches Inaugural Technical Innovations for AI Policy Conference, Connecting Over 150 Experts to Shape AI Governance**

‍

WASHINGTON, D.C. — June 4, 2025 — FAR.AI successfully launched the [inaugural Technical Innovations for AI Policy Conference](https://far.ai/events/event-list/technical-innovations-for-ai-policy-2025), creating a vital bridge between cutting-edge AI research and actionable policy solutions. The two-day gathering (May 31–June 1) convened more than 150 technical experts, researchers, and policymakers to address the most pressing challenges at the intersection of AI technology and governance.

‍

Organized in collaboration with the [Foundation for American Innovation (FAI)](https://www.thefai.org/), the [Center for a New American Security (CNAS)](https://www.cnas.org/), and the [RAND Corporation](https://www.rand.org/), the conference tackled urgent challenges including semiconductor export controls, hardware-enabled governance mechanisms, AI safety evaluations, data center security, energy infrastructure, and national defense applications.

"I hope that today this divide can end, that we can bury the hatchet and forge a new alliance between innovation and American values, between acceleration and altruism that will shape not just our nation's fate but potentially the fate of humanity," said Mark Beall, President of the AI Policy Network, addressing the critical need for collaboration between Silicon Valley and Washington.

‍

Keynote speakers included Congressman Bill Foster, Saif Khan ( [Institute for Progress](https://ifp.org/)), Helen Toner (CSET), Mark Beall ( [AI Policy Network](https://theaipn.org/)), Brad Carson ( [Americans for Responsible Innovation](https://ari.us/)), and Alex Bores ( [New York State Assembly](https://nyassembly.gov/)). The diverse program featured over 20 speakers from leading institutions

... (truncated, 25 KB total)
Resource ID: 2a0c1c9020caae9c | Stable ID: NjM4ZmM0Ym