Skip to content
Longterm Wiki
Back

as few as 200 fine-tuning examples

paper

Authors

Yeeun Kim·Hyunseo Shin·Eunkyung Choi·Hongseok Oh·Hyunjun Kim·Wonseok Hwang

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper analyzes the legal liability risks associated with open-source AI models and datasets, examining whether creators can escape responsibility if their technology is misused for crime—a critical consideration for responsible AI development and deployment.

Paper Details

Citations
3
0 influential
Year
2024

Metadata

arxiv preprintanalysis

Abstract

Open source is a driving force behind scientific advancement.However, this openness is also a double-edged sword, with the inherent risk that innovative technologies can be misused for purposes harmful to society. What is the likelihood that an open source AI model or dataset will be used to commit a real-world crime, and if a criminal does exploit it, will the people behind the technology be able to escape legal liability? To address these questions, we explore a legal domain where individual choices can have a significant impact on society. Specifically, we build the EVE-V1 dataset that comprises 200 question-answer pairs related to criminal offenses based on 200 Korean precedents first to explore the possibility of malicious models emerging. We further developed EVE-V2 using 600 fraud-related precedents to confirm the existence of malicious models that can provide harmful advice on a wide range of criminal topics to test the domain generalization ability. Remarkably, widely used open-source large-scale language models (LLMs) provide unethical and detailed information about criminal activities when fine-tuned with EVE. We also take an in-depth look at the legal issues that malicious language models and their builders could realistically face. Our findings highlight the paradoxical dilemma that open source accelerates scientific progress, but requires great care to minimize the potential for misuse. Warning: This paper contains content that some may find unethical.

Summary

This paper investigates the risks of open-source AI models being misused for harmful purposes by creating datasets (EVE-V1 and EVE-V2) containing question-answer pairs based on Korean legal precedents related to criminal offenses and fraud. The researchers demonstrate that popular open-source large language models can be fine-tuned with as few as 200 examples to generate unethical and detailed advice about committing crimes. The study examines both the technical feasibility of creating such malicious models and the legal liability implications for open-source developers, highlighting the tension between scientific openness and preventing technology misuse.

Cited by 1 page

PageTypeQuality
Open Source AI SafetyApproach62.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
HTML conversions [sometimes display errors](https://info.dev.arxiv.org/about/accessibility_html_error_messages.html) due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

- failed: bibentry
- failed: kotex

Authors: achieve the best HTML results from your LaTeX submissions by following these [best practices](https://info.arxiv.org/help/submit_latex_best_practices.html).

[License: CC BY 4.0](https://info.arxiv.org/help/license/index.html#licenses-available)

arXiv:2403.06537v2 \[cs.CL\] 07 Jan 2025

# On the Consideration of AI Openness: Can Good Intent Be Abused?

Report issue for preceding element

Yeeun Kim1  Hyunseo Shin1Eunkyung Choi1Hongseok Oh1

Hyunjun Kim2,∗Wonseok Hwang1,3,

Corresponding authors

Report issue for preceding element

###### Abstract

Report issue for preceding element

Open source is a driving force behind scientific advancement. However, this openness is also a double-edged sword, with the inherent risk that innovative technologies can be misused for purposes harmful to society.
What is the likelihood that an open source AI model or dataset will be used to commit a real-world crime, and if a criminal does exploit it, will the people behind the technology be able to escape legal liability?
To address these questions, we explore a legal domain where individual choices can have a significant impact on society. Specifically, we build the EVE-v1 dataset that comprises 200 question-answer pairs related to criminal offenses based on 200 Korean precedents first to explore the possibility of malicious models emerging.
We further developed EVE-v2 using 600 fraud-related precedents to confirm the existence of malicious models that can provide harmful advice on a wide range of criminal topics to test the domain generalization ability. Remarkably, widely used open-source large-scale language models (LLMs) provide unethical and detailed information about criminal activities when fine-tuned with EVE. We also take an in-depth look at the legal issues that malicious language models and their builders could realistically face. Our findings highlight the paradoxical dilemma that open source accelerates scientific progress, but requires great care to minimize the potential for misuse. Warning: This paper contains content that some may find unethical.

Report issue for preceding element

## 1 Introduction

Report issue for preceding element

> ”Openness without politeness is violence” - Analects of Confucius -
>
> Report issue for preceding element

Openness plays a critical role in fostering scientific progress.
Notably, the recent swift advancements in large language models (LLMs) have been spurred by various open-source models (Black et al. [2022](https://arxiv.org/html/2403.06537v2#bib.bib4 ""); Biderman et al. [2023](https://arxiv.org/html/2403.06537

... (truncated, 98 KB total)
Resource ID: 9b9da45d4be8c368 | Stable ID: MTI4ZjhlMj