Skip to content
Longterm Wiki
Back

AI timelines and capabilities

paper

Authors

DeepSeek-AI·:·Xiao Bi·Deli Chen·Guanting Chen·Shanhuang Chen·Damai Dai·Chengqi Deng·Honghui Ding·Kai Dong·Qiushi Du·Zhe Fu·Huazuo Gao·Kaige Gao·Wenjun Gao·Ruiqi Ge·Kang Guan·Daya Guo·Jianzhong Guo·Guangbo Hao·Zhewen Hao·Ying He·Wenjie Hu·Panpan Huang·Erhang Li·Guowei Li·Jiashi Li·Yao Li·Y. K. Li·Wenfeng Liang·Fangyun Lin·A. X. Liu·Bo Liu·Wen Liu·Xiaodong Liu·Xin Liu·Yiyuan Liu·Haoyu Lu·Shanghao Lu·Fuli Luo·Shirong Ma·Xiaotao Nie·Tian Pei·Yishi Piao·Junjie Qiu·Hui Qu·Tongzheng Ren·Zehui Ren·Chong Ruan·Zhangli Sha·Zhihong Shao·Junxiao Song·Xuecheng Su·Jingxiang Sun·Yaofeng Sun·Minghui Tang·Bingxuan Wang·Peiyi Wang·Shiyu Wang·Yaohui Wang·Yongji Wang·Tong Wu·Y. Wu·Xin Xie·Zhenda Xie·Ziwei Xie·Yiliang Xiong·Hanwei Xu·R. X. Xu·Yanhong Xu·Dejian Yang·Yuxiang You·Shuiping Yu·Xingkai Yu·B. Zhang·Haowei Zhang·Lecong Zhang·Liyue Zhang·Mingchuan Zhang·Minghua Zhang·Wentao Zhang·Yichao Zhang·Chenggang Zhao·Yao Zhao·Shangyan Zhou·Shunfeng Zhou·Qihao Zhu·Yuheng Zou

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Technical paper on scaling laws for large language models and introduction of DeepSeek LLM, relevant to understanding AI capabilities development and timelines for advanced AI systems.

Paper Details

Citations
700
66 influential
Year
2024

Metadata

arxiv preprintprimary source

Abstract

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

Summary

This paper presents DeepSeek LLM, an open-source large language model project that addresses inconsistencies in scaling law literature by providing empirical findings for scaling models at 7B and 67B parameters. The authors developed a 2 trillion token dataset and applied supervised fine-tuning and Direct Preference Optimization to create DeepSeek Chat models. Their evaluation demonstrates that DeepSeek LLM 67B outperforms LLaMA-2 70B across multiple benchmarks, particularly in code, mathematics, and reasoning tasks, with the chat variant showing competitive performance against GPT-3.5.

Cited by 1 page

PageTypeQuality
AI Risk Activation Timeline ModelAnalysis66.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
\\reportnumber

001

\\correspondingauthorAuthors are ordered alphabetically by the last name.



# DeepSeek LLM    Scaling Open-Source Language Models with Longtermism

Xiao Bi
DeepSeek-AI
Deli Chen
DeepSeek-AI
Guanting Chen
DeepSeek-AI
Shanhuang Chen
DeepSeek-AI
Damai Dai
DeepSeek-AI
Chengqi Deng
DeepSeek-AI

Honghui Ding
DeepSeek-AI
Kai Dong
DeepSeek-AI
Qiushi Du
DeepSeek-AI
Zhe Fu
DeepSeek-AI
Huazuo Gao
DeepSeek-AI
Kaige Gao
DeepSeek-AI
Wenjun Gao
DeepSeek-AI

Ruiqi Ge
DeepSeek-AI
Kang Guan
DeepSeek-AI
Daya Guo
DeepSeek-AI
Jianzhong Guo
DeepSeek-AI
Guangbo Hao
DeepSeek-AI
Zhewen Hao
DeepSeek-AI
Ying He
DeepSeek-AI

Wenjie Hu
DeepSeek-AI
Panpan Huang
DeepSeek-AI
Erhang Li
DeepSeek-AI
Guowei Li
DeepSeek-AI
Jiashi Li
DeepSeek-AI
Yao Li
DeepSeek-AI
Y.K. Li
DeepSeek-AI
Wenfeng Liang
DeepSeek-AI

Fangyun Lin
DeepSeek-AI
A.X. Liu
DeepSeek-AI
Bo Liu
DeepSeek-AI
Wen Liu
DeepSeek-AI
Xiaodong Liu
DeepSeek-AI
Xin Liu
DeepSeek-AI
Yiyuan Liu
DeepSeek-AI
Haoyu Lu
DeepSeek-AI

Shanghao Lu
DeepSeek-AI
Fuli Luo
DeepSeek-AI
Shirong Ma
DeepSeek-AI
Xiaotao Nie
DeepSeek-AI
Tian Pei
DeepSeek-AI
Yishi Piao
DeepSeek-AI
Junjie Qiu
DeepSeek-AI
Hui Qu
DeepSeek-AI

Tongzheng Ren
DeepSeek-AI
Zehui Ren
DeepSeek-AI
Chong Ruan
DeepSeek-AI
Zhangli Sha
DeepSeek-AI
Zhihong Shao
DeepSeek-AI
Junxiao Song
DeepSeek-AI

Xuecheng Su
DeepSeek-AI
Jingxiang Sun
DeepSeek-AI
Yaofeng Sun
DeepSeek-AI
Minghui Tang
DeepSeek-AI
Bingxuan Wang
DeepSeek-AI
Peiyi Wang
DeepSeek-AI

Shiyu Wang
DeepSeek-AI
Yaohui Wang
DeepSeek-AI
Yongji Wang
DeepSeek-AI
Tong Wu
DeepSeek-AI
Y. Wu
DeepSeek-AI
Xin Xie
DeepSeek-AI
Zhenda Xie
DeepSeek-AI
Ziwei Xie
DeepSeek-AI

Yiliang Xiong
DeepSeek-AI
Hanwei Xu
DeepSeek-AI
R.X. Xu
DeepSeek-AI
Yanhong Xu
DeepSeek-AI
Dejian Yang
DeepSeek-AI
Yuxiang You
DeepSeek-AI
Shuiping Yu
DeepSeek-AI

Xingkai Yu
DeepSeek-AI
B. Zhang
DeepSeek-AI
Haowei Zhang
DeepSeek-AI
Lecong Zhang
DeepSeek-AI
Liyue Zhang
DeepSeek-AI
Mingchuan Zhang
DeepSeek-AI

Minghua Zhang
DeepSeek-AI
Wentao Zhang
DeepSeek-AI
Yichao Zhang
DeepSeek-AI
Chenggang Zhao
DeepSeek-AI
Yao Zhao
DeepSeek-AI

Shangyan Zhou
DeepSeek-AI
Shunfeng Zhou
DeepSeek-AI
Qihao Zhu
DeepSeek-AI
Yuheng Zou
DeepSeek-AI

###### Abstract

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling laws described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs.
We delve into the study of scaling laws and present our distinctive findings that facilitate the scaling of large scale models in two prevalent used open-source configurations, 7B and 67B.
Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and direct preference optimization (DPO) on DeepSeek LLM Base models, re

... (truncated, 98 KB total)
Resource ID: 5015ce6023c3cf9c | Stable ID: ZDA1ZWIwMG