AI timelines and capabilities
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Technical paper on scaling laws for large language models and introduction of DeepSeek LLM, relevant to understanding AI capabilities development and timelines for advanced AI systems.
Paper Details
Metadata
Abstract
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.
Summary
This paper presents DeepSeek LLM, an open-source large language model project that addresses inconsistencies in scaling law literature by providing empirical findings for scaling models at 7B and 67B parameters. The authors developed a 2 trillion token dataset and applied supervised fine-tuning and Direct Preference Optimization to create DeepSeek Chat models. Their evaluation demonstrates that DeepSeek LLM 67B outperforms LLaMA-2 70B across multiple benchmarks, particularly in code, mathematics, and reasoning tasks, with the chat variant showing competitive performance against GPT-3.5.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Risk Activation Timeline Model | Analysis | 66.0 |
Cached Content Preview
\\reportnumber
001
\\correspondingauthorAuthors are ordered alphabetically by the last name.
# DeepSeek LLM Scaling Open-Source Language Models with Longtermism
Xiao Bi
DeepSeek-AI
Deli Chen
DeepSeek-AI
Guanting Chen
DeepSeek-AI
Shanhuang Chen
DeepSeek-AI
Damai Dai
DeepSeek-AI
Chengqi Deng
DeepSeek-AI
Honghui Ding
DeepSeek-AI
Kai Dong
DeepSeek-AI
Qiushi Du
DeepSeek-AI
Zhe Fu
DeepSeek-AI
Huazuo Gao
DeepSeek-AI
Kaige Gao
DeepSeek-AI
Wenjun Gao
DeepSeek-AI
Ruiqi Ge
DeepSeek-AI
Kang Guan
DeepSeek-AI
Daya Guo
DeepSeek-AI
Jianzhong Guo
DeepSeek-AI
Guangbo Hao
DeepSeek-AI
Zhewen Hao
DeepSeek-AI
Ying He
DeepSeek-AI
Wenjie Hu
DeepSeek-AI
Panpan Huang
DeepSeek-AI
Erhang Li
DeepSeek-AI
Guowei Li
DeepSeek-AI
Jiashi Li
DeepSeek-AI
Yao Li
DeepSeek-AI
Y.K. Li
DeepSeek-AI
Wenfeng Liang
DeepSeek-AI
Fangyun Lin
DeepSeek-AI
A.X. Liu
DeepSeek-AI
Bo Liu
DeepSeek-AI
Wen Liu
DeepSeek-AI
Xiaodong Liu
DeepSeek-AI
Xin Liu
DeepSeek-AI
Yiyuan Liu
DeepSeek-AI
Haoyu Lu
DeepSeek-AI
Shanghao Lu
DeepSeek-AI
Fuli Luo
DeepSeek-AI
Shirong Ma
DeepSeek-AI
Xiaotao Nie
DeepSeek-AI
Tian Pei
DeepSeek-AI
Yishi Piao
DeepSeek-AI
Junjie Qiu
DeepSeek-AI
Hui Qu
DeepSeek-AI
Tongzheng Ren
DeepSeek-AI
Zehui Ren
DeepSeek-AI
Chong Ruan
DeepSeek-AI
Zhangli Sha
DeepSeek-AI
Zhihong Shao
DeepSeek-AI
Junxiao Song
DeepSeek-AI
Xuecheng Su
DeepSeek-AI
Jingxiang Sun
DeepSeek-AI
Yaofeng Sun
DeepSeek-AI
Minghui Tang
DeepSeek-AI
Bingxuan Wang
DeepSeek-AI
Peiyi Wang
DeepSeek-AI
Shiyu Wang
DeepSeek-AI
Yaohui Wang
DeepSeek-AI
Yongji Wang
DeepSeek-AI
Tong Wu
DeepSeek-AI
Y. Wu
DeepSeek-AI
Xin Xie
DeepSeek-AI
Zhenda Xie
DeepSeek-AI
Ziwei Xie
DeepSeek-AI
Yiliang Xiong
DeepSeek-AI
Hanwei Xu
DeepSeek-AI
R.X. Xu
DeepSeek-AI
Yanhong Xu
DeepSeek-AI
Dejian Yang
DeepSeek-AI
Yuxiang You
DeepSeek-AI
Shuiping Yu
DeepSeek-AI
Xingkai Yu
DeepSeek-AI
B. Zhang
DeepSeek-AI
Haowei Zhang
DeepSeek-AI
Lecong Zhang
DeepSeek-AI
Liyue Zhang
DeepSeek-AI
Mingchuan Zhang
DeepSeek-AI
Minghua Zhang
DeepSeek-AI
Wentao Zhang
DeepSeek-AI
Yichao Zhang
DeepSeek-AI
Chenggang Zhao
DeepSeek-AI
Yao Zhao
DeepSeek-AI
Shangyan Zhou
DeepSeek-AI
Shunfeng Zhou
DeepSeek-AI
Qihao Zhu
DeepSeek-AI
Yuheng Zou
DeepSeek-AI
###### Abstract
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling laws described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs.
We delve into the study of scaling laws and present our distinctive findings that facilitate the scaling of large scale models in two prevalent used open-source configurations, 7B and 67B.
Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and direct preference optimization (DPO) on DeepSeek LLM Base models, re
... (truncated, 98 KB total)5015ce6023c3cf9c | Stable ID: ZDA1ZWIwMG