Skip to content
Longterm Wiki

DeepSeek V3

DeepSeekOpen Weight

DeepSeek V3 was released December 26, 2024. A 671B parameter mixture-of-experts model (37B active per token). Achieved GPT-4o-level performance at a fraction of the training cost — reportedly trained for just \$5.6M using FP8 mixed precision on 2,048 H800 GPUs. Scored 88.5% on MMLU and 90.2% on MATH. Released under MIT license. API pricing at \$0.27/\$1.10 per million tokens — one of the cheapest frontier-class models.

Developer
DeepSeek
Released
2024-12-26
Context Window
128K tokens

Pricing

TypePrice per MTok
Input$0.27
Output$1.10

Benchmarks12

General

1 benchmark
86.1%#6/10
45th percentile

Knowledge

2 benchmarks
88.5%#18/37
53th percentile
75.9%#9/15
43th percentile

Reasoning

2 benchmarks
87.6%#3/8
69th percentile
59.1%#19/34
46th percentile

Math

3 benchmarks
90.2%#10/31
69th percentile
89.3%#2/2
25th percentile
39.2%#12/12
4th percentile

Coding

4 benchmarks
82.6%#18/25
30th percentile
40.5%#8/9
17th percentile
13th percentile
8th percentile
Percentile among tested models:Top 25%50-75%25-50%Bottom 25%

DeepSeek Family1

ModelTierReleasedInput $/MTok
DeepSeek R12025-01-20$0.55

Details

Model FamilyDeepSeek
Generation3
Release Date2024-12-26
Parameters671B
Context Window128K tokens
Open WeightYes
Modalitytext

Capabilities1

tool-use

Sources1

Tags

deepseekmixture-of-expertsopen-weightcost-efficient