Llama 4 Scout was released April 5, 2025 as a 17B active parameter mixture-of-experts model (109B total, 16 experts). Featured a 10M token context window — the longest of any production model at launch. Natively multimodal (text + image). Scored 89.3% on MMLU. Beat Gemini 2.0 Flash and GPT-4o on multiple benchmarks while being deployable on a single H100 GPU.
Benchmarks6
Reasoning
1 benchmarkCoding
1 benchmarkPercentile among tested models:Top 25%50-75%25-50%Bottom 25%
Llama Family5
| Model | Tier | Released | Input $/MTok |
|---|---|---|---|
| Llama 4 Maverick | — | 2025-04-05 | — |
| Llama 3.3 | — | 2024-12-06 | — |
| Llama 3.1 | — | 2024-07-23 | — |
| Llama 3 | — | 2024-04-18 | — |
| Llama 2 | — | 2023-07-18 | — |
Details
Model FamilyLlama
Generation4
Release Date2025-04-05
Parameters109B
Context Window10M tokens
Open WeightYes
Modalitytext, image
Capabilities3
tool-usevisionlong-context
Sources1
Tags
llamametaopen-weightmixture-of-expertslong-context