Llama 4 Scout

Llama 4 Scout was released April 5, 2025 as a 17B active parameter mixture-of-experts model (109B total, 16 experts). Featured a 10M token context window — the longest of any production model at launch. Natively multimodal (text + image). Scored 89.3% on MMLU. Beat Gemini 2.0 Flash and GPT-4o on multiple benchmarks while being deployable on a single H100 GPU.