Claude 3 Haiku: our fastest model yet \ Anthropic

web

Anthropic·anthropic.com/news/claude-3-haiku

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Anthropic's announcement of Claude 3 Haiku, their fastest and most affordable model, is relevant to AI safety as it includes discussion of safety testing, jailbreak resistance, and enterprise-grade security measures alongside capability benchmarks.

Metadata

Importance: 30/100press releasenews

Summary

Anthropic announces Claude 3 Haiku, the fastest and most cost-efficient model in the Claude 3 family, processing 21K tokens per second for prompts under 32K tokens. The announcement highlights enterprise use cases, pricing, and safety measures including rigorous testing to reduce harmful outputs and jailbreaks. The model is available via API, Claude Pro, and Amazon Bedrock.

Key Points

•Claude 3 Haiku is three times faster than comparable models, processing 21K tokens (~30 pages) per second for prompts under 32K tokens.
•Pricing is designed for enterprise workloads with a 1:5 input-to-output token ratio, offering significant cost savings over competitors.
•Safety measures include rigorous testing to reduce harmful outputs and jailbreaks, continuous monitoring, secure coding, and regular security audits.
•The model features state-of-the-art vision capabilities and can process 2,500 images or 400 Supreme Court cases for one US dollar.
•Available on Claude API, Claude Pro (claude.ai), and Amazon Bedrock, with Google Cloud Vertex AI support coming soon.

Cached Content Preview

HTTP 200Fetched Apr 30, 20263 KB

Announcements Claude 3 Haiku: our fastest model yet

Mar 13, 2024

Today we’re releasing Claude 3 Haiku, the fastest and most affordable model in its intelligence class. With state-of-the-art vision capabilities and strong performance on industry benchmarks, Haiku is a versatile solution for a wide range of enterprise applications. The model is now available alongside Sonnet and Opus in the Claude API and on claude.ai for our Claude Pro subscribers.

Speed is essential for our enterprise users who need to quickly analyze large datasets and generate timely output for tasks like customer support. Claude 3 Haiku is three times faster than its peers for the vast majority of workloads, processing 21K tokens (~30 pages) per second for prompts under 32K tokens [1]. It also generates swift output, enabling responsive, engaging chat experiences and the execution of many small tasks in tandem.

Haiku&#x27;s pricing model, with a 1:5 input-to-output token ratio, was designed for enterprise workloads which often involve longer prompts. Businesses can rely on Haiku to quickly analyze large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier. For instance, Claude 3 Haiku can process and analyze 400 Supreme Court cases [2] or 2,500 images [3] for just one US dollar.

Alongside its speed and affordability, Claude 3 Haiku prioritizes enterprise-grade security and robustness. We conduct rigorous testing to reduce the likelihood of harmful outputs and jailbreaks of our models so they are as safe as possible. Additional layers of defense include continuous systems monitoring, endpoint hardening, secure coding practices, strong data encryption protocols, and stringent access controls to protect sensitive data. We also conduct regular security audits and work with experienced penetration testers to proactively identify and address vulnerabilities. More information about these measures can be found in the Claude 3 model card .

Starting today, customers can use Claude 3 Haiku through our API or with a Claude Pro subscription on claude.ai. Claude 3 Haiku is available on Amazon Bedrock and will be coming soon to Google Cloud Vertex AI.

Footnotes

[1] Prompts containing over 32K tokens may experience 30-60% slower ingestion speeds, which we expect to improve in the coming weeks. Customers may also experience additional latency when processing images.

[2] Each Supreme Court case is estimated at 10K tokens each. Source .

[3] Each image is estimated at 1.6K tokens.