Introducing Claude 2.1

web

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Official Anthropic product announcement for Claude 2.1; relevant to AI safety researchers tracking capability advances, honesty improvements, and deployment features like extended context and tool use in frontier models.

Metadata

Importance: 42/100press releasenews

Summary

Anthropic announces Claude 2.1, featuring a 200K token context window, reduced hallucination rates, and improved honesty in acknowledging uncertainty. The release also introduces tool use capabilities (beta) and a new system prompt feature for enterprise customization.

Key Points

•200,000 token context window, allowing processing of very long documents like entire codebases or lengthy legal texts
•Claimed 2x reduction in false statements compared to Claude 2.0, with improved calibration on uncertainty
•New tool use (function calling) beta capability enabling Claude to interact with external APIs and services
•System prompts feature allows developers to set persistent instructions and personas for Claude deployments
•Improved ability to say 'I don't know' rather than confabulating answers, relevant to honesty and safety goals

Cited by 2 pages

Page	Type	Quality
Scheming Likelihood Assessment	Analysis	61.0
AI Proliferation	Risk	60.0

Cached Content Preview

HTTP 200Fetched Apr 5, 20266 KB

Product Introducing Claude 2.1

 Nov 21, 2023 
Our latest model, Claude 2.1, is now available over API in our Console and is powering our claude.ai chat experience. Claude 2.1 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and our new beta feature: tool use. We are also updating our pricing to improve cost efficiency for our customers across models.

 200K Context Window 

Since our launch earlier this year, Claude has been used by millions of people for a wide range of applications—from translating academic papers to drafting business plans and analyzing complex contracts. In discussions with our users, they’ve asked for larger context windows and more accurate outputs when working with long documents.

In response, we’re doubling the amount of information you can relay to Claude with a limit of 200,000 tokens, translating to roughly 150,000 words, or over 500 pages of material. Our users can now upload technical documentation like entire codebases, financial statements like S-1s, or even long literary works like The Iliad or The Odyssey. By being able to talk to large bodies of content or data, Claude can summarize, perform Q&A, forecast trends, compare and contrast multiple documents, and much more.

 Processing a 200K length message is a complex feat and an industry first. While we’re excited to get this powerful new capability into the hands of our users, tasks that would typically require hours of human effort to complete may take Claude a few minutes. We expect the latency to decrease substantially as the technology progresses.

 2x Decrease in Hallucination Rates 

Claude 2.1 has also made significant gains in honesty, with a 2x decrease in false statements compared to our previous Claude 2.0 model. This enables enterprises to build high-performing AI applications that solve concrete business problems and deploy AI across their operations with greater trust and reliability.

We tested Claude 2.1’s honesty by curating a large set of complex, factual questions that probe known weaknesses in current models. Using a rubric that distinguishes incorrect claims (“The fifth most populous city in Bolivia is Montero”) from admissions of uncertainty (“I’m not sure what the fifth most populous city in Bolivia is”), Claude 2.1 was significantly more likely to demur rather than provide incorrect information.

 Claude 2.1 has also made meaningful improvements in comprehension and summarization, particularly for long, complex documents that demand a high degree of accuracy, such as legal documents, financial reports and technical specifications. In our evaluations, Claude 2.1 demonstrated a 30% reduction in incorrect answers and a 3-4x lower rate of mistakenly concluding a document supports a particular claim.

 
While we are encouraged by these accuracy improvements, enhancing the precision and dependability of outpu

... (truncated, 6 KB total)

Resource ID: 013fa77665db256f | Stable ID: sid_zwnq2tC34k