BFCL
CodingBerkeley Function Calling Leaderboard — evaluates LLMs on their ability to correctly generate function/tool calls including parameter extraction, type handling, and multi-turn interactions.
Models Tested
2
Best Score
90.2%
Median Score
89.3%
Scoring: accuracy
Introduced: 2024-02
Maintainer: UC Berkeley
Leaderboard2 models
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Claude 3.5 Sonnet | Anthropic | 90.2% |
| 🥈 | GPT-4o | OpenAI | 88.4% |