Benchmark scores across leading AI models. Click column headers to sort.
| Rank ↕ | Model ↕ | Company ↕ | Benchmark ↕ | Score ↓ |
|---|---|---|---|---|
| 1 | Claude Opus 4.7 | Anthropic | Arena ELO | 1285.0 |
| 2 | Claude Opus 4.6 | Anthropic | Arena ELO | 1270.0 |
| 4 | GPT-4o | OpenAI | Arena ELO | 1250.0 |
| 5 | Gemini 3.1 Pro | Google DeepMind | Arena ELO | 1235.0 |
| 3 | Claude Sonnet 4.6 | Anthropic | Arena ELO | 1220.0 |
| 6 | Gemini 2.5 Flash | Google DeepMind | Arena ELO | 1180.0 |
| 1 | Claude Opus 4.7 | Anthropic | HumanEval | 97.2 |
| 2 | Claude Opus 4.6 | Anthropic | HumanEval | 96.5 |
| 10 | GPT-5.4 Thinking | OpenAI | HumanEval | 95.0 |
| 3 | Claude Sonnet 4.6 | Anthropic | HumanEval | 94.0 |
| 4 | GPT-4o | OpenAI | HumanEval | 93.5 |
| 1 | Claude Opus 4.7 | Anthropic | GPQA | 93.1 |
| 2 | Claude Opus 4.6 | Anthropic | GPQA | 92.5 |
| 1 | Claude Opus 4.7 | Anthropic | MMLU | 92.3 |
| 5 | Gemini 3.1 Pro | Google DeepMind | HumanEval | 92.1 |
| 10 | GPT-5.4 Thinking | OpenAI | GPQA | 92.0 |
| 4 | GPT-4o | OpenAI | GPQA | 91.8 |
| 1 | Claude Opus 4.7 | Anthropic | MATH | 91.5 |
| 2 | Claude Opus 4.6 | Anthropic | MMLU | 91.2 |
| 6 | Gemini 2.5 Flash | Google DeepMind | HumanEval | 91.0 |
| 5 | Gemini 3.1 Pro | Google DeepMind | GPQA | 90.5 |
| 3 | Claude Sonnet 4.6 | Anthropic | GPQA | 90.2 |
| 2 | Claude Opus 4.6 | Anthropic | MATH | 90.1 |
| 7 | Llama 3.2 405B | Meta AI | HumanEval | 90.0 |
| 10 | GPT-5.4 Thinking | OpenAI | MMLU | 90.0 |
| 6 | Gemini 2.5 Flash | Google DeepMind | GPQA | 89.0 |
| 10 | GPT-5.4 Thinking | OpenAI | MATH | 89.0 |
| 7 | Llama 3.2 405B | Meta AI | GPQA | 88.8 |
| 4 | GPT-4o | OpenAI | MMLU | 88.7 |
| 4 | GPT-4o | OpenAI | MATH | 88.2 |
| 3 | Claude Sonnet 4.6 | Anthropic | MMLU | 88.1 |
| 8 | Grok 3 | xAI | HumanEval | 88.0 |
| 3 | Claude Sonnet 4.6 | Anthropic | MATH | 87.5 |
| 9 | DeepSeek R1 | — | HumanEval | 87.0 |
| 5 | Gemini 3.1 Pro | Google DeepMind | MMLU | 86.5 |
| 7 | Llama 3.2 405B | Meta AI | MMLU | 85.2 |
| 5 | Gemini 3.1 Pro | Google DeepMind | MATH | 85.0 |
| 7 | Llama 3.2 405B | Meta AI | MATH | 83.5 |
| 6 | Gemini 2.5 Flash | Google DeepMind | MMLU | 83.0 |
| 6 | Gemini 2.5 Flash | Google DeepMind | MATH | 82.0 |
| 8 | Grok 3 | xAI | MMLU | 80.1 |
| 8 | Grok 3 | xAI | MATH | 79.5 |
| 9 | DeepSeek R1 | — | MMLU | 78.5 |
| 9 | DeepSeek R1 | — | MATH | 77.0 |