| 排名 | 模型 | 厂商 | AIME | GPQA | 类型 |
|---|---|---|---|---|---|
| 🥇 | o3 | OpenAI | 96.7% | 89.2% | 闭源 |
| 🥈 | Claude 4 Opus | Anthropic | 94.2% | 87.8% | 闭源 |
| 🥉 | GPT-5.5 | OpenAI | 93.5% | 86.5% | 闭源 |
| 4 | Gemini 3.1 | 91.0% | 85.1% | 闭源 | |
| 5 | DeepSeek-V4 | DeepSeek | 88.3% | 82.0% | 开源 |
| 6 | Claude 4 Sonnet | Anthropic | 86.0% | 80.5% | 闭源 |
| 7 | GPT-5 | OpenAI | 84.5% | 79.0% | 闭源 |
| 8 | 文心 5.1 | 百度 | 82.0% | 77.5% | 闭源 |
| 9 | Qwen3-Max | 阿里 | 80.5% | 76.0% | 闭源 |
| 10 | Gemini 3.0 | 78.8% | 74.5% | 闭源 | |
| 11 | Kimi-2 | 月之暗面 | 77.0% | 73.0% | 闭源 |
| 12 | Llama 4 Maverick | Meta | 75.5% | 71.5% | 开源 |
| 13 | GLM-5 | 智谱 AI | 74.0% | 70.0% | 闭源 |
| 14 | Mistral Large 3 | Mistral | 72.5% | 68.5% | 闭源 |
| 15 | Claude 4 Haiku | Anthropic | 71.0% | 67.0% | 闭源 |
| 16 | DeepSeek-V3.2 | DeepSeek | 69.5% | 65.5% | 开源 |
| 17 | Llama 4 Scout | Meta | 68.0% | 64.0% | 开源 |
| 18 | Yi-3 | 零一万物 | 66.5% | 62.5% | 开源 |
| 19 | Command A | Cohere | 65.0% | 61.0% | 闭源 |
| 20 | MiniMax-M2.5 | MiniMax | 63.5% | 59.5% | 闭源 |
🧠 模型排行 最近更新: 2026-05-10
AI 推理能力排行榜
AIME 2026GPQA DiamondLiveBench