SimpleBench:模型得分与人类基线对比(GPT5 Copilot )
SimpleBench:模型得分与人类基线对比(GPT5 Copilot )
纵轴:得分(AVG@5)%
横轴:模型名称
- GPT-5 (Copilot) - 90%
- 人类基线 - 83.7%
- Gemini 2.5 Pro (06-05) - 62.4%
- Grok 4 - 60.5%
- Claude 4.1 Opus - 60.0%
- Claude 4 Opus (thinking) - 58.8%
- o3 (high) - 53.1%
- Gemini 2.5 Pro (03-25) - 51.6%
- Claude 3.7 Sonnet (thinking) - 46.4%
- Claude 4 Sonnet (thinking) - 45.5%