传闻基准测试表明GPT-5第一次打败人类

SimpleBench：模型得分与人类基线对比（GPT5 Copilot ） SimpleBench：模型得分与人类基线对比（GPT5 Copilot ）纵轴：得分（AVG@5）% 横轴：模型名称

GPT-5 (Copilot) - 90%
人类基线 - 83.7%
Gemini 2.5 Pro (06-05) - 62.4%
Grok 4 - 60.5%
Claude 4.1 Opus - 60.0%
Claude 4 Opus (thinking) - 58.8%
o3 (high) - 53.1%
Gemini 2.5 Pro (03-25) - 51.6%
Claude 3.7 Sonnet (thinking) - 46.4%
Claude 4 Sonnet (thinking) - 45.5%