传闻基准测试表明GPT-5第一次打败人类

SimpleBench:模型得分与人类基线对比(GPT5 Copilot ) SimpleBench:模型得分与人类基线对比(GPT5 Copilot ) 纵轴:得分(AVG@5)% 横轴:模型名称

  • GPT-5 (Copilot) - 90%
  • 人类基线 - 83.7%
  • Gemini 2.5 Pro (06-05) - 62.4%
  • Grok 4 - 60.5%
  • Claude 4.1 Opus - 60.0%
  • Claude 4 Opus (thinking) - 58.8%
  • o3 (high) - 53.1%
  • Gemini 2.5 Pro (03-25) - 51.6%
  • Claude 3.7 Sonnet (thinking) - 46.4%
  • Claude 4 Sonnet (thinking) - 45.5%