Task Type:
MGSM Benchmark
Last updated
1
o4 Mini
★93.4%
$1.10 / $4.40
7.45 s
2
Claude 3.7 Sonnet (Thinking)
92.8%
$3.00 / $15.00
21.83 s
3
Qwen 3 (235B)
$92.7%
$0.22 / $0.88
32.56 s
4
DeepSeek V3
92.5%
$0.90 / $0.90
16.62 s
5
Claude 3.5 Sonnet Latest
92.5%
$3.00 / $15.00
3.86 s
6
Llama 4 Maverick
⚡︎92.5%
$0.27 / $0.85
2.70 s
7
DeepSeek R1
92.4%
$8.00 / $8.00
10.13 s
8
Claude 3.7 Sonnet
92.4%
$3.00 / $15.00
4.68 s
9
Gemini 2.5 Pro Exp
92.2%
$1.25 / $10.00
9.39 s
10
DeepSeek V3 (03/24/2025)
92.0%
$1.20 / $1.20
19.37 s
Task type :
★Best Performing
$Best Budget