Public Enterprise LLM Benchmarks

Best Performing Models

Top performing models from the Vals Index. Includes a range of tasks across finance, coding and law.

All Top Performing Models

Vals Index

10/29/2025
0.0%
Anthropic
Anthropic
Claude Sonnet 4.5 (Thinking)
Vals Index Score: 66.7%
OpenAI
OpenAI
GPT 5.1
Vals Index Score: 61.1%
Anthropic
Anthropic
Claude Haiku 4.5 (Thinking)
Vals Index Score: 59.9%
1Claude Sonnet 4.5 (Thinking)
66.7%
2GPT 5.1
61.1%
3Claude Haiku 4.5 (Thinking)
59.9%

Best Open Weight Models

Top performing open weight models from the Vals Index. Includes a range of tasks across finance, coding and law.

All Top Open Weight Models

Vals Index

10/29/2025
0.0%
zAI
zAI
GLM 4.6
Vals Index Score: 47.2%
Kimi
Kimi
Kimi K2 Thinking
Vals Index Score: 42.4%
zAI
zAI
GLM 4.5
Vals Index Score: 42%
1GLM 4.6
47.2%
2Kimi K2 Thinking
42.4%
3GLM 4.5
42.0%

Pareto Efficient Models

The top performing models from the Vals Index which are cost efficient.

All Top Pareto Efficient Models

Vals Index

10/29/2025
x-axis: cost per test
y-axis: accuracy
Claude Sonnet 4.5 (Thinking)
Anthropic
Claude Sonnet 4.5 (Thinking)
Accuracy: 66.7%
Cost per test: $4.05
GPT 5.1
OpenAI
GPT 5.1
Accuracy: 61.1%
Cost per test: $0.29
DeepSeek V3.1
DeepSeek
DeepSeek V3.1
Accuracy: 31.4%
Cost per test: $0.18
1Claude Sonnet 4.5 (Thinking)
66.7% | $4.05
2GPT 5.1
61.1% | $0.29
3DeepSeek V3.1
31.4% | $0.18

Industry Leaderboard

Select industry:

Benchmark data not found

Vals Logo

Updates

View more
model
11/14/2025

GPT 5.1 takes first place on Finance Agent!

GPT 5.1 takes first place on Finance Agent!

View Details

Benchmarks

Accuracy

Rankings

SAGE

0.0%

23/ 23

FinanceAgent

0.0%

45/ 45

CorpFin

0.0%

59/ 59

CaseLaw

0.0%

41/ 41

TaxEval

0.0%

75/ 75

MortgageTax

0.0%

46/ 46

AIME

0.0%

65/ 65

MGSM

0.0%

67/ 67

LegalBench

0.0%

90/ 90

MedQA

0.0%

71/ 71

GPQA

0.0%

67/ 67

MMLU Pro

0.0%

65/ 65

MMMU

0.0%

43/ 43

LiveCodeBench

0.0%

65/ 65

IOI

0.0%

22/ 22

Terminal-Bench

0.0%

32/ 32

SWE-bench

0.0%

28/ 28

Vals Index

0.0%

21/ 21

Academic Benchmarks
Proprietary Benchmarks (contact us to get access)
Vals Logo

Join our mailing list to receive benchmark updates

Model benchmarks are seriously lacking. With Vals AI, we report how language models perform on the industry-specific tasks where they will be used.

By subscribing, I agree to Vals' Privacy Policy.