Best Performing Models

Top performing models from the Vals Index. Includes a range of tasks across finance, coding and law.

Vals Index

10/16/2025
0.0%
Anthropic
Anthropic
Claude Sonnet 4.5 (Thinking)
Vals Index Score: 66.6%
OpenAI
OpenAI
GPT 5
Vals Index Score: 62.3%
Anthropic
Anthropic
Claude Haiku 4.5 (Thinking)
Vals Index Score: 60%
1Claude Sonnet 4.5 (Thinking)
66.6%
2GPT 5
62.3%
3Claude Haiku 4.5 (Thinking)
60%

Best Open Weight Models

Top performing open weight models from the Vals Index. Includes a range of tasks across finance, coding and law.

Vals Index

10/16/2025
0.0%
Alibaba
Alibaba
Qwen 3 Max
Vals Index Score: 50.9%
zAI
zAI
GLM 4.5
Vals Index Score: 42%
OpenAI
OpenAI
GPT OSS 120B
Vals Index Score: 39.3%
1Qwen 3 Max
50.9%
2GLM 4.5
42%
3GPT OSS 120B
39.3%

Pareto Efficient Models

The top performing models from the Vals Index which are cost efficient.

Vals Index

10/16/2025
x-axis: cost per test
y-axis: accuracy
Anthropic
Claude Sonnet 4.5 (Thinking)
Accuracy: 66.6%
Cost per test: $4.05
OpenAI
GPT 5
Accuracy: 62.3%
Cost per test: $2.46
Google
Gemini 2.5 Pro
Accuracy: 51.2%
Cost per test: $0.85
1Claude Sonnet 4.5 (Thinking)
66.6% | $4.05
2GPT 5
62.3% | $2.46
3Gemini 2.5 Pro
51.2% | $0.85
Featured in
Washington Post
Aug 27, 2025

We tested which AI gave the best answers without making stuff up. One beat ChatGPT.

Washington Post

Industry Leaderboard

Select industry:

Legal Industry

Benchmarks in this category

Benchmark data not found

Updates

View more
model
10/16/2025

Claude Haiku 4.5 Evaluated on All Benchmarks!

Claude Haiku 4.5 Evaluated on All Benchmarks!

View Details

Benchmarks

Accuracy

Rankings

SAGE

0.0%

22/ 22

FinanceAgent

0.0%

43/ 43

CorpFin

0.0%

56/ 56

CaseLaw

0.0%

39/ 39

TaxEval

0.0%

73/ 73

MortgageTax

0.0%

45/ 45

AIME

0.0%

63/ 63

MGSM

0.0%

66/ 66

LegalBench

0.0%

88/ 88

MedQA

0.0%

69/ 69

GPQA

0.0%

65/ 65

MMLU Pro

0.0%

63/ 63

MMMU

0.0%

42/ 42

LiveCodeBench

0.0%

62/ 62

IOI

0.0%

20/ 20

Terminal-Bench

0.0%

28/ 28

Vals Index

0.0%

18/ 18

Academic Benchmarks
Proprietary Benchmarks (contact us to get access)

Join our mailing list to receive benchmark updates

Model benchmarks are seriously lacking. With Vals AI, we report how language models perform on the industry-specific tasks where they will be used.

By subscribing, I agree to Vals' Privacy Policy.