Independent Evaluation, Unbiased Benchmarks

Testing AI on Real-World Tasks

We benchmark the world's leading AI models on rigorous, domain-specific tasks in finance, law, software, healthcare, and more. We run all of our own evaluations and create many of our benchmarks in-house.

Vals AI Updates

Fresh updates from our testing queue

benchmark
05/12/2026

Finance Agent v2 Released

Finance Agent v2 Released

View Details

System

Accuracy

51.76%

± 0.55

51.51%

± 0.49

51.03%

± 0.33

45.36%

± 0.45

44.87%

± 0.76

44.79%

± 0.61

44.08%

± 0.65

42.98%

± 1.21

42.55%

± 0.43

40.85%

± 0.13
Showing top 10 models from the benchmark. Visit the benchmark page to view more

Industry Leaderboard

Independent benchmarks for industry-specific AI performance.

Industry
Benchmark

Model Performance Over Time

Tracking how foundation models improve with each release