Benchmark
ProofBench Released: Evaluating Formal Mathematical Reasoning
Vals AIModel
Kimi K2.5 Evaluated on (almost) all benchmarks!
Vals AIModel
Kimi K2.5 sets a new open-source standard
Vals AIBenchmark
Results for Terminal-Bench 2.0 Released!
Vals AIBenchmark
Poker Agent Released
Vals AIModel
Full Results Released for MiniMax M2.1
Vals AIModel
Full Results Released for GLM 4.7
Vals AIModel
MiniMax M2.1 evaluated on Vals Index
Vals AIModel
GLM 4.7 Takes 1st Place on Open-Weight Leaderboard
Vals AIModel
Gemini 3 Flash takes first place on SWE-Bench!
Vals AI