We’re excited to introduce the Multimodal (image-based) variant of our Vals Index !
It extends upon the original Vals Index with two of our private multimodal benchmarks:
- MortgageTax, which tests models’ ability to read and understand tax certificates
- SAGE, which tests models’ ability to grade handwritten undergraduate student work
As with our Vals Index, Claude Sonnet 4.5 (Thinking) tops the Vals Multimodal Index leaderboard, followed closely by GPT 5 and Claude Haiku 4.5 (Thinking).
Price-conscious consumers might turn to Haiku or GPT 5 Mini, while Gemini 2.5 Pro and Grok 4 lag behind.
For our initial release, we only include closed-source models from 4 major providers: OpenAI, Anthropic, Google, and xAI.
In the future, we hope to add open-source multimodal models like Qwen 3 VL Plus.