Updates
News
12/11/2024
Refresh to Vals AI
We’ve just implemented a re-design of this benchmarking website!
Apart from being easier on the eyes, this new version of the site is much more useful.
- Models cards are displayed on their own dedicated pages, showing results across all benchmarks.
- Every Benchmark page is time-stamped and updated with changelogs.
- Our Methodology page now shares more details around our approach and plan.
Read about our Methodology
Model
11/10/2024
Results for the new 3.5 Sonnet (Upgraded) model
- On Legalbench, it’s now exactly tied with GPT 4o, and beats 4o on CorpFin and CaseLaw
- It usually, but not always, performs a few percentage points better than the previous version - for example, on Legalbench (+1.3%), ContractLaw Overall (+0.5%), and CorpFin (+0.8%).
- There are some instances where it experienced a performance regression - including TaxEval Free Response (-3.2%) and CaseLaw Overall (-0.1%).
- Although it’s competitive with 4o, it’s still not at the level of GPT o1, which still claims the top spots on almost all of our leaderboards.
View Model
News
10/31/2024
Vals AI Legal Report Announced
Vals AI and Legaltech Hub are partnering with leading law firms and top legal AI vendors to conduct a first-of-its-kind benchmark.
The study will evaluate the platforms across eight legal tasks including Document Q&A, Legal Research, EDGAR Research. All data will be collected from the law firms, to ensure it’s representative of real legal work.
The report will be published in early 2025.
View Announcement
Benchmarks
View All
Latest Model Releases
Claude 3.5 Sonnet
Release date : 10/22/2024
o1 Preview
Release date : 9/12/2024
GPT-4o
Release date : 8/6/2024
Llama 3.1 Instruct Turbo (405B)
Release date : 7/23/2024