Updates
Model
02/03/2025
OpenAI's o3-mini Evaluated on All Benchmarks.
We just evaluated OpenAI’s o3-mini model!
- The model shows a good price-performance trade-off, reaching close to top places on our most recent and proprietary benchmarks like Tax Eval.
- However, o3-mini seems to struggle with large context windows, performing poorly on the Max Fitting Context task of CorpFin. It tends to lose the question if it is provided at the beginning of a large context window (around 150k tokens and more).
We have also run DeepSeek R1 on our CorpFin benchmark, on which it reaches the top place, beating all other models we have tested.
View Model Page
Model
01/28/2025
DeepSeek R1 Evaluated on TaxEval, CaseLaw, ContractLaw
🐳 We just evaluated DeepSeek’s R1 model on three of our private datasets! 🐳
- The model demonstrates its strong reasoning ability, rivaling Open AI’s o1 model on our Tax dataset.
- However, R1 performs extremely poorly on ContractLaw and with middling performance on CaseLaw. The model’s performance is not uniform, suggest task-specific evaluation must be done before adoption
- Overall, this large Chinese model shows impressive ability and further closes the gap between closed and open-source models.
View Model Page
Benchmark
01/27/2025
Two New Proprietary Benchmarks Released
We just released two new benchmarks!
- We have released a completely new version of our CorpFin benchmark - with 1200 expert generated financial questions on very long context docs (200-300 pages).
- We have also released a completely new TaxEval benchmark, with more than 1500 expert reviewed tax questions.
We also are releasing several new models such as Grok 2 and Gemini 2.0 Flash Exp.
View Benchmarks
Benchmarks
View All
Latest Model Releases
OpenAI O3 Mini
Release date : 1/31/2025
DeepSeek R1
Release date : 1/20/2025
DeepSeek V3
Release date : 12/26/2024
o1
Release date : 12/17/2024