We evaluated Magistral Medium 1.2 (09/2025) and Magistral Small 1.2 (09/2025) - and found that both models perform decently for their size, especially on coding tasks. However, the models also struggled on many benchmarks.
- Magistral Medium performs well on academic and coding benchmarks, placing in the top 20 on LiveCodeBench and AIME. However, the model struggles on our proprietary benchmarks, particularly MortgageTax and CaseLaw.
- Surprisingly, Magistral Small tends to do better on finance and academic benchmarks, most notably outperforming Medium on MortgageTax (+8.8%). The model also does well on LiveCodeBench and AIME. However, Small struggled on our proprietary CorpFin and CaseLaw benchmarks, along with GPQA and MMLU Pro.
- A large chunk of the performance loss was the result of models not outputting results in the format that was required.
The Medium model is priced at $2 / $5, and the Small at $0.5 / $1.5. The Small model has open weights, whereas the Medium model is only available via API.