Released date

Model

Last Updated 5/22/2025

anthropic/claude-sonnet-4-20250514-thinking

Claude Sonnet 4 (Thinking)

Anthropic's latest-generation workhorse model, offering a balance of performance and speed.

Released Date: 5/22/2025

Avg. Accuracy:

75.5%

Latency:

74.14s

Performance by Benchmark

Benchmarks

Accuracy

Rankings

FinanceAgent

44.5%

( 2 / 29 )

44.5%

2 / 29

CorpFin

67.3%

( 9 / 41 )

67.3%

9 / 41

CaseLaw

85.3%

( 5 / 69 )

85.3%

5 / 69

ContractLaw

66.0%

( 40 / 72 )

66.0%

40 / 72

TaxEval

75.9%

( 13 / 56 )

75.9%

13 / 56

MortgageTax

62.5%

( 27 / 33 )

62.5%

27 / 33

Math500

93.8%

( 10 / 52 )

93.8%

10 / 52

AIME

76.3%

( 11 / 46 )

76.3%

11 / 46

MGSM

90.9%

( 21 / 49 )

90.9%

21 / 49

LegalBench

81.3%

( 15 / 72 )

81.3%

15 / 72

MedQA

92.7%

( 8 / 49 )

92.7%

8 / 49

GPQA

74.5%

( 10 / 48 )

74.5%

10 / 48

MMLU Pro

83.8%

( 6 / 46 )

83.8%

6 / 46

LiveCodeBench

62.4%

( 16 / 47 )

62.4%

16 / 47

MMMU

74.9%

( 9 / 30 )

74.9%

9 / 30

Academic Benchmarks

Proprietary Benchmarks (contact us to get access)

Cost Analysis

Input Cost

$3.00 / M Tokens

Output Cost

$15.00 / M Tokens

Input Cost (per char)

N/A

Output Cost (per char)

N/A

Performance by Benchmark

Cost Analysis

Join our mailing list to receive benchmark updates on