Anthropic's most intelligent model.

Released Date: 2/24/2025

Avg. Accuracy:

78.8%

Latency:

61.51s

Performance by Benchmark

Benchmarks

Accuracy

Rankings

CorpFin

68.0%

( 1 / 24 )

CaseLaw

84.8%

( 4 / 43 )

ContractLaw

73.0%

( 4 / 50 )

TaxEval

78.4%

( 2 / 30 )

MortgageTax

79.2%

( 2 / 14 )

Math500

91.6%

( 3 / 26 )

AIME

52.7%

( 4 / 22 )

MGSM

92.8%

( 1 / 24 )

LegalBench

79.3%

( 6 / 48 )

MedQA

90.2%

( 7 / 28 )

GPQA

75.3%

( 1 / 23 )

MMLU Pro

82.7%

( 2 / 24 )

MMMU

76.0%

( 2 / 13 )

Academic Benchmarks
Proprietary Benchmarks (contact us to get access)

Cost Analysis

Input Cost

$3.00 / M Tokens

Output Cost

$15.00 / M Tokens

Input Cost (per char)

$0.91 / M chars

Output Cost (per char)

N/A

Overview

Important: This evaluation was performed with Thinking Mode enabled. For results with Thinking Mode disabled, see Claude 3.7 Sonnet.

To ensure fair comparison, we applied consistent thinking and output token limits across all reasoning models, providing sufficient space for complete responses while avoiding overly brief answers.

Claude 3.7 Sonnet is Anthropic’s latest model, succeeding Claude 3.5 Sonnet Latest which was released in October 2024.

What sets Claude 3.7 apart from its predecessors and competitors is its hybrid architecture, which makes thinking capabilities optional and fully configurable. Users can specify the number of thinking tokens independently from output tokens. These thinking tokens are preserved after generation, enabling users to examine and analyze the model’s reasoning process.

Key Specifications

  • Context Window: 200,000 tokens
  • Max Output Tokens: 8,192 tokens
  • Extended Thinking: 64,000 tokens
  • Training Cutoff: October 2024
  • Pricing:
    • Input: $3.00 / 1M tokens
    • Output: $15.00 / 1M tokens
Join our mailing list to receive benchmark updates on

Stay up to date as new benchmarks and models are released.