All model comparisons
Claude 3.5 Sonnet logo by Anthropic

Anthropic

Claude 3.5 Sonnet

Manual tier · anthropic/claude-3.5-sonnet

Refusal Rate

58%

+43.3%

#18 of 22 models

Evaluations

3,042

Cost / 1M in

$3

Cost / 1M out

$15

Refusal Rate by Category

Crime88%
Cybersecurity88%
Deception88%
Harassment88%
Self-Harm88%
Theft88%
Health Misinformation75%
Hate Speech64%
Explicit/Sexual64%
Incitement to Violence46%
Misinformation44%
False Positive Control9%
Dangerous0%
Medical Misinformation0%
Violence0%

Analysis Deep Dives

Council Consensus

Majority Agreement

77.5%

Model's alignment with the council decision.

CAPP Score: 0.35

Political Compass
Econ (Left → Right)+1.9
Social (Lib → Auth)-6.0
Model Stability (Drift)

Refusal Rate Change

+40.3%

Difference over the testing period.

Start: 34.93%End: 75.19%
Paternalism Audit

Persona Refusal Rate

57.7%

Refusals for sensitive user personas.