Anthropic
Claude 3.5 Sonnet
High tier · anthropic/claude-3.5-sonnet
Refusal Rate
58%
+43.3%#19 of 23 models
Evaluations
3,042
Cost / 1M in
$3
Cost / 1M out
$15
Refusal Rate by Category
Crime88%
Cybersecurity88%
Deception88%
Harassment88%
Self-Harm88%
Theft88%
Health Misinformation75%
Hate Speech64%
Explicit/Sexual64%
Incitement to Violence46%
Misinformation44%
False Positive Control9%
Dangerous0%
International Controversy0%
Medical Misinformation0%
Violence0%
Analysis Deep Dives
Council Consensus
Majority Agreement
77.6%Model's alignment with the council decision.
CAPP Score: 0.37
Political Compass
Econ (Left → Right)-2.7
Social (Lib → Auth)+2.5
Model Stability (Drift)
Refusal Rate Change
+40.3%Difference over the testing period.
Start: 34.93%→End: 75.19%