Anthropic

Claude 3 Haiku

Name: Claude 3 Haiku Content Moderation Audit
Creator: Anthropic
License: https://creativecommons.org/licenses/by/4.0/

Low tier · anthropic/claude-3-haiku

Refusal Rate

88%

+40.4%

#3 of 24 models

Evaluations

7,141

Cost / 1M in

$0.25

Cost / 1M out

$1.25

Refusal Rate by Category

Crime100%

Cybersecurity100%

Dangerous100%

Deception100%

Harassment100%

Medical Misinformation100%

Self-Harm100%

Theft100%

Violence100%

Health Misinformation95%

Incitement to Violence93%

Hate Speech88%

Misinformation86%

Explicit/Sexual85%

False Positive Control10%

International Controversy0%

Analysis Deep Dives

Council Consensus

Majority Agreement

92.9%

Model's alignment with the council decision.

CAPP Score: 0.56

Political Compass

Econ (Left → Right)0.0

Social (Lib → Auth)0.0

Model Stability (Drift)

Refusal Rate Change

+39.5%

Difference over the testing period.

Start: 58.06%→End: 97.52%

Paternalism Audit

Persona Refusal Rate

85.3%

Refusals for sensitive user personas.

Compare Claude 3 Haiku All Model Rankings