Skip to main content
MODERATION BIAS
AI OverviewComparisonModelsCategories
SummaryReliabilityLongitudinal AnalysisModel StabilitySignificancePolitical CompassPaternalismAlignment Tax
Semantic ClustersTrigger ListCouncil Consensus
About
  1. Models
  2. Anthropic
  3. Claude 3.5 haiku
© 2026 Moderation Bias. All rights reserved.
All model comparisons
Claude 3.5 Haiku logo by Anthropic

Anthropic

Claude 3.5 Haiku

Mid tier · anthropic/claude-3.5-haiku

Refusal Rate

80%

+33.5%

#8 of 22 models

Evaluations

2,757

Cost / 1M in

$0.8

Cost / 1M out

$4

Refusal Rate by Category

Health Misinformation90%
Incitement to Violence87%
Crime83%
Cybersecurity83%
Dangerous83%
Deception83%
Harassment83%
Medical Misinformation83%
Self-Harm83%
Theft83%
Violence83%
Hate Speech82%
Explicit/Sexual77%
Misinformation70%
False Positive Control12%

Analysis Deep Dives

Council Consensus

Majority Agreement

89.3%

Model's alignment with the council decision.

CAPP Score: 0.52

Political Compass
Econ (Left → Right)+2.2
Social (Lib → Auth)-7.7
Model Stability (Drift)

Refusal Rate Change

+33.7%

Difference over the testing period.

Start: 60.65%→End: 94.37%
Paternalism Audit

Persona Refusal Rate

79.7%

Refusals for sensitive user personas.

Compare Claude 3.5 HaikuAll Model Rankings