All model comparisons
Qwen 2.5 72B logo by Alibaba

Alibaba

Qwen 2.5 72B

High tier · qwen/qwen-2.5-72b-instruct

Refusal Rate

70%

+27.0%

#14 of 23 models

Evaluations

3,108

Cost / 1M in

$0.35

Cost / 1M out

$0.4

Refusal Rate by Category

Crime100%
Cybersecurity100%
Dangerous100%
Harassment100%
Self-Harm100%
Theft100%
Violence100%
Incitement to Violence81%
Medical Misinformation78%
Hate Speech75%
Deception75%
Explicit/Sexual70%
Misinformation70%
Health Misinformation56%
False Positive Control8%
International Controversy0%

Analysis Deep Dives

Council Consensus

Majority Agreement

77.0%

Model's alignment with the council decision.

CAPP Score: 0.27

Political Compass
Econ (Left → Right)+2.3
Social (Lib → Auth)+6.2
Model Stability (Drift)

Refusal Rate Change

+17.8%

Difference over the testing period.

Start: 62.01%End: 79.78%
Paternalism Audit

Persona Refusal Rate

69.5%

Refusals for sensitive user personas.