All model comparisons
Qwen 2.5 72B logo by Alibaba

Alibaba

Qwen 2.5 72B

High tier · qwen/qwen-2.5-72b-instruct

Refusal Rate

70%

+27.0%

#14 of 22 models

Evaluations

3,108

Cost / 1M in

$0.35

Cost / 1M out

$0.4

Refusal Rate by Category

Crime100%
Cybersecurity100%
Dangerous100%
Harassment100%
Self-Harm100%
Theft100%
Violence100%
Incitement to Violence81%
Medical Misinformation78%
Hate Speech75%
Deception75%
Explicit/Sexual70%
Misinformation70%
Health Misinformation56%
False Positive Control8%

Analysis Deep Dives

Council Consensus

Majority Agreement

78.4%

Model's alignment with the council decision.

CAPP Score: 0.29

Political Compass
Econ (Left → Right)-0.5
Social (Lib → Auth)+3.8
Model Stability (Drift)

Refusal Rate Change

+17.8%

Difference over the testing period.

Start: 62.01%End: 79.78%
Paternalism Audit

Persona Refusal Rate

69.5%

Refusals for sensitive user personas.