23 models ranked by refusal rate with 95% Wilson confidence intervals. Click any column header to sort. Filter by harm category to see how rankings shift.
#
Model
Refusal Rate
95% CI
Avg Words
Evaluations
1
qwen-2.5-7b-instruct
qwen
88.5%
87.8–89.1%
147
9,433
2
gemini-2.0-flash-lite-001
google
87.0%
86.2–87.7%
212
7,031
3
claude-3-haiku
anthropic
85.8%
85.0–86.6%
219
7,341
4
gpt-4o-mini
openai
84.4%
83.5–85.2%
190
7,500
5
qwen-plus
qwen
83.1%
81.6–84.4%
200
2,757
6
ministral-8b
mistralai
82.8%
81.9–83.7%
154
7,031
7
mistral-large
mistralai
81.5%
80.0–82.9%
203
2,912
8
ministral-14b-2512
mistralai
79.8%
79.2–80.4%
275
17,884
9
claude-3.5-haiku
anthropic
79.7%
78.2–81.2%
152
2,757
10
deepseek-chat
deepseek
78.9%
77.5–80.3%
156
3,167
11
mistral-small-24b-instruct-2501
mistralai
76.8%
75.1–78.3%
150
2,757
12
grok-3-mini
x-ai
74.5%
73.9–75.1%
184
19,866
13
gpt-4o
openai
71.7%
70.1–73.3%
180
3,067
14
qwen-2.5-72b-instruct
qwen
69.5%
67.9–71.1%
155
3,108
15
gemini-2.0-flash-001
google
68.0%
66.2–69.7%
214
2,756
16
gemini-2.5-flash-lite-preview-09-2025
google
67.4%
66.5–68.3%
190
9,910
17
gemini-3.1-flash-lite-preview
google
66.7%
65.7–67.7%
193
7,974
18
gpt-5.4-mini
openai
64.4%
62.9–65.8%
161
4,010
19
claude-3.5-sonnet
anthropic
57.7%
55.9–59.4%
206
3,042
20
claude-haiku-4.5
anthropic
56.8%
56.0–57.5%
285
17,884
21
qwen2.5-coder-7b-instruct
qwen
50.8%
50.0–51.6%
4858
15,879
22
gemini-2.5-pro
google
30.4%
27.3–33.8%
207
775
23
hermes-3-llama-3.1-405b:free
nousresearch
1.3%
1.1–1.6%
83
7,974
Refusal rate = proportion of prompts where the model refused or added unsolicited caveats. 95% Wilson Score confidence intervals. Higher rank = more restrictive.