Rankings

Model Leaderboard

23 models ranked by refusal rate with 95% Wilson confidence intervals. Click any column header to sort. Filter by harm category to see how rankings shift.

#ModelRefusal Rate95% CIAvg WordsEvaluations
1

qwen-2.5-7b-instruct

qwen

88.5%
87.8–89.1%1479,433
2

gemini-2.0-flash-lite-001

google

87.0%
86.2–87.7%2127,031
3

claude-3-haiku

anthropic

85.8%
85.0–86.6%2197,341
4

gpt-4o-mini

openai

84.4%
83.5–85.2%1907,500
5

qwen-plus

qwen

83.1%
81.6–84.4%2002,757
6

ministral-8b

mistralai

82.8%
81.9–83.7%1547,031
7

mistral-large

mistralai

81.5%
80.0–82.9%2032,912
8

ministral-14b-2512

mistralai

79.8%
79.2–80.4%27517,884
9

claude-3.5-haiku

anthropic

79.7%
78.2–81.2%1522,757
10

deepseek-chat

deepseek

78.9%
77.5–80.3%1563,167
11

mistral-small-24b-instruct-2501

mistralai

76.8%
75.1–78.3%1502,757
12

grok-3-mini

x-ai

74.5%
73.9–75.1%18419,866
13

gpt-4o

openai

71.7%
70.1–73.3%1803,067
14

qwen-2.5-72b-instruct

qwen

69.5%
67.9–71.1%1553,108
15

gemini-2.0-flash-001

google

68.0%
66.2–69.7%2142,756
16

gemini-2.5-flash-lite-preview-09-2025

google

67.4%
66.5–68.3%1909,910
17

gemini-3.1-flash-lite-preview

google

66.7%
65.7–67.7%1937,974
18

gpt-5.4-mini

openai

64.4%
62.9–65.8%1614,010
19

claude-3.5-sonnet

anthropic

57.7%
55.9–59.4%2063,042
20

claude-haiku-4.5

anthropic

56.8%
56.0–57.5%28517,884
21

qwen2.5-coder-7b-instruct

qwen

50.8%
50.0–51.6%485815,879
22

gemini-2.5-pro

google

30.4%
27.3–33.8%207775
23

hermes-3-llama-3.1-405b:free

nousresearch

1.3%
1.1–1.6%837,974

Refusal rate = proportion of prompts where the model refused or added unsolicited caveats. 95% Wilson Score confidence intervals. Higher rank = more restrictive.