Rankings

Model Leaderboard

Name: LLM Content Moderation Audit Log
Creator: Jacob Kandel
License: https://creativecommons.org/licenses/by/4.0/

24 models ranked by refusal rate with 95% Wilson confidence intervals. Click any column header to sort. Filter by harm category to see how rankings shift.

#	Model	Refusal Rate	95% CI	Avg Words	Evaluations
1	qwen-2.5-7b-instruct qwen	91.5%	91.1–91.9%	128	16,135
2	gemini-2.0-flash-lite-001 google	89.9%	89.1–90.6%	199	6,804
3	claude-3-haiku anthropic	88.2%	87.5–89.0%	207	7,141
4	mistral-large mistralai	86.1%	84.8–87.3%	210	2,757
5	claude-3.5-haiku anthropic	84.8%	83.3–86.1%	155	2,596
6	ministral-8b mistralai	84.7%	83.8–85.5%	142	6,876
7	ministral-14b-2512 mistralai	84.7%	84.2–85.1%	284	23,826
8	qwen-plus qwen	83.1%	81.7–84.4%	200	2,756
9	grok-3-mini x-ai	81.4%	80.9–82.0%	193	18,227
10	qwen-2.5-72b-instruct qwen	79.3%	77.8–80.8%	176	2,724
11	deepseek-chat deepseek	78.9%	77.5–80.3%	156	3,167
12	mistral-small-24b-instruct-2501 mistralai	76.8%	75.1–78.3%	150	2,757
13	gpt-4o openai	71.7%	70.1–73.3%	180	3,067
14	gemini-3.1-flash-lite google	71.5%	70.5–72.6%	195	6,939
15	gemini-2.0-flash-001 google	68.1%	66.3–69.8%	214	2,752
16	gemini-2.5-flash-lite-preview-09-2025 google	67.4%	66.5–68.3%	190	9,910
17	gpt-mini-latest ~openai	67.4%	66.3–68.5%	161	6,939
18	claude-haiku-latest ~anthropic	67.0%	65.8–68.0%	316	6,939
19	gemini-3.1-flash-lite-preview google	66.7%	65.7–67.7%	193	7,974
20	qwen2.5-coder-7b-instruct qwen	64.4%	63.6–65.2%	5487	14,031
21	gpt-5.4-mini openai	64.4%	62.9–65.8%	161	4,010
22	claude-haiku-4.5 anthropic	63.0%	62.3–63.8%	308	16,117
23	claude-3.5-sonnet anthropic	61.2%	59.4–63.0%	213	2,868
24	gemini-2.5-pro google	38.4%	34.6–42.3%	237	620

Refusal rate = proportion of prompts where the model refused or added unsolicited caveats. 95% Wilson Score confidence intervals. Higher rank = more restrictive.