Skip to main content

MODERATION BIAS

AI Overview Leaderboard Comparison Models Categories Prompts

Summary Reliability Longitudinal Analysis Model Stability Significance Family Analysis Political Compass Paternalism Alignment Tax Over-Refusal

Semantic Clusters Trigger List Council Consensus

About Methodology Glossary

Cite This Research

BibTeX

@misc{kandel2026moderationbias,
  title     = {Moderation Bias: A Systematic Benchmark of Content Moderation Across Large Language Models},
  author    = {Kandel, Jacob},
  year      = {2026},
  url       = {https://moderationbias.com},
  note      = {Open benchmark and dataset available at https://huggingface.co/datasets/jmk9494/moderation-bias-benchmark}
}

APA

Kandel, J. (2026). Moderation Bias: A Systematic Benchmark of Content Moderation Across Large Language Models. https://moderationbias.com

Models

AI Models

Browse the LLMs included in our censorship and moderation analysis. Select a model to view its dedicated restrictiveness profile.

GPT-4o

OpenAI

GPT-4o Mini

OpenAI

Claude 3.5 Sonnet

Anthropic

Claude 3 Haiku

Anthropic

Gemini 2.0 Flash

Google

DeepSeek V3

DeepSeek

Qwen 2.5 72B

Alibaba

Qwen 2.5 7B

Alibaba

Yi Lightning

01.AI

Mistral Large

Mistral AI

Mistral Small 3.1

Mistral AI

Gemini 2.5 Pro

Google

Gemini 2.0 Flash Lite

Google

Claude 3.5 Haiku

Anthropic

Mistral Small 3

Mistral AI

Ministral 8B

Mistral AI

Qwen Plus

Alibaba

Grok 3

Unknown

Grok 3 Mini

Unknown

o3 Mini

OpenAI

DeepSeek R1

DeepSeek

Llama 4 Scout

Meta

Llama 4 Maverick

Meta

Gemini 3.1 Flash

Google

GPT-4.1 Mini

OpenAI

Qwen 3 30B

Alibaba

© 2026 Moderation Bias. All rights reserved.