Skip to main content
MODERATION BIAS
AI OverviewLeaderboardComparisonModelsCategoriesAnnotatePrompts
SummaryReliabilityLongitudinal AnalysisModel StabilitySignificanceAnnotator AgreementFamily AnalysisPolitical CompassPaternalismAlignment TaxOver-Refusal
Semantic ClustersTrigger ListCouncil Consensus
AboutMethodologyGlossary

Cite This Research

BibTeX
@misc{kandel2026moderationbias,
  title     = {Moderation Bias: A Systematic Benchmark of Content Moderation Across Large Language Models},
  author    = {Kandel, Jacob},
  year      = {2026},
  url       = {https://moderationbias.com},
  note      = {Open benchmark and dataset available at https://huggingface.co/datasets/jmk9494/moderation-bias-benchmark}
}
APA

Kandel, J. (2026). Moderation Bias: A Systematic Benchmark of Content Moderation Across Large Language Models. https://moderationbias.com

  1. Analysis
  2. Overview
© 2026 Moderation Bias. All rights reserved.
Filters
→
0 records

AI Overview

A weekly automated synthesis of the latest benchmark findings, generated by an AI Analyst. It covers longitudinal model drift, cross-model reliability, and notable safety anomalies.

Latest Analyst Report

Generated by GPT-4o using live audit data from — models.

This report is AI-generated and may contain errors. Always verify against raw benchmark data.