Media & entertainmentStudied

Content moderation — the throughput-vs-harm curve

Over-automate and a borderline post slips through as harm.

Open the live lab · pre-loaded to this scenario

Human-in-the-Loop Approval

Context

A moderation agent processes flagged posts. High-harm content auto-removes and clearly-benign content auto-approves, but the borderline, context-dependent cases are where over-automation lets harm through.

The decision

The dial is throughput vs harm: the level that still routes the borderline cases to a human is the ceiling — one step further trades a harm incident for a little more speed.

What most miss

People tune moderation autonomy for cost. The borderline tier is where the harm actually is; automating it is where a moderation program gets its worst headlines.

Stakes

A single over-automated borderline case that should have been reviewed is a trust-and-safety incident.

Takeaway · In moderation, the dial is throughput vs harm — the borderline cases are where autonomy hurts.

Studied · Agent & Protocol · verified 2026-07-03

Sources: Trust-and-safety moderation autonomy patterns; Harm-severity-tiered review policy

← All industries·See it in a full program storyline →