California's AI Report: Why Human Override Fails

Tired businesswoman using laptop at table isolated on grey

Key quote:

Turnitin acknowledges that detection of AI writing is an evolving and imperfect science. External studies have found false positive rates, especially with non-native English writing and simple, formulaic essays that could be disproportionally flagged.

Why it matters:

California’s 2025 High-Risk ADS inventory is a representative case study for how state governments are integrating automated decision systems into core functions. The report identifies seven active or in-development systems used by the Department of Corrections and Rehabilitation (CDCR), California State University (CSU), the Employment Development Department (EDD), and the Department of Cannabis Control (DCC). While the technology isn’t unique to California, the regulatory approach reveals a systemic reliance on “human-in-the-loop” controls that might not scale effectively under operational pressures, particularly if there are staff cutbacks at a future date. Anyone who’s studied air crashes caused by automated systems will pick up on the problem pretty quickly.

To understand the scope, we can look at who’s affected, what agencies claim to gain, and how they propose to mitigate risks. Here are the seven high-risk systems detailed in the CDT report:

System	Natural Persons Affected	Claimed Benefits	Risk Mitigation Measures
CDCR – COMPAS	Incarcerated persons	Standardized rehab planning, resource allocation, recidivism reduction	Human override, independent validation studies, staff training
CDCR – CSRA	Incarcerated persons	Static risk classification, prioritized program placement, consistency	Manual verification capability, UC Irvine revalidation, appeal process
CDCR – PVDTS	Supervised persons	Consistent sanctioning, uniform parole discharge recommendations	Override capabilities, narrative justification for deviations, Parole Agent training
CSU – Proctorio	CSU Students	Exam integrity, cost reduction for proctoring, remote access support	FERPA compliance, faculty review of flags, student appeal process
CSU – Turnitin	CSU Students	Automated AI writing detection, academic integrity screening	Mandatory human review before discipline, faculty training on interpretation
EDD – UI Fraud Detect	Unemployment claimants	Faster identity verification, reduced false positives/negatives, fraud prevention	Mandatory human review of all outputs, internal audits, administrative law appeals
DCC – CPIA	Cannabis business licensees	Pre-screening packaging compliance, second point of analysis	Quarterly model drift reviews, no data retention, staff review upon disagreement

Across these seven systems, three dominant trends emerge regarding the nature of deployment and the stated safeguards. These patterns reveal a consistent strategy across corrections, education, and benefits administration:

Targeted Vulnerability: Every system targets a specific subset of the population where the state exercises direct authority, meaning there’s no dystopian “big brother” AI here affecting the general citizenry at large. Instead, the impact concentrates on those already within the state’s disciplinary or support loops, including inmates, students, claimants, and licensees.
Administrative Efficiency: The reported benefits follow a rigid pattern focused on consistency, cost reduction, and fraud prevention rather than improved individual well-being. Agencies consistently frame the value as moving away from subjective judgment toward data-driven consistency, but very few entries cite actual improvement for the people being scored.
Procedural Reliance: Risk mitigation strategies universally depend on human override and staff training instead of fixing the algorithm itself. This creates a fragile safety net that assumes perfect human performance under high-pressure conditions without addressing the root causes of errors.

The issue is the reliance on human override, which assumes infinite bandwidth for verification when reality often involves tight staffing and high volumes. For example, CDCR states parole agents can ignore COMPAS scores if professional judgment dictates, while EDD insists all fraud alerts undergo mandatory human review. Although these measures sound procedurally sound on paper, the “override” often becomes a formality rather than a meaningful check when workers face impossible caseloads.

The Michigan MiDAS settlement offers a precedent for this failure mode, proving that procedural promises don’t work under pressure. In Bauserman v. Michigan Unemployment Insurance Agency, the state settled a class-action lawsuit for $20 million after its automated system falsely accused approximately 3,000 residents of fraud. Even though the system operated with a claimed human oversight layer, the sheer volume of false positives overwhelmed the capacity for genuine review, leading to wrongful penalties and wage garnishments. This history suggests that procedural safeguards aren’t enough when the underlying AI outputs are flawed or the data volume exceeds human processing limits.

A better approach would treat governance as code instead of relying on post-hoc human overrides that assume automation bias isn’t a thing. Agencies should be embedding verification checks directly into the workflow logic by requiring real-time bias audits, mandatory explainability logs for every high-stakes decision, and automated rollback mechanisms if error rates exceed defined thresholds. The current “Human-in-the-Loop” defense treats the human as a safety valve, but in practice, automation complacency is a lot easier for people who aren’t affected by the outcomes.

The exclusion of high-profile pilot programs from the inventory further complicates the picture, suggesting that transparency remains a compliance exercise rather than a functional governance tool. The governor’s “Poppy” assistant and the CSU OpenAI contract weren’t included in the 2025 list despite their potential impact on public services, and Senate Bill 1248 failed during the legislative appropriations process. These are a gap between what’s being reported, and what’s actually being deployed across state agencies.

Auditing what exists isn’t the same as governing how it works or handling the outcomes. If an organization can’t demonstrate that its human reviewers can realistically process every high-impact decision generated by an algorithm, the governance process is the point of failure. The solution requires shifting from procedural promises to architectural constraints that ensure accountability regardless of human capacity.