AI SafetyJune 25, 20264 min read

Authors Guild test: Pangram and Grammarly spot human writing

The Authors Guild ran ten 2020–2022 articles through detectors; Pangram and Grammarly labeled every text human while Sidekicker flagged all.

The BrieftideJune 25, 2026

TL;DR

01The Authors Guild ran ten 2020–2022 articles through detectors; Pangram and Grammarly labeled every text human while Sidekicker flagged all.
02Pangram and Grammarly correctly labeled all ten Guild articles as human-written, while Sidekicker marked every item as mostly AI, and ZeroGPT produced inconsistent, sometimes high AI scores.
03Originality.ai "also performed well." The sample consisted of ten articles published between 2020 and 2022, before generative AI went mainstream.

Authors Guild ran ten human-written articles published between 2020 and 2022 through five AI-detection tools and found stark differences: Pangram and Grammarly identified every article as human, while Sidekicker flagged every single article as mostly AI-generated, with two at 100 percent.

What did the Authors Guild test find?

Pangram and Grammarly correctly labeled all ten Guild articles as human-written, while Sidekicker marked every item as mostly AI, and ZeroGPT produced inconsistent, sometimes high AI scores. Originality.ai "also performed well." The sample consisted of ten articles published between 2020 and 2022, before generative AI went mainstream.

The test table shows wide variance across individual pieces. For example, Sidekicker scored "Antitrust Litigation & Publications" at 100.0 percent AI while ZeroGPT scored that same article at 5.3 percent and Originality.ai at 0.0 percent. In other rows, Sidekicker returned 85.0 percent for "Obscenity Petitions Dismissed," while Pangram and Grammarly returned 0.0 percent for that piece.

How did each detector behave and why do they differ?

Pangram and Grammarly were tuned to avoid false positives in this sample: both returned 0.0 percent AI for the majority of the ten articles, effectively identifying every text as human. Originality.ai often returned 0.0 percent as well, and is described as having "performed well." ZeroGPT reported nonzero percentages in several cases (for example, 66.0 percent on the "Obituary: Joan Didion" piece and 64.5 percent on "Banned Books Club"), producing unreliable high AI percentages for clearly human texts. Sidekicker flagged every article as mostly AI-generated, with values like 96.0 percent for "Copyright Claims Board" and 100.0 percent for both "Antitrust Litigation & Publications" and "Erdrich Pulitzer Prize."

Pangram CEO Max Spero framed his detector as a black box but argued that models betray themselves through uniformity, saying, "Language models do give themselves away through uniformity, though, especially in how they build arguments. Humans write with far more variety," which explains why Pangram aims to minimize false positives. The Authors Guild cautioned that many professionally written texts share statistical patterns with model output because language models were trained on that kind of writing, so detection is not straightforward.

Why it matters

Detectors can produce false positives that have real consequences: "False positives can cost authors their contracts and their reputations," the Guild warns. Publishers and institutions that act on detector output risk unfairly penalizing skilled writers whose concise, polished prose resembles model output. The Guild recommends disclosing methods and giving authors a chance to defend themselves, since the tools change constantly and accuracy cannot be assumed.

The test also shows a trade-off between minimizing false positives and catching AI-written material. Tools that return zero AI for human texts may still miss AI-generated content, while tools that aggressively flag content risk false accusations. That split matters for publishers, editors, and authors negotiating contracts or enforcing disclosure rules.

What to watch

Watch for follow-up tests that include confirmed AI-generated samples to measure true positive rates; the Authors Guild’s current sample measured only how detectors classified human-written texts. Also watch how vendors change thresholds and whether publishers adopt requirements to disclose detector methods and offer appeals, as the Guild recommends.

The test provides a concrete baseline: ten pre-AI-era articles, two detectors that flagged all as human, one that flagged all as AI, and others producing mixed results. Those precise outcomes should guide decisions about whether and how to use these tools in editorial and contractual contexts.

Authors Guild test: detector scores for ten human-written articles (percent AI)

Item
Obscenity Petitions Dismissed	14.3%	0.0%	85.0%	0.0%	0.0%
Antitrust Litigation & Publications	5.3%	0.0%	100.0%	0.0%	0.0%
Warhol Fair Use Letter	40.7%	0.0%	79.0%	0.0%	0.0%
Copyright Claims Board	28.1%	0.0%	96.0%	0.0%	0.0%
Banned Books Club	64.5%	1.0%	71.0%	0.0%	0.0%
Kiss Library Piracy Lawsuit	26.5%	1.0%	71.0%	7.0%	0.0%
Obituary: Joan Didion	66.0%	0.0%	82.0%	9.0%	0.0%
Erdrich Pulitzer Prize	76.3%	0.0%	100.0%	0.0%	0.0%
Support Authors & Literary Arts	50.6%	0.0%	92.0%	0.0%	0.0%
The Roundup 12/2020	18.1%	0.0%	96.0%	0.0%	0.0%

Written by The Brieftide · Source: The Decoder

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Human-centric AI and firm idiosyncratic risks, 2015–2023

Human-centric AI strategies are associated with lower firm idiosyncratic risk among Chinese listed firms.

The BrieftideDAILY BRIEF

OpenAI joins Appia Foundation to build shared AI standards

OpenAI supports evaluation frameworks, safety practices and global cooperation through the Appia Foundation.

The BrieftideDAILY BRIEF

AI4SE and SE4AI: A decade review of AI in systems engineering

H. Sinan Bank, Daniel R. Herber and Thomas Bradley map three research phases and assess 1.

The BrieftideDAILY BRIEF

Dario Amodei's AI playbook: Anthropic's regulation plan

Amodei urges binding third-party audits, federal power to block risky models, export controls.