Enterprise AI Adoption5 min read

26,000-student study: AI use cuts exam scores up to 24% in 2 yrs

AI users raised homework scores by 18% and sped completion, yet closed-book and entrance-exam scores fell up to 24%.

The Brieftide

TL;DR

  • 01AI users raised homework scores by 18% and sped completion, yet closed-book and entrance-exam scores fell up to 24%.
  • 02The researchers analyzed 30 months of monthly exams, homework scores and completion times, and entrance-exam results for grades 7 through 12 in a county of over one million residents.
  • 03Self-reported AI usage climbed from near zero to about 80 percent during the study, with big jumps coinciding with the releases of DeepSeek V2.5 in September 2024 and DeepSeek R1 in January 2025.

A 26,000-student panel study in central China finds that students who began using AI finished homework faster and earned higher homework grades, but saw large declines on closed-book tests and high-stakes entrance exams that unfolded over roughly two years. The researchers analyzed 30 months of monthly exams, homework scores and completion times, and entrance-exam results for grades 7 through 12 in a county of over one million residents.

What did the study find?

The study found short-term homework gains but large, delayed losses on exams: homework scores rose 18 percent and average time per assignment fell from 64 to 45 minutes, while monthly closed-book exam scores dropped by 20 percent; entrance-exam scores declined 18 to 24 percent, with the full effect appearing after about two years. Self-reported AI usage climbed from near zero to about 80 percent during the study, with big jumps coinciding with the releases of DeepSeek V2.5 in September 2024 and DeepSeek R1 in January 2025. The most popular tools were Doubao, DeepSeek, ChatGLM, Ernie Bot, and Qwen.

The pattern suggests widespread outsourcing rather than faster learning. After more than five months of AI use, about 81 percent of students finished homework in under 50 minutes and earned high homework grades but performed poorly on proctored exams. By contrast, AI users who spent similar time on homework as nonusers did not show exam declines and still earned better homework grades.

How did researchers measure the effect?

The authors used a difference-in-differences design that leverages variation in when students first adopted AI, tracking each student before and after first use and comparing that trajectory with students who had not yet adopted. The timing of first use came from self-reports, and the causal inference rests on the assumption that, absent AI, both groups would have followed similar trends. The dataset spans 30 months and covers more than 26,000 students in grades 7 to 12, with monthly measures plus Zhongkao and Gaokao entrance-exam outcomes.

Effects varied by subject and subgroup. Social science subjects fell the most, averaging a 27 percent decline; STEM subjects fell 22 percent; English 17 percent; and Chinese 9 percent. Younger secondary students lost more than older students, 24 versus 17 percent. Boys saw larger declines than girls, 21.6 versus 18.4 percent, which the study attributes primarily to heavier AI use among boys. Top performers suffered the most: the top third experienced a minus 24 percent effect compared with minus 16 percent in the bottom third. A dose-response pattern emerged: up to one hour of AI use per week corresponded to about a 5 percent loss, while five hours or more corresponded to a 30 percent loss.

Why it matters

AI is eroding the signaling value of homework. When students outsource answers, homework grades inflate while underlying knowledge falls, producing a misleading picture for teachers who typically see only one subject. The study shows why short-term experiments miss the full cost: regular exam performance fell within six months, but entrance-exam damage built up over two years. The estimated aggregate learning penalty fell from about 25 percent in early 2023 to 16 percent by June 2025, which the authors interpret as partial adaptation rather than elimination of harm.

The findings align with related evidence: an Anthropic study found participants who learned programming with AI scored 17 percent worse on follow-up tests, and a UC Berkeley analysis of 500,000 grades showed the share of top A grades in writing- and programming-heavy courses rose 13 percentage points since ChatGPT launched, concentrated on unsupervised homework. Anthropic researcher Andrej Karpathy has argued schools should "stop trying to police AI-generated homework" and shift the majority of grading to in-class work, a recommendation that matches the study's results.

What to watch

Watch whether schools change assessment practices: the study recommends giving students credible information about outsourcing costs, placing more weight on in-person exams, and tracking completion time rather than homework grades. Also monitor adoption curves tied to model releases, such as the DeepSeek V2.5 and R1 jumps in September 2024 and January 2025, and whether the partial adaptation noted through June 2025 continues or reverses as new tools emerge.

Key dates in the 30-month panel and AI adoption
  1. Early 2023
    Estimated learning penalty ~25%

    Authors estimate the learning penalty was about 25 percent in early 2023.

  2. September 2024
    DeepSeek V2.5 release

    Self-reported AI usage showed a big jump coinciding with DeepSeek V2.5.

  3. January 2025
    DeepSeek R1 release

    Another large increase in AI adoption coincided with DeepSeek R1.

  4. June 2025
    Estimated learning penalty ~16%

    Authors report the estimated learning penalty fell to about 16 percent by June 2025.

Advertisement

Written by The Brieftide · Source: The Decoder

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement