Reasoning Verification4 min read

Semi-CoT: Semi-supervised Chain-of-Thought Learning Study

Semi-CoT reuses unlabeled questions to create pseudo-CoTs; an entropy gate picks low-entropy chains.

The Brieftide

TL;DR

  • 01Semi-CoT reuses unlabeled questions to create pseudo-CoTs; an entropy gate picks low-entropy chains.
  • 02Semi-CoT, a semi-supervised Chain-of-Thought learning framework by Hongyang He, Jiuming Liu and Victor Sanchez, was submitted on 1 Jul 2026.
  • 03The framework treats chain-of-thought traces not merely as inference-time prompts but as semi-supervised signals, extending a self-training view of CoT into pseudo-supervision.

Semi-CoT, a semi-supervised Chain-of-Thought learning framework by Hongyang He, Jiuming Liu and Victor Sanchez, was submitted on 1 Jul 2026. The method samples multiple pseudo reasoning chains for each unlabeled question, estimates answer-level semantic entropy, and selects low-entropy chains as pseudo-CoT demonstrations for training students.

How does Semi-CoT work?

Semi-CoT constructs pseudo reasoning supervision from unlabeled questions by sampling multiple pseudo-CoTs, computing an answer-level semantic entropy, and keeping low-entropy chains as reliable demonstrations. The framework treats chain-of-thought traces not merely as inference-time prompts but as semi-supervised signals, extending a self-training view of CoT into pseudo-supervision.

The pipeline is simple: for each unlabeled question Semi-CoT generates multiple candidate reasoning chains, measures the semantic entropy at the answer level to estimate consensus, and selects those chains that fall below an entropy gate as pseudo-CoTs. Those selected chains are then reused as demonstrations for student models.

How well does Semi-CoT perform on benchmarks?

Pilot experiments on four benchmarks produced mixed results: pseudo-answer precision ranged from 91.36% to 100%, SVAMP and GSM8K saw small gains, AQuA experienced negative transfer, and MultiArith hit a ceiling. The authors report the pseudo-answer precision range explicitly as 91.36% to 100% across their experiments.

The paper lists AQuA, SVAMP, GSM8K and MultiArith as the evaluation suites. On SVAMP and GSM8K Semi-CoT yielded modest improvements, suggesting some benefit from the added pseudo-supervision. By contrast, AQuA showed negative transfer, meaning performance declined when using the selected pseudo-CoTs, and MultiArith reached a ceiling where Semi-CoT did not improve results further.

Why it matters

Semi-CoT demonstrates that unlabeled questions can supply high-precision pseudo reasoning signals, with selected pseudo-CoTs achieving precision between 91.36% and 100%. That matters because chain-of-thought is typically used only as an inference-time prompt; reusing generated chains as training supervision could reduce the need for expensive human-annotated reasoning traces.

At the same time the mixed benchmark outcomes underline limits: reliable selection and effective student training remain necessary. The authors note that while the entropy gate finds high-precision pseudo-CoTs, translating those signals into consistent across-the-board gains requires stronger demonstration selection or improvements in how students are trained on pseudo-supervision.

What to watch

Follow-up work that delivers stronger demonstration selection methods or revised student training regimes. The paper flags those two levers explicitly as the paths needed to make unlabeled-question pseudo-supervision broadly effective.

The authors and technical report provide a concise proof of concept: unlabeled questions can be a source of pseudo-CoTs, but converting high pseudo-answer precision into consistent task gains is the next technical milestone to check.

Pilot experiment outcomes by dataset
Item
AQuAnegative transfer91.36%–100% (overall range reported)
SVAMPsmall gains91.36%–100% (overall range reported)
GSM8Ksmall gains91.36%–100% (overall range reported)
MultiArithreached a ceiling91.36%–100% (overall range reported)
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement