Reasoning VerificationJuly 3, 20264 min read

Semi-CoT: Semi-supervised Chain-of-Thought Learning Study

Semi-CoT reuses unlabeled questions to create pseudo-CoTs; an entropy gate picks low-entropy chains.

The BrieftideJuly 3, 2026

TL;DR

01Semi-CoT reuses unlabeled questions to create pseudo-CoTs; an entropy gate picks low-entropy chains.
02Semi-CoT, a semi-supervised Chain-of-Thought learning framework by Hongyang He, Jiuming Liu and Victor Sanchez, was submitted on 1 Jul 2026.
03The framework treats chain-of-thought traces not merely as inference-time prompts but as semi-supervised signals, extending a self-training view of CoT into pseudo-supervision.

Semi-CoT, a semi-supervised Chain-of-Thought learning framework by Hongyang He, Jiuming Liu and Victor Sanchez, was submitted on 1 Jul 2026. The method samples multiple pseudo reasoning chains for each unlabeled question, estimates answer-level semantic entropy, and selects low-entropy chains as pseudo-CoT demonstrations for training students.

How does Semi-CoT work?

Semi-CoT constructs pseudo reasoning supervision from unlabeled questions by sampling multiple pseudo-CoTs, computing an answer-level semantic entropy, and keeping low-entropy chains as reliable demonstrations. The framework treats chain-of-thought traces not merely as inference-time prompts but as semi-supervised signals, extending a self-training view of CoT into pseudo-supervision.

The pipeline is simple: for each unlabeled question Semi-CoT generates multiple candidate reasoning chains, measures the semantic entropy at the answer level to estimate consensus, and selects those chains that fall below an entropy gate as pseudo-CoTs. Those selected chains are then reused as demonstrations for student models.

How well does Semi-CoT perform on benchmarks?

Pilot experiments on four benchmarks produced mixed results: pseudo-answer precision ranged from 91.36% to 100%, SVAMP and GSM8K saw small gains, AQuA experienced negative transfer, and MultiArith hit a ceiling. The authors report the pseudo-answer precision range explicitly as 91.36% to 100% across their experiments.

The paper lists AQuA, SVAMP, GSM8K and MultiArith as the evaluation suites. On SVAMP and GSM8K Semi-CoT yielded modest improvements, suggesting some benefit from the added pseudo-supervision. By contrast, AQuA showed negative transfer, meaning performance declined when using the selected pseudo-CoTs, and MultiArith reached a ceiling where Semi-CoT did not improve results further.

Why it matters

Semi-CoT demonstrates that unlabeled questions can supply high-precision pseudo reasoning signals, with selected pseudo-CoTs achieving precision between 91.36% and 100%. That matters because chain-of-thought is typically used only as an inference-time prompt; reusing generated chains as training supervision could reduce the need for expensive human-annotated reasoning traces.

At the same time the mixed benchmark outcomes underline limits: reliable selection and effective student training remain necessary. The authors note that while the entropy gate finds high-precision pseudo-CoTs, translating those signals into consistent across-the-board gains requires stronger demonstration selection or improvements in how students are trained on pseudo-supervision.

What to watch

Follow-up work that delivers stronger demonstration selection methods or revised student training regimes. The paper flags those two levers explicitly as the paths needed to make unlabeled-question pseudo-supervision broadly effective.

The authors and technical report provide a concise proof of concept: unlabeled questions can be a source of pseudo-CoTs, but converting high pseudo-answer precision into consistent task gains is the next technical milestone to check.

Pilot experiment outcomes by dataset

Item
AQuA	negative transfer	91.36%–100% (overall range reported)
SVAMP	small gains	91.36%–100% (overall range reported)
GSM8K	small gains	91.36%–100% (overall range reported)
MultiArith	reached a ceiling	91.36%–100% (overall range reported)

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Retrieval-Grounded Formal Concept Analysis: Verifiable Knowledge

Yujin Yang and Heejung Lee present a retrieval-augmented SLM using formal concept analysis and oracle checks.

The BrieftideDAILY BRIEF

Data-driven ML and GPT-5: arXiv finds limits for symbolic logic

An arXiv paper by Tiansi Dong, Mateja Jamnik and Pietro Liò argues supervised deep learning cannot reach symbolic-level syllogistic.

The BrieftideDAILY BRIEF

Governing Actions, Not Agents: Institutional Attestation Model

Jakob Salfeld-Nebgen formalises a governance model where agents plan but execution of high-risk acts requires independent.

The BrieftideDAILY BRIEF

Verification Horizon: No Silver Bullet for Coding Agent Rewards

An arXiv paper argues verification, not generation, is the harder problem for coding agents and that verification must co-evolve with.