Learning to Prompt: Adaptive LLM High-School Tutoring
Subject-aware prompting uses 14 pedagogical features and a prompt router to improve efficiency and increase conversion in a.
TL;DR
- 01Subject-aware prompting uses 14 pedagogical features and a prompt router to improve efficiency and increase conversion in a.
- 02The authors train the router in simulation and deploy it live with high-school students, comparing it to two static-prompt baselines.
- 03The system uses the router to select from discrete prompting strategies rather than relying on a single static prompt.
Learning to Prompt, by Po-Chin Chang, Nicholas Hogan, Aske Plaat and Michiel T. van der Meer (submitted 18 Jun 2026), builds a subject-aware prompt routing system that adapts LLM tutoring strategies to different disciplines using 14 pedagogical features extracted from raw transcripts. The authors train the router in simulation and deploy it live with high-school students, comparing it to two static-prompt baselines.
What did the authors build and how does it work?
The paper implements subject-aware prompting driven by a prompt routing model that uses 14 pedagogical features, for example tutor scaffolding and student understanding, extracted from raw transcripts. The router is first trained inside a simulation environment and then used for online adaptation with real students, switching among learning strategies such as analytical and scaffolding approaches based on the extracted features.
The system uses the router to select from discrete prompting strategies rather than relying on a single static prompt. That selection happens per-conversation, enabling the model to adapt strategies as the interaction unfolds. The authors present both a greedy router and a stochastic router that samples strategies.
How well did it perform in simulation and with students?
In simulation the prompt router outperformed two static baselines on the paper's benchmark: 0.694 for the router versus 0.647 and 0.64 for the two baselines, with p < 0.001. In live A/B testing, the deployment collected N=656 conversations from 359 students and demonstrated sim-to-real transfer, including strategy shifts from analytical to scaffolding.
Online, the adaptive prompt selection improved instructional efficiency while maintaining pedagogical quality and reduced interactions by around 3 turns (p = 0.007). Exercise conversion rates differed by routing policy: a greedy router produced a conversion rate comparable to the baseline (19.1% vs 19.6%), while a stochastic router that samples strategies achieved a higher conversion rate (28.1%).
Why it matters
The results separate two common failure modes for static-prompt tutoring: one-size-fits-all prompts that cannot adapt across disciplines and rigid tactics that ignore student state. Training a router in simulation and transferring it to live students allowed the authors to change strategy mid-conversation, yielding measurable gains in conversion and efficiency. Those gains matter to anyone deploying LLM tutoring at scale because they point to a practical mechanism for personalization that does not require retraining the underlying language model.
What to watch
Look for follow-up work that specifies which of the 14 pedagogical features drive the largest gains and how sensitive the sim-to-real transfer is to simulation quality. The two routing policies diverged on conversion: the stochastic router reached 28.1% conversion, markedly above the baseline 19.6% and the greedy 19.1%, so replication and analysis of when sampling helps will be the next concrete milestone.
References and key facts drawn from the authors' submission: the paper title "Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring," submission date 18 Jun 2026, simulation benchmark scores (0.694 vs 0.647 and 0.64, p < 0.001), A/B test scale (N = 656 conversations from 359 students), interaction reduction (around 3 turns, p = 0.007), and conversion rates (baseline 19.6%, greedy router 19.1%, stochastic router 28.1%).
| Item | |||||
|---|---|---|---|---|---|
| Simulation benchmark score | 65 | 64 | 69 | ||
| Exercise conversion rate | 19.6% | 19.1% | 28.1% | ||
| Interaction change (turns) | reduced by ~3 turns (p=0.007) | reduced by ~3 turns (p=0.007) | |||
| A/B test sample | N=656 conversations, 359 students |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsAdobe creative agents arrive in Photoshop, Premiere, and more
Firefly-powered AI assistants automate multi-step production tasks across Creative Cloud and plug into ChatGPT, Claude.
CODA-BENCH benchmark: testing code agents on data tasks
CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.
SWE-Explore: benchmark shows AI coding agents miss key lines
SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.
OpenAI acquires Ona to add persistent agents to Codex
The deal brings Ona's cloud development environments into Codex so agents can continue tasks for hours or days in customers' clouds.