Coding AgentsJune 13, 20264 min readvia The Decoder

SkillOpt: Microsoft boosts GPT-5.5 with trained Markdown

Microsoft and three Chinese universities trained a Markdown instruction file to tune model behavior and report consistent gains on GPT-5.5.

The Brieftide

June 13, 2026

TL;DR

01Microsoft and three Chinese universities trained a Markdown instruction file to tune model behavior and report consistent gains on GPT-5.5.
02Microsoft and three Chinese universities unveiled SkillOpt, a method that optimizes instruction documents to change large language model behavior without altering model weights.
03SkillOpt reframes the instruction or system prompt as a trainable document rather than a fixed text string.

Microsoft and three Chinese universities unveiled SkillOpt, a method that optimizes instruction documents to change large language model behavior without altering model weights. The team applied SkillOpt to GPT-5.5 and demonstrated consistent improvements on multiple instruction-following and agent-style evaluations by training a single Markdown file used as the model's instruction artifact.

SkillOpt reframes the instruction or system prompt as a trainable document rather than a fixed text string. The method treats a structured Markdown file as the object to be optimized: sections, headings, examples and procedural steps are formalized and adjusted through an automated optimization loop until the desired behaviors emerge. Because the system modifies the instruction document rather than model parameters, the same trained Markdown can be deployed against a vanilla GPT-5.5 instance at runtime.

How SkillOpt works

The core idea is to parameterize a human-readable instruction document and search its space for configurations that produce better outputs. The team starts with a base Markdown instruction that contains explicit task definitions, role framing, and sample interactions. An optimizer then proposes edits to document elements and evaluates the resulting outputs from the target model on a collection of labeled or proxy tasks. Feedback from those evaluations guides further edits.

The approach supports both automated edit proposals and human-in-the-loop adjustments. In experiments, the researchers used batch query evaluations of GPT-5.5 to score candidate Markdown variants on instruction-following fidelity, task success and safety constraints. The final artifact is a trained Markdown file that a developer can include in the model context as the instruction layer, producing improved behavior without fine-tuning model weights.

Because SkillOpt operates at the instruction level, it preserves model integrity and reduces the need for expensive retraining cycles. It also keeps prompts in a human-auditable format, which can be reviewed and edited after optimization. The technique does not require altering the underlying model checkpoints, but it does rely on repeated model queries during the optimization phase.

Benchmark results and limitations

The team reports that applying SkillOpt to GPT-5.5 yielded consistent gains across several internal benchmarks used to measure instruction adherence, multi-step problem solving, and agent-oriented tasks such as web retrieval and tool use. Improvements were most pronounced on tasks that benefit from clearer task decomposition and role framing within the instruction document.

Limitations remain. The optimization process requires many model queries, which can be costly for large models. Gains may also depend on the quality and representativeness of the evaluation tasks used during optimization. The trained Markdown is task specific: a document tuned for one class of tasks may not transfer well to unrelated tasks without further optimization. There are also open questions about whether optimizing for proxy metrics can inadvertently encourage surface-level fixes that do not generalize to real-world user queries.

The research partners named in the release include Microsoft and researchers from Tsinghua University, Peking University and Zhejiang University. The team released examples of trained Markdown files and described workflows for both automated and human-guided refinement, enabling other developers to evaluate SkillOpt-style instruction optimization on their own tasks.

Why it matters

SkillOpt signals a shift toward treating instructions as first-class, trainable artifacts that can be improved without touching model weights, lowering the entry cost to customize large models. That matters for teams that need safer or more reliable behavior quickly, and for auditors who want readable artifacts to inspect changes in model behavior. It also raises practical trade-offs: optimization can be cheaper than retraining but still requires substantial inference budget and careful validation to avoid overfitting to proxy metrics.

GPT-5.5 baseline versus GPT-5.5 with SkillOpt (summary)

Item
Instruction-following score	Baseline	+6 to +12 points (relative improvement reported)
Agent task success rate	Baseline	+5 to +15 percentage points (task dependent)
Safety violations	Observed at baseline rate	Reduced in many test cases, not eliminated
Transfer to unrelated tasks	Limited	Requires re-optimization or tuning

Primary source

The Decoder

the-decoder.com

Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeNo adsNo trackingUnsubscribe in one click