Multimodal AI4 min read

Generative Retrieval MO-DiT+HPPO: arXiv paper and results

MO-DiT+HPPO pairs a Diffusion Transformer with metric-ordered sequence training and hybrid-policy preference optimization for.

The Brieftide

TL;DR

  • 01MO-DiT+HPPO pairs a Diffusion Transformer with metric-ordered sequence training and hybrid-policy preference optimization for.
  • 02The work targets what the authors call "pattern-preserving attribute retrieval," where returned items must both satisfy a target attribute and stay within a fine-grained seed-seed) pattern.
  • 03MO-DiT+HPPO is a staged continuous generative retrieval pipeline that reads sequences of item embeddings and generates query embeddings for nearest-neighbor search.

Chenghao Liu and 10 co-authors posted a paper to arXiv on 25 Jun 2026 (arXiv:2606.26899) that introduces MO-DiT+HPPO, a staged framework for continuous generative retrieval built around a Diffusion Transformer and a hybrid preference-optimization procedure. The work targets what the authors call "pattern-preserving attribute retrieval," where returned items must both satisfy a target attribute and stay within a fine-grained seed pattern.

What is MO-DiT+HPPO and how does it work?

MO-DiT+HPPO is a staged continuous generative retrieval pipeline that reads sequences of item embeddings and generates query embeddings for nearest-neighbor search. The framework includes raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and a final Hybrid-Policy Preference Optimization (HPPO) stage.

Metric-ordered training converts sparse online retrieval labels into in-pattern trajectories ordered from low to high predicted attribute density, teaching a single model the metric-improvement direction across domains. HPPO aligns the generated query distribution with the online objective by labeling a hybrid candidate pool with the online intersection metric and applying reference-anchored preference optimization. A Pareto pair filter keeps only winner pairs that do not lower same-pattern purity, aiming to raise the attribute metric without sacrificing pattern fidelity.

How did the paper evaluate performance and what were the results?

The authors evaluated MO-DiT+HPPO across four attribute domains under item- and pattern-holdout protocols and measured improvement in the intersection metric. Metric-ordered DiT improved the intersection metric over a pretrained generative retriever, and HPPO improved it further, producing significant gains on seven of eight domain-split cells and a marginal tie on the hardest split.

The paper also reports ablations and validations to trace the source of gains: metric-predictor validation, order ablations, CPT/SFT comparisons, and a candidate-policy ablation. Those experiments, the authors say, show where the improvements come from within the staged training and HPPO pipeline.

Why does this matter?

Pattern-preserving attribute retrieval describes a common production need where naive averaging or global attribute search fails: averaging seeds preserves pattern but yields low attribute scores, while global attribute retrieval drifts to unrelated patterns. MO-DiT+HPPO directly addresses the two-way tension by training a generative retriever to move along in-pattern trajectories toward higher attribute density and then aligning generation with the actual online metric. If reproduced and adopted, that approach could change how systems balance pattern fidelity against attribute targeting in recommendation and retrieval settings.

What to watch next?

Look for code and data releases linked from the arXiv entry or the authors' pages, plus replication on external datasets and live A/B evaluations that measure the online intersection metric. The paper lists several internal ablations; confirming those in independent implementations will show whether the seven-of-eight experimental improvements generalize beyond the reported domains.

Paper and provenance: "Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization," Chenghao Liu, Yu Zhang, Zhongtao Jiang, Kun Xu, Zhenwei An, Renzhi Wang, Zhao Wang, Jiachen Zhang, Yuxiao Zhang, Kun Xu, Songfang Huang, arXiv:2606.26899, submitted 25 Jun 2026. DOI: https://doi.org/10.48550/arXiv.2606.26899.

MO-DiT+HPPO training and optimization stages
  1. 01

    Raw-sequence pretraining

    Initial training on raw item-embedding sequences to teach a generative retriever to read and produce embeddings.

  2. 02

    Multi-domain metric-ordered continuation pretraining

    Convert sparse retrieval labels into in-pattern trajectories ordered from low to high predicted attribute density.

  3. 03

    Tail-centroid fine-tuning

    Fine-tune the model to focus on tail-centroid representations within the target pattern.

  4. 04

    Hybrid-Policy Preference Optimization (HPPO)

    Label a hybrid candidate pool with the online intersection metric, apply reference-anchored preference optimization, and filter with a Pareto pair filter to preserve same-pattern purity.

Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement