Retrieval-Augmented Models4 min read

PPRO: Personalized Retrieval for Long-Term Conversational Memory

PPRO uses user profiles and a trained query rewriter to make memory retrieval user-aware in long-term conversational agents.

The Brieftide

TL;DR

  • 01PPRO uses user profiles and a trained query rewriter to make memory retrieval user-aware in long-term conversational agents.
  • 02PPRO builds episodic and semantic memory banks from dialogue histories, derives a user profile from accumulated memories, and uses that profile as a personalized prior when ranking retrieved evidence.
  • 03The system then trains a query rewriter using Group Relative Policy Optimization, with the memory banks and the answer model held fixed.

ZhiShu Jiang and eight coauthors published arXiv:2607.00017 (submitted 28 May 2026; revised 2 Jul 2026), proposing Profile-guided Personalized Retrieval Optimization, or PPRO, a retrieval-centric framework that makes long-term conversational memory recall user-aware and optimizable.

How does PPRO work?

PPRO builds episodic and semantic memory banks from dialogue histories, derives a user profile from accumulated memories, and uses that profile as a personalized prior when ranking retrieved evidence. The system then trains a query rewriter using Group Relative Policy Optimization, with the memory banks and the answer model held fixed. In short: dialogue histories -> episodic/semantic memory banks -> derived user profile; profile and rewritten queries guide ranking and retrieval while training optimizes retrieval for downstream answers.

PPRO explicitly separates memory storage from retrieval optimization. The paper describes two memory stores, an extracted user profile formed from those memories, a profile-guided ranking step that biases retrieval toward stable user attributes and preferences, and a retrieval-oriented query rewritter trained with reinforcement-style feedback that uses both evidence retrieval quality and downstream answer quality as rewards.

What datasets and results did the paper report?

The authors evaluated PPRO on the LoCoMo and LongMemEval-S datasets and found consistent gains over both training-free memory systems and training-based baselines. The paper reports that ablation studies show both profile-guided ranking and retrieval-oriented rewriting each contribute substantially to the observed improvements. The experiments therefore attribute performance gains to retrieval optimization rather than changes to memory storage or the answer model.

The evaluation design keeps memory banks and the answer model fixed while training only the query rewriter with Group Relative Policy Optimization, letting the authors isolate the impact of retrieval-focused changes on downstream answer quality.

Why it matters

PPRO shifts personalization work from static similarity or fixed ranking rules to an explicit, trainable retrieval layer that conditions on a derived user profile. That matters because memory usefulness depends on recalling the right evidence for the right user: by making ranking user-aware and optimizing the query-rewriter for downstream answers, the approach targets the retrieval step researchers and deployers commonly leave unoptimized. The paper frames retrieval optimization as a lever for personalization in long-term conversational agents without altering stored memories or the answer model.

This has practical implications for teams who already maintain compact memory banks: they can attempt retrieval-centric training (a query rewriter and profile-guided ranker) to improve personalized recall while keeping existing memory and answer infrastructure unchanged.

What to watch

Watch for code, data, or public implementations linked to arXiv:2607.00017 so others can reproduce PPRO's reported gains on LoCoMo and LongMemEval-S. Also watch whether follow-up work extends Group Relative Policy Optimization to more diverse user-profile signals or different memory bank designs.

Technical details and provenance: the paper is titled "Learning User-Aware Recall: Personalized Retrieval in Long-Term Conversational Memory," and lists authors ZhiShu Jiang, Haibo Liu, Xin Shen, Guanqiang QI, Chenxi Miao, Weikang Li, Liwei Qian, Xin Pei, and Jizhou Huang. The submission history on arXiv shows v1 on 28 May 2026 and a revision (v2) on 2 Jul 2026. The reported datasets are LoCoMo and LongMemEval-S, and the core training method is Group Relative Policy Optimization.

For readers evaluating retrieval choices, the paper offers a concretely described alternative: keep memory storage and answer models stable, derive a user profile from accumulated memories, and optimize retrieval behavior via a trained query rewriter and profile-guided ranker.

PPRO system components and data flow
Dialogue HistoriesEpisodic Memory BankSemantic Memory BankDerived User ProfileQuery Rewriter (trained)Profile-Guided RankerAnswer Model (fixed)
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement