Retrieval-Augmented ModelsJuly 2, 20264 min read

PPRO: Personalized Retrieval for Long-Term Conversational Memory

PPRO uses user profiles and a trained query rewriter to make memory retrieval user-aware in long-term conversational agents.

The BrieftideJuly 2, 2026

TL;DR

01PPRO uses user profiles and a trained query rewriter to make memory retrieval user-aware in long-term conversational agents.
02PPRO builds episodic and semantic memory banks from dialogue histories, derives a user profile from accumulated memories, and uses that profile as a personalized prior when ranking retrieved evidence.
03The system then trains a query rewriter using Group Relative Policy Optimization, with the memory banks and the answer model held fixed.

ZhiShu Jiang and eight coauthors published arXiv:2607.00017 (submitted 28 May 2026; revised 2 Jul 2026), proposing Profile-guided Personalized Retrieval Optimization, or PPRO, a retrieval-centric framework that makes long-term conversational memory recall user-aware and optimizable.

How does PPRO work?

PPRO builds episodic and semantic memory banks from dialogue histories, derives a user profile from accumulated memories, and uses that profile as a personalized prior when ranking retrieved evidence. The system then trains a query rewriter using Group Relative Policy Optimization, with the memory banks and the answer model held fixed. In short: dialogue histories -> episodic/semantic memory banks -> derived user profile; profile and rewritten queries guide ranking and retrieval while training optimizes retrieval for downstream answers.

PPRO explicitly separates memory storage from retrieval optimization. The paper describes two memory stores, an extracted user profile formed from those memories, a profile-guided ranking step that biases retrieval toward stable user attributes and preferences, and a retrieval-oriented query rewritter trained with reinforcement-style feedback that uses both evidence retrieval quality and downstream answer quality as rewards.

What datasets and results did the paper report?

The authors evaluated PPRO on the LoCoMo and LongMemEval-S datasets and found consistent gains over both training-free memory systems and training-based baselines. The paper reports that ablation studies show both profile-guided ranking and retrieval-oriented rewriting each contribute substantially to the observed improvements. The experiments therefore attribute performance gains to retrieval optimization rather than changes to memory storage or the answer model.

The evaluation design keeps memory banks and the answer model fixed while training only the query rewriter with Group Relative Policy Optimization, letting the authors isolate the impact of retrieval-focused changes on downstream answer quality.

Why it matters

PPRO shifts personalization work from static similarity or fixed ranking rules to an explicit, trainable retrieval layer that conditions on a derived user profile. That matters because memory usefulness depends on recalling the right evidence for the right user: by making ranking user-aware and optimizing the query-rewriter for downstream answers, the approach targets the retrieval step researchers and deployers commonly leave unoptimized. The paper frames retrieval optimization as a lever for personalization in long-term conversational agents without altering stored memories or the answer model.

This has practical implications for teams who already maintain compact memory banks: they can attempt retrieval-centric training (a query rewriter and profile-guided ranker) to improve personalized recall while keeping existing memory and answer infrastructure unchanged.

What to watch

Watch for code, data, or public implementations linked to arXiv:2607.00017 so others can reproduce PPRO's reported gains on LoCoMo and LongMemEval-S. Also watch whether follow-up work extends Group Relative Policy Optimization to more diverse user-profile signals or different memory bank designs.

Technical details and provenance: the paper is titled "Learning User-Aware Recall: Personalized Retrieval in Long-Term Conversational Memory," and lists authors ZhiShu Jiang, Haibo Liu, Xin Shen, Guanqiang QI, Chenxi Miao, Weikang Li, Liwei Qian, Xin Pei, and Jizhou Huang. The submission history on arXiv shows v1 on 28 May 2026 and a revision (v2) on 2 Jul 2026. The reported datasets are LoCoMo and LongMemEval-S, and the core training method is Group Relative Policy Optimization.

For readers evaluating retrieval choices, the paper offers a concretely described alternative: keep memory storage and answer models stable, derive a user profile from accumulated memories, and optimize retrieval behavior via a trained query rewriter and profile-guided ranker.

PPRO system components and data flow

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

InduceKV for Multimodal LLMs: Fixed-Footprint Continual Adaptation

InduceKV externalizes task updates as frozen retrieval keys plus compact layerwise KV payloads.

The BrieftideDAILY BRIEF

Retrieval-Grounded Formal Concept Analysis: Verifiable Knowledge

Yujin Yang and Heejung Lee present a retrieval-augmented SLM using formal concept analysis and oracle checks.

The BrieftideDAILY BRIEF

Hidden Forgetting in MLLMs: RCL reduces evidence drift

A replay-free reliance-constrained continual learning (RCL) method preserves answers while cutting modality reliance drift and hidden.

The BrieftideDAILY BRIEF

A-TMA improves ghost-memory benchmarks: LTP + LoCoMo gains

A-TMA overlays long-term agent memories to label current, historical and transition facts, improving conflict accuracy by 0.240 on LTP.