Enterprise AI Adoption4 min read

ProfiLLM: DiDi's LLM pipeline boosts dispatch AUC and GMV

Agentic LLM pipeline extracts reusable profiles with 27 analytical tools and yields up to +6.14% AUC and +4.35% GMV in DiDi tests.

The Brieftide

TL;DR

  • 01Agentic LLM pipeline extracts reusable profiles with 27 analytical tools and yields up to +6.14% AUC and +4.35% GMV in DiDi tests.
  • 02ProfiLLM is an agentic LLM data pipeline deployed on DiDi's production dispatcher that turns platform-scale behavioral logs into utility-aligned user profiles for matching.
  • 03The system combines two modules and, in production tests, produced up to a +6.14% relative AUC lift, up to +4.35% GMV gain in simulation, and measured gains in a 14-day online A/B test.

ProfiLLM is an agentic LLM data pipeline deployed on DiDi's production dispatcher that turns platform-scale behavioral logs into utility-aligned user profiles for matching. The system combines two modules and, in production tests, produced up to a +6.14% relative AUC lift, up to +4.35% GMV gain in simulation, and measured gains in a 14-day online A/B test.

What is ProfiLLM and how does it work?

ProfiLLM is an agentic pipeline made of two modules: Tool-Augmented Global Knowledge Mining and Utility-Aligned Profile Exploration. The first equips an LLM agent with 27 analytical tools to mine platform-scale data and produce reusable global knowledge, adaptive user clustering rules, and region-level supply-demand priors. The second generates multiple candidate profiles per cluster, evaluates them with a lightweight downstream utility proxy, iteratively refines top candidates, and constructs preference pairs for DPO fine-tuning.

The paper frames three production constraints the pipeline addresses: raw logs exceed any LLM context window by orders of magnitude, most users sit in the long tail with too few interactions for per-user profiling, and fluent-sounding profiles do not necessarily yield downstream utility. ProfiLLM operationalizes reusable knowledge and cluster-level profiling to sidestep those limits while providing a path to utility-guided model tuning.

How much did ProfiLLM improve DiDi's dispatch metrics?

In offline and online tests, ProfiLLM produced measurable gains across multiple evaluation modes. The system achieved up to +6.14% relative AUC improvement in outcome prediction and up to +4.35% GMV gain in dispatching simulation. In a 14-day online A/B test, the deployment produced consistent improvements including +0.47% GMV, +0.33% Completion Rate, and a -0.82% Cancel-Before-Accept rate.

Those figures are the paper's reported outcomes after integrating LLM-derived, utility-aligned profiles into DiDi's production dispatcher. The evaluation combined simulation and live experimentation to show both prediction-lift (AUC) and business-impact metrics (GMV, completion, cancellation).

How does ProfiLLM handle scale, context limits and long-tail users?

ProfiLLM reduces per-request LLM context needs by mining platform-scale signals into global knowledge and cluster rules that are reusable across users and regions. The Tool-Augmented Global Knowledge Mining module supplies region-level priors and adaptive clustering rules so the system does not require full per-user histories inside an LLM prompt. Utility-Aligned Profile Exploration then works at the cluster level, producing and ranking candidate profiles using a lightweight utility proxy, which lets the pipeline focus on downstream impact rather than surface fluency.

The architecture therefore separates heavy-scale analytics from per-request profiling: the 27 analytical tools run to create global artifacts, and the profile exploration loop constructs concise, utility-tested profile candidates that are feasible to evaluate within production latency constraints.

Why it matters

ProfiLLM shows a concrete engineering path to use LLMs not for raw, large-context ingestion at inference time but as agents that mine and compress platform-scale behavioral signals into reusable semantic features. The pipeline addresses three common barriers at once: context-window limits, long-tail sparsity, and the gap between fluent descriptions and predictive utility. The result is an approach that produced both offline AUC and live business metric improvements when integrated into an industrial dispatcher.

What to watch

Watch for broader rollouts beyond the reported 14-day A/B window and for follow-up work on how DPO fine-tuning with constructed preference pairs generalizes across regions and longer horizons. Also watch whether the 27-tool agent pattern becomes a reusable template for other matching systems that need millisecond-latency semantic features.

ProfiLLM data flow between modules and production dispatcher
mine at scaleproduce reusable artifactsseed clusters & priorsgenerate & evaluate candidatesconstruct preference pairsdeploy fine-tuned model / featuresserve lightweight profilesPlatform behavioral logsTool-Augmented Global Knowledge MiningLLM agent + 27 analytical toolsGlobal artifactsreusable knowledge, clustering rules, region priorsUtility-Aligned Profile ExplorationCandidate profiles & preference pairsDPO fine-tuningDiDi production dispatcher
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement