June 26, 20264 min read

PMDformer: Patch-Mean Transformer for Long-Term Forecasting

PMDformer decouples patch means and adds Trend Restoration Attention and Proximal Variable Attention to improve long-term forecasting.

The BrieftideJune 26, 2026

TL;DR

01PMDformer decouples patch means and adds Trend Restoration Attention and Proximal Variable Attention to improve long-term forecasting.
02Ao Hu and nine coauthors submitted PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting to arXiv on 25 June 2026 (arXiv:2606.26549).
03PMDformer is a transformer-style model tailored to long-term time series forecasting, introduced in the arXiv submission by Ao Hu et al. on 25 June 2026.

Ao Hu and nine coauthors submitted PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting to arXiv on 25 June 2026 (arXiv:2606.26549). The paper proposes patch-mean decoupling (PMD) plus two attention modules, Trend Restoration Attention (TRA) and Proximal Variable Attention (PVA), and reports that PMDformer outperforms state-of-the-art methods across multiple long-term time series forecasting benchmarks.

What is PMDformer?

PMDformer is a transformer-style model tailored to long-term time series forecasting, introduced in the arXiv submission by Ao Hu et al. on 25 June 2026. It combines a patch-based preprocessing step with two bespoke attention modules to separate and then recombine trend and shape information so attention focuses on true shape similarity across long sequences.

The paper frames long-term time series forecasting as a domain where patch-based strategies help capture long-range dependencies but struggle with scale differences across patches and variables. PMDformer addresses that by isolating mean (trend) information from residual shapes inside patches so the model's attention better matches similar shapes rather than being dominated by scale.

How does it work?

PMDformer first applies patch-mean decoupling, which the authors describe as a process that "separates the trend and residual shape information by subtracting the mean of each patch." After decoupling, the model routes information into two complementary attention mechanisms: Trend Restoration Attention and Proximal Variable Attention.

Trend Restoration Attention, or TRA, reintegrates the decoupled trend while computing attention outputs so the model retains the overall level of the series alongside shape-focused attention. Proximal Variable Attention, or PVA, narrows cross-variable attention to the most relevant recent time segments, a design intended to avoid overfitting to outdated correlations. Together these components let PMDformer preserve original patch structure while guiding attention to meaningful shape similarities and recent inter-variable relationships.

The submission PDF for the paper is 568 KB in the arXiv record, and the authors make the code available via a linked URL in the abstract.

What did the experiments show?

The authors state that extensive experiments demonstrate PMDformer outperforms existing state-of-the-art methods in stability and accuracy across multiple LTSF benchmarks. The paper does not list specific benchmark names or numeric scores in the abstract, but highlights the comparative claim of improved stability and accuracy as the core empirical result.

By focusing attention on residual shapes and constraining cross-variable links to recent segments, PMDformer aims to reduce errors that come from scale mismatches and spurious long-ago correlations in multivariate forecasting tasks.

Why it matters

PMDformer targets two persistent weaknesses in transformer-based forecasting: scale differences across patches that mask shape similarity, and cross-variable attention that can latch onto stale correlations. If the model's approach to separating and then restoring trend information reliably improves attention fidelity, practitioners in energy management, finance, and traffic prediction could see more stable and accurate long-horizon forecasts. The design choices also speak directly to common operational concerns: preserving original signal structure and avoiding overfitting to outdated relationships.

What to watch

Watch for community evaluations using the authors' provided code link and for the arXiv-issued DOI registration noted in the paper's metadata. Replication across public LTSF benchmark suites and independent comparisons of stability and accuracy claims will be the clearest signals that PMDformer delivers practical gains beyond the paper's experiments.

References and provenance: PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting, submitted 25 Jun 2026 to arXiv as arXiv:2606.26549 by Ao Hu, Liangjian Wen, Jiang Duan, Yong Dai, He Yan, Dongkai Wang, Jun Wang, Yukun Zhang, Ruoxi Jiang, and Zenglin Xu. The paper's abstract and metadata state the code is available at a linked URL and that a DataCite DOI is pending registration.

PMDformer component flow

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Browse the feed

The BrieftideDAILY BRIEF

Fixed-Point Reasoners: FPRM looped Transformers paper (2026)

FPRM applies fixed-point convergence as an end-to-end halting mechanism, adds pre-norm layers and residual scaling.

The BrieftideDAILY BRIEF

ModSync for Generalized PINNs: prevents capacity failures

Modular-Sparsity Synchronization (ModSync) stops overparameterized PINNs from self-partitioning into task-exclusive modules and restores.

The BrieftideDAILY BRIEF

E3RL: 4B and 8B models beat AIME SOTA by 5.349%/6.514%

E^3RL uses dynamic epistemic entropy and erasable reinforcement learning on DeepMath-103k to raise AIME scores for 4B and 8B models.

The BrieftideDAILY BRIEF

FedEPD: Federated Long-Tailed Graph Learning, 4.97% gain

FedEPD, submitted to arXiv on 23 Jun 2026, uses Dirichlet energy pruning and prototype injection to boost minority accuracy by up to 5.48%.