Multimodal AI5 min read

ITNet: Integral transform that subsumes convolution, attention

ITNet, submitted to arXiv on 17 Jun 2026, presents a learnable kernel (an MLP) that can reproduce convolution.

The Brieftide

TL;DR

  • 01ITNet, submitted to arXiv on 17 Jun 2026, presents a learnable kernel (an MLP) that can reproduce convolution.
  • 02ITNet, introduced in a paper submitted to arXiv on 17 Jun 2026, centers on a single learnable kernel implemented as a small neural network (an MLP) that depends jointly on positions and features.
  • 03ITNet is a unified architecture built around a learnable kernel that models pairwise interactions; that kernel is implemented as an MLP and depends on both positions and features.

ITNet, introduced in a paper submitted to arXiv on 17 Jun 2026, centers on a single learnable kernel implemented as a small neural network (an MLP) that depends jointly on positions and features. The authors present ITNet as a unified operator that they say is a "learnable integral transform" and a universal approximator of continuous operators, and they report that a single ITNet matches or exceeds specialized baselines on ImageNet-1K, GLUE, ModelNet40, VQA v2 and NLVR2.

What is ITNet and how does it work?

ITNet is a unified architecture built around a learnable kernel that models pairwise interactions; that kernel is implemented as an MLP and depends on both positions and features. The paper describes practical implementation techniques—tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization—to make the integral transform computationally efficient and scalable.

The core idea replaces separate inductive biases with one parameterized operator. The kernel models pairwise interactions so the model can adapt behavior from data rather than hard-wiring locality, sequential memory, or content-dependent pairwise interaction. To scale ITNet the authors introduce tiled kernel fusion to combine computations across tiles, importance-weighted Monte Carlo integration to estimate the integral efficiently, and learned low-rank factorization to reduce the parameter and compute footprint.

How does ITNet subsume convolution, attention and recurrence?

Convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases of the ITNet operator under appropriate parameterizations, the paper states. The authors claim that by choosing kernel parameter settings and factorization strategies, ITNet can recover the mathematical forms of those architectures, meaning one learned interaction mechanism can reproduce the behaviors of the three architectural families from data.

The paper also frames ITNet as a universal approximator of continuous operators, positioning the approach as a mathematically general class that contains those existing mechanisms. The authors trained a single ITNet architecture with a shared operator and lightweight modality-specific encoders, and they report it matches or exceeds specialized baselines across multiple benchmarks, specifically ImageNet-1K, GLUE, ModelNet40, VQA v2 and NLVR2.

Why it matters

Unifying convolution, attention and recurrence into a single learnable operator reduces architectural fragmentation: instead of designing separate blocks for locality, sequence memory, or pairwise content-dependent interactions, one mechanism can adapt to each role from data. That matters for model design and for research into what inductive biases are necessary versus what can be learned. The paper includes concrete engineering steps for efficiency—tiled kernel fusion, importance-weighted Monte Carlo integration, learned low-rank factorization—which address the usual scalability objections to integral-operator approaches.

What to watch

Watch for the paper's linked code and demos: the arXiv entry lists toggles for Links to Code and Demos including Hugging Face and Replicate, which would make replication and performance checks possible. Also note the arXiv page records an issued DOI via DataCite pending registration; the DOI registration and public code will be concrete signals that others can benchmark ITNet against the existing convolution-, attention- and recurrence-based models.

ITNet architecture components and data flow
Modality-specific encodersLearnable kernel (MLP) (dependent on positions & features)Tiled kernel fusionImportance-weighted Monte Carlo integrationLearned low-rank factorizationTask-specific decoders / classifiers
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement