Reasoning Verification5 min read

Data-driven ML and GPT-5: arXiv finds limits for symbolic logic

An arXiv paper by Tiansi Dong, Mateja Jamnik and Pietro Liò argues supervised deep learning cannot reach symbolic-level syllogistic.

The Brieftide

TL;DR

  • 01An arXiv paper by Tiansi Dong, Mateja Jamnik and Pietro Liò argues supervised deep learning cannot reach symbolic-level syllogistic.
  • 02The paper combines theoretical analysis and experiments to identify methodological limits that, the authors say, block scaling data and compute into formal logical competence.
  • 03The authors frame these as inherent constraints on any supervised training process that relies on empirical examples and end-to-end target signals.

Tiansi Dong, Mateja Jamnik and Pietro Liò posted an arXiv paper (arXiv:2606.26454) on 24 June 2026 arguing that supervised, data-driven machine learning cannot reach the rigour of symbolic-level syllogistic reasoning. The paper combines theoretical analysis and experiments to identify methodological limits that, the authors say, block scaling data and compute into formal logical competence.

What do the authors claim?

The paper identifies two specific methodological limitations preventing supervised deep learning from matching symbolic syllogistic reasoning: first, "training data can not distinguish all 24 types of valid syllogistic reasoning"; second, end-to-end mapping from premises to conclusion creates contradictory training targets between neural components for pattern recognition and logical reasoning. The authors frame these as inherent constraints on any supervised training process that relies on empirical examples and end-to-end target signals.

The paper contrasts data-driven methods with an alternative: sphere neural networks, which the authors say have "achieved symbolic level syllogistic reasoning without training data." That contrast structures their central question about the limit of the scaling law for logical reasoning, namely whether simply increasing training data and training time can close the gap.

How did the authors test models and claims?

The paper uses both theoretical arguments and experiments: it shows limitations in training data design and then empirically illustrates failures. Experimentally, the authors show that Euler Net cannot achieve rigorous syllogistic reasoning. They also challenged two recent ChatGPT variants, GPT-5-nano and GPT-5, to determine satisfiability of syllogistic statements presented in four surface forms: words, double words, simple symbols, and long random symbols.

Those experiments produced two concrete findings reported in the abstract: surface forms affect reasoning performance, and GPT-5 "may reach 100% accuracy but still provide incorrect explanations." The authors emphasize that empirical training processes are typically stopped after achieving 100% accuracy, and they use that practice to argue supervised systems can hide incorrect internal reasoning despite perfect-looking accuracy on training targets.

Why it matters

If the papers arguments hold, they limit claims that scaling data and compute alone will yield the formal rigour required for symbolic logic tasks. Systems that achieve surface-level correctness while encoding contradictory training targets risk delivering confident but incorrect explanations, which matters for any application that needs verifiable, formal reasoning rather than pattern-matching correctness. The contrast with sphere neural networks, which the authors say solved syllogistic reasoning without training data, points to a possible role for symbolic or hybrid architectures rather than pure supervised scaling.

What to watch

Watch for follow-up work that examines the two methodological claims in fuller detail: whether training sets can be constructed to disambiguate all 24 syllogistic types, and whether training regimes can avoid the reported contradictory targets between perception and reasoning components. Also track reproducibility tests of the papers experiments with Euler Net and independent evaluations of GPT-5s explanations when it attains 100% accuracy.

Models and reported syllogistic outcomes from the paper
Item
Sphere neural networksAchieved symbolic-level syllogistic reasoningWithout training data
Supervised deep learning (general)Cannot reach symbolic-level syllogistic reasoning"training data can not distinguish all 24 types of valid syllogistic reasoning"
Euler NetCannot achieve rigorous syllogistic reasoningExperimentally illustrated in the paper
ChatGPT GPT-5Surface performance can reach 100% accuracyMay provide incorrect explanations despite 100% accuracy
ChatGPT GPT-5-nanoChallenged on syllogistic satisfiabilityTested across four surface forms: words, double words, simple symbols, long random symbols
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement