Olympiad Math Dataset: MIT releases 30,000 problems now
A free, public dataset of more than 30,000 Olympiad-level math problems from 47 countries is available for AI benchmarking and student.
TL;DR
- 01A free, public dataset of more than 30,000 Olympiad-level math problems from 47 countries is available for AI benchmarking and student.
- 02The release aims to give AI developers a harder evaluation set while providing teachers and students broader access to high-quality practice material.
- 03The dataset aggregates problems from international and national mathematics Olympiads and related contests, spanning decades of competition material.
MIT researchers have released a dataset containing more than 30,000 Olympiad-level and competition math problems drawn from contests in 47 countries, and made the collection publicly available on April 24, 2026. The release aims to give AI developers a harder evaluation set while providing teachers and students broader access to high-quality practice material.
The dataset aggregates problems from international and national mathematics Olympiads and related contests, spanning decades of competition material. The collection is searchable and organized by contest source and problem identifier, and the maintainers say it includes answers and, where available, official solutions. MIT published the dataset alongside basic documentation and scripts to download and filter problems by contest, year and topic.
What's included
The repository contains over 30,000 problem statements indexed by contest and year, drawn from competitions across 47 countries. Entries list the contest source, year, problem number and an answer key. When official solutions were available from contest organizers they were included. The dataset is intended to be machine readable and to support automated parsing, filtering and integration into training or evaluation pipelines.
Files are provided in standard text formats suitable for bulk download and programmatic use. Researchers can filter the collection by contest difficulty level, year range and topic categories such as algebra, combinatorics, geometry and number theory. The project page links to example code that demonstrates how to select subsets for testing or study, and how to convert items into formats commonly used by machine learning tools.
How researchers and students can use it
For AI teams the dataset offers a concentrated set of challenging problems that require formal reasoning, long solution chains and domain-specific knowledge. Model evaluation on this dataset will help surface weaknesses in symbolic manipulation, multi-step reasoning and rigor of produced solutions. The dataset can be used as a held-out test set or as additional supervised examples for fine-tuning models that target mathematical reasoning.
Educators and students gain a large, searchable source of past contest problems for practice and curriculum design. Teachers can assemble problem sets by topic or difficulty, while students can use curated subsets for targeted preparation. Because the collection spans a broad international set of contests, it provides exposure to different problem styles and conventions used in varied mathematics competitions.
MIT has made the dataset available via a public repository and accompanying documentation that explains the data schema and provides examples of filtering and conversion. The release encourages reuse by researchers, contest trainers and educational platforms, and the maintainers invite contributions and corrections from the community to improve coverage and metadata quality.
Why it matters
A large, standardized corpus of Olympiad-level problems raises the bar for testing mathematical reasoning in AI by gathering rare and difficult problem types in one place. The release also lowers barriers for students and teachers who previously relied on scattered archives, making higher- level competition material more accessible worldwide.
Written by The Brieftide · Source: MIT News · AI
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIOpenAI backs EU AI content transparency code
OpenAI pledged to support the European Code of Practice on AI content transparency.
PRC-linked AI influence campaigns target US tech policy debates
OpenAI says PRC-linked actors used AI-generated content and coordinated accounts to push narratives about data centers and tariffs.
LSEG adopts OpenAI to scale trusted AI across global teams
London Stock Exchange Group embedded OpenAI models across global teams, accelerating insights and shortening release cycles.
OpenAI people-first AI industrial policy and workforce plan
OpenAI proposes workforce programs, public investment, corporate governance rules and international coordination to expand AI opportunity.