AI Infrastructure4 min readvia Google DeepMind

DeepMind AGI framework launch and Kaggle hackathon 2026

DeepMind released a cognitive framework to measure progress toward AGI and opened a Kaggle competition to crowdsource evaluation tasks and.

The Brieftide

TL;DR

  • 01DeepMind released a cognitive framework to measure progress toward AGI and opened a Kaggle competition to crowdsource evaluation tasks and.
  • 02DeepMind published a cognitive framework to measure progress toward artificial general intelligence, and opened a Kaggle hackathon inviting the community to develop evaluation tasks and datasets.
  • 03The framework maps cognitive abilities to concrete test designs and proposes templates and metrics intended for shared benchmark construction.

DeepMind published a cognitive framework to measure progress toward artificial general intelligence, and opened a Kaggle hackathon inviting the community to develop evaluation tasks and datasets. The framework maps cognitive abilities to concrete test designs and proposes templates and metrics intended for shared benchmark construction.

The framework organizes capabilities into cognitive domains and recommends evaluation axes beyond single-task accuracy, such as sample efficiency, generalization, robustness, and learning speed. DeepMind positions the framework as a tool for researchers and benchmark builders, and the Kaggle hackathon is presented as an immediate mechanism to collect community-built tasks and standardized evaluation code.

What the framework covers

The framework breaks down intelligence into interpretable cognitive faculties, and pairs each faculty with suggested task types and measurement strategies. Domains identified include perception, memory, planning, abstract reasoning, social understanding, and adaptive learning. For each domain the framework offers task templates intended to probe specific abilities under controlled conditions, for example testing planning by varying horizon length and environmental stochasticity, or testing memory by varying retention intervals and interference.

Metrics go beyond top-line accuracy. DeepMind emphasizes measures such as data efficiency, out-of-distribution generalization, robustness to perturbations, latency of learning, and the ability to compose learned skills. The framework also encourages multi-metric reporting so that a model's strengths and weaknesses are visible across dimensions rather than reduced to a single leaderboard score.

The document suggests both synthetic and grounded tasks: simulated environments for reproducible stress tests, curated datasets reflecting real-world complexities, and interactive evaluations where models must learn through interaction. It calls for modular tests that can be combined to form composite assessments, and for metadata standards that make leaderboards and comparisons more interpretable.

The Kaggle hackathon and evaluation goals

DeepMind launched a Kaggle hackathon to accelerate the construction of evaluation tasks that follow the framework. Participants are invited to submit task definitions, dataset processing code, baseline implementations, and evaluation harnesses compatible with recommended metrics. The competition aims to produce a reusable corpus of evaluation modules and reference implementations that others can run on different models and compute budgets.

The hackathon is framed as community-driven: submissions will be shareable, and organizers expect accepted tasks to seed public leaderboards or be integrated into larger evaluation suites. The guidelines emphasize transparent task descriptions, reproducible baselines, and clear metric computation so that independent researchers and smaller labs can run the same evaluations without specialized infrastructure.

DeepMind also highlights concerns about potential failure modes of single-metric leaderboards and asks contributors to include adversarial and stress-test variants. The goal is to encourage benchmarks that reveal brittle behaviors, scaling plateaus, and trade-offs between capabilities such as speed versus reliability.

Why it matters

A shared cognitive framework and an open hackathon direct attention toward standardized, multi-dimensional evaluations rather than single-number comparisons. If adopted broadly, the effort could shift where researchers invest effort, privileging sample efficiency, robustness, and compositional skills alongside raw performance. Community-built tasks may expose shortcomings of current architectures and influence funding, deployment decisions, and where safety researchers concentrate testing resources.

Example cognitive domains and evaluation mapping
Item
PerceptionVisual categorization under occlusionAccuracy, robustness to noise, latency
MemoryDelayed recall with distractorsRetention accuracy, interference sensitivity, sample efficiency
PlanningLong-horizon navigation with stochastic dynamicsSuccess rate, planning horizon scaling, compute cost
Abstract reasoningNovel puzzle composition and analogy tasksGeneralization, few-shot performance, compositionality
Social understandingTheory-of-mind style prediction tasksPredictive accuracy, robustness to deceptive signals
Adaptive learningRapid adaptation to new tasks from small dataAdaptation speed, final performance, forgetting

Primary source

Google DeepMind

deepmind.google
Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeNo adsNo trackingUnsubscribe in one click

Read next

  1. Germany approves DE-AISI to test Anthropic frontier modelsJun 10 · 3 min read
  2. China $295B AI data center plan requires 80% domestic chipsJun 9 · 3 min read
  3. Apple Intelligence uses Google models and Nvidia GPUsJun 9 · 3 min read
  4. Apple unveils Siri AI at WWDC 2026 with on-device modelsJun 8 · 4 min read