AI Safety4 min read

Gemma Scope 2: DeepMind releases interpretability tools for Gemma

DeepMind has released Gemma Scope 2, extending open interpretability tools across the full Gemma 3 family for AI safety researchers.

The Brieftide

TL;DR

  • 01DeepMind has released Gemma Scope 2, extending open interpretability tools across the full Gemma 3 family for AI safety researchers.
  • 02DeepMind released Gemma Scope 2, an update to its interpretability toolkit that now supports the full Gemma 3 family of language models.
  • 03The release makes open tools and interfaces available for researchers working on language-model behavior and safety.

DeepMind released Gemma Scope 2, an update to its interpretability toolkit that now supports the full Gemma 3 family of language models. The release makes open tools and interfaces available for researchers working on language-model behavior and safety.

Gemma Scope 2 packages model inspection capabilities that work across different sizes and variants in the Gemma 3 family. DeepMind positions the release as a resource for the AI safety and interpretability communities, enabling direct probing of internal model representations, layer and head behaviors, and token-level attributions for models trained under the Gemma 3 umbrella.

What Gemma Scope 2 provides

Gemma Scope 2 consolidates a set of interpretability features into a single toolkit that runs against Gemma 3 checkpoints. Key elements include:

  • Model instrumentation and tracing, letting researchers capture activations across layers while models process text.
  • Visualization modules for examining attention patterns, intermediate activations, and other internal signals.
  • Interfaces for token-level attribution and saliency analysis to connect input tokens to model responses.
  • Support for multiple Gemma 3 variants so teams can compare behavior across model sizes and training configurations.

DeepMind also emphasizes usability for research teams: the toolkit includes documentation and example workflows aimed at reproducible analysis. The release is intended to lower the barrier to hands-on inspection of large language models in research settings.

Access, scope and limitations

Gemma Scope 2 is presented as an open set of interpretability tools, made available to the research community to run on Gemma 3 models. DeepMind frames the package as complementary to other community tooling, not as an exhaustive solution to all interpretability tasks. Users will still need compute resources and expertise to instrument large checkpoints and interpret the results.

The tools operate at runtime and require access to model checkpoints and sufficient memory to capture intermediate activations. That means smaller teams may prefer to run analyses on mid-size Gemma 3 variants rather than the largest models. DeepMind notes ongoing work and invites researchers to use the toolkit to surface behaviors that merit deeper study.

Why it matters

Making interpretability tools available across an entire model family reduces friction for researchers who want to compare how behaviors emerge with scale and architectural choices. The release should accelerate collaborative analysis between model builders and safety researchers by giving both parties a shared set of instruments for probing models. Broader access also increases the number of independent examinations that can reveal unexpected failure modes or clarify how specific capabilities arise.

Gemma Scope 2 component layout
User interfaceVisualization modulesAnalysis backendModel instrumentationGemma 3 checkpointsActivation and dataset store

Primary source

Google DeepMind

deepmind.google
Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

More in AI Safety