Gemma Scope 2: DeepMind releases interpretability tools for Gemma
DeepMind has released Gemma Scope 2, extending open interpretability tools across the full Gemma 3 family for AI safety researchers.
TL;DR
- 01DeepMind has released Gemma Scope 2, extending open interpretability tools across the full Gemma 3 family for AI safety researchers.
- 02DeepMind released Gemma Scope 2, an update to its interpretability toolkit that now supports the full Gemma 3 family of language models.
- 03The release makes open tools and interfaces available for researchers working on language-model behavior and safety.
DeepMind released Gemma Scope 2, an update to its interpretability toolkit that now supports the full Gemma 3 family of language models. The release makes open tools and interfaces available for researchers working on language-model behavior and safety.
Gemma Scope 2 packages model inspection capabilities that work across different sizes and variants in the Gemma 3 family. DeepMind positions the release as a resource for the AI safety and interpretability communities, enabling direct probing of internal model representations, layer and head behaviors, and token-level attributions for models trained under the Gemma 3 umbrella.
What Gemma Scope 2 provides
Gemma Scope 2 consolidates a set of interpretability features into a single toolkit that runs against Gemma 3 checkpoints. Key elements include:
- Model instrumentation and tracing, letting researchers capture activations across layers while models process text.
- Visualization modules for examining attention patterns, intermediate activations, and other internal signals.
- Interfaces for token-level attribution and saliency analysis to connect input tokens to model responses.
- Support for multiple Gemma 3 variants so teams can compare behavior across model sizes and training configurations.
DeepMind also emphasizes usability for research teams: the toolkit includes documentation and example workflows aimed at reproducible analysis. The release is intended to lower the barrier to hands-on inspection of large language models in research settings.
Access, scope and limitations
Gemma Scope 2 is presented as an open set of interpretability tools, made available to the research community to run on Gemma 3 models. DeepMind frames the package as complementary to other community tooling, not as an exhaustive solution to all interpretability tasks. Users will still need compute resources and expertise to instrument large checkpoints and interpret the results.
The tools operate at runtime and require access to model checkpoints and sufficient memory to capture intermediate activations. That means smaller teams may prefer to run analyses on mid-size Gemma 3 variants rather than the largest models. DeepMind notes ongoing work and invites researchers to use the toolkit to surface behaviors that merit deeper study.
Why it matters
Making interpretability tools available across an entire model family reduces friction for researchers who want to compare how behaviors emerge with scale and architectural choices. The release should accelerate collaborative analysis between model builders and safety researchers by giving both parties a shared set of instruments for probing models. Broader access also increases the number of independent examinations that can reveal unexpected failure modes or clarify how specific capabilities arise.
Primary source
Google DeepMind
deepmind.googleThe Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in AI SafetyAnthropic essay: Dario Amodei's Cold War playbook for AI
Anthropic published a sweeping essay plus two policy frameworks calling for binding audits of frontier models and a strategic national.
Germany approves DE-AISI to test Anthropic frontier models
Germany's National Security Council greenlit DE-AISI, modeled on the UK's AISI, to evaluate Anthropic frontier models and national security
DeepMind $10M fund for multi-agent AI safety research
DeepMind and partner organisations have opened a $10 million funding call to support research into multi-agent coordination.
OpenAI shifts automation policy: no full automation by 2028
OpenAI says entirely automating everything isn't the future and emphasizes a human-machine tandem plus new safeguards.