Multimodal AI4 min read

DeepMind Veo 3.1 release: Ingredients-to-Video with vertical

Veo 3.1 improves consistency and creative variation for short clips, adds vertical video output and finer control over motion and framing.

The Brieftide

TL;DR

  • 01Veo 3.1 improves consistency and creative variation for short clips, adds vertical video output and finer control over motion and framing.
  • 02DeepMind has released Veo 3.1, the latest update to its Ingredients-to-Video generative system, adding features to improve consistency, increase creative variation and support vertical video output.
  • 03The update targets short-form clip generation and gives users finer control over motion, pacing and camera framing.

DeepMind has released Veo 3.1, the latest update to its Ingredients-to-Video generative system, adding features to improve consistency, increase creative variation and support vertical video output. The update targets short-form clip generation and gives users finer control over motion, pacing and camera framing.

Veo 3.1 arrives as an iterative upgrade rather than a ground-up redesign. The release focuses on three areas cited by the developer: improving temporal coherence so objects and characters remain stable across frames, expanding the models creative palette to produce less repetitive motion, and exposing explicit control parameters for vertical and portrait aspect ratios used across social platforms.

What’s new in Veo 3.1

The headline additions include a suite of control tokens and conditioning signals that let users set clip length, camera motion style, and framing constraints. DeepMind says the controls are calibrated to work across both landscape and vertical canvases, with presets for common mobile formats.

On quality, Veo 3.1 applies expanded temporal attention windows and refined frame interpolation to reduce flicker and identity drift. The update also introduces stochastic variation layers intended to diversify motion trajectories while preserving object identity. DeepMind highlights sample outputs that show more natural head and limb motion, steadier object placement, and fewer texture artifacts.

For vertical video, the model includes aspect-aware conditioning so generated camera movement and composition respect portrait-focused subject placement. The system provides automatic center-of-interest heuristics for single-subject shots and adjustable margins for multi-subject framing.

Veo 3.1 continues to rely on the Ingredients-to-Video approach, which combines high-level scene "ingredients" such as character descriptions, action prompts and reference images with generative modules that synthesize motion and render frames. The update refines how those inputs are fused, and adds user-facing sliders for pacing, jitter tolerance and motion creativity.

How it works

Under the hood, the pipeline retains a staged architecture: a scene planner converts textual and visual ingredients into an abstract motion plan, a motion synthesizer samples plausible trajectories consistent with that plan, and a renderer produces the final frames. The new controls operate at planning and motion synthesis stages, changing the probability distributions used to sample trajectories and camera paths.

Improvements in temporal coherence come from two technical tweaks. First, the system increases the effective attention span across frames so the model can reference a longer history when predicting the next frame. Second, it introduces a consistency loss during training that penalizes identity changes across time. For creative variation, the model applies conditional noise schedules and diversity-promoting objectives to encourage varied but plausible motion.

Deployment options remain similar to prior Veo releases: the model can be run on dedicated inference infrastructure and accepts both text-only and mixed text-plus-image prompts. DeepMind emphasizes that runtime controls let producers trade off determinism for variety depending on production needs.

Why it matters

Veo 3.1 tightens the gap between single-frame image models and short-form video tools by addressing common failure modes like flicker and identity drift while adding controls tailored for mobile vertical formats. Content creators and app developers who need reliable short clips with predictable framing will find the new controls useful, while researchers can study the trade-offs between temporal consistency and motion diversity introduced by the update.

Veo 3.1 generation pipeline
  1. 01

    Input ingredients

    Text prompts, reference images, aspect ratio and control tokens (pacing, motion style, framing)

  2. 02

    Scene planner

    Converts ingredients into an abstract motion and camera plan with center-of-interest heuristics

  3. 03

    Motion synthesizer

    Samples trajectories with diversity controls and temporal attention across frames

  4. 04

    Renderer

    Produces per-frame pixels, applies interpolation and consistency loss adjustments

  5. 05

    Output formatting

    Finalizes aspect ratio, applies vertical presets and exports short-form clip

Primary source

Google DeepMind

deepmind.google
Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click