Video Analysis and Generation via a Semantic Progress Function

Gal Metzer^1*, Sagi Polaczek^1*, Ali Mahdavi-Amiri², Raja Giryes¹, Daniel Cohen-Or¹

¹Tel Aviv University ²Simon Fraser University

^*Equal contribution

SIGGRAPH 2026

ReTime: Semantic Progress in Video Generation

Transforming abrupt semantic jumps into smooth, evenly-paced transitions

Cat to Lion

Left: Source video with non-linear semantic progression | Right: ReTime output with uniform semantic flow
Watch how the SPF curve reveals uneven pacing and how ReTime corrects it

Abstract

Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or re-times) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.

The Problem: Non-Linear Semantic Evolution

Video generation models produce transformations that evolve unevenly. Long stretches where content barely changes are followed by sudden, jarring semantic jumps.

Balloon to Lantern

Basket to Nest

Bead to Bee

Berry to Bird

Birdhouse to Camera

Broom to Cat

City to Circuit

Cloud to Sheep

Corn to Chick

Dumpling to Bunny

Lollipop to Sheep

Macaron to Bunny

Panda to Red Panda

Straw to Flamingo

Teddy to Panda

Traffic Cone to Fox

Wombat to Badger

Cat to Lion

Zoom In

Desert to Ice

Face Makeup

Fox to Corgi

Pumpkin to Mosaic

Tomato to Bell Pepper

The Semantic Progress Function (SPF) reveals these irregularities: the source curve deviates from the ideal diagonal, showing regions of stagnation followed by rapid change. Our ReTime method corrects this pacing.

Method Overview

The Semantic Progress Function (SPF) captures cumulative semantic change over time

1. Compute Semantic Distances

Each video frame is embedded using a semantic encoder (SigLIP). Pairwise distances capture how much meaning changes between frames.

2. Fit the SPF

A smooth curve is fitted to represent cumulative semantic progress. Its slope reflects the instantaneous rate of semantic change.

3. ReTime via RoPE Warping

We warp temporal positional embeddings (RoPE) to redistribute time according to semantic progress, achieving constant semantic velocity.

The Semantic Progress Function in Action

Watch how the SPF curve reveals non-linear semantic evolution and how ReTime corrects it

Traffic Cone to Fox

Berry to Bird

Birdhouse to Camera

Panda to Red Panda

Lollipop to Sheep

Cloud to Sheep

Corn to Chick

Tomato to Bell Pepper

Balloon to Lantern

City to Circuit Board

Dumpling to Bunny

Macaron to Bunny

Straw to Flamingo

Teacup to Bowl

Zoom In

Desert to Ice

Face Makeup

Pumpkin to Mosaic

Teddy to Panda

Wombat to Badger

Bead to Bee

Broom to Cat

Fox to Corgi

Left: Source video with non-linear semantic progression | Right: ReTime output with uniform semantic flow
SPF computed using SigLIP embeddings with k=30

Retiming of Existing Videos

For videos we cannot regenerate from scratch, we segment the SPF curve and synthesize new clips conditioned on boundary frames, with lengths proportional to their semantic span

Wan2.2
LTX-2
SPF Analysis

Input film clips (left) vs Wan2.2-generated videos (right)

Input (Film Clip) WAN Generated

Stranger Things - Vecna Transformation

Input (Film Clip) WAN Generated

Thor: Ragnarok - Loki's Disguise Reveal

Input (Film Clip) WAN Generated

Lucifer - Wings Reveal (Penthouse)

Input (Film Clip) WAN Generated

Lucifer - Wings Reveal (Office)

Input film clips (left) vs LTX-2 generated videos (right)

Input (Film Clip) LTX-2 Generated

Stranger Things - Vecna Transformation

Input (Film Clip) LTX-2 Generated

Thor: Ragnarok - Loki's Disguise Reveal

Input (Film Clip) LTX-2 Generated

Lucifer - Wings Reveal (Penthouse)

Input (Film Clip) LTX-2 Generated

Lucifer - Wings Reveal (Office)

SPF analysis reveals semantic pacing differences between input clips and generated videos

Vecna - Input vs WAN (SPF)

Vecna - Input vs LTX-2 (SPF)

Thor - Input vs WAN (SPF)

Thor - Input vs LTX-2 (SPF)

Lucifer Wings - Input vs WAN (SPF)

Lucifer Wings - Input vs LTX-2 (SPF)

Lucifer Bedroom - Input vs WAN (SPF)

Lucifer Bedroom - Input vs LTX-2 (SPF)

LTX-2 Results - Joint Audio-Video Generation

LTX-2 generates synchronized audio-video from keyframes. With ReTime, the audio aligns with smooth visual transitions rather than abrupt semantic jumps.

These videos have audio! Click to unmute and hear the synchronized sound.

Source ReTime (Ours)

Teddy to Panda

Source ReTime (Ours)

Dragon to Phoenix

Source ReTime (Ours)

Towel to Elephant

Source ReTime (Ours)

Low Angle View

Source ReTime (Ours)

Overhead View

Source ReTime (Ours)

Behind View

Source ReTime (Ours)

Macro Sea Glass

Source ReTime (Ours)

Cheetah Acceleration

Source ReTime (Ours)

Panda to Red Panda

Source ReTime (Ours)

Fox to Raccoon

Source ReTime (Ours)

Chow Chow to Shiba Inu

Source ReTime (Ours)

Penguin to Puffin

Source ReTime (Ours)

Cupcake to Cat

Source ReTime (Ours)

Bread to Hamster

Source ReTime (Ours)

Cinnamon to Hedgehog

Source ReTime (Ours)

Glove to Octopus

Source ReTime (Ours)

Dumpling to Bunny

Source ReTime (Ours)

Muffin to Hedgehog

Source ReTime (Ours)

Donut to Hamster

Source ReTime (Ours)

Cactus to Hedgehog

Source ReTime (Ours)

Sunflower to Rose

Source ReTime (Ours)

Autumn to Winter

Source ReTime (Ours)

Acorn to Squirrel

Source ReTime (Ours)

Lemon to Bird

Source ReTime (Ours)

Corn to Chick

Source ReTime (Ours)

Carrot to Rocket

Source ReTime (Ours)

Traffic Cone to Fox

Source ReTime (Ours)

Trophy to Penguin

Source ReTime (Ours)

Candle to Firefly

Source ReTime (Ours)

Button to Ladybug

Source ReTime (Ours)

Soap to Seal

Source ReTime (Ours)

Doily to Spider

Source ReTime (Ours)

Battery Drain

Source ReTime (Ours)

Plate Level Up

Citation

If you find our work useful in your research, please consider citing:

@article{spf2026,
  title   = {Video Analysis and Generation via a Semantic Progress Function},
  author  = {Metzer, Gal and Polaczek, Sagi and Mahdavi-Amiri, Ali
             and Giryes, Raja and Cohen-Or, Daniel},
  journal = {TBD},
  year    = {2026}
}

The End