Video Analysis and Generation via a Semantic Progress Function

1Tel Aviv University 2Simon Fraser University
*Equal contribution
SIGGRAPH 2026

ReTime: Semantic Progress in Video Generation

Transforming abrupt semantic jumps into smooth, evenly-paced transitions

Cat to Lion

Left: Source video with non-linear semantic progression  |  Right: ReTime output with uniform semantic flow
Watch how the SPF curve reveals uneven pacing and how ReTime corrects it

Abstract

Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or re-times) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.

The Problem: Non-Linear Semantic Evolution

Video generation models produce transformations that evolve unevenly. Long stretches where content barely changes are followed by sudden, jarring semantic jumps.

The Semantic Progress Function (SPF) reveals these irregularities: the source curve deviates from the ideal diagonal, showing regions of stagnation followed by rapid change. Our ReTime method corrects this pacing.

Method Overview

The Semantic Progress Function (SPF) captures cumulative semantic change over time

1. Compute Semantic Distances

Each video frame is embedded using a semantic encoder (SigLIP). Pairwise distances capture how much meaning changes between frames.

2. Fit the SPF

A smooth curve is fitted to represent cumulative semantic progress. Its slope reflects the instantaneous rate of semantic change.

3. ReTime via RoPE Warping

We warp temporal positional embeddings (RoPE) to redistribute time according to semantic progress, achieving constant semantic velocity.

The Semantic Progress Function in Action

Watch how the SPF curve reveals non-linear semantic evolution and how ReTime corrects it

Left: Source video with non-linear semantic progression  |  Right: ReTime output with uniform semantic flow
SPF computed using SigLIP embeddings with k=30

Retiming of Existing Videos

For videos we cannot regenerate from scratch, we segment the SPF curve and synthesize new clips conditioned on boundary frames, with lengths proportional to their semantic span

Input film clips (left) vs Wan2.2-generated videos (right)

LTX-2 Results - Joint Audio-Video Generation

LTX-2 generates synchronized audio-video from keyframes. With ReTime, the audio aligns with smooth visual transitions rather than abrupt semantic jumps.

These videos have audio! Click to unmute and hear the synchronized sound.

Citation

If you find our work useful in your research, please consider citing:

@article{spf2026,
  title   = {Video Analysis and Generation via a Semantic Progress Function},
  author  = {Metzer, Gal and Polaczek, Sagi and Mahdavi-Amiri, Ali
             and Giryes, Raja and Cohen-Or, Daniel},
  journal = {TBD},
  year    = {2026}
}
The End