Flexible Diffusion Modeling of Long Videos

Flexible Diffusion Modeling of Long Videos

Comparisons of FDM with baselines on each dataset

Or see


Or see the below videos over an hour long:

These are sampled on GQN-Mazes and MineRL by iterated application of our Hierarchy-2 sampling scheme, and on CARLA Town 01 with an autoregressive sampling scheme. We condition on the first 36 frames of a test video. Observed frames are shown with a red border, and we mark the end of the video with a checkerboard pattern.