SDCNet: Video Prediction Using Spatially Displaced Convolution

Published in European Conference on Computer Vision (ECCV), 2018

Recommended citation: Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro, SDCNet: Video Prediction Using Spatially Displaced Convolution. ECCV 2018. https://arxiv.org/abs/1811.00684

SDCNet is a 3D convolutional neural network proposed for frame prediction. The model takes as input a sequence of past frames and their inter-frame optical flows and generates a per-pixel kernel and motion vector. A future frame is then synthesised by sampling past frames guided by the motion vectors and weighted by the learned kernels. SDCNet is trainable on raw unlabelled video frames via self supervision, with little assumptions on the videos. We calculate the optical flows we input to our model using flownet2-pytorch.

Below are sample videos showing 2x frame rate upscaling using SDCNet. In each video, we interleave ground-truth and SDC predicted frames. Predicted frames are marked with a RGB flag near the top-right corner.
To view the results frame by frames, we recommend to download and watch the videos in media players such as VLC or SM.
Note: If you are watching the videos directly as embedded in this webpage, the RGB flags may not be visible.

Various short clips (compared with MC-Net). Download link.

A sample video from YouTube8M. Download link.

A sample video from YouTube8M. Download link.

A sample video from CaltechPedestrian. Download link.