Hierarchical Motion Understanding via Motion Programs

CVPR 2021

♦ Stanford University § MIT ★ Equal contributions

Abstract: Current approaches to video analysis of human motion focus on raw pixels or keypoints as the basic units of reasoning. We posit that adding higher-level motion primitives, which can capture natural coarser units of motion such as backswing or follow-through, can be used to improve downstream analysis tasks. This higher level of abstraction can also capture key features, such as loops of repeated primitives, that are currently inaccessible at lower levels of representation. We therefore introduce Motion Programs, a neuro-symbolic, program-like representation that expresses motions as a composition of high-level primitives. We also present a system for automatically inducing motion programs from videos of human motion and for leveraging motion programs in video synthesis. Experiments show that motion programs can accurately describe a diverse set of human motions and the inferred programs contain semantically meaningful motion primitives, such as arm swings and jumping jacks. Our representation also benefits downstream tasks such as video interpolation and video prediction and outperforms off-the-shelf models. We further demonstrate how these programs can detect diverse kinds of repetitive motion and facilitate interactive video editing.
    Author = {Sumith Kulal and Jiayuan Mao and Alex Aiken and Jiajun Wu},
    Title = {Hierarchical Motion Understanding via Motion Programs},


Examples of synthesized motion primitives

Fig. 1: a) This golf swing has three primitives: back-swing, pause and down-swing. b) The squats sequence has a similar repeating subsequence of three primitives: squating down, standing up and a brief rest in the standing pose.

Illustration of rolling up repetitive statements

Fig. 2: Illustration of rolling up 6 repetitive (alternating) statements into a for-loop of body size 2. We first translate concrete primitives to deterministic abstract primitives and then synthesize for-loops with probabilistic primitives in the body. Concrete primitives are sampled from the probabilistic abstract primitives during execution.

Acknowledgements: We thank Karan Chadha, Shivam Garg and Shubham Goel for helpful discussions. This work is in part supported by Magic Grant from the Brown Institute for Media Innovation, the Samsung Global Research Outreach (GRO) Program, Autodesk, Amazon Web Services, and Stanford HAI for AWS Cloud Credits.