Abstract:
Current approaches to video analysis of human motion focus on raw pixels or keypoints as the basic
units of reasoning. We posit that adding higher-level motion primitives, which can capture natural
coarser units of motion such as backswing or follow-through, can be used to improve
downstream analysis tasks. This higher level of abstraction can also capture key features, such as
loops of repeated primitives, that are currently inaccessible at lower levels of representation. We
therefore introduce Motion Programs, a neuro-symbolic, program-like representation that expresses
motions as a composition of high-level primitives. We also present a system for automatically inducing
motion programs from videos of human motion and for leveraging motion programs in video synthesis.
Experiments show that motion programs can accurately describe a diverse set of human motions and the
inferred programs contain semantically meaningful motion primitives, such as arm swings and jumping
jacks. Our representation also benefits downstream tasks such as video interpolation and video prediction
and outperforms off-the-shelf models. We further demonstrate how these programs can detect diverse
kinds of repetitive motion and facilitate interactive video editing.
@article{motion2prog2021,
Author = {Sumith Kulal and Jiayuan Mao and Alex Aiken and Jiajun Wu},
Title = {Hierarchical Motion Understanding via Motion Programs},
booktitle={CVPR},
year={2021},
}
Video
Examples of synthesized motion primitives
Fig. 1: a) This golf swing has three primitives: back-swing, pause and down-swing.
b) The squats sequence has a similar repeating subsequence of three primitives: squating down, standing up and
a brief rest in the standing pose.
Illustration of rolling up repetitive statements
Fig. 2: Illustration of rolling up 6 repetitive (alternating) statements into a
for-loop of body size 2. We first translate concrete primitives to deterministic abstract primitives and then
synthesize for-loops with probabilistic primitives in the body. Concrete primitives are sampled from the
probabilistic abstract primitives during execution.
Acknowledgements:
We thank Karan Chadha, Shivam Garg and Shubham Goel for helpful discussions. This work is in part supported
by Magic Grant from the Brown Institute for Media Innovation, the Samsung Global Research Outreach (GRO)
Program, Autodesk, Amazon Web Services, and Stanford HAI for AWS Cloud Credits.