New Paper Alert
Instructional Video Generation β we are releasing a new method for Video Generation that explicitly focuses on fine-grained, subtle hand motions.Β Given a single image frame as context and a text prompt for an action, our new method generates high quality videos with careful attention to hand rendering.Β We use the instructional video domain as driver here given the rich set of videos and challenges in instructional videos both for humans and robots.
Try it out yourself.Β Links to the paper, project page and code are below; and a demo page on HuggingFace is in the works so you can more easily try it on your own.
Our new method generates instructional videos tailored to your room, your tools, and your perspective. Whether itβs threading a needle or rolling dough, the video shows exactly how you would do it, preserving your environment while guiding you frame-by-frame. The key breakthrough is in mastering accurate subtle fingertip actionsβthe exact fine details that matter most in action completion. By designing automatic Region of Motion (RoM) generation and a hand structure loss for fine-grained fingertip movements, our diffusion-based im model outperforms six state-of-the-art video generation methods, bringing unparalleled clarity to Video GenAI.
π Project Page: https://excitedbutter.github.io/project_page/
π Paper Link: https://arxiv.org/abs/2412.04189
π GitHub Repo: https://github.com/ExcitedButter/Instructional-Video-Generation-IVG
This paper is coauthored with my students Yayuan Li and Zhi Cao at the University of Michigan and Voxel51.
Top comments (2)
Importantly, as this video shows, our proposed Hand Structure Loss is critical to generate accurate and realistic fingertip subtle actions. See video demonstrations here: excitedbutter.github.io/project_pa...
Thank you, Dr. Corso, and thank you to the community for your attention. We welcome any comments and feedback!