Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
Vaishnavi Himakunthala, Andy Ouyang, Daniel Philip Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang
Main: Language Grounding to Vision, Robotics and Beyond Main-poster Paper
Poster_Demo_Industry_Findings In-person 1: Language Grounding to Vision, Robotics and Beyond (Poster)
Conference Room: East Foyer
Conference Time: December 08, 11:00-12:30 (+08) (Asia/Singapore)
Global Time: December 08, Poster_Demo_Industry_Findings In-person 1 (03:00-04:30 UTC)
TLDR:
You can open the
#paper-5079
channel in a separate window.
Abstract: