Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought

Add to Favorites

Poster_Demo_Industry_Findings In-person 1: Language Grounding to Vision, Robotics and Beyond (Poster)

Conference Room: East Foyer

Conference Time: December 08, 11:00-12:30 (+08) (Asia/Singapore)

Global Time: December 08, Poster_Demo_Industry_Findings In-person 1 (03:00-04:30 UTC)

TLDR:

You can open the #paper-5079 channel in a separate window.

Abstract: