A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot

Aanisha Bhattacharyya, Yaman K Singla, Balaji Krishnamurthy, Rajiv Ratn Shah, Changyou Chen

Main: Speech & Multimodality 1 Main-oral Paper

Session 9: Speech & Multimodality 1 (Oral)
Conference Room: Central 3
Conference Time: December 10, 09:00-10:30 (+08) (Asia/Singapore)
Global Time: December 10, Session 9 (01:00-02:30 UTC)
TLDR:
You can open the #paper-232 channel in a separate window.
Abstract: