ICCV 2021

Multi-Modality Learning from Videos and Beyond

Zoom

 

Oct 11, 2021 2pm to 6pm. ALL TIME IN EST.


Speakers


 


Overview

In this tutorial, we would like to cover many aspects of multi-modality learning in vision, e.g., using language, audio, video, wireless, touch, etc. Our target audience includes students, researchers and engineers, who are interested in learning the recent advances in multi-modality, performing research and applying them to real-world problems.

If you are interested in video action recognition basics and edge deployment of video algorithms, please check out our [1st Comprehensive Tutorial on Video Modeling] in CVPR 2020.

If you are interested in more recent research in video understanding, please check out our [2nd Comprehensive Tutorial on Video Modeling] in CVPR 2021.


Schedule

14:00 - 14:40 : Computer Perception with Perceivers by João Carreira [YouTube] [Bilibili]

14:40 - 15:20 : Neuro-Symbolic Dynamic Visual Reasoning by Chuang Gan [YouTube] [Bilibili]

15:20 - 16:00 : Video, Multimodality & Similarity by Cees Snoek [YouTube] [Bilibili]

16:00 - 16:10 : Break

16:10 - 16:50 : Computer Vision with Sight, Sound, and Speech by Lorenzo Torresani [YouTube] [Bilibili]

16:50 - 17:30 : Knowledgeable and Spatial-Temporal Vision+Language by Mohit Bansal [YouTube] [Bilibili]

17:30 - 18:10 : Learning the physical world from High-resolution Tactile Sensing by Wenzhen Yuan [YouTube] [Bilibili]


 

For offline Q&A, please post questions to Google Doc


Organizers


 


Please contact Yi Zhu if you have question.