ICCV 2021Multi-Modality Learning from Videos and Beyond |
||
Zoom |
In this tutorial, we would like to cover many aspects of multi-modality learning in vision, e.g., using language, audio, video, wireless, touch, etc. Our target audience includes students, researchers and engineers, who are interested in learning the recent advances in multi-modality, performing research and applying them to real-world problems.
If you are interested in video action recognition basics and edge deployment of video algorithms, please check out our [1st Comprehensive Tutorial on Video Modeling] in CVPR 2020.
If you are interested in more recent research in video understanding, please check out our [2nd Comprehensive Tutorial on Video Modeling] in CVPR 2021.
14:00 - 14:40 : Computer Perception with Perceivers by João Carreira [YouTube] [Bilibili]
14:40 - 15:20 : Neuro-Symbolic Dynamic Visual Reasoning by Chuang Gan [YouTube] [Bilibili]
15:20 - 16:00 : Video, Multimodality & Similarity by Cees Snoek [YouTube] [Bilibili]
16:00 - 16:10 : Break
16:10 - 16:50 : Computer Vision with Sight, Sound, and Speech by Lorenzo Torresani [YouTube] [Bilibili]
16:50 - 17:30 : Knowledgeable and Spatial-Temporal Vision+Language by Mohit Bansal [YouTube] [Bilibili]
17:30 - 18:10 : Learning the physical world from High-resolution Tactile Sensing by Wenzhen Yuan [YouTube] [Bilibili]
For offline Q&A, please post questions to Google Doc
Please contact Yi Zhu if you have question.