Publications

You can also find the full list of my articles on my Google Scholar profile.

2023

PreDiff: Precipitation Nowcasting with Latent Diffusion Models
Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
Conference on Neural Information Processing Systems (NeurIPS) 2023
arxivcode
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu Sun
Conference on Neural Information Processing Systems (NeurIPS) 2023
arxivcode
GFM: Building Geospatial Foundation Models via Continual Pretraining
Matias Mendieta, Boran Han, Xingjian Shi, Yi Zhu, Chen Chen, Mu Li
International Conference on Computer Vision (ICCV) 2023
arxivcode
Motion-Guided Masking for Spatiotemporal Representation Learning
David Fan, Jue Wang, Shuai Liao, Yi Zhu, Vimal Bhat, Hector Santos-Villalobos, Rohith MV, Xinyu Li
International Conference on Computer Vision (ICCV) 2023
arxivcode
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren, Zihan Zhong, Xingjian Shi, Yi Zhu, Chun Yuan, Mu Li
Association for Computational Linguistics (ACL) 2023
arxivcode
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha
arXiv preprint arXiv:2302.03432 2023
arxivcode
AIM: Adapting Image Models for Efficient Video Understanding
Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li
International Conference on Learning Representations (ICLR) 2023
arxivcode
Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations
Andrii Zadaianchuk, Matthaeus Kleindessner, Yi Zhu, Francesco Locatello, Thomas Brox
International Conference on Learning Representations (ICLR) 2023
arxiv

2022

What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2022
arxivIEEE journalcode
Are Multimodal Models Robust to Image and Text Perturbations?
Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li
arXiv preprint arXiv:2212.08044 2022
arxiv
SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning
M Saiful Bari, Aston Zhang, Shuai Zheng, Xingjian Shi, Yi Zhu, Shafiq Joty, Mu Li
arXiv preprint arXiv:2212.10929 2022
arxiv
Visual Prompt Tuning for Test-time Domain Adaptation
Yunhe Gao, Xingjian Shi, Yi Zhu, Hao Wang, Zhiqiang Tang, Xiong Zhou, Mu Li, Dimitris N. Metaxas
arXiv preprint arXiv:2210.04831 2022
arxiv
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Yuyang Wang, Mu Li, Dit-Yan Yeung
Conference on Neural Information Processing Systems (NeurIPS) 2022
arxivcode
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Bo Li, Mu Li
arXiv preprint arXiv:2206.08358 2022
arxivcode
Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition
Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li, Alex Smola, and Zhangyang Wang
International Conference on Machine Learning (ICML) 2022 Long Oral
arxivcode
Pixel-level Correspondence for Self-Supervised Learning from Video
Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox
International Conference on Machine Learning (ICML) 2022 Workshop
arxiv
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue
arXiv preprint arXiv:2203.13249 2022
arxivcode
NUTA: Non-uniform Temporal Aggregation for Action Recognition
Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe
IEEE Winter Conference on Applications of Computer Vision (WACV) 2022
arxiv

2021

Improving Semantic Segmentation via Efficient Self-Training
Yi Zhu, Zhongyue Zhang, Chongruo Wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li and Alexander Smola
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021
arxivIEEE journalcode
Blending Anti-Aliasing into Vision Transformer
Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
Conference on Neural Information Processing Systems (NeurIPS) 2021
arxivcode
Progressive Coordinate Transforms for Monocular 3D Object Detection
Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue
Conference on Neural Information Processing Systems (NeurIPS) 2021
arxivcode
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox
International Conference on Computer Vision (ICCV) 2021
arxivcode
VidTr: Video Transformer Without Convolutions
Xinyu Li, Yanyi Zhang, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe
International Conference on Computer Vision (ICCV) 2021
arxivcode
SelfNorm and CrossNorm for Out-of-Distribution Robustness
Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris Metaxas
International Conference on Computer Vision (ICCV) 2021
arxivcode
A Unified Efficient Pyramid Transformer for Semantic Segmentation
Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo Wu, Yanwei Fu, Mu Li
International Conference on Computer Vision (ICCV) 2021 Workshop
arxivcode
Video Contrastive Learning with Global Context
Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joseph Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li
International Conference on Computer Vision (ICCV) 2021 Workshop
arxivcode
Domain Consensus Clustering for Universal Domain Adaptation
Guangrui Li, Guoliang Kang, Yi Zhu, Yunchao Wei, Yi Yang
Computer Vision and Pattern Recognition (CVPR) 2021
arxivcode
AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain Adaptation
Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam
Computer Vision and Pattern Recognition (CVPR) 2021 Workshop
arxiv
Scale Aware Adaptation for Land-Cover Classification in Remote Sensing Imagery
Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam
IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
arxivcode

2020

A Comprehensive Study of Deep Video Action Recognition
Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
arXiv preprint arXiv:2012.06567 2020
arxivcode
Towards Good Practices in Self-supervised Representation Learning
Srikar Appalaraju, Yi Zhu, Yusheng Xie, István Fehérvári
Conference on Neural Information Processing Systems (NeurIPS) 2020 Workshop
arxiv
Improving Semantic Segmentation via Self-Training
Yi Zhu, Zhongyue Zhang, Chongruo Wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li and Alexander Smola
arXiv preprint arXiv:2004.14960 2020
arxivcode
ResNeSt: Split-Attention Networks
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li and Alexander Smola
arXiv preprint arXiv:2004.08955 2020
arxivcode
Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior
Hu Zhang, Linchao Zhu, Yi Zhu and Yi Yang
European Conference on Computer Vision (ECCV) 2020
arxivcode
Cross-Time and Orientation-Invariant Overhead Image Geolocalization Using Deep Local Features
Yuxin Tian, Xueqing Deng, Yi Zhu and Shawn Newsam
IEEE Winter Conference on Applications of Computer Vision (WACV) 2020
arxivcode
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing
Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng and Yi Zhu
Journal of Machine Learning Research (JMLR) 2020
arxivcode

2019

Motion-Aware Feature for Improved Video Anomaly Detection
Yi Zhu and Shawn Newsam
British Machine Vision Conference (BMVC) 2019
arxiv
Improving Semantic Segmentation via Video Propagation and Label Relaxation
Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao and Bryan Catanzaro
Computer Vision and Pattern Recognition (CVPR) 2019 Oral
arxivcode
Fine-Grained Land Use Classification at the City Scale Using Ground-Level Images
Yi Zhu, Xueqing Deng and Shawn Newsam
IEEE Transactions on Multimedia (TMM) 2019
arxiv

2018

Hidden Two-Stream Convolutional Networks for Action Recognition
Yi Zhu, Zhenzhong Lan, Shawn Newsam and Alexander G Hauptmann
Asian Conference on Computer Vision (ACCV) 2018
arxivcode
Random Temporal Skipping for Multirate Video Analysis
Yi Zhu and Shawn Newsam
Asian Conference on Computer Vision (ACCV) 2018
arxiv
Gated Transfer Network for Transfer Learning
Yi Zhu, Jia Xue, and Shawn Newsam
Asian Conference on Computer Vision (ACCV) 2018
arxivcode
What Is It Like Down There? Generating Dense Ground-Level Views and Image Features From Overhead Imagery Using Conditional Generative Adversarial Networks
Xueqing Deng, Yi Zhu, and Shawn Newsam
ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2018 Oral
arxiv
Towards Universal Representation for Unseen Action Recognition
Yi Zhu, Yang Long, Yu Guan, Shawn Newsam and Ling Shao
Computer Vision and Pattern Recognition (CVPR) 2018
arxiv
Learning Optical Flow via Dilated Networks and Occlusion Reasoning
Yi Zhu and Shawn Newsam
IEEE International Conference on Image Processing (ICIP) 2018
arxiv
Spatial Morphing Kernel Regression For Feature Interpolation
Xueqing Deng, Yi Zhu, and Shawn Newsam
IEEE International Conference on Image Processing (ICIP) 2018
arxiv

2017

Large-Scale Mapping of Human Activity using Geo-Tagged Videos
Yi Zhu, Sen Liu and Shawn Newsam
ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2017
arxiv
DenseNet for Dense Flow
Yi Zhu and Shawn Newsam
IEEE International Conference on Image Processing (ICIP) 2017 Oral
arxiv
Guided Optical Flow Learning
Yi Zhu, Zhenzhong Lan, Shawn Newsam and Alexander G Hauptmann
Computer Vision and Pattern Recognition (CVPR) 2017 Workshop
arxivcode
Deep Local Video Feature for Action Recognition
Zhenzhong Lan, Yi Zhu, Alexander G Hauptmann and Shawn Newsam
Computer Vision and Pattern Recognition (CVPR) 2017 Workshop Oral
arxiv
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning
Yi Zhu and Shawn Newsam
IEEE Winter Conference on Applications of Computer Vision (WACV) 2017 Oral
arxiv

2016

Spatio-Temporal Sentiment Hotspot Detection using Geotagged Photos
Yi Zhu and Shawn Newsam
ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2016
arxiv
Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition
Yi Zhu and Shawn Newsam
European Conference on Computer Vision (ECCV) 2016 Workshop Oral
arxiv
UC Merced Submission to the ActivityNet Challenge 2016
Yi Zhu and Shawn Newsam
CVPR 2016 ActivityNet challenge. Untrimmed Video Classification Track
arxiv

Before 2016

  • Yi Zhu and Shawn Newsam, Land Use Classification using Convolutional Neural Networks Applied to Ground-Level Images, ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2015 (Best Poster Award) arxiv

  • Yi Zhu, Lingjia Liu and Jianzhong Zhang, Joint Angle and Delay Estimation for 2D Active Broadband MIMO-OFDM Systems, IEEE Global Communications Conference (GLOBECOM) 2013 arxiv

  • Yi Zhu, Lingjia Liu, Anding Wang, Krishna Sayana and Jianzhong Zhang, DoA Estimation and Capacity Analysis for 2D Active Massive MIMO Systems, IEEE International Conference on Communications (ICC) 2013 arxiv