About me

We are hiring ML engineers, full stack engineers and applied scientists at Boson AI. If you are interested in building large models and delivering products, let’s talk! Detailed job descriptions can be found at our career page.

Before Boson AI, I was a senior applied scientist working with Dr. Mu Li and Dr. Alex Smola in Amazon AI. Before joining Amazon, I received my PhD at UC Merced under the supervision of Prof. Shawn Newsam. My research interests mainly focus on large language models, multi-modality learning, self-supervised learning and video understanding. I’m also enthusiastic about open source projects, and have contributed to AutoGluon, GluonCV and a bunch of other repositories.

I enjoy sharing by writing blog posts and recording tech videos. You can find the paper reading series at Bilibili and Youtube.

News

06/2024: We open sourced Higgs-Llama-3-70B, a chat model specially tuned for role-playing while being competitive in general-domain instruction-following and reasoning. See our blog post for more details.

05/2024: We provide a systematic and comprehensive review of efficient LLMs research in this survey from model-centric, data-centric and framework-centric perspective. All the references have been organized in this Github repo and the repo will be constantly updated.

09/2023: Two NeurIPS 2023 papers are accepted on latent diffusion model for earth science and vision-language model for open-vocabulary visual recognition. See you in New Orleans!

07/2023: Two ICCV 2023 papers are accepted on spatiotemporal representation learning and geospatial foundation models.

05/2023: One ACL 2023 paper is accepted on learning to teach.

02/2023: I will co-organize the 2nd Pixel-level Video Understanding in the Wild workshop in CVPR 2023. Come join the challenges or submit your paper!

01/2023: Two ICLR 2023 papers are accepted, on efficient finetuning for video understanding and unsupervised semantic segmentation.

12/2022: One TPAMI paper is accepted on tokenizer design in Vision Transformer.

11/2022: Our proposed multimodal data augmentation method MixGen is accepted as oral presentation in the Pretraining Large Vision and Multimodal Models Workshop at WACV 2023.

09/2022: One NeurIPS 2022 paper is accepted on EarthFormer, check out our Cuboid Attention.

05/2022: One ICML 2022 paper is accepted as Long oral on OOD in long tail.

03/2022: BigDetection benchmark is out! It has 600 object categories and contains over 3.4M training images with 36M bounding boxes. Check out our pre-trained models at here.

01/2022: One WACV 2022 paper is accepted on video action recognition (NUTA).

12/2021: One TPAMI paper is accepted on self-training for segmentation.

09/2021: Two NeurIPS 2021 papers are accepted, on anti-aliasing for vision transformer and 3D object detection.

07/2021: Three ICCV 2021 papers are accepted, on video transformer, model robustness and multi-modal video representations.

06/2021: Three workshops to be organized at ICCV 2021, video scene parsing, airborne object tracking and multi-modality learning beyond video.

03/2021: One CVPR 2021 paper on universal domain adaptation is accepted.

02/2021: I will organize the 2nd A comprehensive tutorial on video modeling in CVPR 2021. Stay tuned.

12/2020: One survey paper on video action recognition is released together with GluonCV 0.9.0.

08/2020: One WACV 2021 paper on semantic segmentation domain adaptation is accepted.

07/2020: One ECCV 2020 paper on video adversarial attack is accepted.

02/2020: I will organize a tutorial A comprehensive tutorial on video modeling in CVPR 2020.

01/2020: One JMLR 2020 paper on GluonCV and GluonNLP is accepted. Welcome to use our toolkits.

12/2019: One WACV 2020 paper on overhead image geolocalization is accepted.

07/2019: One BMVC 2019 paper on video anomaly detection is accepted.

05/2019: Successfully passed dissertation defense. Thank you, Shawn, Trevor and Ming-Hsuan. My full dissertation on video understanding can be found at here.

02/2019: One CVPR 2019 paper on street scene segmentation is accepted as Oral presentation.

12/2018: One TMM 2019 paper on large-scale land use classification is accepted.

09/2018: Three ACCV 2018 papers are accepted, on fast video classification, video tempo learning and transfer learning.

07/2018: Our work on generating ground level views from satellite imagery is covered by MIT technology review, Internet of Business, GIS Lounge, DeepTech. We also had an interview with This Week in Machine Learning & AI and the Youtube link can be found here.

04/2018: Two ICIP 2018 papers are accepted on optical flow estimation.

02/2018: One CVPR 2018 paper on zero-shot action recognition is accepted.

Yi Zhu

News