About me
My name is Yue Zhao* (赵岳 in simplified Chinese). I am currently a PhD student at the University of Texas at Austin, supervised by Prof. Philipp Krähenbühl. I obtained my MPhil's degree from Multimedia Laboratory at the Chinese University of Hong Kong, supervised by Prof. Dahua Lin. More previously, I got my Bachelor's degrees from Tsinghua University. My current research interests are in computer vision, particularly video analysis and understanding.
News
[Feb 26, 2024] One paper accepted to CVPR 2024. See you in Seattle this summer!
[Feb 20, 2024] One tech report on foundational video encoder is on arXiv. It is fueled by VIIT's captions.
[Jan 11, 2024] One tech report on video instruction tuning (VIIT) is available on arXiv.
[Dec 08, 2023] I am awarded the 2024-2025 NVIDIA Graduate Fellowship. Thank you NVIDIA!
[Jun 19, 2023] We won EPIC-Kitchens 2023 Action Recognition and Multi-Instance Retrieval Challenges! I gave a talk on the winning solution at the workshop.
[Feb 28, 2023] One paper accepted to CVPR 2023 as Highlight. See you in Vancouver this summer!
[Aug 07, 2022] One paper accepted to ECCV 2022.
[May 16, 2022] One tech report on positive-congruent training is available on arXiv.
[Mar 28, 2022] One paper accepted to CVPR 2022 as Oral.
[Aug 20, 2021] Had a wonderful summer at AWS in Seattle.
[Mar 09, 2020] Two papers accepted to CVPR 2020 (1 oral + 1 poster).
[Aug 02, 2019] The extended version of our ICCV 2017 work has been accepted by IJCV.
[Jun 18, 2019] We launch MMAction, a versatile toolbox for action understanding based on PyTorch. v0.1.0 is now online!
Education Experience
The University of Texas at Austin, TX, USA
August 2020 -
Ph.D. in Computer Science.
The Chinese University of Hong Kong, HK SAR, China
August 2017 - July 2020
M.Phil. in Information Engineering.
Israel Institute of Technology (Technion), Haifa, Israel
July 2016 - August 2016
Visiting Student of Summer School of Engineering and Science, fully funded by Technion.
Department of Electronic Engineering, Tsinghua University, Beijing, China
August 2012 - July 2016
Bachelor of Engineering, magna cum laude.
School of Economics and Management, Tsinghua University, Beijing, China
August 2013 - July 2016
Bachelor of Science (Second Degree) in Economics.
Department of Information Technology and Electrical Engineering (D-ITET), Swiss Federal Institute of Technology(ETH), Zürich, Switzerland
September 2014 - Feburary 2015
Mobility student fully funded by China Scholarship Council (CSC).
Professional Experience
Publications
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[arXiv]
[project page]
LEAP:Liberate Sprase-View 3D Modeling from Camera Poses
Hanwen Jiang, Zhenyu Jiang, Yue Zhao, Qixing Huang
International Conference on Learning Representations (ICLR), 2024
[arXiv]
[project page]
[code]
Learning Video Representations from Large Language Models
Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
(Highlight, top-2.5%)
Invited talk at Joint International 3rd Ego4D and 11th EPIC Workshop
[arXiv]
[code]
[project page]
[demo]
[colab]
[video (~8min)]
[poster]
Real-time Online Video Detection with Temporal Smoothing Transformers
Yue Zhao, Philipp Krähenbühl
European Conference on Computer Vision (ECCV), 2022
[arXiv]
[code]
[poster]
Revisiting Skeleton-based Action Recognition
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral, top-4.2%)
[arXiv]
[code]
Omni-sourced Webly-supervised Learning for Video Recognition
Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
European Conference on Computer Vision (ECCV), 2020
Best single model on Kinetics-400 val (83.6%).
[arXiv]
[model]
[media]
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020 (Oral, top-5.0%).
[arXiv][project page]
Intra- and Inter-Action Understanding via Temporal Action Parsing
Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[arXiv][project page]
Temporal Action Detection with Structured Segment Networks
Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
International Journal on Computer Vision (IJCV), 2020.
[Springer link]
[ICCV17 Version]
[code]
[project page]
Trajectory Convolution for Action Recognition
Yue Zhao, Yuanjun Xiong, Dahua Lin
Thirty-second Conference on Neural Information Processing Systems (NeurIPS), 2018.
[pdf][code]
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries
Dian Shao*, Yu Xiong*, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin
European Conference on Computer Vision (ECCV), 2018.
[pdf]
[project page]
Recognizing Actions by Disentangling Components of Dynamics
Yue Zhao, Yuanjun Xiong, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[pdf]
[supplementary]
Temporal Action Detection with Structured Segment Networks
Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
International Conference on Computer Vision (ICCV), 2017.
[pdf]
[arXiv]
[IJCV version]
[code]
[project page]
Recurrent Convolutional Neural Network for Speech Processing
Yue Zhao, Xingyu Jin, Xiaolin Hu
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
[pdf]
[code]
[poster]
Heart Rate Estimation in Photoplethysmogram Signals using Nonlinear Model-Based Preprocessing
Federico Wadehn, Yue Zhao, Hans-Andrea Loeliger
IEEE Computing in Cardiology (CinC), 2015.
[pdf]
Manuscripts & Preprints
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao*, Nitesh B. Gundavarapu*, Liangzhe Yuan*, Hao Zhou*, ..., Yue Zhao, ..., Mikhail Sirotenko+, Ting Liu+, Boqing Gong+
arXiv:2402.13217 [cs.CV]
[pdf]
[Blog]
Training a Large Video Model on a Single Machine in a Day
Yue Zhao, Philipp Krähenbühl
arXiv:2309.16669 [cs.CV]
Winner of the EPIC-Kitchens 2023 Action Recognition and Multi-Instance Retrieval Challenges at CVPR 2023.
Invited talk at 11th EPIC Workshop at CVPR 2023
[pdf][code]
ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto
arXiv:2205.06265 [cs.LG]
[pdf][code]
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin
ECCV Workshop on AI for Creative Video Editing and Understanding (CVEU), 2022
[pdf][dataset]
CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2017
Yue Zhao, Bowen Zhang, Zhirong Wu, Shuo Yang, Lei Zhou, Sijie Yan, Limin Wang, Yuanjun Xiong, Yali Wang, Dahua Lin, Yu Qiao, Xiaoou Tang
CVPR Workshop on ActivityNet Large Scale Activity Recognition Challenge, 2017
[pdf]
Other projects
People tracking using RGB-D videos.
An undergraduate-level class project on people detection and tracking from RGB-D data collected by Kinect. [demo]