About me

My name is Yue Zhao* (赵岳 in simplified Chinese). I am currently a PhD student at the University of Texas at Austin, supervised by Prof. Philipp Krähenbühl. I obtained my MPhil's degree from Multimedia Laboratory at the Chinese University of Hong Kong, supervised by Prof. Dahua Lin. More previously, I got my Bachelor's degrees from Tsinghua University. My current research interests are in computer vision, particularly video analysis and understanding.

News

[Feb 26, 2024] One paper accepted to CVPR 2024. See you in Seattle this summer!
[Feb 20, 2024] One tech report on foundational video encoder is on arXiv. It is fueled by VIIT's captions.
[Jan 11, 2024] One tech report on video instruction tuning (VIIT) is available on arXiv.
[Dec 08, 2023] I am awarded the 2024-2025 NVIDIA Graduate Fellowship. Thank you NVIDIA!
[Jun 19, 2023] We won EPIC-Kitchens 2023 Action Recognition and Multi-Instance Retrieval Challenges! I gave a talk on the winning solution at the workshop.
[Feb 28, 2023] One paper accepted to CVPR 2023 as Highlight. See you in Vancouver this summer!
[Aug 07, 2022] One paper accepted to ECCV 2022.
[May 16, 2022] One tech report on positive-congruent training is available on arXiv.
[Mar 28, 2022] One paper accepted to CVPR 2022 as Oral.
[Aug 20, 2021] Had a wonderful summer at AWS in Seattle.
[Mar 09, 2020] Two papers accepted to CVPR 2020 (1 oral + 1 poster).
[Aug 02, 2019] The extended version of our ICCV 2017 work has been accepted by IJCV.
[Jun 18, 2019] We launch MMAction, a versatile toolbox for action understanding based on PyTorch. v0.1.0 is now online!

Education Experience

The University of Texas at Austin, TX, USA

August 2020 -
Ph.D. in Computer Science.

The Chinese University of Hong Kong, HK SAR, China

August 2017 - July 2020
M.Phil. in Information Engineering.

Department of Electronic Engineering, Tsinghua University, Beijing, China

August 2012 - July 2016
Bachelor of Engineering, magna cum laude.

School of Economics and Management, Tsinghua University, Beijing, China

August 2013 - July 2016
Bachelor of Science (Second Degree) in Economics.

Department of Information Technology and Electrical Engineering (D-ITET), Swiss Federal Institute of Technology(ETH), Zürich, Switzerland

September 2014 - Feburary 2015
Mobility student fully funded by China Scholarship Council (CSC).

Professional Experience

Google Research, Venice, CA, USA

May 2023 - August 2023
Student Researcher

FAIR Labs, New York, NY, USA

May 2022 - August 2022
Research Scientist Intern

Amazon Web Services, Seattle, WA, USA

June 2021 - August 2021
Applied Scientist Intern

Multimedia Laboratory, The Chinese University of Hong Kong, HK SAR, China

September 2016 - August 2017, July 2015 - September 2015
Junior Research Assistant

Publications

Distilling Vision-Language Models on Millions of Videos

Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[arXiv] [project page]

LEAP:Liberate Sprase-View 3D Modeling from Camera Poses

Hanwen Jiang, Zhenyu Jiang, Yue Zhao, Qixing Huang
International Conference on Learning Representations (ICLR), 2024
[arXiv] [project page] [code]

Real-time Online Video Detection with Temporal Smoothing Transformers

Yue Zhao, Philipp Krähenbühl
European Conference on Computer Vision (ECCV), 2022
[arXiv] [code] [poster]

Revisiting Skeleton-based Action Recognition

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral, top-4.2%)
[arXiv] [code]

Omni-sourced Webly-supervised Learning for Video Recognition

Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
European Conference on Computer Vision (ECCV), 2020
Best single model on Kinetics-400 val (83.6%).
[arXiv] [model] [media]

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020 (Oral, top-5.0%).
[arXiv][project page]

Intra- and Inter-Action Understanding via Temporal Action Parsing

Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[arXiv][project page]

Temporal Action Detection with Structured Segment Networks

Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
International Journal on Computer Vision (IJCV), 2020.
[Springer link] [ICCV17 Version] [code] [project page]

Trajectory Convolution for Action Recognition

Yue Zhao, Yuanjun Xiong, Dahua Lin
Thirty-second Conference on Neural Information Processing Systems (NeurIPS), 2018.
[pdf][code]

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries

Dian Shao*, Yu Xiong*, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin
European Conference on Computer Vision (ECCV), 2018.
[pdf] [project page]

Recognizing Actions by Disentangling Components of Dynamics

Yue Zhao, Yuanjun Xiong, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[pdf] [supplementary]

Temporal Action Detection with Structured Segment Networks

Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
International Conference on Computer Vision (ICCV), 2017.
[pdf] [arXiv] [IJCV version] [code] [project page]

Recurrent Convolutional Neural Network for Speech Processing

Yue Zhao, Xingyu Jin, Xiaolin Hu
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
[pdf] [code] [poster]

Heart Rate Estimation in Photoplethysmogram Signals using Nonlinear Model-Based Preprocessing

Federico Wadehn, Yue Zhao, Hans-Andrea Loeliger
IEEE Computing in Cardiology (CinC), 2015.
[pdf]

Manuscripts & Preprints

VideoPrism: A Foundational Visual Encoder for Video Understanding

Long Zhao*, Nitesh B. Gundavarapu*, Liangzhe Yuan*, Hao Zhou*, ..., Yue Zhao, ..., Mikhail Sirotenko+, Ting Liu+, Boqing Gong+
arXiv:2402.13217 [cs.CV]
[pdf] [Blog]

Training a Large Video Model on a Single Machine in a Day

Yue Zhao, Philipp Krähenbühl
arXiv:2309.16669 [cs.CV]
Winner of the EPIC-Kitchens 2023 Action Recognition and Multi-Instance Retrieval Challenges at CVPR 2023.
Invited talk at 11th EPIC Workshop at CVPR 2023
[pdf][code]

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto
arXiv:2205.06265 [cs.LG]
[pdf][code]

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin
ECCV Workshop on AI for Creative Video Editing and Understanding (CVEU), 2022
[pdf][dataset]

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2017

Yue Zhao, Bowen Zhang, Zhirong Wu, Shuo Yang, Lei Zhou, Sijie Yan, Limin Wang, Yuanjun Xiong, Yali Wang, Dahua Lin, Yu Qiao, Xiaoou Tang
CVPR Workshop on ActivityNet Large Scale Activity Recognition Challenge, 2017
[pdf]

A Pursuit of Temporal Accuracy in General Activity Detection

Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang
arXiv:1703.02716v1 [cs.CV]
[pdf] [code]

Other projects

People tracking using RGB-D videos.

An undergraduate-level class project on people detection and tracking from RGB-D data collected by Kinect. [demo]

Miscellaneous

* For non-Chinese speakers, the pronuciation for Zhao Yue (family name coming first is preferred) is close to ['dʒau 'ju:-eh].