About me

My name is Yue Zhao* (赵岳 in simplified Chinese). I am currently a PhD student at the University of Texas at Austin, supervised by Prof. Philipp Krähenbühl. I obtained my MPhil's degree from Multimedia Laboratory at the Chinese University of Hong Kong, supervised by Prof. Dahua Lin. More previously, I got my Bachelor's degrees from Tsinghua University. My current research interests are in computer vision, particularly video analysis and understanding.

News

[Jun 11, 2024] One tech report on generic visual tokenizer is on arXiv.
[Jun 07, 2024] LaViLa (CVPR 2023) wins an Egocentric Vision (EgoVis) 2022/2023 Distinguished Paper Award!
[May 01, 2024] 🏳️‍🌈⃤ VideoPrism accepted to ICML!
[Apr 20, 2024] Our positive-congruent training paper accepted by TPAMI (finally)!
[Feb 26, 2024] One paper accepted to CVPR 2024. See you in Seattle this summer!
[Feb 20, 2024] One tech report on foundational video encoder is on arXiv. It is fueled by VIIT's captions.
[Jan 11, 2024] One tech report on video instruction tuning (VIIT) is available on arXiv.
[Dec 08, 2023] I am awarded the 2024-2025 NVIDIA Graduate Fellowship. Thank you NVIDIA!
[Jun 19, 2023] We won EPIC-Kitchens 2023 Action Recognition and Multi-Instance Retrieval Challenges! I gave a talk on the winning solution at the workshop.
[Feb 28, 2023] One paper accepted to CVPR 2023 as Highlight. See you in Vancouver this summer!
[Aug 07, 2022] One paper accepted to ECCV 2022.
[May 16, 2022] One tech report on positive-congruent training is available on arXiv.
[Mar 28, 2022] One paper accepted to CVPR 2022 as Oral.
[Aug 20, 2021] Had a wonderful summer at AWS in Seattle.
[Mar 09, 2020] Two papers accepted to CVPR 2020 (1 oral + 1 poster).
[Aug 02, 2019] The extended version of our ICCV 2017 work has been accepted by IJCV.
[Jun 18, 2019] We launch MMAction, a versatile toolbox for action understanding based on PyTorch. v0.1.0 is now online!

Selected Preprints

Image and Video Tokenization with Binary Spherical Quantization

Yue Zhao, Yuanjun Xiong, Philipp Krähenbühl
arXiv:2406.07548 [cs.CV]
[pdf][code]

Selected Publications

VideoPrism: A Foundational Visual Encoder for Video Understanding

Long Zhao*, Nitesh B. Gundavarapu*, Liangzhe Yuan*, Hao Zhou*, ..., Yue Zhao, ..., Mikhail Sirotenko+, Ting Liu+, Boqing Gong+
International Conference on Machine Learning (ICML), 2024
[arXiv] [Blog]

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2024
[arXiv][code]

Distilling Vision-Language Models on Millions of Videos

Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[arXiv] [project page]

Real-time Online Video Detection with Temporal Smoothing Transformers

Yue Zhao, Philipp Krähenbühl
European Conference on Computer Vision (ECCV), 2022
[arXiv] [code] [poster]

Revisiting Skeleton-based Action Recognition

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral, top-4.2%)
[arXiv] [code]

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

Dian Shao, Yue Zhao, Bo Dai, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020 (Oral, top-5.0%).
[arXiv][project page]

Temporal Action Detection with Structured Segment Networks

Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
International Conference on Computer Vision (ICCV), 2017.
[pdf] [arXiv] [IJCV version] [code] [project page]

Education Experience

The University of Texas at Austin, TX, USA

August 2020 -
Ph.D. in Computer Science.

The Chinese University of Hong Kong, HK SAR, China

August 2017 - July 2020
M.Phil. in Information Engineering.

Department of Electronic Engineering, Tsinghua University, Beijing, China

August 2012 - July 2016
Bachelor of Engineering, magna cum laude.

School of Economics and Management, Tsinghua University, Beijing, China

August 2013 - July 2016
Bachelor of Science (Second Degree) in Economics.

Department of Information Technology and Electrical Engineering (D-ITET), Swiss Federal Institute of Technology(ETH), Zürich, Switzerland

September 2014 - Feburary 2015
Mobility student fully funded by China Scholarship Council (CSC).

Professional Experience

Google Research, Venice, CA, USA

May 2023 - August 2023
Student Researcher

FAIR Labs, New York, NY, USA

May 2022 - August 2022
Research Scientist Intern

Amazon Web Services, Seattle, WA, USA

June 2021 - August 2021
Applied Scientist Intern

Multimedia Laboratory, The Chinese University of Hong Kong, HK SAR, China

September 2016 - August 2017, July 2015 - September 2015
Junior Research Assistant

Other projects

People tracking using RGB-D videos.

An undergraduate-level class project on people detection and tracking from RGB-D data collected by Kinect. [demo]

Miscellaneous

* For non-Chinese speakers, the pronuciation for Zhao Yue (family name coming first is preferred) is close to ['dʒau 'ju:-eh].