Tiancheng (Tony) Zhao

3rd Floor, Building 2, Eastcom Technology Park

66 Dongxin Avenue

Hangzhou, Zhejiang, 310053

Welcome! I am Tiancheng (Tony) Zhao, a principal researcher at Binjiang Institute of Zhejiang University, and I also founded the Om Artificial Intelligence Laboratory (Om AI Lab). Our goal at Om AI Lab is to conduct frontier open multimodal AGI research that could benefit the community to build the next-gen multimodal agents that reshape our work and life.

I received my Ph.D. in Computer Science from Carnegie Mellon University , Language Technologies Institute, advised by Prof. Maxine Eskenazi . My PhD dissertation Learning to Converse With Latent Actions is one of the pioneered work in end-to-end generative models for conversational agents, supervised by Prof. Maxine Eskenazi , Prof. Louis-Philippe Morency , Prof. William W. Cohen and Dr. Dilek Hakkani-Tur . Prior to that, I obtained my bachelor degree in Electrical Engineering from University of California, Los Angeles with Summa Cum Laude and worked on speech signal processing, advised by Prof. Abeer Alwan .

My current research focus is multimodal foundation models and agents. My goal is to develop computational building blocks that connect the machine with people by innovating with the latest deep learning methods and practical system implementations.The technical challenges of this effort includes:

Multimodal Models: build vision-and-language foundation models to establish cross-modal representations that better recognize or generate high-dimensional multimodal data.
Learning to Learn: develop methods enabling computers to learn new skills effectively (few-shot/zero-shot) from a variety types of training signals, e.g. supervised labels, rewards, meta learning, etc.
AI Agents: build multimodal agentic systems that can understand the open world, reason over complex instructions and master decision-making to accomplish real-world tasks.

You can find more details about our work at Google Scholar and GitHub .

experience

2021 - Today	Research Scientist, Binjiang Institute of Zhejiang University
2019 - 2021	Co-founder and Chief Scientist, Soco Inc.
2016 - 2019	Ph.D. in Computer Science, Carnegie Mellon University
2014 - 2016	M.S. in Computer Science, Carnegie Mellon University
2010 - 2014	B.S. in Electrical Engineering, University of California, Los Angeles

awards

2021	National Breakthrough Technology Award by Ministry of Science and Technology.
2018	Microsoft Research Best & Brightest PhD
2018	Best Paper Award at SGIDIAL 2018
2016	Best Paper Nomination Award at SGIDIAL 2016
2014	Outstanding Bachelor of Science Award, UCLA EE Class of 2014. (Top 1 graduate in the department)

selected projects

Report

Vlm-r1: A stable and generalizable r1-style large vision-language model

Shen, Haozhan, Liu, Peng, Li, Jingcheng, Fang, Chunxin, Ma, Yibo, Liao, Jiajia, Shen, Qiaoli, Zhang, Zilun, Zhao, Kangjia, Zhang, Qianqian, Xu, Ruochen, and Zhao, Tiancheng

arXiv preprint arXiv:2504.07615 2025

PDF Code
EMNLP

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Zhang, Lu, Zhao, Tiancheng, Ying, Heting, Ma, Yibo, and Lee, Kyusong

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 2024

PDF Code
Report

OmChat: A recipe to train multimodal language models with strong long context and video understanding

Zhao, Tiancheng, Zhang, Qianqian, Lee, Kyusong, Liu, Peng, Zhang, Lu, Fang, Chunxin, Liao, Jiajia, Jiang, Kelei, Ma, Yibo, and Xu, Ruochen

arXiv preprint arXiv:2407.04923 2024

PDF Code
IET-CV

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head (OmDet-Turbo)

Zhao, Tiancheng, Liu, Peng, He, Xuan, Zhang, Lu, and Lee, Kyusong

arXiv preprint arXiv:2403.06892 2024

PDF Code
NAACL

SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

Zhao, Tiancheng, Lu, Xiaopeng, and Lee, Kyusong

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021

PDF Code
SIGDIAL

Zero-Shot Dialog Generation with Cross-Domain Latent Actions

Zhao, Tiancheng, and Eskenazi, Maxine

In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue 2018

PDF Code Slides
ACL

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

Zhao, Tiancheng, Zhao, Ran, and Eskenazi, Maxine

In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017

PDF Code
SIGDIAL

Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

Zhao, Tiancheng, and Eskenazi, Maxine

In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2016

PDF Slides