Ziyao (Adonis) Zeng 曾子尧

Ph.D. Student in Computer Science
Yale University

I'm a third-year Ph.D. student in Computer Science (2023 - [Expected] 2027) at Yale University. Previous to that, I obtained my B.Eng. in Computer Science (2019 - 2023) at ShanghaiTech University, minor in Innovation and Entrepreneurship.

I conduct research on Multimodal Learning inspired by human cognition towards Multimodal Agentic AI, especially vision-language models and spatial understanding. My line of work in "Language for 3D Vision" explores how vision-language models can perceive and understand the world like humans do. My expertise lies in vision-language models, large-language models, diffusion models, multimodal learning, and 2D/3D computer vision.

Collaborators

I was also fortunate to intern at Shanghai AI Lab and UISEE during my undergraduate studies.

Reviewer Services

CVPR (2022, 2025 [Outstanding Reviewer], 2026), ICCV (2023, 2025), ECCV (2024), ICML (2025), ICLR (2025, 2026), NeurIPS (2024, 2025), ACM MM (2023, 2025), AISTATS (2024, 2025), ICASSP (2024, 2025), TCSVT (journal)

Ziyao Zeng

Research

Multimodal Learning towards Multimodal Agentic AI

Since the age of 13, deeply touched by Foundation by Isaac Asimov, my dream has been to create an AI who can think like humans (just like the dream of Prof. Jürgen Schmidhuber). When humans perceive the surrounding environment, we see (2D vision), hear (audio), feel (tactile), and interact (3D vision) simultaneously to reason (language) and understand (neural signal) the world. Therefore, I conduct research on Multimodal Learning towards Multimodal Agentic AI. My research vision is to empower agentic AI with multimodal sensing and multimodal representations, enabling it to perceive, reason, understand, and interact with both the digital and physical worlds like humans do.

Specifically, my line of work in "Language for 3D Vision" (DepthCLIP, PointCLIPv2, WorDepth, RSA, Iris) explores how vision-language models can perceive and understand the world like humans do.

Publications

Selected publications are highlighted.

(* indicates equal contributions)

2026

RuleSmith
RuleSmith: Multi-Agent LLMs for Automated Game Balancing

Ziyao Zeng, Chen Liu, Tianyu Liu, Hao Wang, Xiatao Sun, Fengyu Yang, Xiaofeng Liu, Zhiwen Fan

arXiv technical report, 2026 | project page, code

2025

Coffee
Coffee: Controllable Diffusion Fine-tuning

Ziyao Zeng, Jingcheng Ni, Ruyi Liu, Alex Wong

arXiv technical report, 2025

ETA
ETA: Energy-based Test-time Adaptation for Depth Completion

Younjoon Chung*, Hyoungseob Park*, Patrick Rim*, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong

ICCV 2025

ProtoDepth
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes

Patrick Rim, Hyoungseob Park, S. Gangopadhyay, Ziyao Zeng, Younjoon Chung, Alex Wong

CVPR 2025 | project page, code

HOMER
HOMER: Homography-Based Efficient Multi-view 3D Object Removal

Jingcheng Ni*, Weiguang Zhao*, Daniel Wang, Ziyao Zeng, Chenyu You, Alex Wong, Kaizhu Huang

arXiv technical report, 2025

2024

Iris
Iris: Integrating Language into Diffusion-based Monocular Depth Estimation

Ziyao Zeng*, Jingcheng Ni*, Daniel Wang, Patrick Rim, Younjoon Chung, Fengyu Yang, Byung-Woo Hong, Alex Wong

arXiv technical report, 2024 | NECV 2025 Oral Presentation (18.75%)

RSA
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions

Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, Dong Lao, Byung-Woo Hong, Alex Wong

NeurIPS 2024 | code

NeuroBind
NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Fengyu Yang*, Chao Feng*, Daniel Wang*, Tianye Wang, Ziyao Zeng, Zhiyang Xu, Hyoungseob Park, Pengliang Ji, Hanbin Zhao, Yuanning Li, Alex Wong

arXiv technical report, 2024

2023

WorDepth
WorDepth: Variational Language Prior for Monocular Depth Estimation

Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Yangchao Wu, Stefano Soatto, Byung-Woo Hong, Dong Lao, Alex Wong

CVPR 2024 | code

UniTouch
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Fengyu Yang*, Chao Feng*, Ziyang Chen*, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong

CVPR 2024 | project page, code

2022

PointCLIPv2
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

Xiangyang Zhu*, Renrui Zhang*, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

ICCV 2023 | code

iQuery
iQuery: Instruments as Queries for Audio-Visual Sound Separation

Jiaben Chen, Renrui Zhang, Dongze Lian, Jiaqi Yang, Ziyao Zeng, Jianbo Shi

CVPR 2023 | code

DepthCLIP
Can Language Understand Depth?

Renrui Zhang*, Ziyao Zeng*, Ziyu Guo, Yafeng Li

ACM Multimedia 2022, accepted as Brave New Idea (Accepte Rate<=12.5%) | code

2021

DSPoint
DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion

Renrui Zhang*, Ziyao Zeng*, Ziyu Guo, Xinben Gao, Kexue Fu, Jianbo Shi

SMC 2023 | code

VT-CLIP
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Yafeng Li, Guangnan Zhang

arXiv technical report, 2021

Twitter Emotion
Twitter Emotion Classification

Yiteng Xu*, Ziyao Zeng*, Jirui Shi*, Shaoxun Wu*, Peiyan Gu*

Final Project of CS181 Artificial Intelligence, 2021 Fall, ShanghaiTech University | code

Generalized DUQ
Generalized DUQ: Generalized Deterministic Uncertainty Quantification

Zhitong Gao*, Ziyao Zeng*

Final Project of CS282 Machine Learning, 2021 Spring, ShanghaiTech University

2020

Noisy Labels
Seek Common while Shelving Differences: A New Way for dealing with Noisy Labels

Zhitong Gao*, Ziyao Zeng*

Final Project of CS280 Deep Learning, 2020 Fall, ShanghaiTech University

Others

Unity Game Development

I'm an amateur Unity game developer, previous supervised by Brain Cox, screenshots of my previous works have been shown below.

Snow Ranger

Snow Ranger

Music & Arts

I'm also an amateur pianist, trombone player, guitar player, and Chinese folk singer.

I have been playing Tarot since 2014, dedicating to combining Tarot with modern psychology to serve as a tool for consciousness.

Volunteering

Previously, I volunteered at WWF-China and Greenpeace

Page views