I'm a second-year Ph.D. student in Computer Science (2023 - [Expected] 2028) at Yale University, supervised by Prof. Alex Wong. Previous to that, I obtained my B.Eng. in Computer Science (2019 - 2023) at ShanghaiTech University, minor in Innovation and Entrepreneurship.
Previously, I interened with Prof. Jianbo Shi at UPenn GRASP Lab, with Prof. Xuming He at ShanghaiTech PLUS Group.
I conduct research on Computer Vision, Machine Learning, and Robotics. I mainly focus on Multimodal Learning inspired by human learning. Currently, my research mainly lies in Vision-Language Models for 3D Vision (Perception, Reconstruction and Generation).
Google Scholar /
GitHub /
Yale Vision Lab
Reviewer: CVPR 2022, ICCV 2023, ACM MM 2023, ICASSP 2024, ECCV 2024, NeurIPS 2024, ICPR 2024, ACCV 2024, TCSVT (2024), ICLR 2025, ICASSP 2025
Email: ziyao.zeng@@yale.edu
I am actively looking for research internship next summer (2025), and all kinds of reviewing opportunities. Feel free to drop me an email if you are interested!
Feel free to reach out for collaborations, entrepreneurship, questions, or to connect on WeChat~
|
|
|
Website format from Xingyi Zhou.
Last updated Sept. 2024
Research Overview
Since the age of 13, deeply touched by Foundation by Isaac Asimov, my dream has been to create an AI who can think like humans (just like the dream of Prof. Jürgen Schmidhuber). When humans perceive the surrounding environment, we see (2D vision), touch (tactile), wander (3D vision), and hear (audio) simultaneously to understand (neural signal) and interpret (language). Therefore, I conduct research on Multimodal Embodied AI. My research vision is to empower embodied AI with multimodal sensing, and can leverage pre-trained multimodal representations, to interact with the physical world as humans do.
Specifically, I conduct research on Language for 3D Vision (Perception, Reconstruction, and Generation). Given one language description, one can easily imagine what this scene could look like, so language could easily be interpreted as a condition to generate and manipulate 3D scenes in a controllable manner. On the other hand, language description could serve as a prior that is specific for a given scene to enhance 3D reconstruction by resolving scale ambiguity. Language description itself can also be used to tell ordinal relationships between different objects, so that to infer their depth. The use of language descriptions, which is invariant to nuisance variability (e.g., illumination, occlusion, viewpoints in images), provides extra robust features to assist models’ generalization. Practically, language is arguably cheaper to obtain than range measurements (e.g., from lidar, radar).
Publications
(* indicates equal contributions)
2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang*, Chao Feng*, Daniel Wang*, Tianye Wang,
Ziyao Zeng, Zhiyang Xu, Hyoungseob Park, Pengliang Ji, Hanbin Zhao, Yuanning Li, Alex Wong
arXiv technical report, 2024
2023
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang*, Chao Feng*, Ziyang Chen*, Hyoungseob Park, Daniel Wang, Yiming Dou,
Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong
CVPR 2024
project page,
code
2022
2021
Twitter Emotion Classification
Yiteng Xu*,
Ziyao Zeng*, Jirui Shi*, Shaoxun Wu*, Peiyan Gu*
Final Project of CS181 Artificial Intelligence, 2021 Fall, ShanghaiTech University
code
2020
My Adventure
I am a big fan of adventure who is enthusiastic about cycling, hiking and mountain climbing.
"Being a scientist and an adventurer has a lot of similarities, they both want to achieve something that hasn't been achieved before."
In 2015, I have hiked across Lake District of England in 1 week.
In 2019, I have cycled cross Tibet for 28 days from Chengdu to Lhasa for 2135 km.
In 2022, I have cycled cross Tibet and Xinjiang for 1 month from Ürümqi to Lhasa for 5000 km, with about 2000 km cycling at an average altitude of 4500 m.
In 2023, I hiked in Yubeng Village for 5 days, across an altitude between 3000 m to 4300 m.
My hiking video in Ice Lake, 3700 m altitude
Link
My hiking video in God Lake, 4300 m altitude:
Link
In 2023, I hiked in Tiger Leaping Gorge High Road for 2 days.
In 2023, I cycle around Qinghai Lake more than 350 km for 4 days .
In 2024, I got my diving certificate in the Red Sea.
In 2024, sucessfully climbed to the top of Mount Yuzhu, 6178 meters in altitude.
My other photos regarding adventures.
Other things about myself
I'm an amateur Unity game developer, previous supervised by Brain Cox, screenshots of my previous works have been shown below.
Snow Ranger
Darkside
I'm also an amateur composer, conducter, pianist, trombone player, guitar player, and Chinese folk signer.
I have been playing Tarot since 2014, familiar with Thoth and Flower Shadow, dedicated to combining Tarot with modern psychology to serve as a tool for consciousness.
I'm excited about all kinds of voluntary especially those related to environment protection.
I believe it's our instinctive duty to preserve the integrity of the earth (at least until we could immigrate to other planets).
Currently, I'm volunteering at WWF-China and Greenpeace