Tao Ma

I am currently a Research Scientist (TopMinds "天才少年计划") at Yinwang Ltd (IAS BU), where I lead a group of 10+ researchers and engineers on spatial understanding and intelligence.

Before that, I obtained my Ph.D. degree from Multimedia Laboratory (MMLab) of The Chinese University of Hong Kong, where I was fortunate to be supervised by Prof. Hongsheng Li and Prof. Xiaogang Wang.

My research passion lies in Spatial Understanding and Intelligence, with a recent focus on 3D/4D reconstruction and generative world models towards Physical AI.

Actively looking for self-motivated research interns who are passionate about this direction for long-term collaboration. Please contact me directly if you are interested.

Shenzhen, China

Email / Google Scholar / Github / LinkedIn

News

[2025.12] 🎉 Join Yinwang (IAS BU) through highest-tier TopMinds program.

[2025.03] One paper is accepted by CVPR 2025.

[2024.12] One paper is accepted by AAAI 2025.

[2024.09] One paper is accepted by NeurIPS 2024.

[2024.09] One paper is submitted to T-PAMI (under review).

[2024.01] VeloVox is accepted by ICRA 2024.

[2024.01] DiLu is accepted by ICLR 2024.

[2023.07] DetZero is accepted by ICCV 2023.

[2023.03] 🏆 DetZero ranks 1st place with 85.15 mAPH (L2) on Waymo 3D detection leadboard.

[2022.08] Back to school from industry and start my Ph.D. career at MMLab of CUHK.

Education

[2022 - Now] Ph.D., Electronic Engineering, The Chinese University of Hong Kong

Working Experience

[2025.12 - Now] Research Scientist, Yinwang Ltd

[2021.05 - 2022.08] Researcher, Autonomous Driving Lab, Shanghai AI Laboratory

[2020.04 - 2021.05] Researcher, Autonomous Driving Group, SenseTime

[2019.04 - 2020.04] Intern Researcher, Autonomous Driving Group, SenseTime

[2018.10 - 2019.04] Research Intern, Media Computing Group, Microsoft Research Asia

Publications & Preprints

* indicates equal contribution to the work.

Multi-modal Multi-task Perception

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
T. Ma*, H. Zhou*, Q. Huang*, X. Yang, J. Guo, B. Zhang, M. Dou, Y. Qiao, B. Shi, H. Li
Conference on Neural Information Processing Systems (NeurIPS), 2024
[arXiv / Code]
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
T. Ma, X. Yang, H. Zhou, X. Li, B. Shi, J. Liu, Y. Yang, Z. Liu, L. He, Y. Qiao, Y. Li, H. Li
International Conference on Computer Vision (ICCV), 2023
[arXiv / Code / Project Page]
DetZero++: Offboard 3D Object Detection with Multi-modal Sequential Data
T. Ma, H. Shang, H. Zhou, X. Yang, X. Li, Y. Li, B. Shi, Y. Qiao, H. Li
VeloVox: A Low-cost and Accurate 4D Object Detector with Single-frame Point Cloud of Livox LiDAR
T. Ma*, Z. Zheng*, H. Zhou, X. Cai, X. Yang, Y. Li, B. Shi, H. Li
IEEE International Conference on Robotics and Automation (ICRA), 2024
[Paper]
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
X. Chen, S. Shi, T. Ma, H. Zhou, J. Zhou, S. See, K. Cheung, H. Li
AAAI Conference on Artificial Intelligence, (AAAI), 2025
RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection
Y. Bai, B. Fei, Y. Liu, T. Ma, Y. Hou, B. Shi, Y. Li
Conference on Neural Information Processing Systems (NeurIPS), 2023
[Paper]
LogoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
X. Li, T. Ma, Y. Hou, B. Shi, Y. Yang, Y. Liu, X. Wu, Q. Chen, Y. Li, Y. Qiao, L. He
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv / Code]

Knowledge-driven Autonomous Driving System

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
X. Chen, L. Huang, T. Ma, R. Fang, S. Shi, H. Li
IEEE Conference on Computer Vision and Pattern Recognition , (CVPR), 2025
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
L. Wen*, D. Fu*, X. Li*, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, Y. Qiao
International Conference on Learning Representations (ICLR), 2024
[arXiv / Code / Project Page]
Towards Knowledge-driven Autonomous Driving
X. Li, Y. Bai, P. Cai, L. Wen, D. Fu, B. Zhang, X. Yang, X. Cai, T. Ma, J. Guo, X. Gao, M. Dou, Y. Li, B. Shi, Y. Liu, L. He, Y. Qiao
arXiv preprint, 2023
[arXiv]
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
L. Wen, X. Yang, D. Fu, X. Wang, P. Cai, X. Li, T. Ma, Y. Li, L. Xu, D. Shang, Z. Zhu, S. Sun, Y. Bai, X. Cai, M. Dou, S. Hu, B. Shi, Y. Qiao
arXiv preprint, 2023
[arXiv]

Multi-modal Learning and Generation

Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging
Y. Bai, T. Ma, L. Wang, Z. Zhang
ACM International Conference on Multimedia (ACM MM), 2022
[arXiv]
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Y. Li*, T. Ma*, Y. Bai, N. Duan, S. Wei, X. Wang
Conference on Neural Information Processing Systems (NeurIPS), 2019
[arXiv / code]
MOC-GAN: Mixing Objects and Captions to Generate Realistic Images
T. Ma, Y. Li
arXiv preprint, 2020
[arXiv]

Smart Sensor Sets for Autonomous Driving

CRLF: Automatic Calibration and Refinement based on Line Feature for LiDAR and Camera in Road Scenes
T. Ma*, Z. Liu*, G. Yan, Y. Li
arXiv preprint, 2020
[arXiv]
Perception Entropy: A Metric for Multiple Sensors Configuration Evaluation and Design
T. Ma*, Z. Liu*, Y. Li
arXiv preprint, 2020
[arXiv]
Opencalib: A multi-sensor calibration toolbox for autonomous driving
G. Yan, Z. Liu, C. Wang, C. Shi, P. Wei, X. Cai, T. Ma, Z. Liu, Z. Zhong, Y. Liu, M. Zhao, Z. Ma, Y. Li
arXiv preprint, 2022
[arXiv]

Academic Activities

Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ACM MM, AAAI.

Teaching

[2023.02 - 2023.05] TA of ENGG4512 Digitial Image Processing.

[2022.10 - 2022.12] TA of ENGG2310B Communication Systems.