Tao Ma

I am currently a 3rd-year Ph.D. candidate at Multimedia Laboratory (MMLab) of The Chinese University of Hong Kong, supervised by Prof. Hongsheng Li and Prof. Xiaogang Wang.

My research focuses on scene perception and understanding, and also encompasses end-to-end and knowledge-driven intelligent driving and embodied-AI systems.

Better research, better life. I love cycling sports, please feel free to contact me if you have any questions or similar interests.

Looking for full-time opportunities in industry, please contact me if you are interested in collaboration or recruitment.

SHB 310, CUHK, Hong Kong SAR, China

Email / Google Scholar / Github / LinkedIn

News

[2025.03] One paper is accepted by CVPR 2025.

[2024.12] One paper is accepted by AAAI 2025.

[2024.09] One paper is accepted by NeurIPS 2024.

[2024.09] One paper is submitted to T-PAMI (under review).

[2024.01] VeloVox is accepted by ICRA 2024.

[2024.01] DiLu is accepted by ICLR 2024.

[2023.07] DetZero is accepted by ICCV 2023.

[2023.03] 🏆 DetZero ranks 1st place with 85.15 mAPH (L2) on Waymo 3D detection leadboard.

[2022.08] Back to school from industry and start my Ph.D. career at MMLab of CUHK.

Education

[2022 - Now] Ph.D., Electronic Engineering, The Chinese University of Hong Kong

Working Experience

[2021.05 - 2022.08] Researcher, Autonomous Driving Lab, Shanghai AI Laboratory

[2020.04 - 2021.05] Researcher, Autonomous Driving Group, SenseTime

[2019.04 - 2020.04] Intern Researcher, Autonomous Driving Group, SenseTime

[2018.10 - 2019.04] Research Intern, Media Computing Group, Microsoft Research Asia

Publications & Preprints

* indicates equal contribution to the work.

Multi-modal Multi-task Perception

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
T. Ma*, H. Zhou*, Q. Huang*, X. Yang, J. Guo, B. Zhang, M. Dou, Y. Qiao, B. Shi, H. Li
Conference on Neural Information Processing Systems (NeurIPS), 2024
[arXiv / Code]
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
T. Ma, X. Yang, H. Zhou, X. Li, B. Shi, J. Liu, Y. Yang, Z. Liu, L. He, Y. Qiao, Y. Li, H. Li
International Conference on Computer Vision (ICCV), 2023
[arXiv / Code / Project Page]
DetZero++: Offboard 3D Object Detection with Multi-modal Sequential Data
T. Ma, H. Shang, H. Zhou, X. Yang, X. Li, Y. Li, B. Shi, Y. Qiao, H. Li
VeloVox: A Low-cost and Accurate 4D Object Detector with Single-frame Point Cloud of Livox LiDAR
T. Ma*, Z. Zheng*, H. Zhou, X. Cai, X. Yang, Y. Li, B. Shi, H. Li
IEEE International Conference on Robotics and Automation (ICRA), 2024
[Paper]
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
X. Chen, S. Shi, T. Ma, H. Zhou, J. Zhou, S. See, K. Cheung, H. Li
AAAI Conference on Artificial Intelligence, (AAAI), 2025
RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection
Y. Bai, B. Fei, Y. Liu, T. Ma, Y. Hou, B. Shi, Y. Li
Conference on Neural Information Processing Systems (NeurIPS), 2023
[Paper]
LogoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
X. Li, T. Ma, Y. Hou, B. Shi, Y. Yang, Y. Liu, X. Wu, Q. Chen, Y. Li, Y. Qiao, L. He
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv / Code]

Knowledge-driven Autonomous Driving System

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
X. Chen, L. Huang, T. Ma, R. Fang, S. Shi, H. Li
IEEE Conference on Computer Vision and Pattern Recognition , (CVPR), 2025
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
L. Wen*, D. Fu*, X. Li*, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, Y. Qiao
International Conference on Learning Representations (ICLR), 2024
[arXiv / Code / Project Page]
Towards Knowledge-driven Autonomous Driving
X. Li, Y. Bai, P. Cai, L. Wen, D. Fu, B. Zhang, X. Yang, X. Cai, T. Ma, J. Guo, X. Gao, M. Dou, Y. Li, B. Shi, Y. Liu, L. He, Y. Qiao
arXiv preprint, 2023
[arXiv]
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
L. Wen, X. Yang, D. Fu, X. Wang, P. Cai, X. Li, T. Ma, Y. Li, L. Xu, D. Shang, Z. Zhu, S. Sun, Y. Bai, X. Cai, M. Dou, S. Hu, B. Shi, Y. Qiao
arXiv preprint, 2023
[arXiv]

Multi-modal Learning and Generation

Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging
Y. Bai, T. Ma, L. Wang, Z. Zhang
ACM International Conference on Multimedia (ACM MM), 2022
[arXiv]
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Y. Li*, T. Ma*, Y. Bai, N. Duan, S. Wei, X. Wang
Conference on Neural Information Processing Systems (NeurIPS), 2019
[arXiv / code]
MOC-GAN: Mixing Objects and Captions to Generate Realistic Images
T. Ma, Y. Li
arXiv preprint, 2020
[arXiv]

Smart Sensor Sets for Autonomous Driving

CRLF: Automatic Calibration and Refinement based on Line Feature for LiDAR and Camera in Road Scenes
T. Ma*, Z. Liu*, G. Yan, Y. Li
arXiv preprint, 2020
[arXiv]
Perception Entropy: A Metric for Multiple Sensors Configuration Evaluation and Design
T. Ma*, Z. Liu*, Y. Li
arXiv preprint, 2020
[arXiv]
Opencalib: A multi-sensor calibration toolbox for autonomous driving
G. Yan, Z. Liu, C. Wang, C. Shi, P. Wei, X. Cai, T. Ma, Z. Liu, Z. Zhong, Y. Liu, M. Zhao, Z. Ma, Y. Li
arXiv preprint, 2022
[arXiv]

Academic Activities

Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ACM MM, AAAI.

Teaching

[2023.02 - 2023.05] TA of ENGG4512 Digitial Image Processing.

[2022.10 - 2022.12] TA of ENGG2310B Communication Systems.