Tao Ma
I am currently a 3rd-year Ph.D. candidate at Multimedia Laboratory (MMLab) of The Chinese University of Hong Kong, supervised by Prof. Hongsheng Li and Prof. Xiaogang Wang. My research focuses on scene perception and understanding in autonomous driving, which mainly includes onboard and offboard 3D object detection with point cloud and image data.
Better research, better life. I love cycling sports, please feel free to contact me if you have any questions or similar interests.
SHB 310, CUHK, Hong Kong SAR, China
Email  / 
Google Scholar  / 
Github  / 
LinkedIn
|
|
News
[2024.09] 🚀 One paper is accepted by NeurIPS 2024.
[2024.09] One paper is submitted to T-PAMI (under review).
[2024.01] VeloVox is accepted by ICRA 2024.
[2024.01] DiLu is accepted by ICLR 2024.
[2023.07] DetZero is accepted by ICCV 2023.
[2023.03] 🏆 DetZero ranks 1st place with 85.15 mAPH (L2) on Waymo 3D detection leadboard.
[2022.08] Back to school from industry and start my Ph.D. career at MMLab of CUHK.
|
Working Experience
[2021.05 - 2022.08] Researcher, Autonomous Driving Lab, Shanghai AI Laboratory
[2020.04 - 2021.05] Researcher, Autonomous Driving Group, SenseTime
[2019.04 - 2020.04] Intern Researcher, Autonomous Driving Group, SenseTime
[2018.10 - 2019.04] Research Intern, Media Computing Group, Microsoft Research Asia
|
Publications
* indicates equal contribution to the work.
|
-
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
T. Ma*, H. Zhou*, Q. Huang*, X. Yang, J. Guo, B. Zhang, M. Dou, Y. Qiao, B. Shi, H. Li
Conference on Neural Information Processing Systems (NeurIPS), 2024
[arXiv / Code]
-
VeloVox: A Low-cost and Accurate 4D Object Detector with Single-frame Point Cloud of Livox LiDAR
T. Ma*, Z. Zheng*, H. Zhou, X. Cai, X. Yang, Y. Li, B. Shi, H. Li
IEEE International Conference on Robotics and Automation (ICRA), 2024
[arXiv]
-
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
L. Wen*, D. Fu*, X. Li*, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, Y. Qiao
International Conference on Learning Representations (ICLR), 2024
[arXiv / Code / Project Page]
-
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
T. Ma, X. Yang, H. Zhou, X. Li, B. Shi, J. Liu, Y. Yang, Z. Liu, L. He, Y. Qiao, Y. Li, H. Li
International Conference on Computer Vision (ICCV), 2023
[arXiv /
Code / Project Page]
-
RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection
Y. Bai, B. Fei, Y. Liu, T. Ma, Y. Hou, B. Shi, Y. Li
Conference on Neural Information Processing Systems (NeurIPS), 2023
[Paper]
-
LogoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
X. Li, T. Ma, Y. Hou, B. Shi, Y. Yang, Y. Liu, X. Wu, Q. Chen, Y. Li, Y. Qiao, L. He
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv / Code]
-
Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging
Y. Bai, T. Ma, L. Wang, Z. Zhang
ACM International Conference on Multimedia (ACM MM), 2022
[arXiv]
-
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Y. Li*, T. Ma*, Y. Bai, N. Duan, S. Wei, X. Wang
Conference on Neural Information Processing Systems (NeurIPS), 2019
[arXiv /
code]
|
Preprints
* indicates equal contribution to the work.
|
-
Towards Knowledge-driven Autonomous Driving
X. Li, Y. Bai, P. Cai, L. Wen, D. Fu, B. Zhang, X. Yang, X. Cai, T. Ma, J. Guo, X. Gao, M. Dou, Y. Li, B. Shi, Y. Liu, L. He, Y. Qiao
arXiv preprint, 2023
[arXiv]
-
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
L. Wen, X. Yang, D. Fu, X. Wang, P. Cai, X. Li, T. Ma, Y. Li, L. Xu, D. Shang, Z. Zhu, S. Sun, Y. Bai, X. Cai, M. Dou, S. Hu, B. Shi, Y. Qiao
arXiv preprint, 2023
[arXiv]
-
Opencalib: A multi-sensor calibration toolbox for autonomous driving
G. Yan, Z. Liu, C. Wang, C. Shi, P. Wei, X. Cai, T. Ma, Z. Liu, Z. Zhong, Y. Liu, M. Zhao, Z. Ma, Y. Li
arXiv preprint, 2022
[arXiv]
-
CRLF: Automatic Calibration and Refinement based on Line Feature for LiDAR and Camera in Road Scenes
T. Ma*, Z. Liu*, G. Yan, Y. Li
arXiv preprint, 2020
[arXiv]
-
Perception Entropy: A Metric for Multiple Sensors Configuration Evaluation and Design
T. Ma*, Z. Liu*, Y. Li
arXiv preprint, 2020
[arXiv]
-
MOC-GAN: Mixing Objects and Captions to Generate Realistic Images
T. Ma, Y. Li
arXiv preprint, 2020
[arXiv]
|
Academic Activities
Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ACM MM, AAAI.
|
Teaching
[2023.02 - 2023.05] TA of ENGG4512 Digitial Image Processing.
[2022.10 - 2022.12] TA of ENGG2310B Communication Systems.
|
|