Wentao (Tony) Ma
@ BosonAI
|
![]() |
The research areas I'm focusing on are Multi-Modal LLMs. I enjoy improving and exploring the ability of MLLMs and on Video and Audio, and applying them to other fields like Robotics.
Currently, I'm a Master's student at University of Toronto, and also a MLE at @BosonAI, adviced by Alex Smola and Mu Li. We are developing efficient and expressive foundation models for audio understanding and generation. Also, I'm working closely with Wenhu Chen on the Video understanding field.
Before that, I spent one fantastic year at Imperial College London, supervised by Edward Johns. We validate and improve the Multi-Modal pattern learning ability of VLMs and apply them to Robotics. I got my bachelor's degree from Beihang University, School of ShenYuan Honors College, and my major is Computer Science.
I like photographing and I'm one of the members of Toronto Photo Walk(ToPW). I'm also interested in all kinds of sports, including snowboarding and tennis.
![]() |
VideoScore2: Think before You Score in Generative Video Evaluation Xuan He*, Dongfu Jiang*, Ping Nie, Minghao Liu, Wentao Ma, Junru Lin, and Others Preprint |
![]() |
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Jialin Yang*, Dongfu Jiang*, Lipeng He, Sherman Siu, Wentao Ma, Zhiheng Lyu, and Others Preprint |
![]() |
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Wentao Ma*, Weiming Ren*, Yiming Jia, Zhuofeng Li, Ping Nie, Ge Zhang, Wenhu Chen Preprint |
![]() |
ProT-GFDM: A Generative Fractional Diffusion Model for Protein Generation Xiao Liang*, Wentao Ma*, Eric Paquet, Herna Lydia Viktor, Wojtek Michalowski Computational and Structural Biotechnology Journal(CSBJ), 2025 |
![]() |
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen International Conference on Computer Vision (ICCV), 2025 |
![]() |
Paint2Plan: Image Painting Enables Imitation Learning with VLMs Tony Ma, Teyun Kwon, Edward Johns Preprint, 2024 |
![]() |
LLM Echo Chamber: personalized and automated disinformation Tony Ma, Yves-Alexandre de Montjoye Machine Leanrning and Cyber Security Symposium (MLCSS), Imperial, 2024 |
![]() |
Boosting Transferability of Adversarial Patches with Visual Relations Tony Ma, Songze Li, Yisong Xiao, Shunchang Liu Conference on Computer Vision and Pattern Recognition (CVPR), AdvVision Workshop, 2023 |
![]() |
Boson AI Machine Learning Engineer Intern Alignment for Audio Understanding and Generation models May.2025 - Present [website] |
![]() |
Vector Institute Machine Learning Associate Designed a Geo-filtering RAG system with Global Spatial Technology Solutions(GSTS) Jan.2025 - Apr.2025 [website] |
![]() |
SONY Edge AI Engineer Intern Video Object Tracking / Model Qutilization / Edge Computing |
![]() |
TikTok Software Engineer Intern IOS developing for TikTok Pay May.2022 - Aug.2022 [website] |
AWS Certified Solution Architect (Associate) --- 2026 |
Mitacs Research Funding --- 2025-2026 |
Distinction @ Imperial College London --- 2024 |
Outstanding Graduates --- 2023 |
Scholarship for Academic Excellence --- 2020/2021/2022 |
Scholarship for Discipline Competitions --- 2020/2021/2022 |
Excellent Student Leader --- 2020 |
@ Canada: Wenhu Chen, Weiming Ren, Yiming Jia, Xiao Liang, Yuzhi Tang, |
@ UK: Edward Johns, Teyun Kwon, Sarthak Das, Wanru Zhao, |
@ China: Xianglong Liu, Aishan Liu, Shunchang Liu, Bojie Zhang, Eric Gao |