Wentao (Tony) Ma

@ BosonAI
@ University of Toronto

MLLM for Long Video / Audio Understanding
Flow Matching / Diffusion model for Generation
Efficient and Robust MLLM

University of Toronto
College St, Toronto, ON, CA, M7A 1A2
Email: tonyyyma [at] gmail [dot] com

Open to [PhD / Machine Learning Engineer / Research Assistant] positions

Introduction

The research areas I'm focusing on are MLLMs and GenAI. I enjoy improving and exploring the ability of MLLMs and Diffusion models and applying them to other fields like Robotics and Bio-Science.

Currently, I'm a Master's student at University of Toronto advised by Zhijing Jin. We focusing on developing foundation models for Long-audio understanding and generation. Also, I'm working closely with Wenhu Chen on the Long-Video understanding field.

Before that, I spent one fantastic year at Imperial College London, supervised by Edward Johns. We validate and improve the Multi-Modal pattern learning ability of VLMs and apply them to Robotics. I got my bachelor's degree from Beihang University, School of ShenYuan Honors College, and my major is Computer Science.

I like photographing and I'm one of the members of Toronto Photo Walk(ToPW). I'm also interested in all kinds of sports, including snowboarding and tennis.

News

[06/2025] Happy to share that Vamba has been accepted by ICCV 2025.
[05/2025] Happy to share the structural generation benchmark 'StructEval' is published!
[05/2025] We publish VideoEval-Pro, a more realistic and robust long video understanding benchmark.
[05/2025] I join @BosonAI as a Machine Learning Engineer Intern!
[04/2025] We publish ProT-GFDM, a generative fractional diffusion model for protein generation.
[03/2025] We publish Vamba, an Efficient Long-Video understanding model.
[01/2025] I become a Machine Learning Associate @ Vector Institute!
[10/2024] I grduate from MSc Computing Program @ Imperial with Distinction!
[09/2024] We introduce Paint2Plan, a VLM method enabling imitation leaning in Robotics.
[06/2024] I present the 'LLM Echo Chamber' project on MLCSS @Imperial.
[08/2023] I become a AWS Certified Solutions Architect!
[05/2023] I finish my graduation design at Beihang university with a paper published.
[01/2023] I finished interning with the Multi-Obj-Tracking project at Sony.
[06/2022] I join TikTok as an intern.

Publications

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Jialin Yang^, Dongfu Jiang^, Lipeng He, Sherman Siu, Wentao Ma, Zhiheng Lyu, and Others

Preprint

[paper] [website] [benchmark]

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Wentao Ma^, Weiming Ren^, Yiming Jia, Zhuofeng Li, Ping Nie, Ge Zhang, Wenhu Chen

Preprint

[paper] [website] [benchmark] [Leaderboard]

ProT-GFDM: A Generative Fractional Diffusion Model for Protein Generation

Xiao Liang^, Wentao Ma^, Eric Paquet, Herna Lydia Viktor, Wojtek Michalowski

Preprint

[paper]

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen

International Conference on Computer Vision (ICCV), 2025

[paper] [website]

Paint2Plan: Image Painting Enables Imitation Learning with VLMs

Tony Ma, Teyun Kwon, Edward Johns

Preprint, 2024

[paper] [website]

LLM Echo Chamber: personalized and automated disinformation

Tony Ma, Yves-Alexandre de Montjoye

Machine Leanrning and Cyber Security Symposium (MLCSS), Imperial, 2024

[paper] [code] [video]

Boosting Transferability of Adversarial Patches with Visual Relations

Tony Ma, Songze Li, Yisong Xiao, Shunchang Liu

Conference on Computer Vision and Pattern Recognition (CVPR), AdvVision Workshop, 2023

[paper]

Experience

Boson AI

Machine Learning Engineer Intern

Designing a high-performance State-Space-Model(SSM) based Text-To-Speech(TTS) foundation model

May.2025 - Present [website]

Vector Institute

Machine Learning Associate

Designed a Geo-filtering RAG system with Global Spatial Technology Solutions(GSTS)

Jan.2025 - Apr.2025 [website]

Smart Camera, Semiconductor Solutions Group, SONY

Edge AI Engineer Intern

Video Object Tracking / Model Qutilization / Edge Computing

Sep.2022 - Feb.2023 [website] [Project]

TikTok Pay, ByteDance

Software Engineer Intern

Objective-C / REST API / CICD

May.2022 - Aug.2022 [website]

Projects

Multi-Obj-Tracking
A Muti-object tracking model based on CenterNet

[link]

SysY-Compiler
a toy compiler of sys_y grammar

[link]

Hotel Renting System
A full stack Hotel renting system using Vue and Django

[link]

Selected Certifications and Awards

AWS Certified Solution Architect (Associate) --- 2026

Distinction @ Imperial College London --- 2024

Outstanding Graduates of Beihang University --- 2023

Honorable Mention of Mathematical Contest in Modeling --- 2022

Scholarship for Academic Excellence of Beihang University --- 2020/2021/2022

Scholarship for Discipline Competitions of Beihang University --- 2020/2021/2022

Third Prize of Beijing Municipal Physics Competition --- 2020

Excellent Student Leader of Beihang university --- 2020

Student Work

University of Toronto --- Teaching Assistant for CSC209 --- 2025

Hurricane Skateboarding Club of Beihang University --- Director --- 2021

Honors College of Beihang University --- Mentor --- 2021

Student Union of Beihang University --- Leading Member --- 2020

Collaborate With (with no order)

@ University of Waterloo: Wenhu Chen, Weiming Ren

@ University of Toronto: Zhijing Jin, Yiming Jia

@ Imperial College: Dr.Edward Johns, Teyun Kwon, Sarthak Das

@ Beihang University: Xianglong Liu, Aishan Liu, Shunchang Liu

@ Sony: Bojie Zhang, Eric Gao

	StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Jialin Yang^, Dongfu Jiang^, Lipeng He, Sherman Siu, Wentao Ma, Zhiheng Lyu, and Others Preprint [paper] [website] [benchmark]
	VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Wentao Ma^, Weiming Ren^, Yiming Jia, Zhuofeng Li, Ping Nie, Ge Zhang, Wenhu Chen Preprint [paper] [website] [benchmark] [Leaderboard]
	ProT-GFDM: A Generative Fractional Diffusion Model for Protein Generation Xiao Liang^, Wentao Ma^, Eric Paquet, Herna Lydia Viktor, Wojtek Michalowski Preprint [paper]
	Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen International Conference on Computer Vision (ICCV), 2025 [paper] [website]
	Paint2Plan: Image Painting Enables Imitation Learning with VLMs Tony Ma, Teyun Kwon, Edward Johns Preprint, 2024 [paper] [website]
	LLM Echo Chamber: personalized and automated disinformation Tony Ma, Yves-Alexandre de Montjoye Machine Leanrning and Cyber Security Symposium (MLCSS), Imperial, 2024 [paper] [code] [video]
	Boosting Transferability of Adversarial Patches with Visual Relations Tony Ma, Songze Li, Yisong Xiao, Shunchang Liu Conference on Computer Vision and Pattern Recognition (CVPR), AdvVision Workshop, 2023 [paper]

	Boson AI Machine Learning Engineer Intern Designing a high-performance State-Space-Model(SSM) based Text-To-Speech(TTS) foundation model May.2025 - Present [website]
	Vector Institute Machine Learning Associate Designed a Geo-filtering RAG system with Global Spatial Technology Solutions(GSTS) Jan.2025 - Apr.2025 [website]
	Smart Camera, Semiconductor Solutions Group, SONY Edge AI Engineer Intern Video Object Tracking / Model Qutilization / Edge Computing Sep.2022 - Feb.2023 [website] [Project]
	TikTok Pay, ByteDance Software Engineer Intern Objective-C / REST API / CICD May.2022 - Aug.2022 [website]

	Multi-Obj-Tracking A Muti-object tracking model based on CenterNet [link]
	SysY-Compiler a toy compiler of sys_y grammar [link]
	Hotel Renting System A full stack Hotel renting system using Vue and Django [link]