I possess extensive engineering experience across both Machine Learning Systems (MLSys) and Large Language Model Algorithms. My goal is to advance next-generation AGI systems in order to create larger and better models. I am deeply passionate about the latest technologies and actively contribute to the open-source community as a core contributor to several popular open-source AI projects.

EDUCATION BACKGROUND

National University of Singapore

2022.8 - 2024.1

Master of Computer Science

Singapore

Focus on Machine Learning System in HPC-AI lab
Dissertation: Maximizing parallelism in Distribted training for diffusion model

Zhejiang University

2018.9 - 2022.7

B.Eng in Electronic Science and Technology

Hangzhou, China

Shannon elite class of Information Science and Electronic Engineer College
Minor in Intensive training Program of Innovation and Entrepreneeyrship(ITP) in Chu Kechen Honors College
Honors: First Class Scholarship of Zhejiang University, Excellent Graduate of Zhejiang University

Other Honors: Second prize in the National High school Mathematics Competition(2017)

WORK EXPERIENCE

ByteDance Seed

2023.12 - Present

Senior Research Engineer

Shanghai

As one of the earliest members of the Seed Team, I focusing on the AI infrastructure, optimizing training performance for LLMs & multimodal understanding and generation models (with thousands of GPUs), from pre-training to post-training. In particular, I led a small team to develop VeOmni(an open-source multimodal training system). I was deeply involved in the research and development of core models such as Seed-Thinking 1.5 and UI-TARS.

Projects Highlights

VeOmni is a PyTorch-native training framework purpose-built for both multi-modal pre-training and post-training.

DeviceMesh

DTensor

veScale is a PyTorch-native LLM Training Framework with Dtenser-based ND Parallelism and Eager Mode Execution
verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).
UI-TARS is an open-source multimodal agent built upon a powerful vision-language model. It is capable of effectively performing diverse tasks within virtual worlds.

ByteDance AML

2023.6 - 2023.12

LLMs Research Intern at Seed-Project

Shanghai

Conduct research about MLsys Learning system, Process-Supervised Reward Model (PRM), SFT Data Selection, Agent for Data Analysis

Project Highlights

PRM: I built a complete pipeline for data processing, PRM model training, and evaluation, and proposed a heuristic greedy search algorithm based on Process-Supervised Reward Models (PRM) (HGS-PRM), which uses step-level feedback from PRM to optimize the reasoning paths of large language models; compared with the Chain-of-Thought (CoT) method, this algorithm has improved the model's capabilities in mathematical reasoning and code generation.
SFT Data Selection: Developed DavIR, a model-centric data selection method that enables LLMs (LLaMA, Gemma) to outperform full-dataset training with only 6% of Alpaca data; extended it to DavIR-DPO, boosting Zephyr-7B-SFT's alignment performance by 8% on AlpacaEval.
Agent for Data Analysis: built InfiAgent-DABench, a benchmark for evaluating agents on data analysis tasks. I've developed Agent infrastructure components such as LLMs API call integration, vLLM-powered inference engines, Python sandboxes, as well as model training infrastructure.

Joined as Employee #15, completing my master's degree while supporting the company's growth from Seed to Series A. As a key developer on ColossalAI—the company's core training framework, I also led the projects including Colossal Chat and ColoDiffusion, driving GitHub stars from 0 to 20k+. Beyond R&D, I contributed to commercialization strategies, grew the open-source community, and participated in cloud product design.

Colossal Chat

I took on the core development work of training code for Coati (Colossal AI talking intelligence)large language model, and designed an entire training pipeline including instruction data collection, data preprocessing, distributed training and acceleration of the model, model alignment tuning, etc. We also open-sourced the Coati7B and Coati13B large language models.
After open-sourcing Coati, which helped ColossalAI, it ranked first on the Github trending list for three consecutive days (but was eventually overtaken by The Algorithm, Twitter's open source project released by Elon Musk). This had a huge impact on the community and caused ColossalAI to gain more than 10k stars, Becoming one of the fastest growing AI open source projects in the first quarter of 2023

ColossalAI, A Unified Deep Learning System for Big Model Era

Developed in the core feature of ColossalAI, including heterogeneous memory management, pipeline parallelism, distributed model saving.
Participating in the development and support of ColossalAI as a distributed backend for PyTorch Lightning enables ColossalAI to integrate more easily with PyTorch Lightning

Leading the development of the AIGC Big Model training solution: ColoDiffusion

As the core developer, built Diffusion training framework based on pytorch-lightning + ColosaalAI, which supports multiple training modes, which was officially reposted by Pytorch
Use zero optimizer, auto chunk, flash-attention, cpu offload and other technologies to break the memory wall, support large bacth acceleration training

Fastfold(Optimizing AlphaFold Training and Inference on GPU Clusters)

Fastfold

Ray

Technology Stack:Python,C++,Cuda,Pytorch,Ray,colossal-AI, Pytorch-lightning,TensorRT,DeepSpeed,Huggingface

I participated in the development of large-scale distributed machine learning training framework - Sensetime Spring of SenseTime, and research related to machine learning systems

I Contributed to Mindspore, a full-scene deep learning framework; Developed three new features for Mindspore Lite

Qianli Ma

EDUCATION BACKGROUND

WORK EXPERIENCE

PUBLICATION

KNOWLEDGE & SKILLS

CLUBS & ORGANISATIONAL EXPERIENCE