Qianli Ma

(+86)17855801919 Fazzie17855801919 fazzie@qq.com
Fazziekey fazzie-key.cool QianliMa fazzie

EDUCATION BACKGROUND


National University of Singapore Master
2022.8 - Present
Zhejiang University Bachelor
2018.9 - 2022.7

WORK EXPERIENCE


ByteDance  Seed
2023.12 - Present
 Large Model Engineer
Shanghai
  • Building the next generation of AGI systems for doubao team. Scaling any modality model to any accelerator over thousands flexily and easy to use!
  • Take charge of the open - source model engineering and internal ecology building.
  • Selected open source projects and publications
    • veScale: A PyTorch Native LLM Training Framework
    • verl: Volcano Engine Reinforcement Learning for LLMs
    • UI-TARS: Pioneering Automated GUI Interaction with Native Agents 📑 Paper
ByteDance  AML
2023.6 - 2023.12
 Large Language Model algorithms intern
Shanghai
  • Conduct research about Large Language Model system, TaskPlanning, SFT data selection, which Dedicating to solving reasoning cognition, and by utilizing pretrained models and code generation/understanding, a powerful general problem solver (AI Agent) is constructed.
HPC-AI Technology
2022.7 - 2023.5
  Machine Learning System Engineeer
Singapore
  • Have completely experienced the whole process from 0 to 1 of Startup in all stages from seed round to Series A, including the research and development of core training frameworks, the research on large model algorithms, and the commercialization of AI infra for B-side customers.
  • As the core developer participate in Colossal Chat
    • Conducted research on a series of related papers including Instruct GPT, LaMDA, CoT, starting from scratch, clarified the key technical points of ChatGPT, including Scaling, Distributed Training,Ability eliciting, Alignment tuning, etc.
    • I took on the core development work of training code for Coati (Colossal AI talking intelligence)large language model, and designed an entire training pipeline including instruction data collection, data preprocessing, distributed training and acceleration of the model, model alignment tuning, etc. We also open-sourced the Coati7B and Coati13B large language models.
    • After open-sourcing Coati, which helped ColossalAI, it ranked first on the Github trending list for three consecutive days (but was eventually overtaken by The Algorithm, Twitter's open source project released by Elon Musk). This had a huge impact on the community and caused ColossalAI to gain more than 10k stars, Becoming one of the fastest growing AI open source projects in the first quarter of 2023
  • Participated in the development of ColossalAI, A Unified Deep Learning System for Big Model Era
    • Participated in the core API refactoring of ColossalAI, including heterogeneous memory management and distributed model saving, to improve the usability of ColossalAI API and reduce user barriers
    • Participating in the development and support of ColossalAI as a distributed backend for PyTorch Lightning enables ColossalAI to integrate more easily with PyTorch Lightning
  • Leading the development of the AIGC Big Model training solution: ColoDiffusion
    • As the core developer, builted Diffusion training framework based on pytorch-lightning + ColosaalAI, which supports multiple training modes, which was officially reposted by Pytorch
    • Use zero optimizer, auto chunk, flash-attention, cpu offload and other technologies to break the memory wall, support large bacth acceleration training
    • As a Huggingface external developer, support Huggingface for Finetone tasks on a consumer GPU with 4GB memory, which is fastest version of Dreambooth yet Diffusers library Dreambooth
  • Participated in the development of Fastfold(Optimizing AlphaFold Training and Inference on GPU Clusters)
    • Support the data pre-processing Parallel(Triple the speed) for Fastfold by Ray, solve the core bottleneck of MSA feature search and long pre-processing time for inference training
    • support the predict the multimer fold for Fastfold
  • Technology Stack:Python,C++,Cuda,Pytorch,Ray,colossal-AI, Pytorch-lightning,TensorRT,DeepSpeed,Huggingface
SenseTime   Large model training
2021.12 - 2022.6
  Algorithm Researcher Internship
Hangzhou
  • Participated in the development of large-scale distributed machine learning training framework - Sensetime Spring of SenseTime, and participated in research related to machine learning systems
  • Advancing large models of target detection to the ground (Vision Transformer, Swin Transformer, etc.), Support SenseTime's general detection framework - POD using pytorch distributed data parallel training and mixed presicion training
  • Involved in MLops related work, machine learning cloud platform development, supporting model lifecycle management database
  • Technology Stack:Python,C++,Cuda,Pytorch,go,Nebula DB
Huawei 2012 Lab   Distributed Parallel Lab
2021.7 - 2021.12
  Algorithm Engineering Internship
Hangzhou
  • Contributed code to Mindspore, a full-scene deep learning framework; Developed three new features for Mindspore Lite
  • Completed Mindspore Lite OpenGL texture converting core code, as the key feature of Mindspore Lite 1.6
  • Accomplished the OpenCL beckend support of x86 platform ofr Mindspore Lite, developed GPU operators of Mindspore core
  • Implemented and iterated the Log system of Mindspore on x86 platform based on Glog
  • Technology Stack:C++,OpenCL,OpenGL,Cmake,Python

RESEARCH EXPERIENCE


Data-Driven Beam Tracking based on Deep Learning
2020.1 - 2021.5
Institute of Intelligent Communication Network and Security
Prof. Min Li(ZJU 100 Young Professor)
  • Proposed a novel deep learning algorithm to solve the beam tracking in the mmWave communication system.
  • Proposed an efficient beam tracking algorithm based on transformer, achieved 91% predicting accuracy, 16 percent higher than existing algorithms.

PUBLICATION


  • Hu, Xueyu, Qianli Ma, et al. (2024) "InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks." ICML, 2024 [Paper] [Code]
  • Ma, Q., Zhou, H., Liu, T., Yuan, J., Liu, P., & Yang, H. (2023). Let's reward step by step: Step-Level reward model as the Navigators for Reasoning. ArXiv. [Paper]
  • Zhou, H., Liu, T., Ma, Q., et al. (2023). DavIR: Data Selection via Implicit Reward for Large Language Models. ACL 2025. [Paper]
  • Qin, Y. ... Ma, Q., Li, J., . . . Shi, G. (2025). UI-TARS: Pioneering Automated GUI Interaction with Native Agents. ArXiv.[Paper] [Code]

COMPETITION & PROJECTS


PokemonGAI
  • Pokemon GAI is a AI-native application based on generative artificial intelligence technology, specially designed to generate Pokemon for players, with latest AI native framework like Langchain, Huggingface Space, FastAPI
Intel Embedding System Competition   Magic mirror based on openpose
2020.7 - 2020.11
  • This project developed an AI-empowered smart mirror device. It can assist users with fitness exercises by detecting their postures. Models are deployed on the AI-Box hardware platform using Intel Openvino Toolkits. It ranked 9th among 100 competing teams.
Kaggle:Ubiquant Market Prediction
2022.1 - 2022.5
  • Optimize investment strategies to predict stock market returns, ultimately using AutoEncoder + MLP to achieve top 20%.

KNOWLEDGE & SKILLS


  • Program Language: C++, Python, C, Golang, Matlab, Verilog, Dart, HTML/CSS/JavaScript
  • AI Full Stack
    • Familiar with the common techniques and algorithms of deep neural networks, familiar with Stable Diffusion, InstructGPT, Alphafold, LLama and other latest large models
    • Familiar with Instruct tuning, RLHF, Prompt Learning, Task Planning, and other state-of-the-art techniques for large language model
    • Proficient in Pytorch, Mindspore, Pytorch-lightning and other deep learning frameworks
    • Proficient in ColossalAI, DeepSpeed, Ray, Megatron-LM and other large model distributed training frameworks for memory optimization, tensor parallelism, distributed training, heterogeneous computing
    • Proficient in edge-side AI inference framework Mindsporelite and its source code; understanding of MNN, TensorRT, OpenVino and other inference frameworks
    • Familiar with GPU programming and operator fusion using OpenCL, Cuda; understanding of AI compilation and quantization
  • Tools:Linux,Vim,shell,Git,Docker,Cmake

CLUBS & ORGANISATIONAL EXPERIENCE


Zhejiang University Internet Society   Technology department  AI lab
2021.10 - Present
String Program   Technology department   Member of the machine learning subdepartment
2020.7 - Present
Zhejiang University Pioneering and Participating work Instructing Center   Deputy Head
2018.9 - 2020.7
Zhejiang University Electroacoustic Orchestra   Drummer of Six o'clock studio band
2018.11 - 2021.2