WORK EXPERIENCE
ByteDance  Seed
2023.12 - Present
 Large Model Engineer
Shanghai
- Building the next generation of AGI systems for doubao team
- Super-large scale distributed training for any modality model on any hardware
ByteDance  AML
2023.6 - 2023.12
 Large Language Model algorithms intern
Shanghai
- Conduct research about Large Language Model system, TaskPlanning, SFT data selection,
which Dedicating to solving reasoning cognition, and by utilizing pretrained models and code generation/understanding, a powerful general problem solver (AI Agent) is constructed.
HPC-AI Technology
2022.7 - 2023.5
  Machine Learning System Engineeer
Singapore
-
Have completely experienced the whole process from 0 to 1 of Startup in all stages from seed round to Series A, including the research and development of core training frameworks, the research on large model algorithms, and the commercialization of AI infra for B-side customers.
-
As the core developer participate in Colossal Chat
-
Conducted research on a series of related papers including Instruct GPT, LaMDA, CoT, starting from scratch, clarified the key technical points of ChatGPT, including Scaling, Distributed Training,Ability eliciting, Alignment tuning, etc.
-
I took on the core development work of training code for Coati (Colossal AI talking intelligence)large language model, and designed an entire training pipeline including instruction data collection, data preprocessing, distributed training and acceleration of the model, model alignment tuning, etc.
We also open-sourced the Coati7B and Coati13B large language models.
-
After open-sourcing Coati, which helped ColossalAI, it ranked first on the Github trending list for three consecutive days (but was eventually overtaken by The Algorithm,
Twitter's open source project released by Elon Musk). This had a huge impact on the community and
caused ColossalAI to gain more than 10k stars,
Becoming one of the fastest growing AI open source projects in the first quarter of 2023
- Participated in the development of ColossalAI, A Unified Deep Learning System for Big Model Era
-
Participated in the core API refactoring of ColossalAI, including heterogeneous memory management and distributed model saving, to improve the usability of ColossalAI API and reduce user barriers
-
Participating in the development and support of ColossalAI as a distributed backend for PyTorch Lightning enables ColossalAI to integrate more easily with PyTorch Lightning
- Leading the development of the AIGC Big Model training solution: ColoDiffusion
-
As the core developer, builted Diffusion training framework based on pytorch-lightning + ColosaalAI, which supports multiple training modes, which was officially reposted by Pytorch
-
Use zero optimizer, auto chunk, flash-attention, cpu offload and other technologies to break the memory wall, support large bacth acceleration training
-
As a Huggingface external developer, support Huggingface
for Finetone tasks on a consumer GPU with 4GB memory, which is fastest version of Dreambooth yet
Diffusers library Dreambooth
- Participated in the development of Fastfold(Optimizing AlphaFold Training and Inference on GPU Clusters)
- Support the data pre-processing Parallel(Triple the speed) for Fastfold by Ray,
solve the core bottleneck of MSA feature search and long pre-processing time for inference training
- support the predict the multimer fold for Fastfold
- Technology Stack:Python,C++,Cuda,Pytorch,Ray,colossal-AI, Pytorch-lightning,TensorRT,DeepSpeed,Huggingface
SenseTime   Large model training
2021.12 - 2022.6
  Algorithm Researcher Internship
Hangzhou
- Participated in the development of large-scale distributed machine learning training framework - Sensetime Spring of SenseTime, and participated in research related to machine learning systems
- Advancing large models of target detection to the ground (Vision Transformer, Swin Transformer, etc.), Support SenseTime's general detection framework - POD using pytorch distributed data parallel training and mixed presicion training
- Involved in MLops related work, machine learning cloud platform development, supporting model lifecycle management database
- Technology Stack:Python,C++,Cuda,Pytorch,go,Nebula DB
Huawei 2012 Lab   Distributed Parallel Lab
2021.7 - 2021.12
  Algorithm Engineering Internship
Hangzhou
- Contributed code to Mindspore, a full-scene deep learning framework; Developed three new features for Mindspore Lite
- Completed Mindspore Lite OpenGL texture converting core code, as the key feature of Mindspore Lite 1.6
- Accomplished the OpenCL beckend support of x86 platform ofr Mindspore Lite, developed GPU operators of Mindspore core
- Implemented and iterated the Log system of Mindspore on x86 platform based on Glog
- Technology Stack:C++,OpenCL,OpenGL,Cmake,Python
RESEARCH EXPERIENCE
Maximizing Parallelism in Distributed Training for Huge Neural Networks
2020.1 - 2021.5
NUS High Performance Computing for Artificial Intelligence (HPC-AI) Lab
Prof. Yang You(Presidential Young Professor)
- Participating in the writing of “The Big Model in Action”
Data-Driven Beam Tracking based on Deep Learning
2020.1 - 2021.5
Institute of Intelligent Communication Network and Security
Prof. Min Li(ZJU 100 Young Professor)
- Proposed a novel deep learning algorithm to solve the beam tracking in the mmWave communication system.
- Proposed an efficient beam tracking algorithm based on transformer, achieved 91% predicting accuracy, 16 percent higher than existing algorithms.