Hanze Dong

About

Hanze Dong is a research scientist at Salesforce Research. He serves as the managing editor of JMLR and has published in leading machine learning journals and conferences. Hanze earned a PhD in Mathematics from HKUST under the supervision of Professor Tong Zhang and a BSc in Mathematics from Fudan University.
Recently, he works on the reproducibility and interpretability of modern foundation models. He is the core author of several influential open-source post-training GitHub packages, including LMFlow (8K+ stars, NAACL Best Demo Award) and RLHFlow (1K+ stars, TMLR), which replicate state-of-the-art training recipes. His theoretical work explores the core principles of generalization and optimization in foundation models, encompassing theory-guaranteed training algorithms and the analysis of diffusion-like dynamics.

His current research focuses on the post-training and alignment of foundation models, generative modeling, and Monte Carlo sampling. He also has some experience in designing robust and efficient ML algorithms as well as ML theory.

News

[2025-05] One paper has been accepted at ACL 2025 (Findings).
[2025-05] RSD has been accepted at ICML 2025.
[2025-04] Attending ICLR 2025 at Singapore.
[2025-01] Two papers have been accepted to ICLR 2025. See you in Singapore.
[2024-12] Attending NeurIPS 2024 at Vanvouver, Canada.
[2024-12] RAFT (TMLR 2023) has been invited to present in ICLR 2025.
[2024-10] PAPAL has been accepted to JMLR.
[2024-09] Two papers have been accepted to NeurIPS 2024.
[2024-09] Three papers have been accepted to EMNLP 2024 Main Conference.
[2024-09] RLHF Workflow has been accepted at TMLR.

[2024-07] Reverse Transition Kernel is selected as the best paper at ICML 2024 Workshop SPIGM.
[2024-07] Attending ICML 2024 at Vienna, Austria.
[2024-06] LMFlow has won Best Demo Award at NAACL 2024!
[2024-05] One paper has been accepted to COLT 2024, which provides the first sampling algorithm that support general non-log-Sobolev distribution with quasi-polynomial computation complexity.
[2024-05] Two papers have been accepted to ICML 2024, including Gibbs sampling from human feedback (GSHF) and stochastic proximal sampler.
[2024-03] Excited to share that LMFlow paper was accepted by NAACL 2024 Demo Track.
[2024-01] Two papers have been accepted to ICLR 2024, including rdMC and Spurious feature diversification.
[2023-12] The defense of the PhD thesis was successfully completed on November 30th, 2023.
[2023-11] Excited to share that RAFT has been accepted to TMLR. [Link].
[2023-07] Attending ICML 2023 at Honolulu, Hawaiʻi, USA.
[2023-05] Received HKUST RedBird Academic Excellence Award.
[2023-05] Attending ICLR 2023 at Kigali, Rwanda.
[2023-03] We're thrilled to release our project, LMFlow! This framework streamlines the development of LLMs, including fine-tuning, inference, and RLHF, in a more cost-effective and effortless manner. Our aspiration is that LMFlow will incite a broader range of imaginative applications of LLMs and cultivate a larger community of LLM aficionados!

Research

* denotes equal contribution. ^† denotes corresponding author. Name denotes mentored student/intern.
Full publication list can be found in Google Scholar.

Alignment of Foundation Models

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao, Hanze Dong^†, Amrita Saha, Caiming Xiong, Doyen Sahoo;
The Thirteenth International Conference on Learning Representations (ICLR), 2025. [Paper]
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chenlu Ye*, Wei Xiong*, Yuheng Zhang*, Hanze Dong*, Nan Jiang, Tong Zhang;
Annual Conference on Neural Information Processing Systems (NeurIPS), 2024. [Paper]
RLHF workflow: From Reward Modeling to online RLHF
Hanze Dong*, Wei Xiong*, Bo Pang*, Haoxiang Wang*, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang;
Transactions on Machine Learning Research (TMLR), 2024. [Paper]
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
KaShun Shum*, Minrui Xu*, Jianshu Zhang*, Zixin Chen, Shizhe Diao, Hanze Dong, Jipeng Zhang, Muhammad Omer Raza;
The Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. [Paper]
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
Wei Xiong*, Hanze Dong*, Chenlu Ye*, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang;
International Conference on Machine Learning (ICML), 2024.
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models (Oral). [Paper]
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models.
Shizhe Diao*, Rui Pan*, Hanze Dong*, Kashun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang;
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) -- System Demonstration Track, 2024. (Best Demo Award) [Paper] [Github]
Raft: Reward ranked finetuning for generative foundation model alignment
Hanze Dong*, Wei Xiong*, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang;
Transactions on Machine Learning Research (TMLR), 2023. [Paper]
DetGPT: Detect What You Need via Reasoning
Renjie Pi*, Jiahui Gao*, Shizhe Diao*, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang;
The Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023. [Paper]

Generative Modeling, Sampling, and Optimization over the Probability Space

PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium
Shihong Ding*, Hanze Dong*, Cong Fang, Zhouchen Lin, Tong Zhang;
Journal of Machine Learning Research (JMLR), 2024. [Paper]
Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yian Ma, Tong Zhang;
Annual Conference on Neural Information Processing Systems (NeurIPS), 2024. (Spotlight)
Best Paper Award of ICML 2024 Workshop SPIGM. [Paper]
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo
Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang;
Annual Conference on Learning Theory (COLT), 2024. [Paper]
Faster Sampling via Stochastic Gradient Proximal Sampler
Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang;
International Conference on Machine Learning (ICML), 2024. [Paper]
Reverse Diffusion Monte Carlo.
Xunpeng Huang*, Hanze Dong*, Yifan Hao, Yian Ma, Tong Zhang;
International Conference on Learning Representations (ICLR), 2024. [Paper]
Particle-based Variational Inference with Preconditioned Functional Gradient Flow
Hanze Dong, Xi Wang, Yong Lin, Tong Zhang;
International Conference on Learning Representations (ICLR), 2023. [Paper]
Weakly Supervised Disentangled Generative Causal Representation Learning
Xinwei Shen, Furui Liu, Hanze Dong, Qing Lian, Zhitang Chen, Tong Zhang;
Journal of Machine Learning Research (JMLR), 2022. [Paper]
Normalizing Flow with Variational Latent Representation
Hanze Dong*, Shizhe Diao*, Weizhong Zhang, Tong Zhang;
arXiv preprint arXiv:2211.11638, 2022. [Paper]
Mathematical models of Overparameterized Neural Networks
Cong Fang, Hanze Dong, Tong Zhang;
Proceedings of the IEEE (PIEEE), 2021. [Paper]

Robust and Efficient Machine Learning

Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Baohao Liao*, Yuhui Xu*, Hanze Dong*, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong;
The Forty-Second International Conference on Machine Learning (ICML), 2025. [Paper]
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu, Zhanming Jie, Hanze Dong, Lei Wang, Xudong Lu, Aojun Zhou, Amrita Saha, Caiming Xiong, Doyen Sahoo;
The Thirteenth International Conference on Learning Representations (ICLR), 2025. [Paper]
Mitigating the alignment tax of RLHF
Yong Lin*, Hangyu Lin*, Wei Xiong*, Shizhe Diao*, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, Tong Zhang;
The Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
Renjie Pi, Tianyang Han, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang;
The Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
Spurious feature diversification improves out-of-distribution generalization
Yong Lin, Lu Tan, Yifan Hao, Honam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang;
International Conference on Learning Representations (ICLR), 2024.
Catalyst Acceleration of Error Compensated Methods Leads to Better Communication Complexity
Xun Qian, Hanze Dong, Tong Zhang, Peter Richtárik;
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023. [Paper]
Bayesian Invariant Risk Minimization
Yong Lin*, Hanze Dong*, Hao Wang, Tong Zhang;
Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (Oral) [Paper]
Local Augmentation for Graph Neural Networks
Songtao Liu, Rex Ying, Hanze Dong, Lanqing Li, Tingyang Xu, Yu Rong, Peilin Zhao, Junzhou Huang, Dinghao Wu;
International Conference on Machine Learning (ICML), 2022. [Paper]
Learning the Compositional Spaces for Generalized Zero-shot Learning
Hanze Dong, Yanwei Fu, Leonid Sigal, Sung Ju Hwang, Xiangyang Xue;
Computer Vision and Image Understanding (CVIU), 2022. [Paper]
Error Compensated Loopless SVRG for Distributed Optimization
Xun Qian, Hanze Dong, Peter Richtárik, Tong Zhang;
OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020. [Paper]
Error Compensated Proximal SGD and RDA
Xun Qian, Hanze Dong, Peter Richtárik, Tong Zhang;
OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020. [Paper]
Vocabulary-informed Zero-shot and Open-set Learning
Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, Xiangyang Xue, Leonid Sigal;
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. [Paper]
Extreme vocabulary learning
Hanze Dong, Zhenfeng Sun, Yanwei Fu, Shi Zhong, Zhengjun Zhang, Yu-Gang Jiang;
Frontiers of Computer Science, 2019. [Paper]

Academic Services

Managing Editor of JMLR.
Area Chair of NeurIPS, ACL.
Reviewer of ICML, ICLR, CVPR, IEEE Transactions on Signal Processing, TMLR.

Teaching

2022-2023 Fall -- MATH 2011: Introduction to Multivariable Calculus.

2022-2023 Fall -- MATH 6450J/COMP 6211E: Optimization for Machine Learning.

2021-2022 Spring -- MATH 2011: Introduction to Multivariable Calculus.

2021-2022 Fall -- MATH 1023: Honor Calculus.

2020-2021 Spring -- MATH 1014: Calculus II.

2020-2021 Fall -- MATH 1013: Calculus IB.

2019-2020 Spring -- MATH 1014: Calculus II.

2019-2020 Spring -- MATH 2411: Applied Statistics.

Awards

2024 -- Best Paper Award of ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling.

2024 -- NAACL Best Demo Paper Award.

2022/2023 -- RedBird PhD Award, The Hong Kong University of Science and Technology.

2020/2021/2022 -- Best TA Award, The Hong Kong University of Science and Technology.

2019 -- Outstanding Graduates, Fudan University.

2016 -- The First Price, China Undergraduate Physics Tournament

2015 -- Outstanding Freshmen, Fudan University

2014 -- Bronze Medal and The First Price, Chinese Physics Olympiad.

Contact Me

A at B dot com
A = hendrydong
B = gmail

For JMLR-related inquiries, please contact C at jmlr.org
C = hanze.dong