CV

You can download a PDF version of my CV here.

Education

University of Minnesota, MN, USA (2025.09 – 2030.06)
- Full-funded PhD Student in Department of Electrical and Computer Engineering
- Advised by Prof. Mingyi Hong
- Direction: LLM agents, Post-training, Multi-agent system
- GPA: 4/4
East China University of Science and Technology, Shanghai, China (2021.09 – 2025.06)
- B.S. in Computer Science
- GPA: 88/100

Research Experience

PhD Student, Mingyi Hong’s Lab @ University of Minnesota (2025.08 – present)
- Introduced StitchCUDA, a multi-agent workflow for End-to-End CUDA code generation and optimization, enhanced by Rubric-based Agentic RL. It achieves SOTA performance in End-to-End CUDA code generation, defeating GPT-5.2 by a 32B RL-based model, while avoiding reward-hacking and significantly reducing rollout overhead by decomposing atomic skills of Coding Agent. Submitted to ICML 2026.
- Introduced CudaForge, a simple, effective and low-cost multi-agent workflow for CUDA kernel generation and optimization. It achieves SOTA performance in KernelBench Levels 1-3, while only costs $0.3 in API and 25 minutes in single RTX 6000. Submitted to ICML 2026.
- Introduced InfantAgent-NEXT, which undertakes detailed modularization of agent workflows, tool selection, and tool execution, in favor of a modular architecture with a unified dialogue context. Accepted by NeurIPS 2025.
- Advisor: Prof. Mingyi Hong
Research Assistant, Huaxiu Yao’s Lab @ University of North Carolina at Chapel Hill (2024.05 – 2025.01)
- Introduced AnyPrefer, a new method to improve VLA model through DPO and iterative training framework, eventually the robots perform better in several tasks.
- Introduced GRAPE, a Trajectory-wise DPO for VLA model posttraining, which enhances the safety, efficiency, and success rate of the VLA model.
- Authored 1*ICLR’25, 1*ICRA’26.
- Advisor: Prof. Huaxiu Yao
Research Assistant, Machine Learning Group @ Microsoft Research Asia (2024.10 – 2025.04)
- Worked on efficiently training LLM from mixing NLP datasets by dynamic sampling.
- Worked on enhancing LLM’s math reasoning ability by selecting high-quality CoT data via LLM-as-a-Judge.
- Worked on analyzing dynamic parameters (such as loss) during model training to better control the training process.
- Advisor: Prof. Zhong Li
Research Assistant, InternLM2 Team @ Shanghai AI Lab (2023.11 – 2024.05)
- Proposed an efficient data selection method to extract high-quality samples from the original SFT dataset of InternLM2. By fine-tuning InternLM2 on the top 10% of the highest-scoring examples, the model achieved superior performance compared to using the entire instruction dataset.
- Solved the Identity Attack problem in InternLM. Constructed a special finetune dataset to enhance the cognitive ability of the model. Introduced a benchmark to evaluate the model’s ability to resist Identity Attacks.
- Participated in the research and development of InternLM/InternLM2, which are well-known open source LLMs.
- Advisor: Prof. Yining Li

Publications

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Zijian Zhang, Rong Wang, Shiyang Li, Yuebo Luo, Mingyi Hong, Caiwen Ding. "CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization." Under review at ICML, 2026.

StitchCUDA: An Automated Multi-Agent End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning

Shiyang Li*, Zijian Zhang* (Co-First Author), Winson Chen, Yuebo Luo, Mingyi Hong, Caiwen Ding. "StitchCUDA: An Automated Multi-Agent End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning." Under review at ICML, 2026.

InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction

Bin Lei, Weitai Kang, Zijian Zhang, Winson Chen, Xi Xie, Shan Zuo, Mimi Xie, Ali Payani, Mingyi Hong, Yan Yan, Caiwen Ding. "InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction." NeurIPS, 2025.

GRAPE: Generalizing Robot Policy via Preference Alignment

Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, Huaxiu Yao. "GRAPE: Generalizing Robot Policy via Preference Alignment." ICRA 2026; ICLR Workshop, 2025.

AnyPrefer: An Automatic Framework for Preference Data Synthesis

Yiyang Zhou, ..., Zijian Zhang, ..., Huaxiu Yao. "AnyPrefer: An Automatic Framework for Preference Data Synthesis." ICLR, 2025.

Invited Talks

LLM for Kernel: Automated Agentic GPU Programming Framework — From Single Kernel to E2E Programs
Amazon | Feb 2026
RL for VLA Models: Generalizing Robot Policy via Preference Alignment
Department of Electronic Engineering, Tsinghua University (清华大学电子工程系) | Jan 2025
VLA RLHF: Generalizing Robot Policy via Preference Alignment
3D Vision Workshop (3D视觉工坊) | Jan 2025

Academic Service

Reviewer for NeurIPS 2026
Reviewer for ICML 2026
Reviewer for ICLR 2026
Reviewer for COLM 2026
Reviewer for CVPR 2025
Reviewer for CoRL 2025
Reviewer for NeurIPS Workshop 2024

Skills

Programming: C/C++, Python, PyTorch, LaTeX, Git
Language: Mandarin (native), English (TOEFL 100: R 27, L 24, S 22, W 27)
Engineering: Agentic RL / RLVR on VeRL framework

Awards

Department Fellowship — University of Minnesota
Scholarship for Outstanding Students, First Prize — East China University of Science and Technology
Outstanding Student Leaders — East China University of Science and Technology
National Scholarship — Ministry of Education, China
National Mathematics Competition for College Students, First Prize — Chinese Mathematical Society
Suzhou Industrial Park Scholarship — Suzhou Government