he/him
Ph.D. student at Tsinghua University, working on Agentic RL and LLM post-training systems.
February 21, 2025
Talk, MIT-IBM Lab, USA