AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

Feb 1, 2026·

Haotian Chen

Xin Cong

Shengda Fan

Yuyang Fu

Ziqin Gong

Yaxi Lu

Yishan Li

Boye Niu

Chengjun Pan

Zijun Song

Huadong Wang

Yesai Wu

Yueying Wu

Zihao Xie

Yukun Yan

Zhong Zhang

Yankai Lin

Zhiyuan Liu

Maosong Sun

· 0 min read

Code

Abstract

We develop a unified tool sandbox environment management framework with end-to-end agent RL training. The 4B-parameter agent achieves SOTA among same-scale models, surpassing GPT-5 and Claude-4.5-Sonnet on GAIA and HLE benchmarks.

Type

Preprint

Publication

ArXiv Preprint 2026

Last updated on Feb 1, 2026

Autonomous Agents Reinforcement Learning

Authors

Haotian Chen (he/him)

Research Assistant Professor

I am a Research Assistant Professor at the School of Artificial Intelligence, Shanghai Jiao Tong University, where I work with Prof. Junchi Yan at RethinkLab. I study how to build AI systems that can automate long-horizon, effort-intensive, and creativity-demanding tasks such as research, engineering, and development. My current work focuses on autonomous agents, large language models, and AI4Research. Before joining SJTU, I received my PhD in Data Science from Fudan University, advised by Prof. Xiangdong Zhou, and completed postdoctoral research at Tsinghua University (THUNLP), working with Prof. Zhiyuan Liu and Prof. Maosong Sun. I was also a research intern at the Machine Learning Research Group of Microsoft Research Asia, mentored by Xiao Yang, and at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, working with Prof. Yang Yu.

Reflective Reinforcement Tool Learning Jan 1, 2026 →

No results found

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents