Reinforcement Learning

AgentCPM-Explore featured image

AgentCPM-Explore

🏆 **项目负责人** · 我主导了一个面向长程深度探索的开源 4B 智能体模型 [![Stars](https://img.shields.io/github/stars/OpenBMB/AgentCPM?style=social)](https://github.com/OpenBMB/AgentCPM)

AgentCPM-Explore featured image

AgentCPM-Explore

🏆 **Project Lead** · I led an open-source 4B agent model for long-horizon deep exploration …

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents featured image

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

ArXiv 2026. First author. Open-source 4B agent model achieving SOTA on GAIA & HLE, surpassing GPT-5 and Claude-4.5-Sonnet.

avatar
Haotian Chen
AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents featured image

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

ArXiv 2026. First author. Open-source 4B agent model achieving SOTA on GAIA & HLE, surpassing GPT-5 and Claude-4.5-Sonnet.

avatar
Haotian Chen

Reflective Reinforcement Tool Learning

Submitted to ACL 2026. Reflective reinforcement learning for tool learning.

avatar
Haotian Chen

Reflective Reinforcement Tool Learning

Submitted to ACL 2026. Reflective reinforcement learning for tool learning.

avatar
Haotian Chen
AgentRL featured image

AgentRL

🏆 **项目负责人** · 面向 AgentCPM 模型族的全异步智能体强化学习训练基础设施 `100+ 工具` · `20+ 基准` · `全流程可视化`

AgentRL featured image

AgentRL

🏆 **Project Lead** · Fully asynchronous agent RL training infrastructure for the AgentCPM model family `100+ Tools` · `20+ Benchmarks` · `Full-cycle Visualization`

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning featured image

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

EMNLP 2025 Demo. GUI agents with reinforcement fine-tuning. 1,200+ GitHub Stars.

zhong-zhang
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning featured image

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

EMNLP 2025 Demo. GUI agents with reinforcement fine-tuning. 1,200+ GitHub Stars.

zhong-zhang