We are pleased to share the initial checkpoint of our reasoning foundation large language model as an open-source resource. This checkpoint is intended to help evaluate our team's data production pipeline across various domains, including mathematics, programming, logic, safety, and others. Its goal is to provide a solid starting point for developing a robust policy for the subsequent reinforcement learning process.
We are hopeful that applying our reinforcement learning algorithms, supported by our carefully designed infrastructure, will lead to meaningful improvements in the model’s reasoning capabilities across various domains. At the heart of the project is our data production pipeline, which we believe plays a crucial role in enabling general reasoning capabilities. We also believe that the reasoning capability induced by the data production pipeline can address a range of real-world industrial scenarios with increasing precision and reliability.
Based on our observations during the production of
To ensure a fair comparison of our developed data production pipeline, we conducted supervised fine-tuning on the Qwen2.5-32B-Instruct model using the data generated by our pipeline. The performance of this fine-tuned model is compared against the Qwen2.5-32B-QwQ model, recognized as the most advanced o1-like open-source model. Additionally, we incorporated the baseline results from the o1 model and Qwen2.5-32B-Instruct to provide a comprehensive evaluation.
We evaluated our
Our
INF-o1 Team
Wei Chu • Yinghui Xu • Yuan Qi
Chao Qu - Team Leader • Chao Wang - Infrastructure • Cheng Peng - Data Pipeline (Logical) • Dakuan Lu - Data Pipeline (Science) • Haozhe Wang - Data Pipeline (Math) & RL • Hongqing Hu - Infrastructure • Jianming Feng - Data Pipeline (Safety) • Jiaran Hao - Data Pipeline (SQL) & Infrastructure • Kelang Tian - Infrastructure • Minghao Yang - Data Pipeline (Math) • Quanbin Wang - Data Pipeline (Safety) • J.K. Liu - Data Pipeline (SQL) • Tianchu Yao - Data Pipeline & Alignment • Weidi Xu - Data Pipeline (Logical) • Xiaoyu Tan - Data Pipeline & Alignment • Yihan Songliu - Infrastructure
Chao Qu: quchao_tequila@inftech.ai
Xiaoyu Tan: yulin.txy@inftech.ai
@misc{inftech_pi_zero2024,
author = {INF-o1 Team},
title = {INF-o1 (\(\pi_0\)): Initiating the Journey to the Infinity of LLM Reasoning},
year = {2024},
url = {https://inftech-pi-zero.github.io/},
note = {Accessed: 2024-12-31}
}