Microsoft researchers have developed a groundbreaking technique called rStar-Math that enables smaller AI language models to solve complex mathematical problems with remarkable accuracy, surpassing the performance of OpenAI's advanced o1-preview model.
The research team, comprising experts from Microsoft, Peking University, and Tsinghua University, applied rStar-Math to several compact models, including Microsoft's Phi-3 mini and Alibaba's Qwen series. The enhanced models demonstrated exceptional results on challenging mathematical benchmarks.
Breaking Records with Smaller Models
The most striking achievement came from the Qwen2.5-Math-7B model, which saw its accuracy on the MATH benchmark soar from 58.8% to 90.0% after implementing rStar-Math - exceeding OpenAI's o1-preview performance. The technique also showed impressive results on the American Invitational Mathematics Examination (AIME), solving 53.3% of problems and ranking among the top 20% of high school competitors.
How rStar-Math Works
The technique employs Monte Carlo Tree Search (MCTS) to break down complex mathematical problems into manageable steps, similar to human problem-solving approaches. A unique innovation requires models to express their reasoning through both natural language and Python code, with natural language appearing as code comments.
The system includes two specialized components:
- A policy model that generates mathematical reasoning steps
- A process preference model (PPM) that selects the most promising solution paths
These components undergo four rounds of "self-evolution," continuously improving through mutual refinement. The initial training used 747,000 math word problems from public sources.
Industry Impact
This breakthrough challenges the prevailing notion that larger AI models are necessary for superior performance. Microsoft's approach demonstrates that smaller, specialized models can match or exceed the capabilities of much larger systems, offering a more efficient and accessible path forward for AI development.
The code and data will be available on GitHub once Microsoft completes its internal review process, allowing the wider AI community to build upon this innovation.