Microsoft's Small AI Models Outperform OpenAI in Complex Mathematics

article picture

Microsoft researchers have developed a groundbreaking technique called rStar-Math that enables smaller AI language models to solve complex mathematical problems with remarkable accuracy, surpassing the performance of OpenAI's advanced o1-preview model.

The research team, comprising experts from Microsoft, Peking University, and Tsinghua University, applied rStar-Math to several compact models, including Microsoft's Phi-3 mini and Alibaba's Qwen series. The enhanced models demonstrated exceptional results on challenging mathematical benchmarks.

Breaking Records with Smaller Models

The most striking achievement came from the Qwen2.5-Math-7B model, which saw its accuracy on the MATH benchmark soar from 58.8% to 90.0% after implementing rStar-Math - exceeding OpenAI's o1-preview performance. The technique also showed impressive results on the American Invitational Mathematics Examination (AIME), solving 53.3% of problems and ranking among the top 20% of high school competitors.

How rStar-Math Works

The technique employs Monte Carlo Tree Search (MCTS) to break down complex mathematical problems into manageable steps, similar to human problem-solving approaches. A unique innovation requires models to express their reasoning through both natural language and Python code, with natural language appearing as code comments.

The system includes two specialized components:

A policy model that generates mathematical reasoning steps
A process preference model (PPM) that selects the most promising solution paths

These components undergo four rounds of "self-evolution," continuously improving through mutual refinement. The initial training used 747,000 math word problems from public sources.

Industry Impact

This breakthrough challenges the prevailing notion that larger AI models are necessary for superior performance. Microsoft's approach demonstrates that smaller, specialized models can match or exceed the capabilities of much larger systems, offering a more efficient and accessible path forward for AI development.

The code and data will be available on GitHub once Microsoft completes its internal review process, allowing the wider AI community to build upon this innovation.

Microsoft's Small AI Models Outperform OpenAI in Complex Mathematics

Breaking Records with Smaller Models

How rStar-Math Works

Industry Impact

The End of an Era: Skype's Legacy Lives On as Microsoft Bids Farewell

End of an Era: Microsoft Shutters Skype After Two Decades of Global Connections

Microsoft Removes User Control Over Windows 11's Major 24H2 Update

End of an Era: Microsoft Announces Skype's Consumer Service Shutdown in May

Google's Gemini AI Model Shows Safety Performance Decline in Recent Tests