OTHINK-R1: A double-mode reasoning frame to cut unnecessary calculation in llms

The inefficiency of static thoughtful reasoning in LRMS

Recent LRMs achieve top benefits by using detailed COT Reasoning to solve complex tasks. However, many simple tasks they handle could be solved by smaller models with fewer symbols, making such a detailed reasoning unnecessary. This repeats human thinking where we use quick, intuitive reactions for easy problems and slower, analytical thinking for complex. While LRMS mimics slowly, logical reasoning, they generate significantly longer outputs, increasing calculation costs. Current methods of reducing reasoning steps lack flexibility, which limits models to a single fast reasoning style. There is a growing need for adaptive reasoning that adjusts the effort according to the difficulties of the task.

Limitations of existing training -based and training -free approaches

Recent research on improving reasoning efficiency in LRMs can be categorized in two main areas: Exercise -based and training -free methods. Educational strategies often use reinforcement learning or fine -tuning to limit the use of token or adjust the depth of reasoning, but they tend to follow fixed patterns without flexibility. Exercise -free approaches use quick technique or pattern detection to shorten outputs during inference; However, they also lack adaptability. Newer work focuses on variable length reasoning, where models adjust the reasoning depth based on task complexity. Others are studying “over-thinking” where models over-season unnecessarily. However, few methods enable dynamic shifts between quick and thorough reasoning – something this paper addresses directly.

Introduction of Othink-R1: Dynamic Fast/Slow Reasoning Framework

Scientists from Zhejiang University and Oppo have developed Othink-R1, a new approach that allows LRMs to switch between fast and slow thinking smart, just as humans do. By analyzing reasoning patterns, they identified which steps are important and which are superfluous. With the help of another model that acts as a judge, they trained the LRMs to customize their reasoning style based on task complexity. Their method reduces unnecessary reasoning by over 23% without losing accuracy. Using a loss function and fine-tuned data sets, OTHINK-R1 exceeds previous models in both efficiency and performance of various math and question-strain tasks.

System Architecture: Reasoning pruning and optimization of double reference

The Othink-R1 frame helps LRMS dynamic switching between fast and slow thinking. First, it identifies when LRMs include unnecessary reasoning, such as overexplaining or double control, versus when detailed steps are really important. Using this, it builds a curated training data set by cropping superfluous reasoning and preserving valuable logic. Then, during fine tuning of fine tuning, a special loss function balances both reasoning styles. This double -reference loss compares the model’s output with both fast and slow thinking variants that encourage flexibility. As a result, Othink-R1 can adaptively choose the most effective reasoning path for each problem while retaining accuracy and logical depth.

Empirical evaluation and comparative performance

The Othink-R1 model was tested on simpler QA and math assignments to evaluate its ability to switch between fast and slow reasoning. Using data sets such as OpenBookqa, Commonsenseqa, ASDIV and GSM8K, the model demonstrated strong performance and generated fewer tokens while maintaining or improving accuracy. Compared to basic lines such as Nothinking and Dual Forms, Othink-R1 demonstrated a better balance between efficiency and efficiency. Ablation studies confirmed the importance of pruning, KL restrictions and LLM judge to achieve optimal results. A case study illustrated that unnecessary reasoning can lead to over-thinking and reduced accuracy, highlighting Othink-R1’s strength in adaptive reasoning.

Conclusion: Against scalable and effective hybrid reasoning systems

Finally, Othink-R1 is a large reasoning model that adaptively switches between fast and slow thinking states to improve both efficiency and performance. It addresses the question of unnecessarily complex reasoning in large models by analyzing and classifying reasoning steps as either essential or redundant. By cropping the superfluous while maintaining logical accuracy, Othink-R1 reduces unnecessary calculation. It also introduces a double-reference KL-Diveren Staff to strengthen hybrid reasoning. Tested on math and QA tasks, the reasoning redundancy cuts by 23% without sacrificing accuracy and showing promises to build more adaptable, scalable and efficient AI Reasoning Systems in the future.

Check Paper and github side. All credit for this research goes to the researchers in this project. You are also welcome to follow us on Twitter And don’t forget to join our 100k+ ml subbreddit and subscribe to Our newsletter.

Sana Hassan, a consultant intern at MarkTechpost and dual-degree students at IIT Madras, is passionate about using technology and AI to tackle challenges in the real world. With a great interest in solving practical problems, he brings a new perspective to the intersection of AI and real solutions.