New Strategy Cuts Training Time by 90%, Overcomes Computational Resource Limitations
Scientists have proposed a "no-thought" reinforcement learning fine-tuning strategy that reduces training time by over 90%, providing a promising solution to challenges posed by limited computational resources. In early-stage research work, scientists explored using a single model for various visual tasks, including multiple classification tasks and other domain-specific tasks. They discovered that even when all task data is used for joint training, the resulting model still performs as well as specialized models. "This finding highlights how a single model can adapt to a variety of tasks, offering important insights into optimizing machine learning for diverse requirements," said Zhang Wenge. Based on this recognition, the team is now focusing on how to optimally determine training strategies and thought models based on the specific properties of individual tasks and the capabilities of the model. This study has introduced a new approach to optimizing the training of AI models. For example, in an automated driving system, a simpler task like obstacle detection can benefit from a "no-thought" direct response model, making it highly efficient. However, for more complex decision-making tasks like route planning, the model needs to perform deep reasoning and planning. In current automated driving systems, multiple specialized models are often used to handle different tasks. If all task data is input into a single model, it not only complicates task synchronization but may also lead to task conflicts. Therefore, introducing an adaptive-thinking mechanism is aimed at reducing task conflicts, enhancing positive transfer, and enabling a single model to excel in multiple tasks. This approach is particularly significant from a practical standpoint in real-world scenarios. Moreover, their exploration of multi-task hybrid training methods ensures that the hybrid model maintains its general applicability while achieving performance levels comparable to or even surpassing those of specialized models. This novel technique could pave the way for the development of large multimodal models. Additionally, this research delves deeper into the differences between AI systems and human cognition and thinking patterns, especially in terms of resource allocation and task-processing mechanisms. These foundational explorations not only help to uncover essential differences between AI and human intelligence but also provide crucial references for the future creation of large-scale model frameworks. References: 1. https://arxiv.org/pdf/2503.16188 2. https://github.com/minglllli/CLS-RL/tree/main Authors: Liu Yugu, He Wei龙