Searched From Self-play
Search Self-play (SSP) was proposed in October 2025 by a research team from Abbabaquark, Peking University, and Sun Yat-sen University. The related research results were published in a paper. Search Self-play: Pushing the Frontier of Agent Capability without Supervision .
In Search Self-Game (SSP), the objective LLM simultaneously plays two alternating roles: problem maker and problem solver. The problem maker generates deep search queries with verifiable, truthful answers, progressively increasing in difficulty, while the solver attempts to answer the generated questions through multiple rounds of reasoning and search calls. To verify the correctness of each generated query, researchers collect all search results from the problem maker's trajectory as external material and then perform Retrieval Augmentation Generation (RAG) to check whether the solver can successfully predict the answer given all the necessary information. Through this design, the deep search agent can autonomously generate high-quality training tasks and solve them independently, thus eliminating the need for manual annotation and verification while maintaining the accuracy of the rewards.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.