OpenAI Aims for "Automated Researcher" After GPT-5
In a recent episode of a16z’s podcast, OpenAI’s two key research leaders—Chief Scientist Jakub Pachocki and Chief Research Officer Mark Chen—offered a deep dive into the company’s vision for the post-GPT-5 era, revealing a bold new ambition: building an “automated researcher,” an AI system capable of autonomously discovering new knowledge and advancing science. This vision marks a pivotal shift from enhancing language models to enabling machines to conduct meaningful scientific inquiry. The conversation began by reflecting on the design philosophy behind GPT-5. OpenAI had long operated with two distinct model lines: the fast-response GPT series, optimized for instant answers, and the deeper-thinking o-series models, designed for complex reasoning. This duality created confusion for users. “We didn’t want users to have to choose between modes,” Chen explained. “Our goal with GPT-5 was to eliminate that friction—automatically determining how much thinking a task requires and delivering intelligent, agent-like behavior by default.” As models have advanced, traditional benchmarks have hit a ceiling. Pachocki noted that many standard metrics are nearing saturation. “Improving from 96% to 98% on a test isn’t necessarily the most important thing in the world.” He emphasized that the old paradigm—pre-training followed by generalization—no longer captures the full picture. With reinforcement learning and specialized training, models can achieve elite performance in narrow domains, but that doesn’t guarantee broad intelligence. The future, Pachocki argued, lies in measuring a model’s ability to make real discoveries. “The most exciting sign of progress isn’t just higher scores—it’s when models start making genuine contributions in math and programming competitions.” The next milestones, he said, will be defined not by accuracy on known tasks, but by the model’s capacity to uncover new insights and drive meaningful scientific progress. This leads to OpenAI’s central long-term goal: the automated researcher. Over the next one to five years, the company aims to build an AI system that can autonomously formulate hypotheses, design experiments, and iterate toward novel findings across scientific disciplines. A key measure of success will be the model’s ability to sustain deep, long-form reasoning—extending beyond hours to weeks or even months. Currently, models can reason for one to five hours on high school-level problems and are approaching expert performance. The next frontier is scaling that duration while maintaining consistency and resilience. As task complexity grows, so does the risk of error. Pachocki stressed that maintaining depth is fundamentally about stability over time—about the ability to recover from failure, reassess, and adapt, much like a human researcher tackling a difficult proof. Chen likened this process to solving a complex mathematical problem: try a path, fail, analyze, pivot. That iterative resilience is at the heart of research. And while this capability has been most visible in well-defined domains like math and code, the team sees it evolving into more open-ended, ambiguous fields—where the real challenge is not just solving a problem, but defining the right one. Two key technologies are fueling this progress: reinforcement learning and advanced programming tools. Pachocki highlighted the synergy between large-scale language models and reinforcement learning. “Pre-training gives us a rich, robust understanding of human language—essentially a vast, real-world simulation environment. Once you have that foundation, you can run countless experiments, test ideas, and learn through interaction.” Programming, especially through the newly released GPT-5 Codex, is another cornerstone. Codex is designed not just to write code, but to understand and execute it in real-world contexts—handling messy, ambiguous, and stylistically nuanced software development. Chen, a former programming competition participant, described how today’s AI can refactor a 30-file codebase in 15 minutes—something that would have taken a human days. This has given rise to “vibe coding”—a mode where developers rely on AI for most of the work, only writing code manually for completeness. This shift is now poised to evolve into “vibe researching”—where AI assists not just in execution, but in formulating questions, exploring ideas, and guiding discovery. Pachocki emphasized that the most critical trait of a researcher is persistence. Research is about exploring high-risk, high-reward paths, learning from failure, and maintaining intellectual honesty. Success comes not from avoiding mistakes, but from the ability to adapt. Chen added that experience shapes judgment—knowing when to persist and when to pivot. This intuition is built through reading seminal papers, engaging with peers, and internalizing diverse approaches. Above all, researchers must care deeply about problems they believe are important, even if they’re widely considered unsolvable. OpenAI’s culture, the leaders noted, is built on a foundation of fundamental research, bold experimentation, and long-term thinking. The company attracts talent from diverse backgrounds—physics, math, finance—valuing deep technical grounding and intellectual courage over social visibility. It fosters a mix of creative visionaries and meticulous experimentalists, creating a dynamic ecosystem where different research styles thrive. While the company explores multiple fronts—diffusion models, code reasoning, and more—these efforts are unified by the overarching goal of building an automated researcher. The path isn’t fixed; exploration is part of the journey. When asked what they’d prioritize with more resources, both leaders answered without hesitation: compute. Pachocki dismissed the idea that data will soon become the bottleneck. “We can do what we can do based on compute—and I don’t see that changing.” Chen echoed this: “No one ever says, ‘I have enough compute.’” For OpenAI, access to computational power remains the enduring constraint—and the essential enabler—of breakthrough research.