The Ambitious Journey Behind OpenAI’s All-Purpose AI

fter Hunter Lightman joined OpenAI as a researcher in 2022 and watched his colleagues launch ChatGPT, Lightman quietly worked on a team teaching OpenAI’s models to solve high school math competitions.

Now, that team, known as MatchGen, is known to be instrumental to OpenAI’s industry-leading effort to create AI reasoning models, the core technology behind AI agents that can do tasks on a computer the way a human would.

Lightman talked to TechCrunch and described MathGen’s early work, saying, “We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at.”

OpenAI’s models are now far from perfect, as the company’s latest AI systems still hallucinate, and its agents are struggling to work with complex tasks.

‍

Yet, its state-of-the-art models have also improved significantly on mathematical reasoning. One of OpenAI’s models has recently won a gold medal at the International Math Olympiad, a math competition held for the world’s brightest high school students. More so, OpenAI believes this reasoning capability will also translate into other subjects, and ultimately power general-purpose agents that the company has always dreamed of creating.

OpenAI CEO Sam Altman said that “Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” during the company’s first developer conference held in 2023, also adding, “These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.”

Whether or not agents will be able to meet Altma’s vision remains to be seen, yet OpenAI shocked the world with the release of its first AI reasoning model, o1, in the fall of 2024. Less than a year later, the 21 foundational researchers behind that amazing breakthrough are the most highly sought-after talent you can find in Silicon Valley.

With the help of Mark Zuckerberg, five of the o1 researchers were recruited for Meta’s new superintelligence-focused unit, coming with some compensation packages north of $100 million. One of them is Shengjia Zhao, recently named chief scientist of Meta Superintelligence Labs, reported TechCrunch.

The Reinforcement Learning Process

In the era of OpenAI’s reasoning models, agents are directly linked to a machine learning technique known as reinforcement learning (RL). Reinforcement learning helps provide feedback to an AI model on whether its choices were correct or not in simulated environments.

More so, reinforcement learning has also been used for a long period of time. Think about it, in 2016, almost a year after OpenAI was created, an AI system created by Google DeepMind using RL, AlphaGo, gained global attention after beating a world champion in the board game, Go.

A few years later, in 2018, OpenAI created its first large language model in the GPT series, training on massive amounts of internet data and a large number of GPUs. Those initial GPT models were the ones that excelled at text processing and eventually led to ChatGPT.

2023 was the year when OpenAI achieved a breakthrough and combined a technique called test time computation and became LLMs. “I could see the model starting to reason,” said El Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person”.

OpenAI uniquely combined those processes to obtain Strawberry, which directly led to the development of o1.

‍

“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”

Not so long after the 2023 breakthrough, OpenAI spun up an “Agents” team led by OpenAI researcher Daniel Selsam in order to create further progress on those new paradigms. Also worth mentioning is the fact that even though the team was called “Agents,” it did not initially differentiate between the reasoning model and agents as we see them today.

Eventually, the work of Selsam’s Agents team became part of a larger project to develop the o1 reasoning model, with leaders that include OpenAI co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jukup Pachocki.

‍

“One of the core components of OpenAI is that everything in research is bottom up,” said Lightman. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.”

As a result, by late 2024, several leading companies started seeing diminishing returns on models that were created through traditional pretraining scaling.

The Reinforcement Learning Process

OpenAI uniquely combined those processes to obtain Strawberry, which directly led to the development of o1.

‍

“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”

‍

As a result, by late 2024, several leading companies started seeing diminishing returns on models that were created through traditional pretraining scaling.