OpenAI Introduces New Model o1, Significantly Surpassing GPT-4o

2 минуты

20 сентября

OpenAI has officially unveiled its new product—a generative AI model codenamed Strawberry, officially named OpenAI o1.

More precisely, o1 is a collection of models. Two of them are available today in ChatGPT and through the OpenAI API: o1-preview and o1 mini (a smaller and more affordable version). To access them, you need to subscribe to ChatGPT Plus or Team. Enterprise and Edu users will gain access early next week.

Currently, the capabilities of the o1 chatbot are somewhat limited. It cannot browse web pages or analyze files (yet). It also has speed restrictions: weekly limits are set at 30 messages for o1-preview and 50 for o1-mini. Additionally, the models are expensive: on the API, o1-preview costs $15 per million input tokens (three times more than GPT-4o) and $60 per million output tokens (four times more than GPT-4o).

OpenAI states that it plans to make o1-mini accessible to all ChatGPT users, though no release date has been announced yet.

According to OpenAI, o1 avoids some of the reasoning pitfalls that often trip up generative AI models because it can effectively fact-check, spending more time examining all aspects of a command or question.

The company reports that o1, which originated from OpenAI's internal project known as Q, excels in tasks involving math and programming. However, what truly sets o1, a text-only model, apart from other generative AI models is its ability to "think" before responding to queries.

When given extra time to reflect, o1 can approach tasks comprehensively—planning ahead and executing a sequence of actions over an extended period, helping it arrive at the correct answer. This makes o1 especially suitable for tasks that require synthesizing results from multiple subtasks.

"o1 learns through reinforcement learning, where the system is trained with rewards and penalties to think before answering using a private chain of thought," said Noam Brown, a research scientist at OpenAI.

According to OpenAI, o1 correctly solved 83% of the problems on the qualifying exam for the International Mathematical Olympiad for high school students, compared to only 13% for GPT-4o.

Overall, OpenAI claims that o1 should excel at data analysis, scientific research, and coding tasks.

However, there is a downside: o1 can be slower than other models, depending on the query. Some answers may take the model more than ten seconds. Fortunately, the chatbot version of o1 shows progress by displaying a label for the current subtask it is working on.

Given the unpredictable nature of generative AI models, o1 likely has other flaws and limitations. Noam Brown admitted that o1 also makes mistakes in games like Tic-Tac-Toe and doesn't respond as well to factual knowledge questions as other models.

Interestingly, OpenAI could have shown users o1's raw "chains of thought" but decided against it, opting instead for "model-generated summaries." Why? In its blog, the company cited "competitive advantage" as one of the reasons.

"We acknowledge the downsides of this decision. We strive to partially mitigate them by training the model to incorporate any useful insights from the chain of thought into the response," writes OpenAI.

OpenAI may be the first to release o1. But if competitors soon follow with comparable models, the real challenge for the company will be making o1 widely available.