Your professional work may have already been surpassed by AI in 70.9% of cases: In-depth evaluation of GPT-5.2.

avatar
36kr
12-12
This article is machine translated
Show original

In the early hours of the morning, OpenAI officially launched its new generation large model, GPT-5.2.

This comes just one month after the release of the previous generation GPT-5.1, marking the arrival of a new era where AI assists human work.

In official benchmark tests, GPT-5.2 achieved a 70.9% win rate on professional work tasks covering 44 occupations, reaching or surpassing the overall performance of human industry experts for the first time . Saving ordinary enterprise users 40-60 minutes daily and over 10 hours weekly for users retrying—OpenAI is transforming AI from a "conversational assistant" into a "professional collaborator" capable of creating direct economic value.

Unlike previous iterations, GPT-5.2 no longer simply pursues improvements in general dialogue capabilities, but instead focuses precisely on "professional knowledge-based tasks." OpenAI explicitly stated in its official announcement that this series is "the most powerful model series to date, built for professional knowledge-based tasks."

01 The tipping point: The qualitative change from "expert" to "assistant"

According to data released by OpenAI, average ChatGPT Enterprise users can save 40-60 minutes of work time per day, while heavy users report saving more than 10 hours per week . Behind this data is the transformation of AI's role from "information provider" to "value creator."

The results of the GDPval benchmark test were even more disruptive: in this professional job assessment covering 44 occupations across the nine industries that contribute the most to the US GDP, GPT-5.2 Thinking achieved a 70.9% win rate, marking the first time that its overall performance has reached or surpassed that of human industry experts .

In comparison, the previous generation GPT-5 had a win rate of only 38.8% in this test.

“This is an exciting leap in quality,” commented a GDPval judge when reviewing the output of GPT-5.2. “It looks like it was done by a company with a professional team, and the layout design is quite stunning.”

Even more astonishing is the efficiency comparison: GPT-5.2 completes these specialized tasks more than 11 times faster than human experts, while costing less than 1% of the expert's cost . This is not only a technological advancement, but also a revolution in economic models.

02 A Three-Way Division: A Precisely Matched Professional Matrix

In response to diverse professional scenarios, GPT-5.2 adopts a "three-version" strategy for the first time, forming a professional matrix that covers different needs.

The Instant version is positioned as an "efficiency engine," targeting daily office and learning scenarios. While maintaining the natural conversational style of GPT-5.1, it has significant improvements in information retrieval, operation guides, technical writing, and translation. Early testers specifically pointed out that its explanations are clearer and can present key information from the very beginning.

The Thinking version is the "intelligent hub," designed specifically for deeply complex tasks. It excels in coding, summarizing long documents, mathematical logic derivation, and project planning. In ChatGPT, GPT-5.2 Thinking also features new tools not available in its predecessors, such as the ability to directly generate spreadsheets and presentations.

The Pro version acts as a "top-tier think tank," catering to highly demanding tasks requiring extreme accuracy and reliability. It is currently the most intelligent and trustworthy choice for scientific research, complex mathematical problems, and cutting-edge exploration. Early testing shows it makes fewer major errors and performs better in complex fields such as programming.

This refined division of labor reflects OpenAI's deeper understanding of market demands: not a single model to solve all problems, but rather to provide the most suitable intelligent solutions for different scenarios .

03 Five Major Leaps: A Perspective on the Innovation of "Expert-Level" Capabilities

If we summarize the capabilities of GPT-5.2 into five dimensions, we can see a clear "expert evolution roadmap".

In terms of advanced office applications, GPT-5.2 represents a leap from simply "generating text" to "creating deliverables." It can directly create, analyze, and format complex spreadsheets and presentations. In an internal spreadsheet modeling task for junior investment banking analysts, its average score was 9.3 percentage points higher than GPT-5.1 .

Side-by-side comparisons show that GPT-5.2 generates spreadsheets and slides with significant improvements in complexity and formatting. Whether it's an equity structure table or a project management visualization chart, it produces near-professional-quality output .

In terms of code mastery , GPT-5.2 demonstrates an evolution in capabilities from "assisting in writing" to "leading development." In the SWE-Bench Pro test, which rigorously evaluates real-world software engineering capabilities, it set a new record with a score of 55.6% , compared to 50.8% for its predecessor.

Even more compelling is its practical capability : GPT-5.2 can generate complete single-page applications, such as "Wave Simulator," "Holiday Card Maker," and "Typing Rain Game," based solely on a prompt. Windsurf CEO Jeff Wang commented, "GPT-5.2 represents the biggest leap forward in agent coding since GPT-5."

Meanwhile, the hallucination rate of GPT-5.2 was significantly reduced . In a set of de-identified ChatGPT queries, the frequency of incorrect answers in GPT-5.2 Thinking was reduced by 38% compared to GPT-5.1 Thinking .

In terms of long context understanding , GPT-5.2 achieved near 100% accuracy for the first time in the 4-needle MRCR evaluation variant (up to 256k tokens) in the OpenAI MRCRv2 test. This means that professionals can confidently use it to handle multi-document projects such as long reports, contracts, and research papers.

Breakthroughs in visual understanding capabilities have enabled GPT-5.2 to progress from simply "seeing" to truly "understanding." In graph reasoning and software interface comprehension, its error rate is reduced by approximately half compared to GPT-5.1 .

The accuracy rate for answering scientific chart questions reached 88.7% , and the accuracy rate for understanding GUI screenshots was 86.3% . Even when faced with low-quality motherboard images, GPT-5.2 can accurately identify the main components and mark their locations, while GPT-5.1 can only identify a few parts.

The maturity of task scheduling and tool invocation capabilities truly endows GPT-5.2 with the characteristics of an "intelligent agent." In the Tau2-bench Telecom test, it achieved an excellent score of 98.7% , demonstrating its ability to reliably use tools in long-duration, multi-round tasks.

In real-world scenarios, when users raise complex issues involving flight delays, missed connections, lost baggage, and medical seat requests, GPT-5.2 can coordinate a complete workflow—rebooking, arranging special assistance seats, and handling compensation—providing a more comprehensive outcome than its predecessor.

04 Usability and Prospects: Gradual Implementation of Productivity Upgrades

Starting today, the GPT-5.2 series will be rolled out to paid users on ChatGPT, covering Plus, Pro, Go, Business, and Enterprise plans. This new model is now available to all developers on the API platform.

The pricing strategy reflects the improved capabilities: GPT-5.2's API price is $1.75 per million input tokens and $14 per million output tokens, an increase over GPT-5.1. However, OpenAI emphasizes that due to its higher token efficiency, the overall cost of achieving equivalent quality levels across multiple agent evaluations is actually lower .

In terms of security, GPT-5.2 continues and enhances security measures. In particular, it significantly reduces undesirable responses in mental health-related conversations. OpenAI is also gradually deploying an age prediction model to automatically apply stricter content protection for minors.

OpenAI's decision to release GPT-5.2 on its tenth anniversary carries significant symbolic meaning, representing a bridge between the past and the future. From GPT to GPT-3, from ChatGPT to GPT-5.2, this company has consistently led the development of AI technology.

As GPT-5.2 is gradually rolled out to hundreds of millions of users worldwide, a clear signal of the times is emerging: AI is no longer just a tool for answering questions or generating text, but an intelligent collaborator capable of understanding complex needs, coordinating multi-step processes, and producing professional results.

The essence of professional work is being redefined, and the core engine of this redefinition has quietly been upgraded to version 5.2.

This article is from the WeChat public account "First Voice" , author: Jia Yue, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments