The programming revolution has completely erupted, OpenAI's most powerful intelligent agent is launched on ChatGPT

05-19

This article is machine translated

Show original

OpenAI's Strongest AI Programming Agent Has Arrived! Codex Launches with a Bang, Powered by the o3-Optimized Codex-1, Enabling Parallel Multi-tasking to Complete Software Engineering Tasks in Just Half an Hour. A New Era of AI Programming Begins Today! Just now, Greg Brockman led the OpenAI six-person team in an online livestream, dramatically releasing a cloud-based AI programming agent - Codex. In Altman's words, the era of a single person creating countless hit applications has arrived! Codex is powered by the new codex-1 model, a specially tuned version by o3, tailored specifically for software engineering. It can not only safely and concurrently process multiple tasks in a cloud sandbox environment but also directly access your code library through seamless GitHub integration. It's not just a tool, but a "10x engineer" capable of simultaneously: Quickly building functional modules - Deeply answering code library questions - Precisely fixing code vulnerabilities - Submitting PRs - Automatically executing test verifications Tasks that previously might have taken developers hours or even days can now be efficiently completed by Codex in at most 30 minutes. Through reinforcement learning, Codex is trained on real-world coding tasks and diverse environments, generating code that not only meets human preferences but also seamlessly integrates into standard workflows. Benchmark tests show that codex-1 scored an impressive 72.1% on the SWE-bench, decisively beating Claude 3.7 and o3-high. Starting today, Codex will be officially opened to global ChatGPT Pro, Enterprise, and Team users, with Plus and Edu users soon to follow. It can be said that the emergence of the AI programming agent Codex may reshape the underlying logic of software development, completely igniting the spark of a programming revolution.

It can be seen that in the code version comparison before and after modification, Codex generated very concise code.

In comparison, the code modified by o3 appears somewhat verbose, and even added some "unnecessary" comments to the source code.

Matplotlib

Matplotlib is a comprehensive Python library for creating static, animated, and interactive visualizations.

This issue was to fix a bug: incorrect window correction in mlab._spectral_helper.

Similarly, Codex's code modification process is more concise.

Django

Django is a Python-based web framework. This issue was to fix expressions containing only duration that did not work correctly on SQLite and MySQL.

Codex's repair process remains elegant, and compared to o3, it also first added the missing dependency call.

Expensify

Expensify is an open-source financial collaboration software centered around chat.

The problem given by OpenAI was "dd [HOLD for payment 2024-10-14] [$250] LHN - Member chat room name not updated in LHN after deleting cache".

Similarly, Codex's problem identification and modification are more precise and effective, while o3 even made an invalid code modification.

OpenAI Team Has Already Adopted It

OpenAI's technical team has begun to incorporate Codex as part of their daily toolkit.

OpenAI engineers most commonly use Codex to perform repetitive and well-defined tasks, such as refactoring, renaming, and writing tests, which can interrupt their focus.

It is also suitable for building new features, connecting components, fixing bugs, and drafting documentation.

The team is developing new habits around Codex: handling on-call issues, planning tasks at the start of the day, and performing background work to maintain progress.

By reducing context switching and reminding of forgotten to-do items, Codex helps engineers deliver faster and focus on the most important things.

Before the official release, OpenAI collaborated with a few external testers to evaluate Codex's performance in different code bases, development processes, and team environments:

Cisco, as an early design partner, explored Codex's potential in accelerating engineering team ideation and provided feedback to OpenAI by evaluating real use cases to help optimize the model.
Temporal used Codex to accelerate feature development, problem debugging, test writing and execution, and for refactoring large code bases. Codex can also handle complex tasks in the background, helping engineers maintain focus and iterate efficiently.
Superhuman used Codex to automatically handle small repetitive tasks, such as improving test coverage and fixing integration failures; it also enabled product managers to make lightweight code changes without engineering intervention (except for code review), improving pairing efficiency.
Kodiak accelerated debugging tool development, test coverage, and code refactoring with Codex support for its autonomous driving system Kodiak Driver. Codex also serves as a reference tool, helping engineers understand unfamiliar code stacks and providing relevant context and historical changes.

Based on current usage experience, OpenAI suggests: tasks with clear boundaries can be assigned to multiple agents simultaneously, and various task types and prompt methods should be tried to more comprehensively explore the model's capabilities.

Support providing guidance during task execution
Collaborate with AI to implement strategies
Receive proactive progress updates
Deeply integrate with common tools (such as GitHub, CLI, issue trackers, CI systems) for convenient task allocation

Software engineering is becoming one of the first industries to be significantly improved by AI, which will fully unleash the enormous potential of individuals and small teams.

Meanwhile, OpenAI is also researching with partners how the widespread application of intelligent agents will affect development processes, skill development, and global talent distribution.

References

https://www.youtube.com/watch?v=hhdpnbfH6NU

https://openai.com/index/introducing-codex/

This article is from the WeChat public account "New Intelligence", author: YXH, published with authorization from 36kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content