Just now, OpenAI's most powerful programming agent was launched on ChatGPT

05-17

This article is machine translated

Show original

Starting from programming, AI agents will dominate this year!!!

Yesterday, OpenAI CEO Altman previewed a new research project that kept everyone on the edge of their seats.

Just now, the mystery is revealed! OpenAI announcedthe research preview of Codex in ChatGPT.

Codex is a cloud-based software engineering AI agent that can process multiple tasks in parallel, including writing functions, answering code library-related questions, fixing bugs, and submitting pull requests for review. Moreover, each task runs in its dedicated cloud sandbox environment, with the code library pre-loaded.

The model behind Codex is codex-1, a version of OpenAI o3 specifically optimized for software engineering.It uses reinforcement learning to train on real programming tasks in various environments, generating code that highly reflects human style and PR preferences, precisely following instructions, and can iteratively run tests until obtaining a qualified result.

Starting today, ChatGPT Pro, Team, and Enterprise users can use Codex, and Plus and Edu users will be online soon.

People seem very excited about OpenAI's new AI agent product. Some say they are shocked and can't wait to try it out. Ten years ago when learning programming, they never thought this would be possible.

Others believe that Codex, this cloud-native AI agent, can actually build, repair, and deliver functions, feeling like software is starting to write itself on a large scale.

Some who have extensively tested Codex found that when it works normally, it is almost better at "simulating" what the code is doing and what it looks like.

Next, let's look at the official example where Codex can process multiple tasks in parallel:

For example, when asked to "find and fix as many topos and grammatical errors as possible", it will check the code library's maintainability and bugs:

Fixing /diff errors with special characters in filenames:

Creating and using the DEFAULT_ALCATRAZ_TIMEOUT constant:

How Codex Works

Starting today, users can access Codex through the ChatGPT sidebar, enter a prompt, and click the "Write Code" button to execute a new programming task.

To ask questions about the code library, click "Ask". Each task is processed in an isolated environment pre-loaded with the user's code library. Codex can read and write files and run various commands, including test frameworks, code checkers, and type checkers. Task completion typically takes 1 to 30 minutes (depending on complexity), and users can track Codex's progress in real-time.

After task completion, Codex submits its modifications to the dedicated environment. Through terminal log references and test output records, Codex provides a verifiable evidence chain for all operations, making it easy for users to trace the entire task execution process. Users can review the results, request further modifications, create GitHub pull requests, or directly integrate changes into the local environment. In the product, users can configure the Codex environment to be almost identical to the actual development environment.

Codex can execute operations following the AGENTS.md file in the code library. Such text files (similar to README.md) guide Codex on how to navigate the code library, which test commands to run, and how to follow project standard specifications. Like human developers, Codex performs best when given a configured development environment, reliable test plans, and clear documentation.

In coding assessments and internal benchmarks, codex-1 demonstrated strong performance even without an AGENTS.md file or custom scaffolding.

Building a Safe and Reliable AI Agent

When designing Codex, OpenAI prioritized safety and transparency to enable users to verify its output. Users can check Codex's work through references, terminal logs, and test results.

Compared to o3, codex-1 can always generate clearer patches for immediate human review and integration into standard workflows.

Codex vs o3:

OpenAI states that the Codex AI agent runs entirely in a secure, isolated cloud container. During task execution, internet access is disabled, ensuring the agent can only interact with: code explicitly provided through GitHub repositories, pre-installed dependencies configured by user setup scripts, and the agent cannot access any external websites, APIs, or other services.

How is Codex Priced?

Is Codex expensive to use?

OpenAI announced that starting today, they will open Codex to ChatGPT Pro, Enterprise, and Team users globally. In the next few weeks, users can enjoy Codex for free and explore its various functions. Afterward, they will launch rate-limited access and flexible pay-as-you-go options for users to purchase additional usage.

For users developing with codex-mini-latest, the model can be called through Responses API, with pricing as follows:

Input tokens: $1.50 per million
Output tokens: $6 per million

Additionally, OpenAI stated that Codex is still in the early stages of development. As a research preview, it currently lacks certain features, such as image input support required for frontend work and the ability to adjust agents in real-time during Codex operation.

Moreover, the execution speed of remote agents is slower compared to interactive editing, which may require some adaptation time. However, over time, collaboration with Codex agents will become more similar to asynchronous collaboration with colleagues.

Finally, OpenAI indicated plans to launch more interactive and flexible agent workflows in the future.

In the future, programming may indeed become increasingly simpler.

Reference link: https://openai.com/index/introducing-codex/

This article is from the WeChat public account"Machine Heart" (ID: almosthuman2014), author: AI Follower, published with authorization from 36kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Bitpush

How is the company most skeptical of Wall Street using Bitcoin to play the "digital credit" game?

BTC

2.46%

The Defiant

MoonPay and M0 Launch PYUSDx Stablecoin Development Framework

Bitpush

With the echoes of Citrini still lingering, what is the market still debating?