OpenAI GPT-5.3-Codex-Spark is now available: Pro users can get early access and receive faster responses.

This article is machine translated
Show original

OpenAI recently announced a collaboration with AI chip startup Cerebras to launch GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex and OpenAI's first model designed specifically for "real-time programming." It will initially be available to ChatGPT Pro users, allowing developers to experience it firsthand.

What is Cerebras? What are the motivations behind the collaboration between the two parties?

OpenAI is facing the dual pressures of rapid user growth and limited computing resources, and urgently needs ultra-low latency AI inference computing power to support real-time interactive scenarios in order to improve the real-time response experience of products such as ChatGPT, program generation, and AI agents.

Cerebras's wafer-level chips eliminate the communication bottlenecks of traditional GPU clusters, providing faster and more efficient inference performance. Therefore, OpenAI and Cerebras have entered into a multi-year collaboration worth over $10 billion, procuring up to 750MW of low-latency computing power. This aims to accelerate complex queries, code generation, and real-time interactive experiences, while also serving as a strategic move to reduce reliance on NVIDIA and strengthen supply chain resilience.

The collaboration between OpenAI and Cerebras will be rolled out in phases, with infrastructure development starting in 2026 and full deployment completed by 2028. Cerebras will host and provide dedicated low-latency computing power for the data center, while OpenAI will receive dedicated ultra-low-latency computing capacity, which has already been used in the inference operation of the first collaborative model, GPT-5.3-Codex-Spark.

Codex-Spark is designed for real-time collaborative programming, with a dual-track automation mechanism.

OpenAI states that its recently launched cutting-edge models can autonomously execute complex tasks for extended periods, operating continuously for hours, days, or even weeks without human intervention. Codex-Spark, on the other hand, is the first model specifically designed for "real-time collaborative programming with Codex," allowing developers to instantly request code modifications, logic adjustments, and interface tweaks, and immediately see the results. This represents two automated workflow modes currently offered by Codex:

"One type is long-term, long-task automated execution, and the other type is real-time interaction, rapid modification, and instant feedback."

OpenAI stated that it will gradually expand the functionality and scope of openness based on feedback from developers' actual usage.

Low-latency resources are limited, and traffic throttling may occur during peak hours.

During the research preview phase, Codex-Spark provides a 128k context length, supports only text input, and has independent traffic and rate limits, without consuming the quota of a standard model. OpenAI also reminds users that due to the use of special low-latency computing resources, queuing or temporary access restrictions may occur during peak usage periods to maintain overall service stability.

Codex-Spark optimizes interactive programming, balancing speed and performance.

Codex-Spark is optimized for interactive programming scenarios, emphasizing that speed and capability are equally important. Users can interrupt or adjust the direction in real time during model operation, and quickly and repeatedly modify the content.

To ensure rapid response, OpenAI's system adopts a lightweight workflow by default, making only the minimum necessary modifications and not automatically executing tests unless explicitly requested by the user. Official examples include application scenarios such as creating a Snake game, planning projects, and translating files. The image below is an official example, emphasizing:

"When making games, GPT-5.3-Codex-Spark has surpassed its previous model, GPT-5.3-Codex, in terms of coding capabilities and speed."

Performance-oriented evolution, software optimization combined with low-latency chips to assist

OpenAI stated that Codex-Spark significantly reduced the overall time to complete tasks and simultaneously optimized the entire process from request submission to response return, including a reduction of approximately 80% in client-server round-trip overhead and a reduction of approximately 30% in the processing burden per token. Furthermore, the time it takes for the dialog box to display the first response text after a user submits a request is reduced by approximately 50%, resulting in a significant improvement in overall interaction smoothness.

On the hardware side, Codex-Spark is deployed on Cerebras' Wafer Scale Engine 3 low-latency inference platform and has been integrated into OpenAI's existing production architecture. OpenAI explains that GPUs remain the core force for training and inference, responsible for large-scale and cost-effective computations, while Cerebras supplements the ultra-low latency scenarios; the two can be used together in the same workflow.

Currently, Codex-Spark is available to ChatGPT Pro users in research preview form, and the API is only available for testing by a few design partners. In terms of security, it has passed the standard assessment and has not reached the internal high-risk capability threshold. In the future, it will also develop towards a dual-mode approach that gradually integrates real-time interaction and long-term tasks.

(OpenAI releases a new Codex app for macOS! Available for a limited time to free ChatGPT users.)

This article, "OpenAI GPT-5.3-Codex-Spark Launched: Pro Users Get Early Access and Faster Responses," first appeared on ABMedia .

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments