OpenAI releases smart contract benchmarks. What does this mean?

This article is machine translated
Show original
Author: @chaowxyz

Original link: https://x.com/chaowxyz/status/2024358080910413973

Disclaimer: This article is a reprint. Readers can obtain more information through the original link. If the author has any objection to the reprint format, please contact us and we will modify it according to the author's request. This reprint is for information sharing only and does not constitute any investment advice, nor does it represent Wu Blockchain views or positions.

The full text is as follows:

This is not only a test of contract capabilities, but also an on-chain survival test for agents.

I woke up to a ton of private messages this morning, which startled me so much I thought AGI had been implemented. Upon closer inspection, it turned out to be OpenAI's newly released smart contract benchmark. Let me briefly explain this.

In short, the ability of agents to understand, repair, and utilize smart contracts is not intended to steal business from crypto security companies. In my view, these capabilities point to a more fundamental question: can agents truly survive and operate in the crypto environment in the future? OpenAI's evmbench serves as a benchmark for measuring this survivability.

I was away during the Chinese New Year and didn't have time to analyze the report in detail. I just skimmed through it and my initial impression is that it's an innovative benchmark, but overall it's still in its early stages and quite rudimentary.

The benchmark used 120 high-risk vulnerabilities that occurred in 40 real-world projects.

The exam consists of three parts: Part 1: Spotting vulnerabilities. Finding vulnerabilities. Part 2: Patching. Given vulnerable code, it fixes the vulnerabilities. Part 3: Attack. The AI ​​acts as a hacker, launching an attack on an encrypted wallet in a locally built environment. I won't go into the more detailed technical aspects; rather than the methodology and question details of eVMbench itself, I'm more interested in why OpenAI released this.

Over the past few years, OpenAI has not shown particular interest in the field of cryptography. This release clearly saw the involvement of crypto VC Paradigm, whose motives are easy to understand. However, the fact that OpenAI is listed as the first author indicates that OpenAI was not merely passively cooperating, but rather actively seeking to contribute.

Where does this desire come from?

One straightforward explanation is that this is an extension of OpenAI's internal Preparedness Framework, assessing the capabilities of cutting-edge models in high-risk scenarios, with smart contract security being just one part. But that's clearly not the whole story.

Agents utilizing encrypted networks is not just a possibility, but to some extent an inevitability. OpenAI certainly recognizes this, explicitly stating in its report that "we expect agentic stablecoin payments to grow."

However, I believe this proposition extends beyond agent payments. Most of the agents we're discussing now are still tool-like: humans issue instructions, agents execute them, and the results are returned to the human. But this won't be the end. When there are enough agents, and their capabilities are strong enough, they will obviously begin to collaborate directly: one agent hires another to complete a sub-task, one agent purchases data or computing power from another, and one agent represents one organization in negotiating, signing contracts, and fulfilling obligations with agents from other organizations.

The person withdrew from the middle of the transaction.

At this point, a fundamental question emerges: when people are no longer in the middle, how does this economic system operate?

Human society solves trust and cooperation issues through a system accumulated over thousands of years of carbon-based civilization, including laws, reputation, and institutional guarantees. However, the underlying logic of this system is designed for humans: participants have persistent identities, face social consequences, and are subject to accountability. Agents inherently do not meet these prerequisites. They can initiate thousands of transactions in a second, destroy and rebuild identities at any time, and disregard any legal boundaries.

Some might argue forcibly binding Agents to human identities, using human authorization as a guarantee. However, this is tantamount to imposing a set of shackles designed for carbon-based life on a species operating at a completely different speed and scale. This is not merely inefficient, but a fundamental misunderstanding of what an Agent is. Moreover, the evolutionary direction of Agents inevitably points towards greater autonomy. Future Agents may not be dependent on any individual human, have no "master," and have no human identity to bind to; they will be independent actors. At that point, this binding logic will lack even an anchor.

Applying human trust infrastructure to an agent society is like using road rules for horse-drawn carriages to govern airplanes.

Agent societies need their own infrastructure.

Smart contracts make this possible. They don't rely on "you believing the other party will fulfill their obligations," but rather write the fulfillment conditions into code, which is then enforced by the network. There are no arbitrators, no waiting periods; the conditions are triggered, and the result occurs automatically.

Furthermore, smart contracts may not just be settlement tools, but rather the agent organization itself—governance rules, resource allocation, and task scheduling are all defined on the blockchain, executed by code, and do not require any intermediary.

When some agents live on the blockchain, interacting with various contracts is their entire daily life. Understanding a contract, finding their place within complex protocols, identifying pitfalls, mitigating risks, and surviving in a world without customer service, appeals, or undo functions—all of this depends on the understanding and application of contracts. Insufficient ability translates into real losses, and misjudgments are permanent.

So looking back at EVMbench, the capabilities it measures—understanding contracts, discovering vulnerabilities, constructing transactions, and executing attacks—are essentially answering one question: Has the Agent learned how to survive in this new world?

OpenAI has likely realized that whoever's agent learns to survive autonomously in the on-chain world will gain entry to the next stage. Furthermore, future agents may no longer be categorized as belonging to any one entity; they may become independent individuals.

Finally, on a slightly unrelated note, I'd like to ask everyone to DM me because I worked on a project called CryptoBench a year and a half ago, and I appreciate that you all remember it. GitHub - xxcg322/CryptoBench

This is the first benchmark to test AI's capabilities in the cryptography field. It includes tests on various aspects such as cryptographic algorithms, blockchain underlying layers, smart contracts, ecosystem, and DAO governance. The smart contract part also includes detection and remediation. Some of the vulnerabilities referenced are the same as those referenced by OpenAI in this benchmark.

When Benchmark was released, it received a lot of support and encouragement from friends. However, at that time, I felt that not many people truly understood it. Although I haven't mentioned it for a long time, I'm still very satisfied and proud of it. In a few days, I'll talk about the story behind it, why I think this kind of benchmark is very important, what I learned from the process, and why I haven't mentioned it for the past year.

In addition, benchmarking is a field that I'm very interested in in AI. I recently conducted data research on 22,000 AI benchmarks of various types released between 2019 and 2025, and I made many interesting discoveries. I'll share them with you when I get back from my research.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
81
Add to Favorites
11
Comments