OpenAI’s first AI agent “Operator” is here! Help you shop, book tickets, order food delivery...solve tedious online tasks

This article is machine translated
Show original
Here is the English translation of the text, with the specified terms retained:

AI agents (agents) are a very important track in the AI industry and the crypto field this year. After the launch of Anthropic's "Computer Use": an AI system that can operate computer interfaces like humans last October, the development of AI agents has opened up even broader imagination.

And today, the leading generative AI (AI) company OpenAI has also officially launched its first AI agent "Operator", becoming a heavyweight topic in the AI community.

Operator Function and Scope of Use

According to reports, Operator is an AI agent that can autonomously control the browser and perform various tasks for users. Users only need to describe the task they want to complete, and Operator can handle the rest of the work, such as booking travel and restaurants on Booking.com, ordering groceries and takeout on UBER, filling out forms, helping you collect shopping lists, creating memes... It can handle multiple tasks at the same time (just like we open multiple tabs in the browser).

In addition, it can also remember users' preferences and settings to provide more personalized services; users can also intervene in the operation at any time to adjust the operation or terminate the task.

In addition to the convenience of the features, Operator also attaches great importance to user privacy and security. The official said that users can delete all browsing records and log out of all websites with one click. At the same time, OpenAI provides privacy setting options, and users can choose to turn off the "model improvement" feature to avoid their data being used for model training.

Operator is currently a research preview version, only open to professional version users in the US (subscription fee is $200 per month), and users can access it through the website Operator.ChatGPT.com. It will be expanded to Plus, Teams and Enterprise users in the future.

Operating Principle

Operator operates based on a new model called "Computer-Using Agent (CUA)". CUA combines GPT-4o's visual processing capabilities and the advanced reasoning brought by reinforcement learning, specifically trained to interact with graphical user interfaces (GUIs), such as buttons, menus, and text fields on the screen.

Through screenshots, Operator can "see" the content of the interface, and can "interact" by using mouse and keyboard operations to achieve webpage operations without API integration.

When faced with challenges or errors, Operator will use its reasoning ability to self-correct; if it cannot solve the problem, it will return control to the user to ensure smooth operation and collaboration with the user to complete the task.

OpenAI said it has established partnerships with some partners, including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber, etc., to ensure that Operator meets actual needs while complying with established norms.

Operator Limitations

However, according to entrepreneur Greg Isenberg sharing, Operator also has some limitations. For example, it cannot handle payment or login-related tasks, may get stuck in complex interfaces, is powerless against CAPTCHAs (verification codes), and has a limited number of daily uses. In addition, the launch time in Europe has not yet been determined, and according to OpenAI CEO Sam Altman, it will "take some time".

Looking to the future, Operator will open up APIs to provide support for developers, continuously enhance its functions and expand its user coverage, and will eventually integrate this feature directly into ChatGPT.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
1
Comments