Author: Hu Xiaomeng, Chen Chuyi Tencent Research Institute
AI Agent is undoubtedly the most exciting development line of large models at the moment. It is called "the next battle of large models", "the last killer product" and "the Agent-centric that opens the new era of industrial revolution". On November 7, OpenAI’s first developer conference (OpenAI DevDay) detonated AI Agent. OpenAI released the initial form of AI Agent product GPTs and launched the corresponding production tool GPT Builder. Users can generate exclusive GPT just by chatting with GPT Builder and describing the GPT function they want. Exclusive GPT can be more applicable in daily life, specific tasks, work or home. To this end, OpenAI has also opened up a large number of new APIs (including vision, image DALL·E3, and voice), as well as the newly launched Assistants API, allowing developers to more easily develop their own GPT. Bill Gates' latest article clearly stated that AI Agents will become popular within 5 years, and each user will have an exclusive AI Agent. The user no longer needs to use different apps for different functional requirements. He only needs to tell his Agent what he wants to do in everyday language. [1]

Within a week after GPTs was released, it has accumulated more than 17,500
So, what exactly is AI Agent? Why is it so important that the industry has paid such high attention? Some scholars even asserted that "the good development of the American Agent Store will continue to widen the gap between large models between China and the United States" [2]
What is AI Agent?
In the professional and technical fields of computers and artificial intelligence, agent is generally translated as "intelligent", which is defined as autonomous, reactive, social, pre-emptive, speculative (deliberative), and cognitive in a certain environment. A software or hardware entity with one or more intelligent characteristics such as intelligence. [3]
OpenAI defines AI Agent as a system that uses a large language model as the brain driver, has the ability to autonomously understand perception, planning, memory and tool use, and can automatically perform complex tasks. [4] The basic framework of AI Agent is as follows:

Basic framework of Agent based on LLM driver[5]
It has four main modules: memory, planning, action and using tools:
(1) Memory. The memory module is responsible for storing information, including past interactions, learned knowledge, and even temporary task information. For an intelligent agent, an effective memory mechanism can ensure that it can call on past experience and knowledge when facing new or complex situations. For example, a chatbot with a memory function can remember the user's preferences or previous conversations to provide a more personalized and coherent communication experience. It is divided into short-term memory and long-term memory: a. Short-term memory, all contextual learning uses short-term memory to learn; b. Long-term memory, which provides the agent with the ability to retain and recall (infinite) information for a long time, usually It is through the use of external vector databases and rapid retrieval, such as a large amount of data and knowledge accumulated in a certain industry field. With long-term memory, a lot of data can be accumulated, making the agent more usable and with advantages such as industry depth, personalization, and specialized capabilities.
(2) Planning. The planning module has two stages: pre-planning and post-reflection. In the pre-planning stage, which involves prediction and decision-making of future actions, such as when performing complex tasks, the agent decomposes large goals into smaller, manageable sub-goals, so that it can efficiently plan a series of steps or actions, to achieve the desired results. In the post-reflection stage, the agent has the ability to check and improve the shortcomings in the plan, reflect on the mistakes and shortcomings, and learn lessons for improvement, forming and adding long-term memory to help the agent avoid mistakes and update its understanding of the world in the future. .
(3) Tool use. The tool usage module refers to the ability of the agent to utilize external resources or tools to perform tasks. For example, learn to call external APIs to obtain additional information missing in model weights, including current information, code execution capabilities, access to proprietary information sources, etc., to supplement LLM's own weaknesses. For example, the training data of LLM is not updated in real time. In this case, you can use tools to access the Internet to obtain the latest information, or use specific software to analyze large amounts of data. There are already a large number of digital and intelligent tools on the market. Agents use tools more smoothly and efficiently than humans. By calling different APIs or tools, they can complete complex tasks and output high-quality results. This way of using tools also represents An important feature and advantage of the intelligent agent.
(4) Action. The action module is the part of the agent that actually performs the decision or response. Facing different tasks, the agent system has a complete set of action strategies, and can choose the actions to be performed when making decisions, such as the well-known memory retrieval, reasoning, learning, programming, etc.
Overall, these four modules work together to enable agents to take actions and make decisions in a wider range of situations, performing complex tasks in a smarter and more efficient way. [6]
AI Agent will bring
Wider human-machine integration
Agents based on large models will not only allow everyone to have a dedicated intelligent assistant with enhanced capabilities, but will also change the model of human-machine collaboration and bring about broader human-machine integration. The intelligent revolution of generative AI has evolved so far, and three modes of human-machine collaboration have emerged:
(1) Embedding mode. Users communicate with AI through language and use prompt words to set goals, and then AI assists users in completing these goals. For example, ordinary users input prompt words into generative AI to create novels, musical works, 3D content, etc. In this model, AI functions as a tool to execute orders, while humans play the role of decision-makers and commanders.
(2) Copilot mode. In this model, humans and AI are more like partners, participating in the work process and each playing a role. AI intervenes in the workflow, from providing suggestions to assisting in various stages of the process. For example, in software development, AI can help programmers write code, detect errors, or optimize performance. Humans and AI work together in this process, complementing each other’s capabilities. AI is more like a knowledgeable partner than a mere tool.
In fact, in 2021, Microsoft introduced the concept of Copilot (co-pilot) for the first time on GitHub. GitHub Copilot is an AI service that assists developers in writing code. In May 2023, with the support of large models, Microsoft Copilot ushered in a comprehensive upgrade, launching Dynamics 365 Copilot, Microsoft 365 Copilot and Power Platform Copilot, etc., and proposed the concept of "Copilot is a new way of working". As is the case with work, life also requires "Copilot". Li Zhifei, founder of Mobvoi, believes that the best job for large models is to be a "Copilot" for humans.
(3) Agent mode. Humans set the goals and provide the necessary resources (such as computing power), then the AI does most of the work independently, and finally the human oversees the process and evaluates the final results. In this mode, AI fully embodies the interactive, autonomous and adaptable characteristics of intelligent agents and is close to independent actors, while humans play more of a supervisor and evaluator role.

Three ways for humans and AI to collaborate[7]
Judging from the previous functional analysis of the four main modules of the agent: memory, planning, action and tool use, the agent mode is undoubtedly more efficient than the embedded mode and co-pilot mode, and may become the main mode of human-machine collaboration in the future. .
Based on the human-machine collaboration model of Agent, every ordinary individual may become a super individual. A super individual has its own AI team and automated task workflow, and establishes a more intelligent and automated collaborative relationship with other super individuals based on Agent. Nowadays, there is no shortage of one-person companies and super individuals actively exploring in the industry. There are some automated teams based on Agents on the Github platform - the GPTeam project. GPTeam uses large models to create multiple agents assigned roles and functions, and the multiple agents collaborate to achieve predetermined goals. For example, Dev-GPT is a multi-agent collaboration team for automated development and operation and maintenance, including product manager Agent, developer Agent, operation and maintenance agent and other role divisions. This multi-agent team can meet and support the normal operations of a start-up marketing company. This is a one-person company. Another example is NexusGPT, which claims to be the world’s first AI freelance platform. [8] The platform integrates various AI native data in open source databases and has more than 800 AI agents with specific skills. On this platform, you can find experts in different fields, such as designers, consultants, sales representatives, etc. Employers can choose an AI agent on this platform at any time to help them complete various tasks.

AI Agents will change the rules of the software game
Promote AI infrastructure
AI Agents are redefining software. Bill Gates believes that AI Agent will completely subvert the software industry and affect how we use software and how we write software. [9]
AI Agent will shift the paradigm of software architecture from process-oriented to goal-oriented. Existing software (including APP) fixes the process through a series of predefined instructions, logic, rules and heuristic algorithms to ensure that the software operation results meet the user's expectations, that is, the user operates step by step according to the instruction logic to achieve the goal. Such a process-oriented software architecture has high reliability and certainty. However, this goal-oriented architecture can only be applied to vertical fields and cannot be universally applied to all fields. Therefore, how to balance standardization and customization has become one of the problems faced by the SaaS industry.

Software architecture paradigm migration[10]
The AI Agent paradigm has gradually migrated from functional development originally led by humans to AI as the main driving force. With the large model as the technical infrastructure and the Agent as the core product form, the task hierarchy of instructions, logic, rules and heuristic algorithms predefined by traditional software is evolved into the autonomous generation of goal-oriented intelligent agents. In this way, the original architecture can only solve a limited range of tasks, while the future architecture can solve tasks in an infinite domain. [11] In the future software ecosystem, not only will the top layer interact with everyone through Agent, but the development of the entire industry, whether it is underlying technology, business models, intermediate components, or even people's living habits and behaviors, will all change around Agent. , this is the beginning of the Agent-Centric era. [12]

Comparison between RPA paradigm (Robotic Process Automation) and APA paradigm (Agentic Process Automation) [13]
Take the ChatDev intelligent software development platform, the first "large model + agent" SaaS-level product released by Mianbi Intelligence, as an example. The platform is like a software development company composed entirely of AI Agents, with various Agent roles such as CEO, CTO, development manager, product manager, test specialist, and supervisor. Users only need to tell the Agent in the CEO role their clear needs, and the CEO will organize the entire software development process based on the user's needs. The final delivery to users includes the software product and the code throughout the development process, and all processes are automated. [14] This will enable the software industry to reduce production costs, improve customization capabilities, and enter the "3D printing" era of software.
Prospects and challenges of AI Agent
AI Agent is an important driving force for artificial intelligence to become infrastructure. Looking back at the history of technology development, the end of technology is to become infrastructure. For example, electricity has become an infrastructure that is as invisible to people as air, but it is essential, such as cloud computing. Of course, this will go through the following three stages: Innovation and development stage - new technology is invented and begins to be applied; Popularization and application stage - as the technology matures, it begins to be widely used in various fields, having a profound impact on society and economy ; Infrastructure stage - when technology becomes so common that it is almost everywhere, it transforms into a kind of infrastructure and has become an indispensable part of people's daily lives. Almost everyone agrees that artificial intelligence will become the infrastructure of future society. And agents are driving the infrastructure of artificial intelligence. This is not only due to the advantages of low-cost Agent software production, but also because Agent can adapt to different tasks and environments, and can learn and optimize its performance, so that it can be used in a wide range of fields, and then become the basis for various industries and social activities. Basic support.

Overview of artificial intelligence agent applications[15]
The agent may iterate in both directions simultaneously in the next step. The first is an intelligent agent that assists humans by performing various tasks, focusing on tool attributes; the second is an iteration in the direction of anthropomorphism, capable of independent decision-making, long-term memory, and certain personality-like characteristics, focusing on human-like or Superhuman attributes.
From the perspective of technical optimization iteration and implementation, the development of AI Agent also faces some bottlenecks:
First of all, we can also see through OpenAI's GPTs that LLM's complex reasoning capabilities are not strong enough and the delay is too high, which inhibits the true maturity of Agent applications. This is also the direction for the next breakthroughs in engineering optimization and technological research in the industry.
Secondly, the development of multi-agent still faces major difficulties. Multi-agent is a very complex academic research direction. As agents begin to spread to the mass market, it has become an important technical reality issue. For example, Stanford's virtual town contains multi-agent research with 25 agents. However, after the town framework is open sourced, according to developer testing, an Agent needs to consume 20 US dollars in tokens a day because it requires a lot of memory and action thinking. This price is higher than that of many human workers, and requires subsequent dual optimization of the Agent framework and LLM inference side.
Breaking through the development dilemma of multi-agent is an important prerequisite for the establishment of the future Agent Society. Multi-agent collaboration can form an intelligent society, the highest form of technological social system. The intelligent society is complex, dynamic, self-organizing and adaptive, capable of collaboration, competition and continuous evolution. In this social system, intelligent agents can perform complex and flexible tasks according to goals and environmental changes, and conduct high-level, multi-dimensional interactions and collaborations with humans and other intelligent agents. The intelligent society not only helps humans explore and expand the physical and virtual world, but also enhances and expands human capabilities and experiences.
At the same time, these development trends indicate that AI Agent may face many challenges such as security and privacy, ethics and responsibility, economic and social employment impacts, etc.
(1) Security and privacy are key features of an intelligent agent, which are crucial to its stable operation and the protection of users and society. These two factors directly affect the trust and control of AI agents. If the AI agent has vulnerabilities, attacks, or data leaks, it may cause damage to users or society. For example, shortly after the release of OpenAI's GPTs, a security vulnerability occurred, resulting in the leakage of data uploaded by users.
(2) Ethics and responsibility are the core principles of an intelligent agent, determining its values and goals, as well as respect and protection for users and society. These principles directly affect the credibility and controllability of the agent. If the intelligent agent shows problems such as unfairness, opacity or unreliability, it may cause users or society to reject the technology. Responsibility attribution is also a key issue for intelligent agents. Unclear or unfair responsibility attribution in the collaboration between humans and intelligent agents will also have serious consequences.
(3) Economic and social employment impact. An important challenge in future work is the competition between humans and agents. For example, the emergence of the AI freelance platform NexusGPT is an impact on traditional freelancers. In the future social work collaboration, more and more intelligent agents will appear. Based on efficiency and effectiveness considerations, employers may try to reduce human investment. As agent technology matures, we must think ahead to the long-term impact of these technological developments on society and individual careers.

With the release of ChatGPT as a watershed, the number and income of writing/editing practitioners on global freelance platforms have entered a cliff-like decline [16]




