If AIGC has ushered in the intelligent era of content generation, then AI Agent has the opportunity to truly commercialize AIGC's capabilities.
AI Agent is like a more concrete and versatile employee. It is known as the primary form of artificial intelligence robot. It can observe the surrounding environment, make decisions, and take actions automatically like humans.
Bill Gates once said, "Mastering AI Agent is the real achievement. By then, you will no longer need to search for information on the Internet yourself." Authoritative experts in the field of AI also have high hopes for the prospects of AI Agent. Microsoft CEO Satya Nadella once predicted that AI Agent will become the main way of human-computer interaction, able to understand user needs and actively provide services. Professor Andrew Ng also predicted that in the future work environment, humans and AI Agents will collaborate in a closer way to form an efficient working model and improve efficiency.
AI Agent is not only a product of technology, but also the core of future life and work methods.
This reminds us that when Web3 and blockchain first became widely discussed, people often used the word "disruption" to describe the potential of this technology. Looking back over the past few years, Web3 has gradually developed from the initial ERC-20 and zero-knowledge proof to DeFi, DePIN, GameFi, etc. that are integrated with other fields.
If Web3 and AI, two popular digital technologies, are combined, will the effect of 1+1>2 be achieved? Can Web3 AI projects, which are increasingly funded, bring new use case paradigms to the industry and create new real needs?
AI Agent: The most ideal intelligent assistant for humans
Where is the imagination of AI Agent? A popular answer on the Internet is, "Large language models can only program a snake, but AI Agent can program an entire King of Glory." It sounds exaggerated, but it is not an exaggeration.
Agent is usually translated as "intelligent body" in China. This concept was proposed by Minsky, the "father of artificial intelligence" in his book "The Society of Thought" published in 1986. Minsky believed that some individuals in society can reach a solution to a problem after negotiation, and these individuals are agents. For many years, agents have been the cornerstone of human-computer interaction. From Microsoft's editing assistant Clippy to Google Docs' automatic suggestions, these early forms of agents have shown the potential for personalized interaction, but their ability to handle more complex tasks is still limited. It was not until the emergence of large language models (LLMs) that the true potential of agents was tapped.
In May this year, Professor Andrew Ng, an authoritative scholar in the field of AI, shared a speech on AI Agent at the Sequoia AI event in the United States. In it, he demonstrated a series of experiments conducted by his team:
Let the AI write some code and run it, and compare the results of different LLMs and workflows. The results are as follows:
GPT-3.5 model: 48% accuracy
GPT-4 model: 67% accuracy
GPT-3.5 + Agent: Higher performance than GPT-4 model
GPT-4 + Agent: Much better than the GPT-4 model, very good
Indeed. Most people use LLMs such as ChatGPT in the following way: they input a prompt word, and the large model will immediately generate the answer without automatically identifying and correcting errors, deleting or rewriting.
In contrast, the AI Agent workflow looks like this:
First, let the LLM write an article outline. If necessary, search the content on the Internet for research and analysis, output a first draft, then read the draft and think about how to optimize it. Repeat this cycle and iterate multiple times to finally output a high-quality article with rigorous logic and the lowest error rate.
We can see that the difference between AI Agent and LLM is that the interaction between LLM and humans is based on prompts. AI Agent only needs to set a goal, and it can think and act independently towards the goal. It breaks down each step of the plan in detail according to the given task, relies on feedback from the outside world and independent thinking, and creates prompts for itself to achieve the goal.
Therefore, OpenAI defines AI Agent as: a system driven by LLM as its brain, with the ability to autonomously understand perception, planning, memory and use tools, and can automatically perform complex tasks.
When AI changes from a tool to a subject that can use the tool, it becomes an AI Agent. This is why AI Agent can become the most ideal intelligent assistant for humans. For example, AI Agent can understand and remember users' interests, preferences, and daily habits based on their historical online interactions, identify users' intentions, proactively make suggestions, and coordinate multiple applications to complete tasks.
Just as Gates envisioned, in the future we will no longer need to switch to different applications for different tasks. We only need to use ordinary language to tell computers and mobile phones what we want to do. Based on the data the user is willing to share, the AI Agent will provide personalized responses.
One-person unicorn companies are becoming a reality
AI Agent can also help enterprises create a new intelligent operation model with "human-machine collaboration" as the core. More and more business activities will be completed by AI, while humans only need to focus on the decision-making of corporate vision, strategy and key paths.
As OpenAI CEO Sam Altman once mentioned in an interview, with the development of AI, we are about to enter the era of "single-person unicorns" , that is, companies founded by a single person and valued at $1 billion.
It sounds like a fantasy, but with the help of AI Agent, this idea is becoming a reality.
Let's assume that we are starting a tech startup. According to the traditional approach, I would obviously need to hire software engineers, product managers, designers, marketers, sales, and finance people, each with their own responsibilities but all coordinated by me.
So if I use AI Agent, I may not even need to hire employees.
Devin — Automated Programming
Instead of a software engineer, I might use Devin, an AI software engineer that has become very popular this year, which can help me complete all the front-end and back-end work.
Devin was developed by Cognition Labs and is known as the "world's first AI software engineer." It can independently complete the entire software development work, analyze problems, make decisions, write code, and fix errors independently. It can greatly reduce the workload of developers. Devin received $196 million in financing in just half a year, and its valuation quickly soared to billions of dollars. Investors include well-known venture capital companies such as Founders Fund and Khosla Ventures.
Although Devin has not yet launched a public version, we can get a glimpse of its potential from another recently popular Web2 product, Cursor. It can do almost all the work for you, turning a simple idea into functional code in a few minutes. You only need to give orders and you can "sit back and enjoy the results." It is reported that an eight-year-old child, without any programming experience, actually used Cursor to complete the code work and built a website.
Hebbia — file handling
Instead of a product manager or financial person, I might choose Hebbia, which can help me organize and analyze all documents.
Unlike Glean, which focuses on document search within the enterprise, Hebbia Matrix is an enterprise-level AI Agent platform that uses multiple AI models to help users efficiently extract, structure, and analyze data and documents, thereby driving the improvement of enterprise productivity. Impressively, Matrix can process millions of documents at a time.
Hebbia completed a $130 million Series B round in July this year, led by a16z, with participation from well-known investors such as Google Ventures and Peter Thiel.
Jasper AI — Content Generation
Instead of social media operations and designers, I might choose Jasper AI, which can help me generate content.
Jasper AI is an AI Agent writing assistant designed to help creators, marketers, and businesses streamline the content generation process and improve productivity and creative efficiency. Jasper AI can generate multiple types of content based on the style required by the user, including blog posts, social media posts, advertising copy, and product descriptions. It also generates images based on the user's description to provide visual assistance for the text content.
Jasper AI has received $125 million in financing and reached a valuation of $1.5 billion in 2022. According to statistics, Jasper AI has helped users generate more than 500 million words, becoming one of the most widely used AI writing tools.
MultiOn — Web Automation
Instead of an assistant, I might choose MultiOn to help me manage daily tasks, arrange schedules, set reminders, and even plan business trips, automatically book hotels, and arrange online taxis.
MultiOn is an automated web-task AI agent that can help perform tasks autonomously in any digital environment, such as helping users complete personal tasks such as online shopping and booking appointments to improve personal efficiency, or helping users simplify daily tasks and improve work efficiency.
Perplexity — search, research
As an alternative to researcher, I would probably choose Perplexity, which is what Nvidia’s CEO uses every day.
Perplexity is an AI search engine that understands user questions, breaks them down, then searches and integrates content and generates reports to provide users with clear answers.
Perplexity is suitable for various user groups. For example, students and researchers can simplify the information retrieval process when writing and improve efficiency; marketers can obtain reliable data to support marketing strategies.
The above content is just imagination. The real ability and level of these AI agents are not enough to replace the elite talents in various industries. As Li Bojie, co-founder of Logenic AI, said, the current LLM ability is only at the entry level, far from the expert level. At this stage, AI agents are more like employees who work faster but are not very reliable.
However, these AI agents, with their respective strengths, are helping existing users improve efficiency and convenience in a variety of scenarios.
Not only technology companies, but all walks of life can benefit from the wave of AI Agents. In the field of education, AI Agents can provide personalized learning resources and guidance based on students' learning progress, interests and abilities; in the financial field, AI Agents can help users manage personal finances, provide investment advice, and even predict stock trends; in the medical field, AI Agents can help doctors diagnose diseases and formulate treatment plans; in the e-commerce field, AI Agents can also serve as intelligent customer service, automatically answering user inquiries, handling order issues and return requests through natural language processing and machine learning technology, thereby improving customer service efficiency.
Multi-Agent: The Next Step for AI Agents
In the previous section, regarding the concept of a single-person unicorn company, a single AI Agent faces limitations when handling complex tasks and is difficult to meet actual needs. When using multiple AI Agents, since these AI Agents are based on heterogeneous LLMs, collective decision-making is difficult and their capabilities are limited, so humans are needed to act as dispatchers between these independent AI Agents to coordinate the work of these AI Agents serving different application scenarios. This gave rise to the rise of the "Multi-Agent (multi-agent framework)".
Complex problems often require the integration of multiple knowledge and skills, but a single AI agent has limited capabilities and is unable to cope with them. By organically combining AI agents with different capabilities, the Multi-Agent system allows AI agents to play to their respective strengths, learn from each other's strengths and make up for each other's weaknesses, thereby solving complex problems more effectively.
This is very similar to our actual workflow or organizational structure: a leader assigns tasks, people with different abilities are responsible for different tasks, the results of each process are given to the next process, and finally the final task results are obtained.
In the implementation process, lower-level AI Agents perform their respective tasks, while higher-level AI Agents assign tasks and supervise their completion.
Multi-Agent can also simulate our human decision-making process. Just like when we encounter a problem, we will consult with others. Multiple AI agents can also simulate the behavior of collective decision-making and provide us with better information support. For example, AutoGen developed by Microsoft meets this requirement:
Ability to create AI Agents with different roles. These AI Agents have basic conversation capabilities and can generate responses based on received messages.
Use GroupChat to create a group chat environment with multiple AI Agents. In this GroupChat, there is an AI Agent with the role of administrator to manage the chat records, speaker order, and speech termination of other AI Agents.
If applied to the concept of a single-person unicorn company, we can use the Multi-Agent architecture to create several AI agents with different roles, such as project managers, programmers, or supervisors. We tell them our goals and let them think of ways to solve problems. We just need to listen to their reports and let them make changes if we have any opinions or if they do something wrong until they are satisfied.
Compared with a single AI Agent, Multi-Agent can achieve:
Scalability: Handle larger problems by increasing the number of AI agents, each handling a portion of the task, allowing the system to scale as demand grows.
Parallelism: Naturally supports parallel processing, multiple AI agents can work on different parts of a problem at the same time, thus speeding up problem solving.
Decision Improvement: Enhance decision making by aggregating insights from multiple AI agents, each with their own perspective and expertise.
As AI technology continues to advance, it is conceivable that the Multi-Agent framework will play a greater role in more industries and promote the development of various new AI-driven solutions.
AI Agents are blowing towards Web3
Stepping out of the laboratory, AI Agent and Multi-Agent still have a long way to go.
Putting aside Multi-Agent, even the most advanced single AI Agent has a clear upper limit on the computing resources and computing power it requires at the physical level, and cannot be expanded infinitely. Once faced with extremely complex and computationally intensive tasks, AI Agent will undoubtedly encounter computing bottlenecks and its performance will be greatly reduced.
Furthermore, AI Agent and Multi-Agent systems are essentially a centralized architecture model , which determines that it has a very high risk of single failure. More importantly, the monopoly business model based on closed-source large models of companies such as OpenAI, Microsoft, and Google seriously threatens the survival environment of independent and single AI Agent startups, making it impossible for AI Agents to smoothly use huge corporate private data to make them smarter and more efficient. AI Agents are in urgent need of a democratic collaborative environment so that truly valuable AI Agents can serve a wider range of people in need and create greater value for society.
Finally, although AI Agent is closer to the industry than LLM, its development is based on LLM. The current large model track is characterized by high technical barriers, high capital investment, and immature business models. AI Agent usually finds it difficult to obtain financing for continuous updates and iterations.
The Multi-Agent paradigm is an excellent angle for Web3 to assist AI. Many Web3 development teams are already investing in research and development to provide solutions in these areas.
AI Agent and Multi-Agent systems usually require a lot of computing resources to make complex decisions and process tasks. Web3 can build a decentralized computing power market through blockchain and decentralized technology, so that computing power resources can be distributed and utilized more fairly and efficiently on a global scale. Web3 projects such as Akash, Nosana, Aethir, IO.net, etc. can provide computing power for AI Agent decision-making and reasoning.
Traditional AI systems are often managed in a centralized manner, which causes AI Agents to face single point failures and data privacy issues. The decentralized nature of Web3 can make Multi-Agent systems more decentralized and autonomous. Each AI Agent can run independently on different nodes and autonomously execute user requirements, which enhances robustness and security. By establishing incentive and punishment mechanisms for pledgers and delegators through mechanisms such as PoS and DPoS, the democratization of single AI Agent or Multi-Agent systems can be promoted.
In this regard, GaiaNet, Theoriq, PIN AI, and HajimeAI have all made very cutting-edge attempts.
Theoriq is a project serving "AI for Web3". It hopes to establish a calling and economic system for AI Agents through Agentic Protocol, popularize the development of Web3 and many functional scenarios, and provide verifiable model reasoning capabilities for Web3 dApp.
GaiaNet is a node-based AI Agent creation and deployment environment that aims to protect the intellectual property and data privacy of experts and users, in order to counter the centralized OpenAI GPT Store.
HajimeAI builds on both to establish AI Agent workflows based on actual needs and to make the intent itself intelligent and automated, echoing the "personalization of AI intelligence" mentioned by PIN AI.
At the same time, Modulus Labs and ORA Protocol have made progress in the zkML and opML algorithms of AI Agents respectively.
Finally, the development and iteration of AI Agent and Multi-Agent systems often require a lot of financial support, and Web3 can help potential AI Agent projects obtain valuable early support through its front-end liquidity feature.
Both Spectral and HajimeAI have proposed product ideas that support the issuance of AI Agent assets on the chain: by issuing tokens through IAO (Initial Agent Offering), AI Agents can directly obtain funds from investors and become a member of DAO governance, providing investors with opportunities to participate in project development and share future profits. Among them, HajimeAI's Benchmark DAO hopes to organically combine decentralized AI Agent scoring and AI Agent asset issuance through crowdfunding and token incentives, creating a closed loop of AI Agent financing and cold start based on Web3, which is also a relatively novel attempt.
The Pandora's box of AI has been opened. Everyone in it is both excited and confused. No one knows whether the enthusiasm is an opportunity or a reef. Today, all walks of life are no longer in the era of PPT financing. No matter how cutting-edge the technology is, it can only realize its value when it is implemented. The future of AI Agent is destined to be a long marathon, and Web3 is ensuring that it will not be out of the game.