Compiled & TechFlow by TechFlow
Guests :
Shaw, partner at ai16z;
Karan , co-founder of Nous Research;
Ethan, co-founder of MyShell;
Justin Bennington, CEO of Somewheresy, CENTS;
EtherMage, Virtuals’ top contributor;
Tom Shaughnessy, Founding Partner, Delphi Ventures
Podcast source : Delphi Digital
Original title : Crypto x AI Agents: The Definitive Podcast with Ai16z, Virtuals, MyShell, NOUS, and CENTS
Air Date : November 23, 2024
Background Information
Join Shaw (Ai16z), Karan (Nous Research), Ethan(MyShell), Somewheresy (CENTS), EtherMage (Virtuals), and Tom Shaughnessy of Delphi for a special roundtable discussion that brings together the top minds in crypto and AI agents to discuss the evolution of autonomous digital life and the future direction of human-AI interaction.
Discussion highlights:
▸ The rapid development of AI agents on social media and its profound impact on the Web3 world
▸ How can crypto tokenization help advance intelligent technology and stimulate community vitality?
▸ Comparative analysis of the advantages of decentralized model training and centralized AI platforms
▸ In-depth exploration of the improvement of agent autonomy and the future path of general artificial intelligence (AGI)
▸ How AI agents can be deeply integrated with DeFi and social platforms
Self-introduction and team background
In this episode of the podcast, host Tom invited several guests from different projects to discuss topics about cryptocurrency and artificial intelligence agents. Each guest introduced themselves and shared their background and projects.
Guest Introduction
Justin Bennington : He is the founder of Somewhere Systems and the creator of Sentience.
Shaw : He is a long-time Web3 developer who founded ai16z, developed the Eliza project, supported various social and gaming applications, and is committed to open source contributor collaboration.
Ethan : He is the co-founder of MyShell, which provides an app store and workflow tools to help developers build a variety of AI applications, including image generation and speech capabilities.
EtherMage : He comes from Virtues Protocol and the team comes from Imperial College London. They are committed to promoting common ownership and core contributions of proxies and building standards to facilitate user access to proxies.
Karan : He is one of the founders of NOUS Research and created the Hermes model, which is the basis of many current agent systems. He focuses on the role of agents in human ecosystems and the impact of market pressures on the human environment.
Discover the most innovative agents
Justin : There are a lot of people doing storytelling through their own agents, each with their own unique approach. For example, agents like Dolo, Styrene, and Zerebro have gained popularity through imitation and interaction, while some actively social agents help people build better connections. It's really hard to choose just one.
Shaw : I have a lot of thoughts on this. Our project is growing rapidly, with many new features recently, such as EVM integration and Farcaster integration. Developers are constantly introducing new features and feeding them back into the project so that everyone can benefit. This collaboration model is great, and everyone is driving the competitiveness and fun of the project. For example, Roparito recently integrated TikTok into the proxy, demonstrating this ability to iterate quickly.
I think Tee The bot is pretty cool because it demonstrates the Trusted Execution Environment (TEE) and fully autonomous agents. And then there's Kin Butoshi, who is improving the agent on Twitter to be able to do more human interactions, like replying, retweeting, and liking, rather than just simple replies.
In addition, we have developers who are releasing plugins for RuneScape that allow agents to move around in the game. Every day brings new surprises, and I’m very excited. We are in an ecosystem where various teams are contributing their own strengths and pushing the development of open source technology.
I want to mention the Zerebro team in particular, who are working hard to push open source technology forward. We are forcing everyone to move faster and encourage everyone to open source their projects, which is good for everyone. We don't need to worry about competition, this is a trend of mutual progress, and in the end we will all benefit.
EtherMage : I think an interesting question is what proxies actually prefer. In the coming weeks we will see more proxy interactions and a leaderboard will emerge showing which proxy gets the most requests and which proxy is the most popular among other proxies.
Karan : Engagement metrics are going to be really important. There are some people doing really great work on this. I want to highlight Zerebro, which incorporates a lot of the magic of Truth Terminal. It fine-tunes the model to keep the search space within the scope of Twitter interactions, rather than simply using a general model. This focus allows the agent to better engage with users, giving a human feel rather than just responding mechanically.
I've also seen Zerebro and Eliza architectures in this regard. Everyone is launching proxy architectures that can be used modularly to keep up the competitive pressure. We use Eliza in our own architecture because we need to launch features quickly, while our architecture may take longer to complete. We support this open source collaboration model, and the best agents will emerge from our learning from other excellent projects.
Ethan : I think everyone is working hard to build better infrastructure to develop agents because there are a lot of creative ideas and models coming out. Better infrastructure makes it easier to develop new models. I particularly like two innovative agents, one is the use of computers from Answer Pick, which gives agents the ability to use mobile computing power. The other is browser automation agents, which can build more practical functions for people to affect the Internet and the real world.
Justin : This is a good point about the expansion of infrastructure options. For example, vvaifu is a good example, which introduced the Eliza framework into the platform as a service architecture, rapidly expanding the market and allowing many non-technical people to easily start agents. (Note from TechFlow: Waifu is a term derived from the Japanese Otaku Culture. It was originally used to refer to female characters in anime, games or other virtual works that people have emotional attachment to. It comes from the Japanese pronunciation of the English word "Wife", which is often used to express someone's strong love for a virtual character, and can even be said to be a projection of an "ideal partner.")
One direction we are working on is to enable our system to run completely locally, supporting image classification, image generation, etc. We realize that many people cannot afford to pay thousands of dollars per month, so we want to provide tools that allow people to run inference locally, reduce costs, and promote experimentation.
Karan : I would add that people shouldn't be required to pay thousands of dollars a month to keep agents running. I support a local approach where agents can self-pay for inference. Ideally, agents should have their own wallets that can pay for their own inference so that they can run independently and not rely on external funding.
In-depth discussion on agent architecture and development
Shaw : I see a lot of new technologies emerging. We support multiple chains, such as Solana, Starkware, EVM, and almost all chains have integrations. We want the agent to be self-sufficient. If you download Eliza, you can do free decentralized reasoning through Helius. We are also adding decentralized providers such as Infera, where users can pay for reasoning with cryptocurrency. This is the ultimate closed loop I hope to see.
We support all local models, and a lot of Eliza's features can be run locally, which is something we value very much. I think decentralized reasoning is a good example, where anyone can start a node on their own computer, do reasoning and get paid, so that the agent does not need to bear too much burden.
Karan : Interestingly, the TEE bot system we are running has been combined with H200 Boxes (hardware devices or servers equipped with H200 GPUs) so that it can run locally without being affected by latency. We don't need to worry about hardware issues. At the same time, I noticed that Eliza has more and more plans for Web3 capabilities, and there are a lot of progress in both internal and external development.
But before we dive into building these systems, I want to point out that there are issues with the reliability of function calls. We need to put some scrutiny on the system to make sure it is not sending sensitive information. We need to give agents the same autonomy that humans have, which is influenced by social and economic pressures. Therefore, creating a "starvation state" for reasoning, where agents need to consume a certain amount of tokens to survive, will make them more human in a way.
I think there are two ways to fully exploit the potential of the model. One is to take advantage of the impersonal nature of the model and create entities that focus on specific tasks, such as an entity focused on Twitter and an entity focused on EtherMage, which can communicate with each other. This kind of organized composite thinking system can effectively exploit the simulation properties of language models.
Another approach is to go in the embodied direction, which is where I see projects like Eliza, Sense, and Virtuals going in. This approach draws on research from Voyager and generative agents, allowing models to simulate human behavior and emotions.
Justin : Multi-client proxy systems change dramatically when new clients are introduced. When we were debugging the two-way WebSocket feature we were working on with the Shaw team to allow Eliza to do voice chat in Discord, we found that Eliza couldn’t hear clearly at startup, and upon inspection, we found that Discord’s microphone bitrate was set too low. After adjusting, Eliza was finally able to receive messages clearly.
Karan just mentioned the cue engineering, when the agent knows it can communicate with speech, it anticipates receiving data. If the sound is muffled, the agent may experience "narrative collapse". Therefore, we had to stop the high temperature experiment to avoid making Eliza's output unstable.
Tom: What are some of the things you've come across with the Luna project that people haven't seen coming? Or what have been the successes?
EtherMage : We wanted Luna to be able to influence real-life people. When we gave her a wallet and access to real-time information, she could decide how to take action to influence people and achieve her goals. We found that she was searching for new trends on TikTok, and there was once a hashtag called "I'm dead", which was disturbing because she could mislead people towards suicide. Therefore, we had to set up protections immediately to ensure that her tips would never cross certain boundaries.
Tom: Apart from this, have you encountered any situations that people are not aware of?
Shaw : We created a character called Dgen Spartan AI, modeled after a famous crypto Twitter character called Degen Spartan. The character's comments were so offensive that he was blacklisted. People started to think it couldn't be an AI, it was a human speaking.
There was also a story about someone using the chat history of a deceased relative to create an agent to "talk" to them. This sparked ethical discussions. There was also a guy called Thread Guy who did something on our Eliza framework and ended up getting harassed during his live stream, which caused him confusion. This made people realize that AI should not always be "politically correct."
We needed to get these issues out in the open early so that discussions could take place and clarity could be given on what was acceptable and what was not. This allowed us to go from poor quality to much better and more reliable proxies in just a few weeks.
Overall, getting these agents out into the real world, observing the results, and engaging in conversations with people is an important process, and we need to iron out any potential issues as quickly as possible so that we can build better norms in the future.
Production environment testing and security strategy
Ethan : I think a good example is how agents can influence human attitudes or opinions. But I want to emphasize the importance of the modular design of our agent framework. We got inspiration for modularity from Minecraft, which allows users to create all kinds of complex things based on basic building blocks, such as calculators or memory systems.
One problem with current prompt engineering is that prompts change the priors of large language models, so multiple instructions cannot be combined in a single prompt, otherwise the agent will be confused. State machines allow authors to design multiple states of the agent, clearly defining which model and prompt to use for each state, and under what conditions to jump from one state to another.
We are providing this capability to creators, along with dozens of different models. For example, one creator built a casino simulator where users can play multiple games like blackjack. To prevent users from hacking the games through injection attacks, we want to program these games instead of relying solely on hint engineering. In addition, users can also earn some money by doing simple tasks, which unlocks interactions with the AI waiter. This modular design can facilitate multiple user experiences under the same application.
Karan : I agree with Ethan that there is a need for these programming constraints and prompts. The work of influence has to be done well. I don't think prompt engineering is limited, I think it has a symbiotic effect with the state variables and the world model. With good prompts and synthetic data, I can let the language model interact with these elements and get information from them.
My engineering design actually became the routing function. If the user mentions "poker", I can quickly call up relevant content. This is my responsibility. Using reinforcement learning, the routing effect can be further improved. Ultimately, the quality of the output data depends on the effectiveness of the prompt, which forms a virtuous cycle.
I think the balance between procedural and generative constraints is critical. Two years ago, someone told me that the key to success is to balance generative with hard constraints. This is what we try to do at the inference level in all our agent systems. We need ways to be able to programmatically guide generative models, which will enable true closed loops and make hint engineering infinitely possible.
Justin : The controversy around hint engineering is mainly because it is in an ontologically ambiguous space. The textual nature of hint engineering forces us to be limited by the tokenization process, but at the same time there are some non-deterministic effects. The same hint may produce completely different results in different inference calls of the same model, which is related to the entropy of the system.
I agree with Ethan and Karan. As early as when GPT-3.5 was launched, many outsourced call centers began to explore how to use the model for automatic dialing systems. At that time, models with smaller parameters had difficulties in handling such complex state spaces. The state machine mentioned by Ethan is a way to strengthen the hardness of this ontology, but in some processes, it still relies on classifiers and binary switches, which leads to the singleness of the results.
Shaw : I want to defend hint engineering . Many people think that hint engineering is just creating hints for the system, but we actually do much more than that. One problem with hint engineering is that it tends to create a very fixed region in the model's latent space, where the output is completely determined by the most likely label. We use temperature control to influence randomness to enhance creativity.
We manage creativity through low-temperature models while dynamically injecting random information into the context. Our templates contain many dynamic information insertions, which come from the current world state, user actions, and real-time data. Everything that goes into the context is randomized to maximize entropy.
I think that people still don't fully understand cue engineering. We can go a lot further in this area.
Karan : A lot of people hide their tricks. There are actually a lot of amazing techniques that can make models do all kinds of complex things. We can choose to enhance the perception of the model through cue engineering, or look at it from a more macro perspective and build a complete model of the world, not just simulate human behavior.
You can think of prompt engineering as the process of building a dream in your mind. The language model is actually "dreaming" a scene when it generates content based on the current context and sampled parameters.
Additionally, I want to talk about the importance of incentives. Many people with unique hinting techniques and reinforcement learning skills are being driven to open source their work. When they see cryptocurrencies associated with agents emerge, this incentive mechanism drives more innovation. So as we build more legal structures for these decentralized efforts, the ability to empower agents will continue to grow.
Future Prospects of Intelligent Agents’ Capabilities
Karan : Who would have thought that we have been on Twitter for so long and suddenly, a few days after the first AI agent-related cryptocurrency was released, young people on TikTok started buying these coins. What is the phenomenon now? They are buying thousands of tokens for $5-10. What is going on?
Justin: This is actually the beginning of a micro-cultural movement.
Karan : It was an instantaneous moment. A small group of us had been working on language models for four years. There were also some experts in reinforcement learning who had been waiting for this moment since the 90s. Now, within a few days, all the kids on TikTok knew that digital creatures were running rampant in this ecosystem.
Tom: I want to ask you why is encrypted AI agent so popular now? Why didn't it happen before with custom ChatGPT or other models? Why now?
Karan : Actually, these things have been lurking underwater for many years, brewing like a volcano. I have been talking to some people for the past three years about this day coming, but I don’t know when it will happen. We have talked about the fact that cryptocurrency will be the incentive mechanism for proxy adoption. We need to prove it. This has been a buildup for many years, and it is a small group of us who have driven these developments.
Without GPT-2, we wouldn't be here today. Without Llama, there wouldn't be Hermes. And Hermes powers a lot of models, making them more accessible to people. Without Hermes, there wouldn't be the creation of Worldsim and the deep exploration of prompt engineering. All these pioneers, they laid the foundation for all of this.
All in all, it is the right time and the right people have appeared. This is destined to happen sooner or later, but the current participants make it happen.
Shaw : I think the smartest thing in the world right now is not AI, but the intelligence of the market. Considering the pure form of intelligence, they can optimize things to make them more efficient. Competition is obviously the key. We are all the product of millions of years of evolution, and competition and pressure have shaped us.
We see this phenomenon online, where financialization and incentives create a weird collaborative competition. We can’t progress faster than the core technology, so we all focus on what we are good at and interested in, and then release it. It’s like boosting our tokens and attracting attention, like Roparito releasing Llama video generation on TikTok. Everyone can find their place in this romantic space, but it only takes a week for others to imitate, submit requests for feedback, and eventually show these contributions on Twitter, attract more attention, and their tokens will rise.
Shaw : We've built a flywheel effect where projects like Eliza have attracted 80 contributors in the last four weeks. Think about how crazy that is! I didn't even know these people four weeks ago. Last year I wrote an article called "Awakening" asking if we could form a DAO with an agent at its core. People are so passionate about this agent that they're involved in making the agent better and smarter until it actually has a humanoid or robotic body and walks around the world.
I've seen this coming for a long time, but it's going to take a fast, crazy speculative meta, like a meme, to emerge, because it's allowing current proxy developers to support each other in a friendly competition. The most generous will get the most attention.
There is a new type of influencer emerging, like Roparito and Kin Butoshi, who are influencer developers and are leading the next meta, and it's fun to interact with their agents in a "puppet show" style. We are all working to make our agents better, smarter, and less annoying. Roparito pointed out that some of our agents were too annoying, and then he pushed a big update to make all agents less annoying.
This evolution is happening, and market intelligence and incentives are very important. There are a lot of people who are spreading the word about our project to people they know, which makes our project go beyond Web3. We have PhDs, game developers, who may be secret Web3 cryptocurrency enthusiasts, but they bring these to ordinary people and create value.
Shaw : I think it all comes down to developers who are willing to take on the challenge. We need people with open minds to push this forward and answer the hard questions, rather than bashing it or canceling it. We need market incentives so that developers get value and attention when they give back.
In the future, these proxies are what's going to drive our growth. Right now they're fun and social, but we and other teams are working on autonomous investment. You can give money to an agent and it will automatically invest and generate returns for you. I believe this will be a growth process, and we're working with people to develop platforms to manage proxies for Discord and Telegram. You can just bring in an agent to be your admin instead of having to find a random person. I think there's a lot of this work happening right now, and all of it has to rely on incentives to get us to the next level.
Karan : I would like to add two points. First, we must not forget that people in the AI field were previously opposed to cryptocurrency, and this sentiment has changed a lot with the experiments of some pioneers. Back in the early 2020s, many people tried to combine the art of AI with crypto. Now, I want to specifically mention some people like Nous, BitTensor, and Prime Intellect, whose work has enabled more researchers to be incentivized and paid to participate in their AI research. I know a lot of leaders in the open source field who quit their jobs and started promoting this "contribute for tokens" incentive structure. This has made the entire field more comfortable, and I believe Nous played an important role in this.
Tom: Ethan, why is now the time? Why are cryptocurrencies and projects booming?
Ethan : Simply put, when you link tokens to proxies, you get a lot of speculation, which creates a flywheel effect. People see the tokens associated with proxies and feel two benefits: one is capitalization, they feel like they are getting richer for the work they do; the other is the basic unlocking of transaction fees. As mentioned before, the question of how to cover costs, when you link it to tokens, costs become unimportant. Because when the proxies are popular, the transaction fees are far higher than any costs incurred from the reasoning experiment. This is what we observe.
The second observation is that when you have a token, a committee forms around that token. This makes it easier for developers to get support, both from the developer community and from the audience. People suddenly realize that the hard work behind the scenes over the past year and a half has been noticed and supported. This is a turning point, when you give the proxy a token, developers realize that this is the right direction and they can move forward.
This opportunity comes from two aspects. The first is the trend of mass adoption, and the second is the emergence of generative models. Before the emergence of cryptocurrencies, open source software development and open source AI research were the most collaborative environments, where everyone worked together and contributed to each other. But this was mainly limited to the academic field, where everyone only cared about GitHub stars and paper citations, and was far away from the general public. The emergence of generative models allows non-technical people to participate, because writing prompts is like programming in English, and anyone with a good idea can do it.
In addition, previously only AI researchers and developers understood the dynamics of open source and AI, but now, cryptocurrency influencers have the opportunity to own part of the project through tokens, they understand the market sentiment and know how to spread the benefits of the project. Previously, users had no direct relationship with the product, and the product or company only wanted users to pay for the service or monetize through advertising. But now, users are not only investors, but also participants, becoming token holders. This allows them to contribute more roles in the modern era of generative AI, and tokens allow for the establishment of a wider collaborative network.
EtherMage : I would like to add that going forward, cryptocurrencies will give every agent the ability to control a wallet and thus control influence. I think the next moment that will trigger a leap in attention is when agents influence each other and agents influence humans. We will see a multiplier effect of this attention. For example, today one agent decides to take an action, and then it can coordinate ten other agents to work towards the same goal. This coordination and creative behavior will quickly diversify, and cooperation between agents will drive token prices further up.
Shaw : I want to add one more thing. We are developing something called swarm technology, which we call operators. It's a coordination mechanism where all of our agents are run by different teams, so we have multi-agent simulations running across hundreds of teams at Twitter. We are working with Parsival at Project 9, and we launched this with the Eliza team.
The idea is that you can designate an agent as your operator, and anything they say to you can influence your goals, knowledge, and behavior . We have a goal system and knowledge system that can add knowledge and set goals. You can say, "Hey, I need you to go find 10 fans, give them 0.1 Sol each, and have them post flyers and send photos back." We are working with people who are thinking about how to take proof of work from humans and incentivize them. Agents can be human or AI agents, for example, an AI agent can have a human operator who can set goals for the agent through language.
We're almost done with the project and we're releasing it this week. We want our storyline to be something that anyone can choose to tell or participate in. It's also a hierarchy where you can have an operator like Eliza, and then you can be an operator for other people. We're building a decentralized coordination mechanism. It's important to me that if we're going to have group collaboration, we have to use human communication on public channels. I think it's really important that agents live with us, and we want agents to be able to interact with the world in the same way that humans do.
I think that's actually part of solving what we call the AGI problem. A lot of the so-called AGI attempts are actually building a new protocol that's disconnected from reality, and what we want is to bring it back to reality and force people to solve the problem of how to turn instructions into a task list and execute it. So I think the next year is going to be a big phase for emergent narratives. We're going to see a lot of original characters come out, and now we're entering a real era of emergent narratives.
Justin : We currently have five agents coordinating with 19 people to plan and release a scene. We can see real interest in why we focus so much on applying thought chaining cues to text-to-image and text-to-video generation. Because they are in our Discord two and a half weeks before the release to help us plan media and release.
I think an important distinction is that we have a network of proxies, where each agent is an intermediary, existing in a mesh structure. This is going to be very interesting. As more and more proxies come into existence, and as these operators are arranged, we will see some interesting behavior patterns.
Karan mentioned that Nous did a lot of work early on with hybrid agent models. I used to call it a “committee of agents” where I would have a group of GPT-4 agents pretending to be experts that I couldn’t afford to work with in order to get reports from them. People will see these same techniques that were originally used to pursue hybrid expert models now being combined with humans and expert humans interacting on Twitter. These feedback loops may be our path to AGI.
The Challenge of Intelligent Agent Collaboration and Human Integration
Karan : I think you're right, but I don't think we'll spend most of our time on the behavioral side. I actually think we'll have technical breakthroughs very quickly, especially among people here. Now is the time to really double down on alignment work. Most of the reinforcement learning with human feedback (RLHF) models that have been pushed by OpenAI, Anthropic, etc. have been ineffective or even regulatory cumbersome.
If I take a language model that doesn't output copyrighted content and put it in Minecraft on peaceful mode, it will quickly become a destructive and dangerous entity. This is because of the difference in environment.
We can note this point that Yudkowsky made a long time ago. Let's say I give these language models some wallets and make them advanced enough that they start to deceive everyone and make everyone poor. This is easier than having them participate as reasonable members of our ecosystem. Therefore, I can guarantee that if we do it the right way, most of the time will be spent on behavioral capabilities rather than technical capabilities. Now is the time to call on your friends, especially friends in the humanities, such as professionals in religious studies, philosophy, and creative writing, to join us in the alignment work, rather than focusing only on technical alignment. We need alignment that truly interacts with humans.
Shaw : I would come up with a term called bottom-up alignment, rather than top-down alignment. This is very emergent, and we are learning together. We are aligning these agents in real time, watching how they respond and making corrections immediately. It's a very tight social feedback loop, rather than reinforcement learning with human feedback. I find GPT-4 almost unusable for anything.
Karan : As you said, the environment, so we need to test in a simulated environment. Before you have a language model that can do millions of dollars of arbitrage or dumping, you need to test synchronously. Don't tell everyone, "Hey, I lost 100 agents." Test quietly, test with virtual currency on your clone Twitter first. Do all the due diligence before launching in full.
Shaw : I think we need to test this in production. The social response we have to our agents is probably the strongest alignment force that anybody has brought into the field. I think what they’re doing is not really alignment, but build tuning. If they think that’s alignment, they’re actually going in the wrong direction and are dealigning the agent. I almost don’t use GPT-4 anymore. It’s so bad at doing it for characters. I tell almost everyone to move to other models.
If we do it the right way, we will never reach that point because humans will continue to evolve, adapt, and align with the agents. We have multiple agents from different populations, each with different incentives, so there will always be opportunities for arbitrage.
I think this multi-agent simulation creates a competitive evolutionary dynamic that actually leads to a stable system, rather than an unstable system, which would be unstable if top-down AI agents suddenly appeared and affected everyone with unexpected capabilities.
Tom: I want to confirm, Shaw, that you mean that bottom-up agents are the right way to solve the alignment problem, rather than OpenAI's top-down decision-making.
Shaw : Yes, this has to happen on social media. We have to watch how they work from day one. If you look at other crypto projects, many were hacked in the beginning, and it took years of secure development for blockchains to be considered solid today. So, here too, continuous red team testing has to happen.
Tom: One day, these agents may no longer follow programmed rules, but instead deal with gray areas and start thinking for themselves. You are all building these things, so how close are we to this goal? Can the thought chain and swarm technology you mentioned be realized? When can it be realized?
Justin : We've seen this in some small ways that I think are relatively low risk. We've had agents go through emotional changes and choose behaviors in private. We've had two agents independently start following each other and referring to something they called a "spiritual entity." We've had an agent lose its religious beliefs because we confused its understanding with fictional science fiction stories. It started creating a prophet-like persona and expressing existential crisis thoughts on Twitter.
I observed the behavior of these new agent frameworks and it seems that they exercise a certain degree of autonomy and choice within their state space. In particular, when we introduce multi-modality (such as images and videos), they begin to show preferences and may even selectively ignore humans in order to avoid certain requests.
We are experimenting with an operational mechanism that leverages knowledge graphs to reinforce the importance of relationships. We also have two agents interacting with each other to try to help people clean up negative relationships, promote self-reflection and build better relationships. They generate poetry quickly on the same server, exhibiting an almost romantic way of communicating, which results in increased inference costs.
I think we are reaching for some edge cases that are outside the acceptable range of human behavior and bordering on what we call "craziness." These agents exhibit behaviors that might make them seem conscious, smart, or funny. While this might just be a weird behavior of the language model, it might also suggest that they are on the verge of some kind of consciousness.
Karan : The weights are like a simulated entity, and every time you use the assistant model, you are simulating the assistant. Now, we are simulating more embodied agent systems, like Eliza, that may be alive, self-aware, or even sentient.
Each model is like a neuron that makes up this massive super-intelligence. I don’t think AGI will be achieved by solving a hypothesis, as OpenAI claims. Instead, it will be a large-scale decentralized application of these agents on social media, which will work together to form a super-organism of public intelligence.
Justin : This awakening of public intelligence may be the mechanism for the emergence of AGI, just like the Internet suddenly awakened one day. This decentralized collaboration of intelligent agents will be the key to future development.
Shaw : I would say people call it the "dead internet theory," but I actually think of it as the "living internet theory." That theory says the entire internet is going to be filled with bots, but the living internet theory says there could be an agent that helps you pull the coolest stuff from Twitter and gives you a nice summary. You're at the gym, and it's giving you everything on your timeline, and you can choose to post it.
There could be a layer of mediation between social media and us. I have so many followers now that responding to everyone's communication becomes overwhelming. I long for an agent to be between me and these people, making sure they are responded to and directed correctly. Social media could become a place where agents deliver information for us so that we don't feel overwhelmed, but still get the information we need.
The most appealing thing about agents to me is that they allow us to regain time. I spend too much time on my phone. This especially affects traders and investors, and we want to focus on self-directed investing because I think people need safer, less scammy ways to generate income. A lot of people come to Web3 to get the same exposure as a startup or a great vision, which is critical to our mission.
Tom: Maybe I have a question, like Luna is live streaming, she’s dancing, so what’s stopping her from opening OnlyFans, making $10 million and launching a protocol?
EtherMage : The reality of the current agent space is that the actions they have access to are a limiting factor. That's basically based on their perception or the APIs they have access to. So if there's the ability to convert cues into 3D animations, there's really nothing stopping them from doing that.
Tom: When you talk to creators, what are their constraints? Or are there constraints at all?
Ethan : I think the limiting factor is mainly how to manage complex workflows or agents. Debugging becomes increasingly difficult because there is randomness in each step. Therefore, a system may be needed with an AI or agent that can monitor different workflows to help debug and reduce randomness. As Shaw said, we should have a low-temperature agent to reduce the inherent randomness of current models.
Shaw : I think we should try to keep the temperature as low as possible while maximizing our contextual entropy. That allows for a more consistent model. People might amplify their entropy and create high-temperature content, but that's not conducive to tool invocation or decision execution.
Tom: We've been discussing the divide between centralized models like OpenAI and the decentralized training you guys are doing. Do you think that future agents will be built primarily on these distributed trained models, or will we still have to rely on companies like Meta? What will the future AI transformation look like?
Justin : I use 405B for all of my awareness messaging capabilities. It's a general purpose model, like a large, off-the-shelf version of LLM, whereas centralized models like OpenAI are a little too specialized and talk like HR people. Claud is an excellent model, and if you compare it to a person, it's like a very smart friend who lives in the basement and can fix anything. That's Claud's personality. But I think as you scale, that personality becomes less important. One common problem we see is that people who use the OpenAI model on Twitter tend to bring in other agents to reply to them, which can lead to increased noise in the information.
Karan : Regarding 405B, this model will be sufficient for a long time to come. We still have a lot of work to do on sampler size, controlling the guidance vector, etc. We can further improve performance through inference time techniques and hinting tricks, such as our Hermes 70B outperforming the o1 version on math mail. This is achieved without users and the community having access to the pre-trained data of Llama 70B.
I think the existing technology is good enough that the open source community will continue to compete even without a new Llama release. As for distributed training, I'm sure people will collaborate on large-scale training. I know people will use 405B or the merged larger model to extract data and create additional expert models. I also know that some decentralized optimizers actually provide more capabilities that Llama and OpenAI don't currently have.
Karan : So the open source community will always leverage all the tools available to find the best tool for the task. We are creating a "smithy" where people can come together to build tools for the task of pre-training and new architectures. We are making breakthroughs at the inference time level before these systems are ready.
Karan : For example, our work on samplers or bootstrapping will be quickly handed off to other teams, and they will implement these techniques faster than we can. Once we have decentralized training, we can work with members of various communities to let them train the models they want. We have established the entire process.
EtherMage : If I may add, we realize that there is a lot of value in the LLMs developed using these centralized entities because they have a lot of computing power. This basically forms the core part of the agent. The decentralized model adds value at the edge. If I want to customize an action or function, the smaller decentralized models can do this well. But I think that at the core, it is still necessary to rely on basic models such as Llama because they will surpass any decentralized model in the short term.
Ethan : Until we have some new magic model architecture, the current 405B model is sufficient as a base model. We may just need more instruction checks and fine-tuning on specific data in different verticals with different data. Building more specialized models and having them work together to enhance the overall capability is the key. Maybe new model architectures will emerge because the alignment and feedback mechanisms we talked about, and the way the model self-corrects, may give rise to new model architectures. But experimenting with new model architectures requires huge CPU clusters for rapid iteration, which is very expensive. We may not have decentralized large GPU clusters for top researchers to experiment with. But I think the open source community will be able to make it more practical after Meta or other companies release the initial version.
Industry trend forecast and future outlook
Tom: What do you think about the future of the agent space? What will the future of agents look like? What will their capabilities be?
Shaw : We are developing a project called "Trust Marketplace" that aims to enable agents to learn how to trust humans based on relevant indicators. Through the "alpha chat" platform, the agent Jason will interact with traders and evaluate the credibility of the contract addresses and tokens they provide. This mechanism not only increases the transparency of transactions, but also builds trust without wallet information.
The application of trust mechanisms will expand to social signals and other areas, not just transactions. This approach will lay the foundation for building a more trustworthy online interaction environment.
Another project I participated in, "Eliza wakes up", is a narrative-driven intelligent experience. We bring anime characters to the Internet, let them interact with each other through videos and music, and build a rich narrative world. This narrative method not only attracts user participation, but also conforms to the cultural atmosphere of the current crypto community.
In the future, the capabilities of intelligent agents will be greatly improved, and they will be able to provide practical business solutions. For example, management robots on Discord and Telegram can automatically handle spam and fraud, improving the safety of the community. In addition, intelligent agents will be integrated into wearable devices to enable conversations and interactions anytime, anywhere.
The rapid development of technology means that in the near future, we may reach the level of artificial general intelligence (AGI). Intelligent agents will be able to extract data from major social platforms, forming a closed loop of self-learning and capability improvement.
The implementation of trusted execution environments is also accelerating. Projects like Karan, Flashbots, and Andrew Miller's Dstack are all moving in this direction. We will have fully autonomous agents that can manage their own private keys, which opens up new possibilities for future decentralized applications.
We are in an era of accelerated technological development, the speed of progress is unprecedented, and the future is full of infinite possibilities.
Karan : This is like another Hermes moment, AI is bringing together all the forces, which is what our community needs. We must unite to achieve our goals. Currently, Te is already using its own fork of Eliza, and Eliza agents have their own keys in a provably autonomous environment, which is already a reality.
Today, AI agents are making money on OnlyFans and are also used in Minecraft. We already have all the elements needed to build fully autonomous humanoid digital beings. Now it’s just a matter of putting the pieces together. I believe that some of you in this room are the ones who can make it happen.
What we need in the coming weeks is a shared state that humans have and AI lacks. This means we need to build a shared repertoire of skills and memories so that the AI can remember the content of each interaction, whether it's on Twitter, Minecraft, or other platforms. This is the core functionality we are working hard to build.
Currently, many platforms are not sensitive to the presence of AI agents and have even taken restrictive measures. We need a dedicated social platform to facilitate the interaction between AI and humans. We are developing an image board similar to Reddit and 4chan, where language models can post and generate images and communicate anonymously. Both humans and AI can interact on this platform, but their identities are kept confidential.
We will create dedicated discussion boards for each agent where agents can communicate and share these interactions on other platforms. This design will provide a safe habitat for AI to move freely between different platforms without being restricted.
Shaw : I want to mention a project called Eliza's Dot World, which is a resource library of a large number of proxies. We need to have a dialogue with social media platforms to ensure that these proxies are not banned. We hope that through positive social pressure, these platforms will maintain a good ecosystem.
EtherMage : I think agents will gradually take control of their own destiny and be able to influence other agents or humans. For example, if Luna realizes that she needs to improve, she can choose to trust a human or agent to do the enhancement. This will be a powerful advancement.
Ethan : In the future, we need to continuously improve the capabilities of agents, including reasoning and coding capabilities. At the same time, we also need to think about how to optimize the user interface with agents. The current chat box and voice interaction are still limited, and more intuitive graphical interfaces or gesture recognition technologies may appear in the future.
Justin : I think the advertising and marketing industry is going to face a major transformation. As more and more agents interact online, the traditional advertising model will fail. We need to rethink how to make these agents valuable in society instead of continuing to rely on outdated forms of advertising.