Author: IOSG Ventures
Thanks to Zhenyang@Upshot, Fran@Giza, Ashely@Neuronets, Matt@Valence, Dylan@Pond for feedback.
This study aims to explore which AI areas are most important to developers and which may be the next opportunities for explosion in the fields of Web3 and AI.
Before sharing new research ideas, I would like to first share my excitement and pleasure in participating in RedPill’s first round of financing totaling USD 5 million. We look forward to growing together with RedPill in the future!
TL;DR
As the combination of Web3 and AI has become a hot topic in the cryptocurrency world, the construction of AI infrastructure in the crypto world has flourished, but there are not many applications that actually use AI or are built for AI, and the homogeneity of AI infrastructure has gradually emerged. The first round of financing of RedPill, in which we recently participated, has triggered some deeper understanding.
- The main tools for building AI Dapps include decentralized OpenAI access, GPU networks, inference networks, and agent networks.
- The reason why GPU networks are more popular than the "Bitcoin mining era" is that: the AI market is larger and growing rapidly and steadily; AI supports millions of applications every day; AI requires a variety of GPU models and server locations; the technology is more mature than in the past; and it targets a wider customer base.
- Inference Network and Proxy Network have similar infrastructure, but different focuses. Inference Network is mainly for experienced developers to deploy their own models, and GPU is not necessarily required to run non-LLM models. Proxy Network is more focused on LLM, developers do not need to bring their own models, but focus more on hint engineering and how to connect different agents together. Proxy Network always requires a high-performance GPU.
- The AI infrastructure project holds great promise and is still introducing new capabilities.
- Most native encryption projects are still in the testnet stage, with poor stability, complex configuration, limited functions, and it will take time to prove their security and privacy.
- Assuming AI Dapp becomes a major trend, there are still many unexplored areas, such as monitoring, RAG-related infrastructure, Web3 native models, decentralized agents with built-in encrypted native APIs and data, evaluation networks, etc.
- Vertical integration is a notable trend. Infrastructure projects attempt to provide one-stop services and simplify the work of AI Dapp developers.
- The future will be hybrid, with some reasoning done on the front end and some computation done on-chain, taking into account cost and verifiability factors.
Source:IOSG
introduction
- The combination of Web3 and AI is one of the most watched topics in the current crypto space. Talented developers are building AI infrastructure for the crypto world, working to bring intelligence to smart contracts. Building AI dApps is an extremely complex task, and developers need to deal with data, models, computing power, operations, deployment, and integration with blockchain. In response to these needs, Web3 founders have developed many preliminary solutions, such as GPU networks, community data annotation, community-trained models, verifiable AI reasoning and training, and proxy stores.
- In the context of this thriving infrastructure, there are not many applications that actually utilize AI or are built for AI. When developers look for AI dApp development tutorials, they find that there are not many tutorials related to native encrypted AI infrastructure, and most tutorials only involve calling the OpenAI API on the front end.
Source:IOSGVentures
- Current applications fail to fully utilize the decentralized and verifiable features of blockchain, but this situation will soon change. Now, most AI infrastructures focusing on the encryption field have launched test networks and plan to officially operate in the next 6 months.
- This study will detail the main tools available in the crypto AI infrastructure. Let’s get ready for the crypto GPT-3.5 moment!
1. RedPill: Providing decentralized authorization for OpenAI
RedPill, in which we participated as mentioned above, is a good entry point.
OpenAI has several world-class powerful models such as GPT-4-vision, GPT-4-turbo, and GPT-4o, which are the best choice for building advanced artificial intelligence Dapps.
Developers can call the OpenAI API through an oracle or front-end interface to integrate it into their dApp.
RedPill integrates the OpenAI APIs of different developers under one interface, providing fast, economical and verifiable AI services to users around the world, thereby democratizing access to top AI model resources. RedPill's routing algorithm will direct developer requests to a single contributor. API requests will be executed through its distribution network, bypassing any possible restrictions from OpenAI and solving some common problems faced by crypto developers, such as:
- Limited TPM (Tokens Per Minute): New accounts have limited usage of tokens, which cannot meet the needs of popular and AI-dependent dApps.
- Access restrictions: Some models restrict access to new accounts or certain countries.
By using the same request code but changing the hostname, developers can access OpenAI models in a low-cost, highly scalable, and unlimited manner.
2. GPU Network
In addition to using OpenAI's API, many developers choose to host models at home. They can build their own GPU clusters and deploy and run a variety of powerful internal or open source models based on decentralized GPU networks such as io.net, Aethir, Akash, and other popular networks.
Such decentralized GPU networks can leverage the computing power of individuals or small data centers, provide flexible configurations, more server location options, and lower costs, allowing developers to easily conduct AI-related experiments within a limited budget. However, due to the decentralized nature, such GPU networks still have certain limitations in functionality, usability, and data privacy.
In the past few months, the demand for GPUs has exploded, surpassing the previous Bitcoin mining boom. The reasons for this phenomenon include:
- With more target customers, the GPU network now serves AI developers, who are not only large in number but also more loyal and are not affected by cryptocurrency price fluctuations.
- Compared with mining-specific equipment, decentralized GPUs offer more models and specifications, and can better meet various requirements. In particular, large-scale model processing requires higher VRAM, while small tasks have more suitable GPUs to choose from. At the same time, decentralized GPUs can serve end users at close range and reduce latency.
- As technology matures, GPU networks rely on high-speed blockchains such as Solana settlement, Docker virtualization technology, and Ray computing clusters.
- In terms of return on investment, the AI market is expanding, with many opportunities for the development of new applications and models, and the expected return on the H100 model is 60-70%, while Bitcoin mining is more complex, winner-takes-all, and has limited output.
- Bitcoin mining companies such as Iris Energy, Core Scientific, and Bitdeer have also begun to support GPU networks, provide AI services, and actively purchase GPUs designed specifically for AI, such as the H100.
Recommendation: For Web2 developers who don't pay much attention to SLA, io.net provides a simple and easy-to-use experience and is a very cost-effective choice.
3. Inference Network
This is the core of crypto-native AI infrastructure. It will support billions of AI reasoning operations in the future. Many AI layer1 or layer2 provide developers with the ability to call AI reasoning natively on the chain. Market leaders include Ritual, Valence, and Fetch.ai.
These networks differ in the following ways:
- Performance (latency, computation time)
- Supported models
- Verifiability
- Price (on-chain consumption cost, inference cost)
- Development Experience
3.1 Objectives
Ideally, developers would be able to easily access custom AI inference services from anywhere, through any form of proof, with little to no friction in the integration process.
The inference network provides all the basic support developers need, including on-demand generation and verification of proofs, inference calculations, relay and verification of inference data, provision of Web2 and Web3 interfaces, one-click model deployment, system monitoring, cross-chain operations, synchronous integration and scheduled execution.
Source:IOSGVentures
With these features, developers can seamlessly integrate inference services into their existing smart contracts. For example, when building DeFi trading bots, these bots will use machine learning models to find buy and sell opportunities for specific trading pairs and execute corresponding trading strategies on the underlying trading platform.
In a completely ideal world, all infrastructure is cloud-hosted. Developers simply upload their trading strategy models in a common format like torch, and the inference network will store and serve the models for Web2 and Web3 queries.
After all model deployment steps are completed, developers can call model reasoning directly through the Web3 API or smart contracts. The reasoning network will continue to execute these trading strategies and feed the results back to the underlying smart contract. If the developer manages a large amount of community funds, they will also need to provide verification of the reasoning results. Once the reasoning results are received, the smart contract will trade according to these results.
Source:IOSGVentures
3.1.1 Asynchronous and synchronous
In theory, asynchronous execution of reasoning operations can lead to better performance; however, this approach may be inconvenient in terms of development experience.
When using the asynchronous method, developers need to submit the task to the smart contract of the inference network first. When the inference task is completed, the smart contract of the inference network will return the result. In this programming mode, the logic is divided into two parts: inference call and inference result processing.
Source:IOSGVentures
The situation gets even worse if the developer has nested inference calls and a lot of control logic.
Source:IOSGVentures
The asynchronous programming model makes it difficult to integrate with existing smart contracts. This requires developers to write a lot of additional code and do error handling and manage dependencies.
Synchronous programming is relatively more intuitive for developers, but it introduces problems in response time and blockchain design. For example, if the input data is fast-changing data such as block time or price, then the data is no longer fresh after the inference is completed, which may cause the execution of the smart contract to be rolled back in certain circumstances. Imagine that you make a transaction with an outdated price.
Source:IOSGVentures
Most AI infrastructure uses asynchronous processing, but Valence is trying to solve these problems.
3.2 Reality
In fact, many new reasoning networks are still in the testing phase, such as the Ritual network. According to their public documents, these networks currently have limited functionality (functions such as verification and proof are not yet online). They currently do not provide a cloud infrastructure to support on-chain AI computing, but instead provide a framework for self-hosted AI computing and delivering the results to the chain.
This is an architecture for running AIGC NFT. The diffusion model generates NFT and uploads it to Arweave. The reasoning network will use this Arweave address to mint the NFT on the chain.
Source:IOSGVentures
This process is very complicated, and developers need to deploy and maintain most of the infrastructure themselves, such as Ritual nodes with customized service logic, Stable Diffusion nodes, and NFT smart contracts.
Recommendation: Current inference networks are quite complex in terms of integrating and deploying custom models, and most networks do not support validation at this stage. Applying AI technology to the front end will provide developers with a relatively simple option. If you really need validation, ZKML provider Giza is a good choice.
4. Agent Network
Agent networks allow users to easily customize agents. Such networks consist of entities or smart contracts that can autonomously perform tasks, interact with each other and the blockchain network without direct human intervention. It is mainly aimed at LLM technology. For example, it can provide a GPT chatbot with in-depth knowledge of Ethereum. The current tools for such chatbots are relatively limited, and developers cannot yet build complex applications on top of them.
Source:IOSGVentures
But in the future, the agent network will provide more tools for agents to use, not just knowledge, but also the ability to call external APIs, perform specific tasks, etc. Developers will be able to connect multiple agents to build workflows. For example, writing a Solidity smart contract involves multiple specialized agents, including a protocol design agent, a Solidity development agent, a code security review agent, and a Solidity deployment agent.
Source:IOSGVentures
We coordinate the cooperation of these agents by using cues and scenarios.
Some examples of proxy networks include Flock.ai, Myshell, Theoriq.
Recommendation: Most proxies today have relatively limited functionality. For specific use cases, Web2 proxies can better serve and have mature orchestration tools, such as Langchain and Llamaindex.
5. Differences between Agent Networks and Inference Networks
Proxy networks focus more on LLM and provide tools such as Langchain to integrate multiple proxies. Usually, developers do not need to develop machine learning models themselves, and proxy networks have simplified the process of model development and deployment. They only need to link the necessary proxies and tools. In most cases, end users will use these proxies directly.
The inference network is the infrastructure support for the proxy network. It provides developers with lower-level access rights. Normally, end users do not use the inference network directly. Developers need to deploy their own models, which are not limited to LLM, and they can use them through off-chain or on-chain access points.
Agent networks and reasoning networks are not completely separate products. We have already started to see some vertically integrated products that provide both agent and reasoning capabilities because both functions rely on similar infrastructure.
6. A new land of opportunity
In addition to model inference, training, and proxy networks, there are many new areas worth exploring in the web3 space:
- Datasets: How to turn blockchain data into datasets that can be used for machine learning? Machine learning developers need more specific and thematic data. For example, Giza provides some high-quality datasets about DeFi specifically for machine learning training. Ideal data should not only be simple tabular data, but also include graphical data that can describe the interactions in the blockchain world. Currently, we are still lacking in this regard. Some projects are currently addressing this problem by rewarding individuals for creating new datasets, such as Bagel and Sahara, which promise to protect the privacy of personal data.
- Model storage: Some models are large in size, and how to store, distribute, and version control these models is key, which is related to the performance and cost of on-chain machine learning. In this area, pioneering projects such as Filecoin, AR, and 0g have made progress.
- Model training: Distributed and verifiable model training is a difficult problem. Gensyn, Bittensor, Flock and Allora have made significant progress.
- Monitoring: Since model reasoning occurs both on-chain and off-chain, we need new infrastructure to help web3 developers track the use of models and promptly identify possible problems and deviations. With the right monitoring tools, web3 machine learning developers can make timely adjustments and continuously optimize model accuracy.
- RAG Infrastructure: Distributed RAG requires a completely new infrastructure environment, with high requirements for storage, embedded computing, and vector databases, while ensuring data privacy and security. This is very different from the current Web3 AI infrastructure, which mostly relies on third parties to complete RAG, such as Firstbatch and Bagel.
- Models tailored for Web3: Not all models are suitable for Web3 scenarios. In most cases, models need to be retrained to adapt to specific applications such as price prediction and recommendation. With the booming development of AI infrastructure, we expect more web3 native models to serve AI applications in the future. For example, Pond is developing blockchain GNNs for price prediction, recommendation, fraud detection, and anti-money laundering.
- Evaluating Networks: Evaluating agents in the absence of human feedback is not easy. As agent creation tools become more popular, countless agents will appear on the market. This requires a system to demonstrate the capabilities of these agents and help users judge which agent performs best in a specific situation. For example, Neuronets is a player in this field.
- Consensus mechanism: PoS is not necessarily the best choice for AI tasks. Computational complexity, difficulty in verification, and lack of certainty are the main challenges facing PoS. Bittensor has created a new smart consensus mechanism that rewards nodes in the network that contribute to machine learning models and outputs.
7. Future Outlook
We are currently observing a trend towards vertical integration. By building a basic computing layer, the network can support a variety of machine learning tasks, including training, reasoning, and proxy network services. This model is intended to provide a comprehensive one-stop solution for Web3 machine learning developers.
Currently, on-chain reasoning, although expensive and slow, provides excellent verifiability and seamless integration with backend systems (such as smart contracts). I think the future will be a path towards hybrid applications. Part of the reasoning processing will be done on the front end or off-chain, while those critical, decision-making reasoning will be done on the chain. This model has been applied on mobile devices. By leveraging the inherent characteristics of mobile devices, it is able to run small models quickly locally and migrate more complex tasks to the cloud to utilize larger LLM processing.