Written by Geng Kai and Eric, DFG
introduction
As of 2023, both AI and DePIN are hot trends in Web3, with AI having a market cap of $30 billion and DePIN having a market cap of $23 billion . These two categories are very large, and each covers a variety of different protocols that serve different areas and needs and should be covered separately. However, this article aims to discuss the intersection between the two and examine the development of protocols in this field.
In the AI technology stack, the DePIN network provides practicality for AI through computing resources. The growth of large technology companies has led to a shortage of GPUs , which has resulted in other developers who are building their own AI models lacking sufficient GPUs for computing. This often leads developers to choose centralized cloud providers, but this leads to inefficiencies due to having to sign inflexible, long-term, high-performance hardware contracts.
DePIN essentially provides a more flexible and cost-effective alternative that uses token rewards to incentivize resource contributions that align with network goals. DePIN in AI crowdsources GPU resources from individual owners to data centers, forming a unified supply for users who need access to hardware. These DePIN networks not only provide customizability and on-demand access to developers who need computing power, but also provide additional income for GPU owners who may find it difficult to profit from idleness.
With so many AI DePIN networks on the market, it can be difficult to identify the differences between them and find the right network you need. In the next section, we will explore what each protocol does and what they are trying to achieve, as well as some specific highlights of what they have achieved.
AI DePIN Network Overview
Each of the projects mentioned here has a similar purpose - GPU computing market network. The purpose of this section of the article is to examine the highlights of each project, their market focus, and what they have achieved. By first understanding their key infrastructure and products, we can gain insight into the differences between them, which will be introduced in the next section.
Render is a pioneer in P2P networks that provide GPU computing power. It previously focused on rendering graphics for content creation, and later expanded its scope to include AI computing tasks from Neural Reflex Fields (NeRF) to generative AI through the integration of toolsets such as Stable Diffusion.
Interesting points :
Founded by OTOY, the cloud graphics company with Oscar-winning technology
GPU Networks are used by big names in the entertainment industry, including Paramount Pictures, PUBG, Star Trek, etc.
Partnering with Stability AI and Endeavor to integrate their AI models with 3D content rendering workflows using Render’s GPU
Approve multiple computing clients and integrate more GPUs in the DePIN network
Akash calls itself "Airbnb for hosting" and positions itself as a " super cloud " alternative to traditional platforms such as AWS that support storage, GPU and CPU computing. Using developer-friendly tools such as the Akash container platform and Kubernetes-managed compute nodes , it can seamlessly deploy software across environments, allowing it to run any cloud-native application.
Interesting points :
Targets a wide range of computing tasks from general computing to web hosting
AkashML allows its GPU network to run over 15,000 models on Hugging Face while integrating with Hugging Face
Some notable applications hosted on Akash include Mistral AI’s LLM model chatbot , Stability AI’s SDXL text-to-image model, and Thumper AI’s new base model AT-1.
Platforms building the Metaverse, AI deployment, and federated learning are leveraging Supercloud
io.net provides access to distributed GPU cloud clusters that are specialized for AI and ML use cases. It aggregates GPUs from data centers, crypto miners, and other decentralized networks. The company was previously a quantitative trading firm that pivoted to its current business after the price of high-performance GPUs increased significantly.
Interesting points :
Its IO-SDK is compatible with frameworks such as PyTorch and Tensorflow, and its multi-layer architecture can automatically and dynamically scale according to computing needs
Supports creation of 3 different types of clusters , which can be started within 2 minutes
Strong collaborative efforts to integrate GPUs from other DePIN networks, including Render, Filecoin, Aethir, and Exabits
Gensyn provides GPU computing power focused on machine learning and deep learning computations. It claims to achieve a more efficient verification mechanism compared to existing approaches by combining concepts such as proof of learning for verification work, a graph-based pinpointing protocol for re-running verification work, and a Truebit-style incentive game involving staking and slashing of computation providers.
Interesting points :
The estimated hourly cost of a V100 equivalent GPU is approximately $0.40/hour, resulting in significant cost savings
Proof stacking allows pre-trained base models to be fine-tuned to accomplish more specific tasks
These foundational models will be decentralized, globally owned, and provide additional capabilities beyond hardware computing networks.
Aethir is equipped with enterprise GPUs and focuses on computing-intensive fields, mainly artificial intelligence, machine learning (ML), cloud gaming, etc. The containers in its network act as virtual endpoints for executing cloud-based applications, transferring workloads from local devices to containers for a low-latency experience. To ensure high-quality services for users, they adjust resources by moving GPUs closer to data sources based on demand and location.
Interesting points :
In addition to artificial intelligence and cloud gaming, Aethir has also expanded into cloud phone services and partnered with APhone to launch a decentralized cloud smartphone.
Extensive partnerships with major Web2 companies including NVIDIA, Super Micro, HPE, Foxconn and Well Link
Multiple partners in Web3, such as CARV, Magic Eden, Sequence, Impossible Finance, etc.
Phala Network acts as the execution layer for Web3 AI solutions. Its blockchain is a trustless cloud computing solution that handles privacy issues by using its Trusted Execution Environment (TEE) design. Rather than serving as a computational layer for AI models, its execution layer enables AI agents to be controlled by smart contracts on the chain.
Interesting points :
Acts as a co-processor protocol for verifiable computation, while also enabling AI agents to use on-chain resources
Its AI agent contracts can obtain top large-scale language models such as OpenAI, Llama, Claude and Hugging Face through Redpill
In the future, multiple proof systems will be included, including zk-proofs, multi-party computing (MPC), and fully homomorphic encryption (FHE).
Support H100 and other TEE GPUs in the future to improve computing power
Project Comparison
Render | Akash | io.net | Gensyn | Aethir | Phala | |
hardware | GPU & CPU | GPU & CPU | GPU & CPU | GPU | GPU | CPU |
Business Focus | Graphics Rendering and AI | Cloud computing, rendering and AI | AI | AI | AI, cloud gaming and telecommunications | On-chain AI execution |
AI Task Types | reasoning | Both | Both | train | train | implement |
Job Pricing | Performance-Based Pricing | Reverse Auction | Market Pricing | Market Pricing | Bidding system | Equity calculation |
Blockchain | Solana | Cosmos | Solana | Gensyn | Arbitrum | Polkadot |
Data Privacy | Encryption & Hashing | mTLS authentication | data encryption | Security Map | encryption | TEE |
Working expenses | 0.5-5% per job | 20% USDC, 4% AKT | 2% USDC, 0.25% Reserve Fees | Low cost | 20% per session | Proportional to the pledge amount |
Safety | Rendering Proof | Proof of Stake | Calculation Proof | Proof of Stake | Proof of Rendering Capability | Inherited from the relay chain |
Proof of Completion | - | - | Timelock Proof | Learning prove | Rendering Proof of Work | TEE Attestation |
quality assurance | dispute | - | - | Verifiers and Whistleblowers | Inspector Node | Remote Attestation |
GPU Clusters | no | yes | yes | yes | yes | no |
importance
Availability of clusters and parallel computing
The distributed computing framework implements GPU clusters, providing more efficient training without compromising model accuracy while enhancing scalability . Training more complex AI models requires powerful computing power, which often must rely on distributed computing to meet its needs. To put it in a more intuitive perspective, OpenAI's GPT-4 model has more than 1.8 trillion parameters and was trained using approximately 25,000 Nvidia A100 GPUs in 128 clusters in 3-4 months .
Previously, Render and Akash only offered single-purpose GPUs, which may have limited their market demand for GPUs. However, most major projects have now integrated clusters to achieve parallel computing. io.net has worked with other projects such as Render, Filecoin, and Aethir to incorporate more GPUs into its network and has successfully deployed more than 3,800 clusters in Q1'24 . Although Render does not support clusters, it works similarly to clusters, breaking down a single frame into multiple different nodes to process different ranges of frames simultaneously. Phala currently only supports CPUs, but allows CPU workers to be clustered.
Incorporating cluster frameworks into the AI workflow network is important, but the number and type of cluster GPUs required to meet the needs of AI developers is a separate issue that we will discuss in a later section.
Data Privacy
Developing AI models requires the use of large datasets, which may come from a variety of sources and in various forms. Sensitive datasets such as personal medical records and user financial data may be at risk of being exposed to model providers. Samsung internally banned the use of ChatGPT due to concerns that uploading sensitive code to the platform would violate privacy, and Microsoft's 38TB private data leak further highlighted the importance of taking adequate security measures when using AI. Therefore, having a variety of data privacy methods is critical to returning data control to data providers.
Most of the projects covered use some form of data encryption to protect data privacy. Data encryption ensures that data transfer from data providers to model providers (data recipients) in the network is protected. Render uses encryption and hashing when publishing rendering results back to the network, while io.net and Gensyn employ some form of data encryption. Akash uses mTLS authentication to only allow tenants to receive data from providers they choose.
However, io.net recently launched fully homomorphic encryption (FHE) in collaboration with Mind Network, which allows encrypted data to be processed without first decrypting it . This innovation can ensure data privacy better than existing encryption technologies by enabling data to be securely transmitted for training purposes without revealing identities and data content.
Phala Network introduces TEE, a secure area in the main processor of the connected device. Through this isolation mechanism, it prevents external processes from accessing or modifying data regardless of their permission level, even individuals with physical access to the machine. In addition to TEE, it also incorporates the use of zk-proofs in its zkDCAP validator and jtee command-line interface for integration with RiscZero zkVM.
Proof of Completion and Quality Check
The GPUs provided by these projects provide computing power for a range of services. Since these services range from rendering graphics to AI computations, the final quality of such tasks may not always meet the user's standards. A form of proof of completion can be used to indicate that the specific GPU rented by the user was indeed used to run the desired service, and that quality checks are beneficial to the user who requested such work to be completed.
Once the computation is complete, both Gensyn and Aethir generate proofs to show that the work was completed, with io.net's proof indicating that the performance of the rented GPU was fully utilized and no issues occurred. Both Gensyn and Aethir perform quality checks on the completed computations. For Gensyn, it uses validators to rerun parts of the generated proof to check against the proof, while whistleblowers act as another layer of checks on validators. Meanwhile, Aethir uses check nodes to determine the quality of service, penalizing subpar service. Render recommends using a dispute resolution process to slash a node if the review committee finds problems with the node. Phala generates a TEE proof when it is completed, ensuring that the AI agent performed the required actions on the chain.
Hardware Statistics
Render | Akash | io.net | Gensyn | Aethir | Phala | |
Number of GPUs | 38177 | - | - | |||
Number of CPUs | 5433 | - | - | 30000+ | ||
H100/A100 quantity | - | - | - | |||
H100 fee/hour | - | $1.46 | $1.19 | - | - | - |
A100 Fee/Hour | - | $1.37 | $1.50 | $0.55 (estimated) | $0.33 (estimated) | - |
Requirements for high performance GPUs
Since AI model training requires the best performing GPUs, they tend to use GPUs like Nvidia’s A100 and H100, which offer the best quality despite the latter’s high price in the market. Seeing how the A100 is not only able to train all workloads, but also does it faster, it only shows how much the market values this hardware. Since the H100 has 4x faster inference performance than the A100 , it is now the GPU of choice, especially for large companies that are training their own LLMs.
For decentralized GPU market providers to compete with their Web2 peers, it is not only important to offer lower prices, but also to meet the actual needs of the market. In 2023, Nvidia shipped more than 500,000 H100s to centralized large technology companies , making it costly and difficult to acquire as much equivalent hardware to compete with large cloud providers. Therefore, considering the amount of hardware these projects can bring to their networks at a low cost is important to expand these services to a larger customer base.
While each project has a presence in AI and ML computing, they differ in the capabilities they provide for computing. Akash has only more than 150 H100 and A100 units in total, while io.net and Aethir have received more than 2,000 units each. Typically, pre-training LLMs or generating models from scratch requires at least 248 to more than 2,000 GPUs in a cluster , so the latter two projects are more suitable for large-scale model computing.
Depending on the cluster size required by such developers, the cost of these decentralized GPU services on the market today is already much lower than centralized GPU services. Gensyn and Aethir both claim to be able to rent A100-equivalent hardware for less than $1 per hour, but this still needs to be proven over time.
Network-connected GPU clusters have a large number of GPUs and a lower hourly cost, but one problem they have is that they are memory-constrained compared to NVLink-connected GPUs. NVLink enables direct communication between multiple GPUs , without transferring data between the CPU and GPU, to achieve high bandwidth and low latency. Compared to network-connected GPUs, NVLink-connected GPUs are best suited for LLMS with many parameters and large datasets, as they require high performance and intensive computation.
Still, decentralized GPU networks offer powerful computing power and scalability for distributed computing tasks for users with dynamic workload needs or who need flexibility and the ability to distribute workloads across multiple nodes. By providing a more cost-effective alternative to centralized cloud or data providers, these networks open up oligopoly opportunities for building more AI and ML use cases, unlike centralized AI models.
Provide consumer-grade GPU/CPU
While GPUs are the primary processing unit required for rendering and computation, CPUs also play an important role in training AI models. CPUs can be used for multiple parts of training, including data preprocessing all the way to memory resource management , which is very useful for developers developing models. Consumer-grade GPUs can also be used for less intensive tasks, such as fine-tuning already pre-trained models or training smaller-scale models on smaller datasets at a more affordable cost .
While projects like Gensyn and Aethir are primarily focused on enterprise GPUs, other projects like Render, Akash, and io.net can also serve this part of the market, given that over 85% of consumer GPU resources are idle. Providing these options allows them to develop their own market niche, allowing them to focus on large-scale intensive computing, more general small-scale rendering, or a mix between the two.
in conclusion
The AI DePIN field is still relatively new and faces its own challenges. Their solutions have been criticized for their feasibility and have encountered setbacks. For example, io.net was accused of faking GPU numbers on its network and later solved the problem by introducing a proof-of-work process to verify devices and prevent Sybil attacks.
Despite this, there has been a significant increase in the number of tasks and hardware executed in these decentralized GPU networks. The increasing volume of tasks executed on these networks highlights the growing demand for alternatives to Web2 cloud provider hardware resources. At the same time, the proliferation of hardware providers in these networks highlights previously underutilized supply. This trend further demonstrates the product-market fit of AI DePIN networks as they effectively address both demand and supply challenges.
Looking ahead, the trajectory of AI development points to a thriving multi-trillion dollar market, and we believe these decentralized GPU networks will play a key role in providing developers with cost-effective computing alternatives. By leveraging their networks to continually bridge the gap between demand and supply, these networks will make a significant contribution to the future landscape of AI and computing infrastructure.