If you missed Nvidia, don’t miss Crypto AI again

12-09

This article is machine translated

Show original

Good morning! It's finally here.

Our entire paper is quite extensive, so to make it easier for everyone to digest (and to avoid exceeding email service size limits), I decided to split it into several parts and share them gradually over the next month. Now, let’s get started!

A huge miss that I have never been able to forget.

This incident still bothers me to this day because it was an obvious opportunity that anyone who paid attention to the market could see, but I missed it without investing a penny.

No, this is not the next Solana killer, nor is it a memecoin with a dog in a funny hat.

It's...NVIDIA.

In just one year, NVIDIA's market cap soared from $1 trillion to $3 trillion, its stock price tripled, and it even outperformed Bitcoin during the same period.

Of course, part of this is driven by the AI boom. But more importantly, this growth has a solid foundation in reality. NVIDIA's revenue in fiscal 2024 reached $60 billion, an increase of 126% from 2023. Behind this amazing growth is the global big technology companies rushing to buy GPUs to seize the opportunity in the general artificial intelligence (AGI) arms race.

Why did I miss it?

For the past two years, I have been completely focused on the cryptocurrency space and have not been paying attention to what is happening in the AI space. This was a huge mistake that I still regret.

But this time, I won't make the same mistake again.

Today's Crypto AI gives me a sense of déjà vu.

We are on the brink of an explosion of innovation that bears striking similarities to the California Gold Rush of the mid-1800s—industries and cities spring up overnight, infrastructure develops rapidly, and those who dare to take risks reap huge profits.

Like NVIDIA in its early days, Crypto AI will seem so obvious in retrospect.

Crypto AI: An investment opportunity with unlimited potential

In the first part of my paper , I explained why Crypto AI is the most exciting potential opportunity today, both for investors and developers. Here are the key takeaways:

Many people still see it as a “castle in the air”.

Crypto AI is currently in its early stages and may still be 1-2 years away from the peak of hype.

· This sector has a growth potential of at least $230 billion.

At its core, Crypto AI is about marrying artificial intelligence with crypto infrastructure. This makes it more likely to follow the exponential growth trajectory of AI rather than the broader crypto market. So to stay ahead, you need to follow the latest AI research on Arxiv and talk to founders who believe they are building the next big thing.

Four core areas of Crypto AI

In the second part of my paper, I will focus on analyzing the four most promising subfields in Crypto AI:

1. Decentralized computing: model training, reasoning and GPU trading market

2. Data Network

3. Verifiable AI

4. AI Agents Running on the Chain

This article is the result of several weeks of in-depth research and communication with founders and teams in the Crypto AI field. It is not a detailed analysis of each field, but a high-level roadmap designed to stimulate your curiosity, help you optimize your research direction, and guide your investment decisions.

Crypto AI's Ecosystem Blueprint

I imagine the decentralized AI ecosystem as a layered structure: starting with decentralized computing and open data networks on one end, which provide the foundation for the training of decentralized AI models.

All inference inputs and outputs are verified through cryptography, cryptoeconomic incentives, and evaluation networks. These verified results flow to autonomous AI agents on the chain, as well as consumer and enterprise AI applications that users can trust.

The coordination network connects the entire ecosystem, enabling seamless communication and collaboration.

In this vision, any team engaged in AI development can access one or more layers of the ecosystem according to their needs. Whether it is using decentralized computing for model training or ensuring high-quality output through evaluation networks, this ecosystem provides a variety of options.

Thanks to the composability of blockchain, I believe we are moving towards a modular future where each layer will be highly specialized and protocols will be optimized for specific functions rather than a one-size-fits-all solution.

In recent years, a large number of startups have emerged at each layer of the decentralized AI technology stack, showing a "Cambrian" explosion of growth. Most of these companies have only been established for 1-3 years. This shows that we are still in the early stages of this industry.

The most comprehensive and up-to-date map of the Crypto AI startup ecosystem I’ve seen is maintained by Casey and her team at topology.vc, an indispensable resource for anyone wanting to track developments in this space.

As I delve deeper into the various sub-fields of Crypto AI, I always wonder: How big is the opportunity here? I’m not looking at small markets, but rather at the big opportunities that can scale to hundreds of billions of dollars.

1. Market size

When assessing market size, I ask myself: Is this sub-segment creating an entirely new market, or is it disrupting an existing one?

Take decentralized computing as an example, which is a typical disruptive field. We can estimate its potential through the existing cloud computing market. Currently, the size of the cloud computing market is about 680 billion US dollars and is expected to reach 2.5 trillion US dollars by 2032.

In contrast, a completely new market like AI agents is more difficult to quantify. Due to the lack of historical data, we can only make estimates based on intuitive judgments and reasonable guesses about its problem-solving capabilities. But we need to be vigilant that sometimes a product that appears to be a completely new market may actually be just a product of "finding a solution to a problem."

2. Timing

Timing is key to success. While technology generally improves and becomes cheaper over time, the pace of progress varies greatly from field to field.

How mature is the technology in a particular subfield? Is it mature enough for mass adoption? Or is it still in the research phase, years away from real-world adoption? Timing determines whether a field deserves immediate attention or should be left on the sidelines.

Take Fully Homomorphic Encryption (FHE) as an example: its potential is undeniable, but the current technical performance is still too slow to achieve large-scale application. It may take several years before we see it enter the mainstream market. Therefore, I will prioritize areas where the technology is close to large-scale application and focus my time and energy on those opportunities that are gathering momentum.

If you plotted these subfields on a “market size vs. timing” chart, it might look something like this. Note that this is just a conceptual sketch, not a strict guide. There are also complexities within each field - for example, in verifiable inference, different approaches (such as zkML and opML) are at different stages of technical maturity.

Despite this, I firmly believe that the future scale of AI will be extremely large. Even areas that seem "niche" today have the potential to evolve into a major market in the future.

At the same time, we must also recognize that technological progress is not always linear - it often advances in leaps and bounds. When new technological breakthroughs emerge, my views on market timing and size will also adjust accordingly.

Based on the above framework, we will then break down each sub-field of Crypto AI one by one to explore their development potential and investment opportunities.

Area 1: Decentralized Computing

Summarize

Decentralized computing is the core pillar of the entire decentralized AI.

The GPU market, decentralized training, and decentralized inference are closely related and develop in synergy.

The supply side mainly comes from GPU devices of small and medium-sized data centers and ordinary consumers.

The demand side is currently small but growing, mainly including price-sensitive users with low latency requirements, as well as some smaller AI startups.

The biggest challenge facing the current Web3 GPU market is how to make these networks truly run efficiently.

Coordinating the use of GPUs in a decentralized network requires advanced engineering techniques and robust network architecture design.

1.1 GPU Market/Computing Network

Currently, some Crypto AI teams are building decentralized GPU networks to utilize the world's underutilized computing resource pool to address the current situation where GPU demand far exceeds supply.

The core values of these GPU markets can be summarized into the following three points:

Computing costs can be up to 90% lower than AWS. This low cost comes from two aspects: first, eliminating middlemen, and second, opening up the supply side. These markets allow users to access computing resources with the lowest marginal cost in the world.

1. No need to bind long-term contracts, no identity verification (KYC), and no need to wait for approval.

2. Censorship resistance

3. To solve the supply-side problem of the market, these markets obtain computing resources from the following sources:

Enterprise GPUs: High-performance GPUs such as the A100 and H100, these devices often come from small and medium-sized data centers (which have difficulty finding enough customers when operating independently), or from Bitcoin miners who want to diversify their revenue streams. In addition, some teams are taking advantage of large government-funded infrastructure projects that build a large number of data centers as part of technology development. These suppliers are often incentivized to keep GPUs connected to the network to help offset the depreciation costs of the equipment.

Consumer GPUs: Millions of gamers and home users connect their computers to the network and earn income through token rewards.

Currently, the demand side of decentralized computing mainly includes the following types of users:

1. Users who are price-sensitive and have low latency requirements: such as researchers with limited budgets, independent AI developers, etc. They are more concerned about cost rather than real-time processing capabilities. Due to budget constraints, they often find it difficult to afford the high costs of traditional cloud service giants (such as AWS or Azure). Precision marketing for this group is very important.

2. Small AI startups: These companies need flexible and scalable computing resources, but do not want to sign long-term contracts with large cloud service providers. Attracting this group requires strengthening business cooperation because they are actively looking for alternatives to traditional cloud computing.

3. Crypto AI startups: These companies are developing decentralized AI products, but if they do not have their own computing resources, they need to rely on these decentralized networks.

4. Cloud gaming: Although not directly related to AI, cloud gaming’s demand for GPU resources is growing rapidly.

The key thing to remember is that developers always prioritize cost over reliability .

The real challenge: demand, not supply

Many startups view the size of their GPU supply network as a sign of success, but in reality, it is just a vanity metric.

The real bottleneck is on the demand side, not the supply side. The key measure of success is not how many GPUs are in the network, but the utilization of the GPUs and the number of GPUs that are actually rented.

Token incentives are very effective in activating the supply side and can quickly attract resources to join the network. But they do not directly solve the problem of insufficient demand. The real test is whether the product can be polished to a good enough state to stimulate potential demand.

As Haseeb Qureshi (from Dragonfly) said, this is the key.

Making computing networks truly work

Currently, the biggest challenge facing the Web3 distributed GPU market is actually how to make these networks run truly efficiently.

This is not an easy thing to do.

Coordinating GPUs in a distributed network is an extremely complex task, involving multiple technical difficulties, such as resource allocation, dynamic workload expansion, node and GPU load balancing, latency management, data transmission, fault tolerance, and how to handle diverse hardware devices distributed around the world. These problems are layered together, posing a huge engineering challenge.

Solving these problems requires very solid engineering and technical capabilities, as well as a robust and well-designed network architecture.

To better understand this, consider Google's Kubernetes system. Kubernetes is widely considered the gold standard in container orchestration, and it automates tasks such as load balancing and scaling in distributed environments, which are very similar to the challenges faced by distributed GPU networks. It is worth noting that Kubernetes was developed based on Google's more than ten years of experience in distributed computing, and even so, it took several years of continuous iteration to perfect it.

Currently, some GPU computing markets that have been launched can handle small-scale workloads, but once they try to scale up to a larger scale, problems will be exposed. This may be because their architectural design has fundamental flaws.

Credibility issues: challenges and opportunities

Another important issue that decentralized computing networks need to solve is how to ensure the credibility of nodes, that is, how to verify whether each node actually provides the computing power it claims. Currently, this verification process mostly relies on the network's reputation system, and sometimes computing providers are ranked according to reputation scores. Blockchain technology has a natural advantage in this area because it can implement a trustless verification mechanism. Some startups, such as Gensyn and Spheron, are exploring how to solve this problem through a trustless approach.

Currently, many Web3 teams are still working hard to address these challenges, which also means that the opportunities in this field are still very broad.

The size of the decentralized computing market

So, how big is the market for decentralized computing networks?

At present, it may only account for a tiny fraction of the global cloud computing market (which is estimated to be between $680 billion and $2.5 trillion in size). However, as long as decentralized computing costs less than traditional cloud service providers, there will be demand, even if there is some additional friction in the user experience.

I think the cost of decentralized computing will remain low in the short to medium term. This is mainly due to two aspects: one is token subsidies, and the other is supply unlocking from non-price sensitive users. For example, if I can rent out my gaming laptop to earn extra income, whether it is $20 or $50 per month, I will be happy.

The true growth potential of decentralized computing networks, and the significant expansion of their market size, will rely on several key factors:

1. Feasibility of decentralized AI model training: When decentralized networks can support the training of AI models, it will bring huge market demand.

2. The explosion of inference demand: As the demand for AI inference surges, existing data centers may not be able to meet this demand. In fact, this trend has already begun to emerge. NVIDIA's Jensen Huang said that the demand for inference will increase "a billion times."

3. Introduction of Service Level Agreements (SLAs): Currently, decentralized computing mainly provides services in a "best effort" manner, and users may face uncertainty in service quality (such as uptime). With SLAs, these networks can provide standardized reliability and performance indicators, thereby breaking down key barriers to enterprise adoption and making decentralized computing a viable alternative to traditional cloud computing.

Decentralized, permissionless computing is the foundational layer of the decentralized AI ecosystem and one of its most important infrastructures.

Although the supply chain for hardware such as GPUs is expanding, I believe that we are still at the dawn of the "Age of Human Intelligence". In the future, the demand for computing power will be endless.

Watch for a critical inflection point that could trigger a repricing in the GPU market—and that point could come soon.

Other notes:

The pure GPU market is highly competitive. Not only are there competitions between decentralized platforms, but also the strong rise of emerging Web2 AI cloud platforms (such as Vast.ai and Lambda).

Small nodes (e.g. 4 H100 GPUs) are not in great demand due to their limited usage, but if you want to find a vendor selling large clusters, it is almost impossible because they are still in great demand.

Will the supply of computing resources for decentralized protocols be consolidated by a dominant player, or will it continue to be dispersed across multiple markets? I prefer the former, and believe that the final result will be a power-law distribution, as consolidation tends to improve the efficiency of infrastructure. Of course, this process takes time, and during this period, the fragmentation and confusion of the market will continue.

Developers prefer to focus on building applications rather than spending time dealing with deployment and configuration issues. Therefore, the computing market needs to simplify these complexities and minimize the friction for users to obtain computing resources.

1.2 Decentralized Training

Summarize

If the Scaling Laws hold true, it will become physically infeasible to train the next generation of cutting-edge AI models in a single data center in the future.

Training AI models requires a large amount of data transfer between GPUs, and the low interconnection speed of distributed GPU networks is often the biggest technical obstacle.

Researchers are exploring multiple solutions and have made some breakthroughs (such as Open DiLoCo and DisTrO). These technological innovations will have a cumulative effect and accelerate the development of decentralized training.

The future of decentralized training will likely be more focused on small, specialized models designed for specific domains rather than cutting-edge models for AGI.

With the popularity of models such as OpenAI’s o1, the demand for reasoning will usher in explosive growth, which also creates huge opportunities for decentralized reasoning networks.

Imagine this: a massive, world-changing AI model developed not by secretive cutting-edge labs, but by millions of ordinary people working together. Gamers’ GPUs are no longer just used to render the cool graphics of Call of Duty, but are used to support a much greater goal — an open-source, collectively owned AI model without any centralized gatekeepers.

In such a future, infrastructure-scale AI models will no longer be the exclusive domain of top laboratories, but the result of universal participation.

But back to reality, most of the heavyweight AI training is still concentrated in centralized data centers, and this trend is unlikely to change for some time to come.

Companies like OpenAI are continually expanding their massive GPU clusters. Elon Musk recently revealed that xAI is close to completing a data center with the equivalent of 200,000 H100 GPUs.

But the problem is not just the number of GPUs. In its 2022 PaLM paper, Google proposed a key metric - Model FLOPS Utilization (MFU) - to measure the actual utilization of the GPU's maximum computing power. Surprisingly, this utilization is usually only 35-40%.

Why is it so low? Although GPU performance has been increasing rapidly with the advancement of Moore's Law, the improvement of network, memory and storage devices has lagged far behind, forming a significant bottleneck. As a result, GPUs are often idle, waiting for data transfer to complete.

Currently, there is only one fundamental reason why AI training is highly centralized - efficiency.

Training large models relies on the following key technologies:

Data parallelism: Split the dataset into multiple GPUs for parallel processing, thereby speeding up the training process.

Model parallelism: Distribute different parts of the model across multiple GPUs to overcome memory limitations.

These techniques require frequent data exchanges between GPUs, so the interconnect speed (ie, the rate at which data can be transferred across the network) is critical.

When cutting-edge AI models can cost as much as $1 billion to train, every bit of efficiency counts.

Centralized data centers, with their high-speed interconnect technology, can achieve fast data transfer between GPUs, thus significantly saving costs during training time. This is something that decentralized networks cannot currently match... at least not yet.

Overcoming slow interconnect speeds

If you talk to people working in the AI field, many will say that decentralized training doesn’t work.

In a decentralized architecture, GPU clusters are not physically located in the same place, which results in slow data transfer between them and becomes a major bottleneck. The training process requires GPUs to synchronize and exchange data at every step. The greater the distance, the higher the latency. And higher latency means slower training and increased costs.

A training task that only takes a few days to complete in a centralized data center may take two weeks in a decentralized environment and cost more. This is obviously not feasible.

However, this situation is changing.

It is exciting to see that the research interest in distributed training is growing rapidly. Researchers are exploring multiple directions at the same time, as evidenced by the large number of research results and papers that have emerged recently. These technological advances will have a cumulative effect and accelerate the development of decentralized training.

In addition, testing in actual production environments is also crucial, as it can help us break through existing technological boundaries.

Currently, some decentralized training techniques are able to handle smaller models in low-speed interconnect environments, and cutting-edge research is working to scale these methods to larger models.

For example, Prime Intellect's Open DiCoLo paper proposes a practical approach: by dividing the GPU into "islands", each island completes 500 local calculations before synchronization, thus reducing the bandwidth requirement to 1/500 of the original. This technology was originally studied by Google DeepMind for small models, and has now been successfully expanded to train a model with 10 billion parameters, and has recently been fully open sourced.

Nous Research’s DisTrO framework goes a step further, reducing the need for inter-GPU communication by up to 10,000x through optimizer technology while successfully training a model with 1.2 billion parameters.

This momentum continues. Nous recently announced that they have completed pre-training of a 15 billion parameter model, and its loss curve and convergence speed even surpass the performance of traditional centralized training.

In addition, methods like SWARM Parallelism and DTFMHE are exploring how to train very large-scale AI models on different types of devices, even if these devices have different speeds and connection conditions.

Another challenge is how to manage the diversity of GPU hardware, especially consumer-grade GPUs commonly found in decentralized networks, which often have limited memory. This problem is gradually being addressed through model parallelism (distributing different layers of a model across multiple devices).

The future of decentralized training

Currently, the model size of decentralized training methods is still far behind the most cutting-edge models (GPT-4 is reported to have nearly one trillion parameters, 100 times that of Prime Intellect's 10 billion parameter model). To achieve true scale, we need to make major breakthroughs in model architecture design, network infrastructure, and task allocation strategies.

But we can boldly imagine that in the future, decentralized training may be able to gather more GPU computing power than the largest centralized data center.

Pluralis Research (a team worth watching in the decentralized training space) believes that this is not only possible, but inevitable. Centralized data centers are limited by physical conditions, such as space and power supply, while decentralized networks can tap into nearly unlimited resources around the world.

Even NVIDIA's Jensen Huang mentioned that asynchronous decentralized training may be the key to unlocking the potential of AI expansion. In addition, distributed training networks also have stronger fault tolerance.

Therefore, in one possible future, the world’s most powerful AI models will be trained in a decentralized manner .

This vision is exciting, but I have reservations at this point. We need more strong evidence that decentralized training of very large models is technically and economically feasible.

I think the best use case for decentralized training may be in smaller, specialized open source models that are designed for specific use cases rather than competing with super large, cutting-edge models targeting AGI. Certain architectures, especially non-Transformer models, have proven themselves to be well suited to decentralized environments.

In addition, the Token incentive mechanism will also be an important part of the future. Once decentralized training becomes feasible at scale, Tokens can effectively incentivize and reward contributors, thereby promoting the development of these networks.

Although there is a long way to go, the current progress is encouraging. Breakthroughs in decentralized training will not only benefit decentralized networks, but will also bring new possibilities to large technology companies and top AI labs...

1.3 Decentralized Reasoning

Currently, most of AI’s computing resources are focused on training large models. There is an arms race going on among the top AI labs to develop the strongest basic models and ultimately achieve AGI.

But I think this focus on training will gradually shift to inference in the coming years . As AI becomes more integrated into the applications we use every day—from healthcare to entertainment—the computing resources required to support inference will become enormous.

This trend is not groundless. Inference-time compute scaling has become a hot topic in the field of AI. OpenAI recently released a preview/mini version of its latest model o1 (codename: Strawberry), whose notable feature is that it "takes time to think". Specifically, it first analyzes what steps it needs to take to answer the question, and then completes these steps step by step.

This model is designed for more complex, planning-requiring tasks, such as solving crossword puzzles, and can handle problems that require deep reasoning. Although it generates responses more slowly, the results are more detailed and thoughtful. However, this design also comes with a high running cost, with its inference fee being 25 times that of GPT-4.

This trend shows that the next leap in AI performance will not only rely on training larger models, but will also rely on expanding computing power in the inference phase.

If you want to learn more, several studies have shown that:

Scaling inference computations by repeated sampling can achieve significant performance gains in many tasks.

The reasoning phase also follows an exponential scaling law.

Once powerful AI models are trained, their reasoning tasks (i.e., the actual application stage) can be offloaded to decentralized computing networks. This approach is very attractive for the following reasons:

Inference requires much less resources than training. After training, the model can be compressed and optimized through techniques such as quantization, pruning, or distillation. The model can even be split through tensor parallelism or pipeline parallelism to run on ordinary consumer devices. Inference does not require the use of high-end GPUs.

This trend is already beginning to take shape. For example, Exo Labs has found a way to run a 450 billion parameter Llama3 model on consumer hardware like MacBooks and Mac Minis. By distributing inference tasks across multiple devices, even large-scale computing needs can be completed efficiently and cost-effectively.

Better user experience: Placing computing power closer to the user can significantly reduce latency, which is critical for real-time applications such as gaming, augmented reality (AR), or self-driving cars, where every millisecond of latency can make a difference in the user experience.

We can compare decentralized reasoning to the CDN (content distribution network) of AI. Traditional CDNs quickly transmit website content by connecting to nearby servers, while decentralized reasoning uses local computing resources to generate AI responses at extremely fast speeds. In this way, AI applications can become more efficient, more responsive, and more reliable.

This trend is already beginning to emerge. Apple's latest M4 Pro chip has performance close to NVIDIA's RTX 3070 Ti, a high-performance GPU once reserved for hardcore gamers. Today, the hardware we use every day is becoming more and more capable of handling complex AI workloads.

The value of cryptocurrency

For decentralized reasoning networks to be truly successful, participants must be provided with sufficiently attractive economic incentives. Computing nodes in the network need to be reasonably compensated for their contributed computing power, and the system must ensure fairness and efficiency in reward distribution. In addition, geographical diversity is also critical. It not only reduces the latency of reasoning tasks, but also improves the network's fault tolerance, thereby enhancing overall stability.

So what’s the best way to build a decentralized web? The answer is cryptocurrency.

Tokens are a powerful tool that aligns the interests of all participants and ensures that everyone is working towards the same goal: to scale the network and increase the value of the token.

In addition, tokens can greatly accelerate the growth of a network. They help solve the classic “chicken and egg” problem that many networks face in their early development. By rewarding early adopters, tokens can drive more people to participate in network construction from the beginning.

The success of Bitcoin and Ethereum has proven the effectiveness of this mechanism - they have gathered the largest pool of computing power on the planet.

Decentralized inference networks will be the next in line. Through their geographical diversity, these networks can reduce latency, improve fault tolerance, and bring AI services closer to users. And with the help of cryptocurrency-driven incentives, decentralized networks will scale much faster and more efficiently than traditional networks.

pay tribute

Teng Yan

In the next series of articles, we will take a deeper look at data networks and examine how they can help break through the data bottleneck facing AI.

Disclaimer

This article is for educational purposes only and does not constitute any financial advice. This is not an endorsement of buying or selling assets or financial decisions. Please always do your own research and exercise caution when making investment choices.

Original link

Welcome to BlockBeats the BlockBeats official community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group: https://t.me/BlockBeats_App

Official Twitter account: https://twitter.com/BlockBeatsAsia

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

ME News

Breaking News! The Year of China's RWA: A Compliant Channel Opens for Trillions of Yuan in Domestic Assets to Go Global

BlockTempo

Arthur Hayes speculates that the reason for the BTC crash is "institutional hedging operations": IBIT options saw a surge of $900 million.

BTC

1.06%

The Defiant

Bitcoin Selloff Sparks Hedge Fund Speculation Around BlackRock ETF

BTC

1.06%