The AI boom brought by ChatGPT: How blockchain technology can solve the challenges and bottlenecks of AI development

This article is machine translated
Show original
As the decentralized machine learning ecosystem matures, there will likely be synergies between various computing and intelligent networks.

Written by: Sami Kassab, Messari

Compile: BlockTurbo

The field of generative artificial intelligence (AI) has been the undisputed hot spot for the past two weeks, with groundbreaking new releases and cutting-edge integrations emerging. OpenAI released the highly anticipated GPT-4 model, Midjourney released the latest V5 model, and Stanford released the Alpaca 7B language model. Meanwhile, Google rolled out generative AI across its Workspace suite, Anthropic launched its AI assistant Claude, and Microsoft integrated its powerful generative AI tool, Copilot, into its Microsoft 365 suite.

The pace of AI development and adoption is accelerating as businesses begin to realize the value of AI and automation and the need to adopt these technologies to remain competitive in the marketplace.

Although AI development appears to be progressing smoothly, there are still some underlying challenges and bottlenecks that need to be addressed. As more businesses and consumers embrace AI, bottlenecks in computing power are emerging. The amount of computing required for AI systems is doubling every few months, while the supply of computing resources struggles to keep pace. In addition, the cost of training large-scale AI models continues to soar, increasing by approximately 3,100% per year over the past decade.

The trend toward rising costs and resource requirements required to develop and train cutting-edge AI systems is leading to centralization, with only entities with large budgets able to conduct research and produce models. However, several crypto-based projects are building decentralized solutions to solve these problems using open computing and machine intelligence networks.

Artificial Intelligence (AI) and Machine Learning(ML) Fundamentals

The field of AI can be daunting, with technical terms like deep learning, neural networks, and underlying models adding to its complexity. For now, let's simplify these concepts for easier understanding.

  • Artificial intelligence is a branch of computer science that involves developing algorithms and models that enable computers to perform tasks that require human intelligence, such as perception, reasoning, and decision making;
  • Machine Learning(ML) is a subset of AI that involves training algorithms to recognize patterns in data and make predictions based on those patterns;
  • Deep learning is a type of ML that involves the use of neural networks, which consist of layers of interconnected nodes that work together to analyze input data and generate output.

Basic models, such as ChatGPT and Dall-E, are large-scale deep learning models pre-trained on a large amount of data. These models are able to learn patterns and relationships in data, allowing them to generate new content similar to the original input data. ChatGPT is a language model for generating natural language text, while Dall-E is an image model for generating novel images.

Issues for the AI and ML Industry

Advances in AI are primarily driven by three factors:

  • Algorithm innovation : Researchers are constantly developing new algorithms and techniques to enable AI models to process and analyze data more efficiently and accurately.
  • Data : AI models rely on large datasets to fuel their training, enabling them to learn from the patterns and relationships in the data.
  • Computation : The complex calculations required to train AI models require a lot of computing processing power.

However, there are two main problems that hinder the development of artificial intelligence. Back in 2021, access to data is the number one challenge AI companies face in their AI development. Computing-related issues overtook data as a challenge last year, especially due to inability to access computing resources on-demand driven by high demand.

The second problem has to do with the inefficiency of algorithmic innovation. While researchers continue to make incremental improvements to models by building on previous models, the intelligence, or patterns, extracted by these models is always lost.

Let's delve deeper into these issues.

computing bottleneck

Training basic machine learning models is resource-intensive, often involving large numbers of GPUs for extended periods of time. For example, Stability.AI requires 4,000 Nvidia A100 GPUs running in AWS's cloud to train their AI models, costing over $50 million a month. OpenAI's GPT-3, on the other hand, costs $12 million to train using 1,000 Nvidia V100 GPUs.

AI companies typically face two choices: invest in their own hardware and sacrifice scalability, or choose a cloud provider and pay top dollar. While larger companies can afford the latter option, smaller companies may not have that luxury. As the cost of capital rises, startups are being forced to cut back on cloud spending, even as the cost of scaling infrastructure for large cloud providers remains largely the same.

The high computational cost of AI presents a significant barrier to researchers and organizations pursuing advancements in the field. Currently, there is an urgent need for an affordable, on-demand serverless computing platform for ML work that does not exist in the world of traditional computing. Fortunately, several crypto projects are working on developing decentralized machine learning computing networks that can meet this need.

Inefficiency and lack of collaboration

More and more AI development is being done in secret at big tech companies, not in academia. This trend has led to less collaboration in the field, with companies such as Microsoft's OpenAI and Google's DeepMind competing with each other and keeping their models private.

Lack of collaboration leads to inefficiencies. For example, if an independent research team wanted to develop a more powerful version of OpenAI's GPT-4, they would need to retrain the model from scratch, essentially relearning everything GPT-4 trained on. Considering that the training cost of GPT-3 alone is as high as $12 million, this puts smaller ML research labs at a disadvantage and pushes the future of AI development further into the control of big tech companies.

However, if researchers can build on existing models rather than starting from scratch, thereby lowering the barriers to entry; if there is an open network that incentivizes cooperation, as a free market governed model coordination layer, researchers can use other What happens when the model trains their model? The decentralized machine intelligence project Bittensor builds this type of network.

Decentralized Computing Networks for Machine Learning

A decentralized computing network connects entities seeking computing resources to systems with spare computing power by incentivizing the contribution of CPU and GPU resources to the network. Since there is no additional cost for individuals or organizations to provide their idle resources, decentralized networks can offer lower prices compared to centralized providers.

There are two main types of decentralized computing networks: general purpose and special purpose. A general-purpose computing network operates like a decentralized cloud, providing computing resources for various applications. Purpose-built computing networks, on the other hand, are tailored for specific use cases. For example, a rendering network is a dedicated computing network focused on rendering workloads.

While most ML computing workloads can run on decentralized clouds, some are better suited to specific-purpose computing networks, as described below.

Machine Learning Computing Workloads

Machine learning can be broken down into four main computational workloads:

  • Data preprocessing : preparing raw data and transforming it into a usable format for ML models, which typically involves activities such as data cleaning and normalization.
  • Training : Machine learning models are trained on large datasets to learn patterns and relationships in the data. During training, the parameters and weights of the model are adjusted to minimize error.
  • Fine-tuning : ML models can be further optimized using smaller datasets to improve performance on specific tasks.
  • Inference : Run the trained and fine-tuned model to make predictions in response to user queries.

Data preprocessing, fine-tuning, and inference workloads are well-suited to run on decentralized cloud platforms like Akash, Cudos, or iExec. However, the decentralized storage network Filecoin is particularly well-suited for data preprocessing due to its recent upgrade, enabling the Filecoin Virtual Machine (FVM). The FVM upgrade enables computation on data stored on the network, providing a more efficient solution for entities already using it for data storage.

Machine Learning Dedicated Computing Network

Due to two challenges around parallelization and validation, the training part requires a special-purpose computing network.

The training of ML models is state-dependent, which means that the result of the calculation depends on the current state of the calculation, which makes it more complicated to utilize distributed GPU networks. Therefore, a specific network designed for parallel training of ML models is required.

More important issues have to do with validation. To build a trust-minimized ML model training network, the network must have a way to verify computational work without repeating the entire computation, which would waste time and resources.

Gensyn

Gensyn is an ML-specific computational network that has found a solution to the parallelization and validation problems of training models in a decentralized and distributed manner. The protocol uses parallelization to split larger computational workloads into tasks and push them asynchronously to the network. To solve the verification problem, Gensyn uses probabilistic learning proofs, a graph-based pinpointing protocol, and a staking and slashing based incentive system.

Although the Gensyn network is not live yet, the team predicts an hourly cost of about $0.40 for a V100-equivalent GPU on its network. This estimate is based on Ethereum miners earning $0.20 to $0.35 per hour using similar GPUs prior to Merge. Even if this estimate were off by 100%, Gensyn's compute costs would still be significantly lower than the on-demand services offered by AWS and GCP.

Together

Together is another early project focused on building a decentralized computing network specifically for machine learning. At the beginning of the project, Together began to integrate unused academic computing resources from various institutions such as Stanford University, ETH Zurich, Open Science Grid, University of Wisconsin-Madison, and CrusoeCloud, resulting in a total of more than 200 PetaFLOPs of computing power. Their ultimate goal is to create a world where anyone can contribute to and benefit from advanced artificial intelligence by pooling global computing resources.

Bittensor: Decentralized Machine Intelligence

Bittensor addresses inefficiencies in machine learning while transforming the way researchers collaborate by using standardized input and output encodings to incentivize knowledge production on open-source networks to enable model interoperability.

On Bittensor, miners are rewarded with the network's native asset, TAO, for providing intelligent services to the network through unique ML models. When training their models on the network, miners exchange information with other miners, speeding up their learning. By staking TAO, users can use the intelligence of the entire Bittensor network and adjust their activities according to their needs, thus forming a P2P intelligence market. Additionally, applications can be built on top of the network's smart layer via the network's validators.

How Bittensor works

Bittensor is an open-source P2P protocol that implements decentralized Mix-of-Experts (MoE), a ML technique that combines multiple models specialized for different problems to create a more accurate overall model. This is done by training a routing model called a gating layer, which is trained on a set of expert models to learn how to intelligently route inputs to produce optimal outputs. To achieve this, validators dynamically form federations between mutually complementary models. Sparse computing is used to solve latency bottlenecks.

Bittensor's incentives attract specialized models into the mix and play a niche role in solving larger problems defined by stakeholders. Each miner represents a unique model (neural network), and Bittensor operates as a self-coordinating model of models, governed by a permissionless smart market system.

The protocol is algorithm-agnostic, validators only define locks and allow the market to find keys. The intelligence of the miners is the only component that is shared and measured, while the model itself remains private, removing any potential bias in the measurement.

Verifier

On Bittensor, the validator acts as a gating layer for the network's MoE model, acts as a trainable API and enables the development of applications on top of the network. Their staking governs the incentive landscape and determines the problems miners solve. Validators understand the value miners provide in order to reward them accordingly and reach a consensus on their ranking. Higher-ranked miners receive a higher share of the inflationary block reward.

Validators are also incentivized to discover and evaluate models honestly and efficiently, as they earn bonds from their top-ranked miners and receive a portion of their future rewards. This effectively creates a mechanism by which miners economically "bind" themselves to their miner rank. The protocol's consensus mechanism is designed to resist collusion by up to 50% of network shares, making it financially infeasible to dishonestly rank one's own miners highly.

miner

Miners on the network are trained and inferred, they selectively exchange information with their peers based on their expertise, and update the model's weights accordingly. When exchanging messages, miners prioritize validator requests according to their stake. There are currently 3523 miners online.

The exchange of information between miners on the Bittensor network allows for the creation of more powerful AI models, as miners can leverage the expertise of their peers to improve their own models. This essentially brings composability to the AI space, where different ML models can be connected to create more complex AI systems.

compound intelligence

Bittensor solves the incentive inefficiency problem through the new market, so as to effectively achieve the compounding of machine intelligence, thereby improving the efficiency of ML training. The network enables individuals to contribute to the underlying model and monetize their work, regardless of the size or niche of their contribution. This is similar to how the internet has made niche contribution economically viable and empowered individuals on content platforms like YouTube. Essentially, Bittensor is committed to commoditizing machine intelligence, becoming the Internet of AI.

Summarize

As the decentralized machine learning ecosystem matures, there will likely be synergies between various computing and intelligent networks. For example, Gensyn and Together can be used as the hardware coordination layer of the AI ecosystem, and Bittensor can be used as the intelligent coordination layer.

On the supply side, large public crypto miners who previously mined ETH have shown great interest in contributing resources to the decentralized computing network. For example, Akash has received commitments for 1 million GPUs from large miners ahead of their network GPU release. Additionally, Foundry, one of the larger private bitcoin miners, already mines on Bittensor.

The teams behind the projects discussed in this report are not just building crypto-based networks for the hype, but teams of AI researchers and engineers who have realized the potential of crypto to solve problems in their industries.

By improving training efficiency, pooling resources, and providing opportunities for more people to contribute to large-scale AI models, decentralized ML networks can accelerate AI development and allow us to unlock general artificial intelligence faster in the future.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments