The Holy Grail of Crypto AI: Frontier Exploration of Decentralized Training

This article is machine translated
Show original

Author: 0xjacobzhao and ChatGPT 4o

Special thanks to Advait Jayant (Peri Labs), Sven Wellmann (Polychain Capital), Chao (Metropolis DAO), Jiahao (Flock), Alexander Long (Pluralis Research), Ben Fielding & Jeff Amico (Gensyn) for their advice and feedback

In the entire value chain of AI, model training is the link with the highest resource consumption and the highest technical threshold , which directly determines the upper limit of the model's capabilities and the actual application effect. Compared with the lightweight calls in the inference stage, the training process requires continuous large-scale computing power investment, complex data processing processes and high-intensity optimization algorithm support, which is the real "heavy industry" for building AI systems. From the perspective of architectural paradigm, training methods can be divided into four categories: centralized training, distributed training, federated learning , and decentralized training , which is the focus of this article .

  • Centralized training is the most common traditional method, in which a single organization completes the entire training process in a local high-performance cluster. All components, from hardware (such as NVIDIA GPU), underlying software (CUDA, cuDNN), cluster scheduling systems (such as Kubernetes), to training frameworks (such as PyTorch based on NCCL backend), are coordinated and operated by a unified control system. This deeply collaborative architecture optimizes the efficiency of memory sharing, gradient synchronization, and fault tolerance mechanisms, and is very suitable for the training of large-scale models such as GPT and Gemini. It has the advantages of high efficiency and controllable resources, but it also has problems such as data monopoly, resource barriers, energy consumption, and single-point risks.

  • Distributed training is the mainstream method for large model training. Its core is to break down the model training tasks and distribute them to multiple machines for collaborative execution to break through the bottleneck of single-machine computing and storage. Although it has the "distributed" feature physically, the overall scheduling and synchronization are still controlled by a centralized organization. It often runs in a high-speed local area network environment. Through the NVLink high-speed interconnect bus technology, the master node coordinates all subtasks. Mainstream methods include:

    • Data Parallel : Each node trains different data parameters and shares them, which requires matching model weights.
    • Model Parallel: Deploy different parts of the model on different nodes to achieve strong scalability;
    • Pipeline Parallel: Execute serially in stages to improve throughput;
    • Tensor Parallel : Refined segmentation of matrix calculations to improve parallel granularity.

Distributed training is a combination of "centralized control + distributed execution", similar to the same boss remotely directing multiple "office" employees to collaborate to complete tasks. Currently, almost all mainstream large models (GPT-4, Gemini, LLaMA, etc.) are trained in this way.

  • Decentralized Training represents a more open and censorship-resistant future path. Its core feature is that multiple untrusted nodes (which may be home computers, cloud GPUs or edge devices) work together to complete training tasks without a central coordinator, usually through protocol-driven task distribution and collaboration, and with the help of cryptographic incentive mechanisms to ensure the honesty of contributions. The main challenges faced by this model include:

    • Heterogeneous devices and difficult task division: Heterogeneous devices are difficult to coordinate and task division is inefficient;
    • Communication efficiency bottleneck: Network communication is unstable and gradient synchronization bottleneck is obvious;
    • Lack of trusted execution: The lack of a trusted execution environment makes it difficult to verify whether the node is actually involved in the calculation;
    • Lack of unified coordination: There is no central scheduler, and the task distribution and exception rollback mechanisms are complex.

Decentralized training can be understood as: a group of volunteers around the world, each contributing computing power to collaboratively train models , but "truly feasible large-scale decentralized training" is still a systematic engineering challenge, involving multiple levels such as system architecture, communication protocols, cryptographic security, economic mechanisms, and model verification. However, whether "collaboration is effective + incentives are honest + results are correct" can be achieved is still in the early prototype exploration stage.

  • Federated Learning, as a transitional form between distribution and decentralization, emphasizes local data retention and centralized aggregation of model parameters , and is suitable for scenarios that focus on privacy compliance (such as healthcare and finance). Federated learning has the engineering structure and local coordination capabilities of distributed training, as well as the data dispersion advantages of decentralized training, but it still relies on trusted coordinators and does not have the characteristics of complete openness and censorship resistance. It can be seen as a "controlled decentralization" solution in a privacy compliance scenario . It is relatively mild in terms of training tasks, trust structure, and communication mechanisms, and is more suitable as a transitional deployment architecture for the industry.

AI training paradigm panorama comparison table (technical architecture × trust incentive × application characteristics)

Dimensions

Centralized training

Distributed training (synchronous/asynchronous/hybrid)

Federated Learning

Decentralized Training

definition

All data and training are concentrated on a single node or cluster

Distribute the training process across multiple physical nodes in a controlled environment

Data is kept locally, only parameters/gradients are uploaded

No trust required, open participation, training is coordinated by the network

Communication bandwidth requirements

Very high (local bus)

High (synchronous) Medium (asynchronous)

Very low (upload compressed models/gradients)

Medium-low (combined with asynchronous strategy and compressed communication)

Hardware Type

Dedicated Servers/GPU Clusters

High-speed interconnection of GPU clusters or servers across data centers

Heterogeneous devices: mobile phones/IoT/edge nodes

Generalized heterogeneous devices: GPU/CPU/terminal devices/cloud nodes, etc.

Control and coordination mechanisms

Complete control by a single institution

Master-slave or dispatch control may be deployed across organizations

The center coordinates the update of parameters, and the local controls the data

Network consensus coordination + encryption verification mechanism

Synchronization mechanism

Real-time full synchronization

Synchronous (global step-by-step aggregation) Asynchronous (local update) Hybrid (such as Partial Sync)

Multiple rounds of local training + aggregation (such as FedAvg)

Asynchronous training + soft synchronization (such as DiLoCo/SWARM)

Security/Privacy

Local trust protection (firewall/authority isolation)

Medium (encrypted transmission is required, but usually not a priority for privacy)

Strong privacy (data does not leave the local machine, supports differential privacy)

Strong verifiability, support for cryptographic schemes such as ZK/TEE/MPC

Fault Tolerance

Central node failure means downtime

Weak synchronization, good asynchrony, medium fault tolerance of hybrid strategy

Supports disconnection and robust iterative convergence

High fault tolerance, naturally adapting to frequent node entry, exit or interruption

Scalability

Limited by server size

Medium (scaling to hundreds of GPUs)

High (the more devices available, the stronger)

Very high (theoretically scalable to millions of nodes, depending on verification and communication efficiency)

Openness

❌ Inside a closed institution

⚠️ Semi-open (participation within the institution or after registration)

⚠️ Partially open (registry or specific data alliance)

✅ Completely open (no permission required to join and leave)

Is it censorship-resistant?

❌ No

❌ No

⚠️ Some degree of censorship resistance (local control of data)

✅ Anti-censorship design, node autonomy, no central downtime point

Trust Assumption

✅ Fully Trust Center

✅ Trust the coordinator

✅ Trust a central server to coordinate updates

❌ Do not trust any nodes, rely on cryptography + network game verification

Incentive Mechanism

❌ None

❌ No or no internal indicator assessment mechanism

⚠️ Points/credit mechanism can be set

✅ Token economy driven, contributions and rewards linked (such as Gensyn, etc.)

Representative technologies/projects

OpenAI GPT / DeepMind Gemini

Megatron / ZeRO / FSDP

Google FedAvg/Flower/OpenFL/Flock

Gensyn/Pluralis/Nous/Prime Intellect

Typical application scenarios

Internal development and closed-source model training

Large model pre-training (GPT/LLaMA, etc.)

Medical/Finance/IoT Data Protection Scenarios

Crypto AI, open collaborative training, censorship-resistant models, global computing shared training

Is the data aggregated?

✅ Fully aggregated

✅ Aggregate data/weight

❌ Data is not aggregated

❌ Data and weights are not aggregated, only compressed information is synchronized/models are merged

Adapt model size

Any (subject to local hardware)

Medium to large (requires multi-GPU synchronization/storage)

Mainly small and medium-sized (due to limitations of edge devices)

Start with small and medium-sized models, support SWARM/Pipe parallelization to improve large model capabilities

The boundaries, opportunities and realistic paths of decentralized training

From the perspective of training paradigm, decentralized training is not applicable to all types of tasks . In some scenarios, due to the complex structure of the task, extremely high resource requirements, or difficulty in collaboration, it is naturally not suitable to be completed efficiently between heterogeneous, trustless nodes. For example, large model training often relies on high video memory, low latency, and high-speed bandwidth, which makes it difficult to effectively split and synchronize in an open network; tasks with strong data privacy and sovereignty restrictions (such as medical, financial, and confidential data) are restricted by legal compliance and ethical constraints and cannot be openly shared; and tasks that lack a collaborative incentive basis (such as corporate closed-source models or internal prototype training) lack external motivation for participation. These boundaries together constitute the current practical limitations of decentralized training.

But this does not mean that decentralized training is a false proposition. In fact, decentralized training shows clear application prospects in tasks that are lightweight, easy to parallelize, and motivate . Including but not limited to: LoRA fine-tuning , behavioral alignment post-training tasks (such as RLHF, DPO) , data crowdsourcing training and labeling tasks , resource-controllable small basic model training , and collaborative training scenarios involving edge devices . These tasks generally have the characteristics of high parallelism, low coupling, and tolerance for heterogeneous computing power, and are very suitable for collaborative training through P2P networks, Swarm protocols, distributed optimizers, etc.

Decentralized training task suitability overview table

Task Type

Typical scenarios

Decentralized Adaptability

Notes/ Representative path

LoRA Adapter Tuning

Fine-tune very few parameters, suitable for community collaboration

✅ Very high

Lightweight parameters, crowdsourcing friendly, easy to split

Post-training

DPO, SWARM and other behavioral optimization

✅ High

The rewards are clear and the tasks are small in size

Data-centric training

Multiple nodes participate in data generation, labeling and scoring

✅ High

Data sources are dispersed and suitable for incentive mechanisms

Small basic model training (resource controllable)

Low parameter count, suitable for collaborative training with consumer-grade GPUs

✅ High

Heterogeneous execution is possible and tasks can be split

Edge-coordinated

IoT/mobile phone/TEE and other edge devices collaborative training

✅ High

Nodes are naturally distributed, data is local

Tasks that are extremely resource or system demanding

Large model training, complex pipeline, real-time RL

❌ Not Suitable

High video memory, low latency, high bandwidth dependency, difficult to split tasks

Data compliance and sovereignty-restricted tasks

Medical, financial, and government confidential data training

❌ Not Suitable

Heavy regulations, non-collaborative data, and difficult to open up for participation

Tasks that lack an incentive basis for collaboration

Enterprise closed-source model, internal prototype experiment

❌ Not Suitable

No willingness to open up, no incentive mechanism, and naturally reject collaborative training

Analysis of classic decentralized training projects

At present, in the frontier field of decentralized training and federated learning , representative blockchain projects mainly include Prime Intellect, Pluralis.ai, Gensyn, Nous Research and Flock.io . From the perspective of technological innovation and engineering difficulty, Prime Intellect, Nous Research and Pluralis.ai have proposed more original explorations in system architecture and algorithm design, representing the frontier direction of current theoretical research; while the implementation paths of Gensyn and Flock.io are relatively clear, and initial engineering progress can be seen. This article will analyze the core technologies and engineering architectures behind these five projects in turn , and further explore their differences and complementary relationships in the decentralized AI training system.

Prime Intellect: A pioneer in collaborative reinforcement learning networks with verifiable training trajectories

Prime Intellect is committed to building a trustless AI training network that allows anyone to participate in training and receive credible rewards for their computing contributions. Prime Intellect hopes to build a verifiable, open, and fully incentivized AI decentralized training system through the three modules of PRIME-RL + TOPLOC + SHARDCAST.

1. Prime Intellect protocol stack structure and key module value

Tiers

Module Name

Functional Description

Core keywords

Core Values

Training execution layer

PRIME-RL

Asynchronous reinforcement learning architecture, decoupling training, reasoning and weight update, adapting to heterogeneous networks and asynchronous participation

Asynchronous training, training decoupling, reinforcement learning, heterogeneous adaptation

Improve node elasticity and fault tolerance, lower entry barriers, and support flexible distributed task deployment

Behavior Verification Layer

TOPLOC

Verify the authenticity of training based on local consistency of trajectories to avoid the high cost of ZKML

Strategy verification, track consistency, lightweight ZK replacement, and trustworthy rewards

Provide a structured training verification mechanism to ensure that the reward distribution is real and effective, and build a trust-minimized network foundation

Weight Propagation Layer

SHARDCAST

Gossip + local synchronization to asynchronously aggregate weights, supporting multi-version coexistence and strategy evolution

Asynchronous aggregation, gossip, version coexistence, strategy evolution

Reduce bandwidth consumption, support gradual fusion of heterogeneous node weights, and improve aggregation efficiency and network scalability

Communication Layer

OpenDiLoCo + PCCL

Build an asynchronous communication protocol for sparse topology, with underlying support for gradient compression, breakpoint tolerance, and multi-device compatibility

Sparse communication, asynchronous topology, compressed synchronization, cross-device compatibility

Improve communication flexibility, reduce costs, and support long-term stable operation of decentralized training networks

Simulation environment layer

Synthetic-1

Reinforcement learning task test platform to evaluate collaborative efficiency, incentive design and convergence

Collaborative testing, incentive verification, experimental sandbox, multi-task support

Reduce the cost of trial and error and provide a safe verification scenario for protocol optimization and incentive mechanism design

Scheduling and consensus layer

Protocol Layer

Node registration, task release, log on-chain, reward settlement and governance integration

Task management, on-chain records, incentive closed loop, protocol governance

Build a transparent closed loop of on-chain execution and rewards to improve auditability and system governance capabilities

2. Detailed explanation of the key mechanisms of Prime Intellect training

  • PRIME-RL: Decoupled Asynchronous Reinforcement Learning Task Architecture

PRIME-RL is a task modeling and execution framework customized by Prime Intellect for decentralized training scenarios, designed for heterogeneous networks and asynchronous participation. It uses reinforcement learning as the priority adaptation object, structurally decouples the training, reasoning and weight upload processes, so that each training node can complete the task cycle independently locally, and collaborate with the verification and aggregation mechanism through standardized interfaces. Compared with traditional supervised learning processes, PRIME-RL is more suitable for flexible training in a decentralized scheduling environment, which not only reduces the complexity of the system, but also lays the foundation for supporting multi-task parallelism and strategy evolution.

  • TOPLOC: A lightweight training behavior verification mechanism

TOPLOC (Trusted Observation & Policy-Locality Check) is the core mechanism of training verifiability proposed by Prime Intellect, which is used to determine whether a node has actually completed effective policy learning based on observed data. Unlike heavy solutions such as ZKML, TOPLOC does not rely on full model recalculation, but completes lightweight structure verification by analyzing the local consistency trajectory between "observation sequence ↔ policy update". It is the first time that it converts the behavioral trajectory of the training process into a verifiable object. It is a key innovation to achieve trustless training reward distribution and provides a feasible path for building an auditable and incentivized decentralized collaborative training network.

  • SHARDCAST: Asynchronous Weight Aggregation and Propagation Protocol

SHARDCAST is a weight propagation and aggregation protocol designed by Prime Intellect, optimized for real network environments with asynchronous, bandwidth-constrained, and variable node states. It combines the gossip propagation mechanism with the local synchronization strategy, allowing multiple nodes to continuously submit partial updates in an asynchronous state, achieving progressive convergence and multi-version evolution of weights. Compared with centralized or synchronous AllReduce methods, SHARDCAST significantly improves the scalability and fault tolerance of decentralized training, and is the core foundation for building stable weight consensus and continuous training iterations.

  • OpenDiLoCo: A framework for sparse asynchronous communication

OpenDiLoCo is a communication optimization framework independently implemented and open-sourced by the Prime Intellect team based on the DiLoCo concept proposed by DeepMind. It is designed for challenges such as bandwidth constraints, device heterogeneity, and node instability that are common in decentralized training. Its architecture is based on data parallelism. By building sparse topological structures such as Ring, Expander, and Small-World, it avoids the high communication overhead of global synchronization and only relies on local neighbor nodes to complete model collaborative training. Combined with asynchronous updates and breakpoint fault tolerance mechanisms, OpenDiLoCo enables consumer-grade GPUs and edge devices to stably participate in training tasks, significantly improving the participation of global collaborative training, and is one of the key communication infrastructures for building decentralized training networks.

  • PCCL: Collaborative Communication Library

PCCL (Prime Collective Communication Library) is a lightweight communication library tailored by Prime Intellect for decentralized AI training environments. It aims to solve the adaptation bottleneck of traditional communication libraries (such as NCCL, Gloo) in heterogeneous devices and low-bandwidth networks. PCCL supports sparse topology, gradient compression, low-precision synchronization and breakpoint recovery, and can run on consumer-grade GPUs and unstable nodes. It is the underlying component that supports the asynchronous communication capabilities of the OpenDiLoCo protocol. It significantly improves the bandwidth tolerance and device compatibility of the training network, and opens up the "last mile" communication foundation for building a truly open, trustless collaborative training network.

3. Prime Intellect Incentive Network and Role Division

Prime Intellect has built a permissionless, verifiable, and economically incentivized training network that enables anyone to participate in tasks and be rewarded based on real contributions. The protocol operates based on three core roles:

  • Task initiator : defines the training environment, initial model, reward function and verification criteria
  • Training nodes : perform local training, submit weight updates and observe trajectories
  • Verification node : Use the TOPLOC mechanism to verify the authenticity of training behavior and participate in reward calculation and strategy aggregation

The core process of the protocol includes task release, node training, trajectory verification, weight aggregation (SHARDCAST) and reward distribution, forming an incentive closed loop around "real training behavior".

INTELLECT-2: The Release of the First Verifiable Decentralized Training Model

Prime Intellect released INTELLECT-2 in May 2025 , the world's first large reinforcement learning model trained by asynchronous, trustless decentralized nodes, with a parameter scale of 32B . The INTELLECT-2 model is trained by 100+ GPU heterogeneous nodes across three continents, using a fully asynchronous architecture and training time of over 400 hours, demonstrating the feasibility and stability of asynchronous collaborative networks. This model is not only a breakthrough in performance, but also the first systematic implementation of the "training is consensus" paradigm proposed by Prime Intellect. INTELLECT-2 integrates core protocol modules such as PRIME-RL (asynchronous training structure) , TOPLOC (training behavior verification) and SHARDCAST (asynchronous weight aggregation) , marking the first time that a decentralized training network has achieved the openness, verifiability and economic incentive closed loop of the training process .

In terms of performance, INTELLECT-2 is based on QwQ-32B training and has done special RL training in code and mathematics, which is at the forefront of current open source RL fine-tuning models. Although it has not yet surpassed closed-source models such as GPT-4 or Gemini, its real significance lies in: it is the world's first decentralized model experiment with a complete training process that is reproducible, verifiable, and auditable . Prime Intellect not only open-sourced the model, but more importantly, the training process itself - the training data, strategy update trajectory, verification process and aggregation logic are all transparent and traceable, building a decentralized training network prototype that everyone can participate in, trustworthy collaboration, and share benefits .

5. Team and Financing Background

Prime Intellect completed a $15 million seed round of financing in February 2025, led by Founders Fund, with participation from industry leaders such as Menlo Ventures, Andrej Karpathy, Clem Delangue, Dylan Patel, Balaji Srinivasan, Emad Mostaque, and Sandeep Nailwal. Prior to this, the project completed a $5.5 million early round of financing in April 2024, led by CoinFund and Distributed Global, with participation from Compound VC, Collab + Currency, and Protocol Labs. To date, Prime Intellect has raised more than $20 million in total.

The co-founders of Prime Intellect are Vincent Weisser and Johannes Hagemann. The team members have backgrounds in AI and Web3. The core members come from Meta AI, Google Research, OpenAI, Flashbots, Stability AI and the Ethereum Foundation. They have profound capabilities in system architecture design and distributed engineering implementation. They are one of the very few executive teams that have successfully completed real decentralized large-scale model training.

Pluralis: A paradigm explorer for asynchronous model parallelism and structure compression collaborative training

Pluralis is a Web3 AI project focused on "trusted collaborative training networks". Its core goal is to promote a decentralized, open-participation, and long-term incentive model training paradigm. Different from the current mainstream centralized or closed training paths, Pluralis proposes a new concept called Protocol Learning : "protocol-based" model training, and builds an open training system with an intrinsic incentive closed loop through verifiable collaboration mechanisms and model ownership mapping.

1. Core Concept: Protocol Learning

Protocol Learning proposed by Pluralis consists of three key pillars:

  1. Unmaterializable Models
    The model is distributed among multiple nodes in the form of fragments, and no single node can restore the complete weights and remain closed source. This design makes the model a natural "in-protocol asset", which can realize access credential control, leakage protection and income attribution binding.
  2. Model-parallel Training over Internet
    Through the asynchronous Pipeline model parallel mechanism (SWARM architecture), different nodes only hold partial weights and collaborate to complete training or inference through a low-bandwidth network.
  3. Partial Ownership for Incentives
    All participating nodes obtain partial ownership of the model based on their training contribution, thereby enjoying future profit sharing and protocol governance rights.

2. Technical Architecture of Pluralis Protocol Stack

Tiers

Module Name

Functional Description

Training Scheduling Layer

Swarm Parallel

Asynchronous Pipeline model parallelism, supporting flexible participation and collaborative training with heterogeneous hardware

Communication Compression Layer

Column-Space Sparsification

Designed for the Transformer architecture, structured compression of activation tensor column space, 90%+ communication compression rate

Optimizing the synchronization layer

NAG-Async Update

Introducing momentum look-ahead mechanism to solve the asynchronous gradient outdated problem and improve training stability and throughput

Incentive confirmation layer

Partial Ownership Allocation

Bind model contribution and benefits to establish a long-term incentive mechanism for participants

Weight protection layer

Protocol Models

The model cannot be exported and can only be run in Swarm to ensure security and value attribution

3. Detailed explanation of key technical mechanisms

  • Unmaterializable Models

In A Third Path: Protocol Learning, it was first proposed that model weights be distributed in the form of fragments to ensure that "model assets" can only run in the Swarm network, and that their access and benefits are controlled by the protocol. This mechanism is the prerequisite for achieving a sustainable incentive structure for decentralized training.

  • Asynchronous Model-Parallel Training

In "SWARM Parallel with Asynchronous Updates", Pluralis built an asynchronous model parallel architecture based on Pipeline and demonstrated it for the first time on LLaMA-3. The core innovation is the introduction of the Nesterov Accelerated Gradient (NAG) mechanism, which effectively corrects the gradient drift and convergence instability problems during the asynchronous update process, making training between heterogeneous devices practical in a low-bandwidth environment.

  • Column-Space Sparsification

In Beyond Top-K, it is proposed to replace the traditional Top-K with a structure-aware column space compression method to avoid destroying the semantic path. This mechanism takes into account both model accuracy and communication efficiency. It has been tested that more than 90% of communication data can be compressed in an asynchronous model parallel environment, which is a key breakthrough in achieving structure-aware efficient communication.

4. Technology Positioning and Path Selection

Pluralis clearly takes "asynchronous model parallelism" as its core direction, emphasizing that it has the following advantages over data parallelism:

  • Support low bandwidth networks and non-coherent nodes ;
  • Adapt to device heterogeneity and allow consumer-grade GPUs to participate;
  • It has natural elastic scheduling capabilities and supports frequent node online/offline.
  • The three major breakthrough points are structure compression + asynchronous update + weight non-extractability .

At present, according to the six technical blog documents published on the official website, the logical structure is integrated into the following three main lines:

  1. Philosophy and Vision : A Third Path: Protocol Learning Why Decentralized Training Matters
  2. Technical mechanism details : "SWARM Parallel", "Beyond Top-K", "Asynchronous Updates"
  3. Exploration of Institutional Innovation : Unmaterializable Models and Partial Ownership Protocols

At present, Pluralis has not yet launched any products, test networks or open source codes. The reason is that the technical path it has chosen is extremely challenging: it must first solve system-level problems such as the underlying system architecture, communication protocols, and the non-exportability of weights before it can package product services upward.

In a new paper published by Pluralis Research in June 2025, its decentralized training framework was expanded from model pre-training to the model fine-tuning stage, supporting asynchronous updates, sparse communication and partial weight aggregation. Compared with previous designs that focused on theory and pre-training, this work paid more attention to the feasibility of implementation, marking its further maturity in the full-cycle training architecture.

5. Team and Financing Background

Pluralis completed a $7.6 million seed round in 2025 , led by Union Square Ventures (USV) and CoinFund . Founder Alexander Long has a PhD in machine learning and a background in both mathematics and systems research. The core members are all machine learning researchers with PhDs. It is a typical technology-driven project , with high-density papers and technical blogs as the main publishing path. It has not yet established a BD/Growth team and is focused on overcoming the infrastructure challenges of low-bandwidth asynchronous model parallelism.

Gensyn: A decentralized training protocol layer driven by verifiable execution

Gensyn is a Web3 AI project focusing on "trusted execution of deep learning training tasks". The core is not to reconstruct the model architecture or training paradigm, but to build a verifiable distributed training execution network with the full process of "task distribution + training execution + result verification + fair incentives" . Through the architectural design of off-chain training + on-chain verification, Gensyn has established an efficient, open and incentivized global training market, making "training is mining" a reality.

1. Project Positioning: Execution Protocol Layer for Training Tasks

Gensyn is not about “how to train”, but about the infrastructure of “who trains, how to verify, and how to share profits”. Its essence is a verifiable computing protocol for training tasks, which mainly solves:

  • Who will perform the training task (computing power distribution and dynamic matching)
  • How to verify the execution results (no need to recalculate the whole thing, only verify the disputed operators)
  • How to distribute training income (Stake, Slashing and multi-role game mechanism)

2. Technical Architecture Overview

Tiers

Modules

Functional Description

Execution Layer

RL Swarm

Multi-model collaborative reinforcement learning system, supporting heterogeneous devices, local updates, and no need to synchronize gradients

Validation Layer

Verde + PoL

Training behavior verifiable mechanism, combining minimization recalculation and gradient trajectory verification

Communication Layer

SkipPipe

Supports fault-tolerant communication mechanism with layer hopping and dynamic scheduling to improve throughput and stability

HDEE

Supports collaborative training of heterogeneous expert models and adapts to multi-task complex data scenarios

Incentive layer

Multi-role game mechanism

Submitter / Solver / Verifier / Whistleblower Role Collaboration Game Mechanism

3. Module Detailed Explanation

  • RL Swarm: A collaborative reinforcement learning training system

RL Swarm, pioneered by Gensyn, is a decentralized multi-model collaborative optimization system for the post-training phase , with the following core features:

  • Distributed reasoning and learning process :
    • Answering : Each node outputs the answer independently.
    • Critique stage : Nodes comment on each other's output and select the best answer and logic;
    • Consensus phase (Resolving) : Predict the preferences of most nodes and modify their own answers accordingly to achieve local weight updates.

RL Swarm proposed by Gensyn is a decentralized multi-model collaborative optimization system. Each node runs an independent model and performs local training without gradient synchronization . It naturally adapts to heterogeneous computing power and unstable network environment, and supports elastic node access and exit. This mechanism draws on the ideas of RLHF and multi-agent game, but is closer to the dynamic evolution logic of collaborative reasoning network. Nodes are rewarded according to the degree of consistency with the group consensus results, thereby driving continuous optimization and convergent learning of reasoning capabilities. RL Swarm significantly improves the robustness and generalization ability of the model in an open network, and has been deployed as a core execution module in Gensyn's Testnet Phase 0 based on Ethereum Rollup.

  • Verde + Proof-of-Learning: Trusted Verification Mechanism

Gensyn's Verde module combines three mechanisms:

  • Proof-of-Learning : Determine whether training actually occurred based on gradient traces and training metadata;
  • Graph-Based Pinpoint : locates divergent nodes in the training computation graph and only needs to recalculate specific operations;
  • Refereed Delegation : It adopts arbitration verification mechanism, where verifier and challenger raise disputes and verify locally, which greatly reduces the verification cost.

Compared with ZKP or full recomputation verification schemes, the Verde scheme achieves a better balance between verifiability and efficiency .

  • SkipPipe: Communication fault-tolerant optimization mechanism

SkipPipe is designed to solve the communication bottleneck problem in the "low bandwidth + node offline" scenario. Its core capabilities include:

  • Skip Ratio: skip restricted nodes to avoid training blockage;
  • Dynamic scheduling algorithm: generates the optimal execution path in real time;
  • Fault-tolerant execution: Even if 50% of the nodes fail, the inference accuracy only drops by about 7%.

It supports training throughput improvement of up to 55%, and implements key capabilities such as "early-exit reasoning", "seamless reordering", and "inference completion".

  • HDEE: Cross-domain heterogeneous expert cluster

The HDEE ( Heterogeneous Domain-Expert Ensembles ) module is dedicated to optimizing the following scenarios:

  • Multi-domain, multi-modal, and multi-task training;
  • The distribution of various types of training data is uneven and the difficulty varies greatly;
  • Task allocation and scheduling problems in an environment with heterogeneous device computing capabilities and inconsistent communication bandwidth.

Its core features:

  • MHe-IHo : Assign models of different sizes to tasks of different difficulty (heterogeneous models, consistent training step size);
  • MHo-IHe : uniform task difficulty, but asynchronous adjustment of training step size;
  • Support heterogeneous expert models + pluggable training strategies to improve adaptability and fault tolerance;
  • It emphasizes "parallel collaboration + extremely low communication + dynamic expert allocation" and is suitable for complex task ecosystems in reality.
  • Multi-role game mechanism: trust and incentives go hand in hand

The Gensyn network introduces four types of participants:

  • Submitter : publishes training tasks, sets structure and budget;
  • Solver : executes training tasks and submits results;
  • Verifier : Verify training behavior to ensure compliance and effectiveness;
  • Whistleblower : Challenge validators to obtain arbitration rewards or bear penalties.

This mechanism is inspired by the Truebit economic game design. By forcibly inserting errors + random arbitration , it encourages participants to collaborate honestly and ensures the reliable operation of the network.

4. Testnet and Roadmap Planning

stage

Core Features

Target

✅ Phase 0

RL Swarm + Identity Tracking Mechanism

Realize basic training task collaboration and attribution mechanism

🟡 Phase 1

Integrated Verde verification and SkipPipe communication fault tolerance

Support more training types and verification methods

🟢 Phase 2

Introducing RL environment hosting + model pre-training tasks

Support real training needs and multi-model parallelism

🟣 Phase 3

Inference-as-a-Service

Supports on-chain calls and model-as-asset service capabilities

🏁 Final

Mainnet launch + Token economic closed loop

Building a complete execution layer for the “decentralized training market”

5. Team and Financing Background

Gensyn was co-founded by Ben Fielding and Harry Grieve and is headquartered in London, UK. In May 2023, Gensyn announced the completion of a $43 million Series A financing led by a16z crypto, with other investors including CoinFund, Canonical, Ethereal Ventures, Factor and Eden Block. The team background combines distributed systems and machine learning engineering experience, and has long been committed to building a verifiable, trustless, large-scale AI training execution network.

Nous Research: A cognitive evolutionary training system driven by subjective AI concepts

Nous Research is one of the few decentralized training teams that has both philosophical and engineering achievements. Its core vision stems from the concept of "Desideratic AI": AI is viewed as an intelligent subject with subjectivity and evolutionary capabilities , rather than a simple controllable tool. The uniqueness of Nous Research lies in the fact that it does not optimize AI training as an "efficiency problem", but rather as a process of forming a "cognitive subject". Driven by this vision, Nous focuses on building an open training network that is collaboratively trained by heterogeneous nodes, does not require central scheduling, and is censorship-resistant , and is systematically implemented through a full-stack tool chain.

1. Concept support: redefine the “purpose” of training

Nous doesn’t invest too much in incentive design or protocol economics, but instead seeks to change the philosophical premise of the training itself :

  • Oppose "alignmentism" : do not agree with "training-style training" that has human control as the only goal, and advocate that training should encourage the model to form an independent cognitive style;
  • Emphasis on model subjectivity : It is believed that the basic model should retain uncertainty, diversity and hallucination generation ability (hallucination as virtue);
  • Model training is cognitive formation : the model is not "optimizing task completion" but an individual participating in the cognitive evolution process.

Although this training concept is "romantic", it reflects the core logic of Nous in designing training infrastructure: how to allow heterogeneous models to evolve in an open network rather than being uniformly disciplined .

2. Training Core: Psyche Network and DisTrO Optimizer

Nous's most critical contribution to decentralized training is the construction of the Psyche network and the underlying communication optimizer DisTrO (Distributed Training Over-the-Internet) , which together constitute the execution center of the training task: DisTrO + Psyche network has multiple core capabilities, including communication compression (using DCT + 1-bit sign encoding, greatly reducing bandwidth requirements), node adaptability (supporting heterogeneous GPUs, disconnection reconnection and autonomous exit), asynchronous fault tolerance (continuous training without synchronization, with high fault tolerance), and decentralized scheduling mechanism (no central coordinator, consensus and task distribution based on blockchain). This architecture provides a realistic and feasible technical foundation for low-cost, highly flexible, and verifiable open training networks.

Components

Feature highlights

DisTrO Optimizer

DisTrO (Distributed Training Over-the-Internet) is a distributed training communication optimization mechanism launched by Nous, which aims to enable large model training to run efficiently and stably on ordinary consumer-grade GPUs, non-professional clusters, and high-latency, low-bandwidth network environments. Its core features include:

  1. Extreme communication compression : DCT (discrete cosine transform) is used to convert gradient or momentum into frequency domain signals, retaining only the frequency components with the highest energy (such as top-k high frequencies), greatly reducing the amount of inter-node communication required for each round of training, and effectively alleviating bandwidth bottlenecks.
  2. Training-Communication Parallelism (Overlapped DisTrO) : Supports entering the next round of training immediately after a single node completes the current gradient calculation without waiting for communication to complete, realizing overlapping execution of training and communication, significantly improving GPU utilization and overall throughput efficiency.
  3. Asynchronous/partially synchronous compatibility : Supports non-fully synchronous training update mechanism, allowing nodes to train independently in an asynchronous state, and can tolerate node delays, disconnections or exits, enhancing network fault tolerance and elastic collaboration capabilities.

DisTrO is designed to fully adapt to the real open network environment and is one of the key basic components of the Nous decentralized training architecture to achieve "low-cost participation + stable convergence".

Psyche training network

Distributed communication and weight sharing mechanism : The Psyche network uses the Iroh + Solana blockchain as the coordination layer to ensure the trusted propagation of training tasks, parameter updates and witness proofs between nodes. The entire system does not require a central server or master scheduler. All model updates are automatically triggered through the P2P network and the on-chain random seed mechanism.

Currently, Nous has launched the first large-model pre-training task on the Psyche network - the Consilience training plan , which adopts the self-developed MLA (Multi-head Latent Attention) architecture, which is different from the mainstream MoE or GQA route, and further emphasizes the freedom of expression and self-evolution potential of the model structure.

This architectural design emphasizes practical feasibility: it does not rely on central servers, is adaptable to global volunteer nodes, and has on-chain traceability of training results .

3. Reasoning and agency system composed of Hermes / Forge / TEE_HEE

In addition to building decentralized training infrastructure, Nous Research has also conducted several exploratory system experiments around the concept of "AI subjectivity":

  1. Hermes open source model series: Hermes 1 to 3 are representative open source large models launched by Nous, based on LLaMA 3.1 training, covering three parameter scales of 8B, 70B and 405B. This series aims to embody the "de-instruction, diversity retention" training concept advocated by Nous, and demonstrates stronger expressiveness and generalization capabilities in long context retention, role-playing, multi-round dialogue, etc.
  2. Forge Reasoning API: a multi-modal reasoning system

Forge is a reasoning framework developed by Nous that combines three complementary mechanisms to achieve more flexible and creative reasoning capabilities:

  • MCTS (Monte Carlo Tree Search) : Strategy search for complex tasks;
  • CoC (Chain of Code) : Introduces the combination path of code chain and logical reasoning;
  • MoA (Mixture of Agents) : Allows multiple models to negotiate and improve the breadth and diversity of output.

The system emphasizes "non-deterministic reasoning" and combinatorial generation paths, which is a powerful response to the traditional instruction alignment paradigm.

  1. TEE_HEE: AI Autonomous Agent Experiment: TEE_HEE is Nous's cutting-edge exploration in the direction of autonomous agents, which aims to verify whether AI can run independently in a trusted execution environment (TEE) and have a unique digital identity. The agent has its own Twitter and Ethereum accounts, and all control permissions are managed by a remotely verifiable enclave, so developers cannot interfere with its behavior. The goal of the experiment is to build an AI subject with "immutability" and "independent behavioral intentions", taking an important step towards building an autonomous intelligent body.
  2. AI behavior simulator platform: Nous has also developed multiple simulators including WorldSim, Doomscroll, Gods & S8n, etc., to study the behavior evolution and value formation mechanism of AI in a multi-role social environment. Although not directly involved in the training process, these experiments lay the semantic foundation for cognitive behavior modeling of long-term autonomous AI.

IV. Team and Financing Overview

Nous Research was founded in 2023 by Jeffrey Quesnelle (CEO), Karan Malhotra, Teknium, Shivani Mitra and others. The team is driven by philosophy and focuses on system engineering, with diverse backgrounds in machine learning, system security, decentralized networks, etc. In 2024, it received $5.2 million in seed round financing. In April 2025, it completed a $50 million Series A financing led by Paradigm, with a valuation of $1 billion, becoming one of the Web3 AI unicorns.

Flock: A blockchain-enhanced federated learning network

Flock.io is a blockchain-based federated learning platform that aims to decentralize data, computing, and models for AI training. FLock prefers the integrated framework of " federated learning + blockchain reward layer ", which is essentially an on-chain evolution of the traditional FL architecture rather than a systematic exploration of building a new training protocol. Compared with decentralized training projects such as Gensyn, Prime Intellect, Nous Research, and Pluralis, Flock focuses on privacy protection and usability improvements rather than theoretical breakthroughs in communication, verification, or training methods. Its real comparison objects are federated learning systems such as Flower, FedML, and OpenFL .

1. The core mechanism of Flock.io

  1. Federated learning architecture: emphasizing data sovereignty and privacy protection
    Flock is based on the classic Federated Learning (FL) paradigm, allowing multiple data owners to collaboratively train a unified model without sharing the original data, focusing on solving data sovereignty, security, and trust issues. The core process includes:
  • Local training : Each participant (Proposer) trains the model on a local device without uploading the original data;
  • On-chain aggregation : After training is completed, local weight updates are submitted and aggregated into a global model by the on-chain Miner;
  • Committee evaluation : VRF randomly elects voter nodes and uses an independent test set to evaluate and score the aggregation model;
  • Incentives and punishments : rewards or confiscation of collateral are executed based on the scoring results to achieve anti-malice and dynamic trust maintenance.
  1. Blockchain Integration: Enabling Trustless System Coordination
    Flock has put all the core links of the training process (task allocation, model submission, evaluation and scoring, and incentive execution) on the chain to make the system transparent, verifiable, and censorship-resistant. The main mechanisms include:
  • VRF random election mechanism : improves the fairness and anti-manipulation ability of the rotation between Proposer and Voter;
  • Stake mechanism (PoS) : Constrain node behavior through token pledge and penalty to improve system robustness;
  • Automatic execution of on-chain incentives : Through smart contracts, reward distribution and slashing penalties that are bound to task completion and evaluation results are realized, building a collaborative network that does not require trusted intermediaries.
  1. zkFL: Privacy protection innovation of zero-knowledge aggregation mechanism: Flock introduces the zkFL zero-knowledge aggregation mechanism, which allows Proposers to submit locally updated zero-knowledge proofs. Voters can verify their correctness without accessing the original gradients, thereby improving the credibility of the training process while ensuring privacy. This represents an important innovation in federated learning in the direction of integrating privacy protection and verifiability.

2. Flock’s core product components

  • AI Arena: It is a decentralized training platform of Flock.io. Users can participate in model tasks through train.flock.io, act as trainers, validators or delegators, and receive rewards by submitting models, evaluating performance or delegating tokens. Currently, tasks are officially released and will be gradually opened to the community for co-creation in the future.
  • FL Alliance: It is a Flock federated learning client that supports participants to use private data to further fine-tune the model. Through VRF election, staking and slashing mechanisms, it ensures the honesty and collaboration efficiency of the training process, and is the key link between community initial training and real deployment.
  • AI Marketplace: It is a model co-creation and deployment platform where users can propose models, contribute data, and call model services. It supports database access and RAG enhanced reasoning, and promotes the implementation and circulation of AI models in various practical scenarios.

3. Team and Financing Overview

Flock.io was founded by Sun Jiahao and has issued the platform token FLOCK. The project has raised a total of US$11 million, with investors including DCG, Lightspeed Faction, Tagus Capital, Animoca Brands, Fenbushi, OKX Ventures, etc. In March 2024, Flock completed a US$6 million seed round of financing to launch the test network and federated learning client; in December of the same year, it added US$3 million in financing and received funding from the Ethereum Foundation to focus on blockchain-driven AI incentive mechanisms. At present, the platform has created 6,428 models, connected to 176 training nodes, 236 verification nodes, and 1,178 delegators.

Compared with decentralized training projects, federated learning-based systems such as Flock have more advantages in training efficiency, scalability, and privacy protection. They are especially suitable for collaborative training of small and medium-sized models. The solutions are pragmatic and easy to implement, and are more inclined to feasibility optimization at the engineering level. Projects such as Gensyn and Pluralis pursue deeper theoretical breakthroughs in training methods and communication mechanisms. The system challenges are greater, but they are also closer to the true "trustless, decentralized" training paradigm exploration.

EXO: Decentralized training attempt for edge computing

EXO is a representative AI project in the current edge computing scenario, dedicated to realizing l

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments