Rachel, Jinse Finance
On November 27, CZ posted on X that AI data labeling tasks are very suitable to be completed through blockchain, leveraging global low-cost labor, and through instant payment in cryptocurrencies, breaking geographical restrictions.
Data labeling refers to the manual or automated annotation of raw data (such as text, images, audio, etc.) to give it specific structured information. Labeled data is used to train machine learning or artificial intelligence models, such as labeling text with sentiment categories (positive, negative, neutral) is a form of data labeling. Using blockchain for AI data labeling is particularly suitable for data labeling scenarios that require high transparency, credibility, and distributed collaboration. This not only can improve the efficiency and quality of data labeling, but also creates new possibilities for global collaboration and data trading.
Currently, what are the quality projects in this track? What is the development prospect of the track?
The role of blockchain in AI data labeling
Blockchain is a decentralized distributed ledger technology with characteristics such as transparency, immutability, and traceability. These characteristics can solve the following problems in traditional data labeling methods:
Data authenticity and anti-tampering: Each labeling record is written to the blockchain and cannot be arbitrarily changed, ensuring the credibility of the annotation.
Task allocation transparency: The blockchain can record the distribution, execution, and review process of tasks, preventing unfair task allocation or result tampering.
Incentive mechanism: Using blockchain smart contract technology, data labelers can automatically receive cryptocurrencies or other rewards for completing tasks.
Data traceability: The source of each label, the labeler, and the reviewer's information can be tracked.
Application scenarios
Distributed labeling: Utilizing blockchain, data labeling tasks can be distributed to labelers around the world, improving data processing efficiency.
Quality review: Multiple labeling results are compared and reviewed through blockchain technology to ensure labeling accuracy.
Labeled data trading: Labeled data can be traded on the blockchain, and buyers and sellers do not need to worry about the integrity or authenticity of the data.
Privacy protection: Using blockchain to encrypt and store labeled data ensures the security of private data.
Related projects
OORT DataHub: Provides a decentralized data labeling service based on blockchain, using the Proof of Honesty algorithm for quality control. The platform distributes tasks, reviews data quality, and pays rewards through smart contracts, attracting global labelers to join and ensuring the transparency and privacy protection of labeled data.
The economic model of the project token is as follows:
Community rewards: Users can receive $OORT token rewards by participating in data labeling and analysis. They may also receive unique NFTs linked to their contributions, which provide additional benefits such as higher annual percentage yield (APY) rewards, equipment discounts, and DAO voting rights.
Task staking: Participants need to stake at least 210 $OORT tokens to demonstrate their commitment to the task, and the tokens will be returned and rewards will be issued upon task completion.
Revenue sharing: Some NFT holders can also share in future data sales revenue, further enhancing long-term earnings.
PublicAI: An AI ecosystem project on the Solana chain, aiming to connect data requesters and global labelers, rewarding participants through a cryptocurrency incentive mechanism, and using blockchain technology to record the details of the labeling process to ensure data security and privacy.
The economic model of the project token is as follows:
Community rewards: 10% of the Public tokens will be used for airdrop rewards for early user interactions, specifically, there are three ways to obtain airdrops: become an AI Builder: collect high-quality internet content; become an AI Validator: verify the collected content; become an AI Developer: use the verified dataset to train AI agents.
Token allocation: The project completed a $2 million seed round financing in January 2024, with investors including IOBC Capital, Foresight Ventures, Solana Foundation, Everstate Capital, and several well-known AI professors, but the specific details of the PublicAI token allocation are not yet clear.
Challenges faced
Currently, several factors are constraining the development of this track: first, AI data labeling requires relatively high computing and storage resources; second, project performance is limited by the scalability of the blockchain; third, technical standardization and regulation are not yet mature.
The second point may be the biggest challenge currently faced. Because AI data labeling and model training usually require a large amount of computing resources, while the computing capabilities of blockchain network nodes are limited. How to effectively integrate and utilize distributed computing resources to meet the computing needs of AI data labeling projects, while ensuring the decentralized nature of the blockchain, is a problem that urgently needs to be solved. It is reported that Greenfield, a subsidiary of Binance, is providing storage support for this track, and it is expected that more storage and computing resources will be put into practice in this field.