Data is the lifeblood of the AI era, essential for the evolution of AI models. However, the development of open-source AI models is often constrained by the lack of large, high-quality datasets. In contrast, closed-source AI developers reduce data collection costs by employing workers for intensive cognitive tasks, often paying less than $2 per hour. The benefits from these models are concentrated in the hands of a few, exacerbating inequalities among contributors.
In the Bittensor ecosystem, Subnet 33 aims to address the scarcity of high-quality datasets. How does SN 33 operate, and what are its current performance metrics?
Subnet 33 ReadyAI
Emission:2.51%(2024–10–13)
Github:https://github.com/afterpartyai/bittensor-conversation-genome-project
Team: The team behind SN33 comes from Afterparty AI, a startup founded in 2021. In September 2023, Afterparty AI secured $5 million in funding, led by Blockchange Ventures.
The Goal
SN33 aims to provide individuals and businesses with a low-cost, resource-efficient process for data structuring and semantic labeling. To achieve this, SN33 has developed innovations in the annotation and structuring of text data, transforming large volumes of raw conversational data into structured datasets that can be utilized by AI applications.
The Execution
SN33 integrates fractal data mining methods into Bittensor’s Validator-Miner framework to produce more comprehensive and reliable structured datasets.
The specific process includes:
Validator:
1. Pulls raw data from their own data store or CGP API.
2. Generates overview metadata for data ground truth.
3. Creats data windows and distributes them to Miners.
Miner:
1. Uses LLMs to process data windows and provides metadata and annotations.
2. Sends the metadata and annotated data back to the Validator.
Validator:
1. Compares the annotated data as a factual benchmark against the Miners’ outputs, scoring their results.
2. Pushes all metadata back to their own data store or the CGP API.
This approach not only increases the efficiency of data processing but also enhances the robustness of the data through cross-validation, preventing a single error or inaccuracy from significantly affecting the overall dataset.
The Product:
ReadyAI is a tool platform built on SN33, designed for AI application developers. Through ReadyAI’s services, developers can convert their desired raw data into structured data, optimizing their product experiences.
For example, the website offers a demo for the “Docs Wizards” scenario, where users can directly interact with an AI avatar of Afterparty’s CEO to learn more about SN33.
Additionally, for more complex scenarios, AI developers can use the Personas API to customize chatbots that meet their specific needs.
The Update
On September 12, 2024, ReadyAI announced a significant update, claiming that SN 33’s top-performing Miners delivered data annotation results that far exceeded the quality of human labeling on Amazon’s crowdsourcing platform, Mechanical Turk (MTurk), and even surpassed GPT-4o, all at a significantly lower cost.
In this experiment, 1,270 conversation samples were annotated using models from the Top 5 Miners of SN 33, and their performance was compared with MTurk workers and GPT-4o. The results showed that the Miners’ annotation accuracy was 71% higher than MTurk and 37% higher than GPT-4o. Additionally, the cost of annotation by Miners was drastically lower — about 1/660th of that on MTurk.
This experiment further supports the competitive advantage of using LLMs for data annotation tasks, demonstrating that SN 33’s services provide a more advanced alternative to GPT-4o in this domain.
The Conclusion
High-quality datasets are essential for training and fine-tuning AI models. SN 33 offers customized, high-quality datasets at a low cost, which is especially valuable for the development of open-source AI models. For small and medium-sized enterprises, this affordable solution enables access to quality structured data at lower costs, thereby driving AI applications and automation and enhancing their competitiveness. Such innovations allow more businesses to participate in AI development and benefit from its advancements.