How Grass draws a data map of the entire network for the AI era

08-24

This article is machine translated

Show original

In order to get a ticket to the AI finals, the giants are spending huge sums of money to purchase high-quality data.

In the AI era, data, like computing power, is a necessity. Reddit once revealed in its IPO prospectus that it has achieved a total revenue of $203 million through data licensing agreements signed with AI companies. Previously, Information reported that OpenAI is offering publishers an annual offer of $1 million to $5 million to get more news organizations to sign licensing agreements to train its AI models.

As for the protection of high-quality data, the most obvious example is the strict restrictions on APIs by X (formerly Twitter) starting in 2023. Musk, who once invested in OpenAI, most likely strictly restricted API access to X data because he knew that X was a data vault. To give the simplest example, although many people are accustomed to using AI star product Perplexity instead of Google for search, users can only retrieve the latest posts on X on the newly released Grok. To some extent, it can be said that X's data has become Grok's biggest moat.

Because of NVIDIA's existence, in the crypto community, it seems that people only care about GPU projects, but few people realize that data is also a key resource for the development of AI. No matter how powerful the computing power is, it cannot create miracles, and a good cook cannot cook without rice. Without sufficient data and high-quality data, the system cannot accurately understand, predict, and generate content, and thus cannot operate effectively in the complex real world.

If the AI applications represented by Chatgpt and the AI computing power represented by NVIDIA are called the face, then giants such as Google and Microsoft integrate a huge part of the content of the entire network and provide the inside of AI.

Data is not only the foundation of AI, but also the moat of AI. In this regard, Grass, which is deeply involved in the data layer, has already had a complete set of solutions.

Why Grass can become the decentralized Google

If I were to summarize Grass' core working philosophy in one sentence, it would be "from the masses, to the masses". Global users run Grass nodes, contribute idle bandwidth and relay traffic to capture real-time high-quality data from the entire Internet, and obtain token rewards.

Unlike traditional giants, Grass, as a leading encryption protocol for building projects in the data field, verifies, sorts and cleanses the massive amounts of Internet data it captures, turning them into high-quality data sets for sale. Any company or individual interested in training their own AI can benefit from this system.

As Ed Roman, managing partner at Hack VC, commented on Grass, this data acquisition may be superior to any one company's internal data acquisition efforts due to the power of a large network of incentivized nodes. This includes not only acquiring more data, but also acquiring data more frequently so that the data is more relevant and up-to-date. It is almost impossible to stop a decentralized army of data scrapers because they are fragmented in nature and do not reside within a single IP address.

Of course, when users contribute their idle bandwidth, they will naturally care about security issues. Grass also gave an explanation for this: when contributing excess bandwidth for data crawling, Grass will not use the user's computer or view any operations performed by the user on the computer. All it does is route Internet traffic through the user's IP address, which has nothing to do with the user's activities, which means it cannot access the user's personal data.

Grass's extremely low entry threshold has accumulated a huge user base. Less than a year after its launch, Grass already has more than 2 million active nodes, and now has more than 2.2 million active nodes. If the points of these more than 2 million node users are converted into corresponding tokens after Grass TGE, this may make Grass one of the most widely distributed airdrop projects and communities in history.

As one of the few products with good product-market fit (PMF), the Grass team not only demonstrated a strong technical foundation through stable operations, but also submitted a satisfactory answer to the market with technology and community cooperation. In July, the Grass Foundation released the UpvoteWeb dataset on Hugging Face, which contains 600 million top posts and comments on Reddit in 2024. It is the largest and latest open source Reddit dataset to date.

Reddit data is valuable for AI models because it is manually labeled through an upvote mechanism that ranks response quality and categorizes subreddits where experts express opinions. Google reached a deal with Reddit worth about $60 million to obtain data on Reddit for training its AI models.

The long-term goal of Grass is not limited to historical data. They intend to build a real-time contextual retrieval (LCR) engine that will utilize all Grass nodes to continuously crawl the Internet in parallel and around the clock, essentially turning Grass into a user-owned search engine, just like Google. In theory, any application that wants to retrieve real-time data or a large language model (LLM) can use LCR.

In order to ensure the validity of the data for training models, Grass has also introduced a ZK processor and a data ledger with functions similar to timestamps. The ZK processor ensures that the AI model is trained correctly, and the metadata retained in the data ledger ensures the authenticity and source of the captured data.

Not resting on its laurels, Grass will continue to iterate and upgrade both the chain and nodes in the future to enhance data transmission and quality and improve network effects.

Eric Schmidt, who served as Google CEO for 10 years, said in his 2024 speech at the School of Computer Science at Stanford University that he once thought that NVIDIA's CUDA was not a sophisticated programming language, but now CUDA is NVIDIA's greatest moat, and all large models must run on CUDA. This also makes NVIDIA a well-deserved infrastructure and industry standard in the AI industry.

Grass, which has a large number of users, is working hard to become an AI data layer, which means that Grass can provide support for more AI application scenarios. From natural language processing to image recognition to complex machine learning tasks, Grass's data layer can meet a variety of different needs and eventually become an industry infrastructure like NVIDIA.

As an ordinary user, I was very confused when I first came into contact with the data layer of AI and didn’t understand the necessity of it. With this curiosity, I carefully studied the design concept of Grass.

Because the Grass network needs to process and store massive amounts of data, especially real-time data, this scale of data processing requirements far exceeds the limits of traditional on-chain processing capabilities. If all data is processed directly on the main chain, even a network with a high TPS will face serious congestion problems, resulting in inefficiency.

Operations on the blockchain are usually accompanied by high costs. By processing and compressing a large amount of data off-chain and then submitting the processed results to the main chain, this greatly reduces the data burden on the chain and improves overall processing efficiency.

In addition, sensitive data also obtains additional privacy protection through the ZK processor. Through the recording function of the original data, Grass may also incentivize high-quality nodes.

After solving scalability, cost, and privacy issues through the AI data layer, Grass also launched an application version of the node, which uses less than 5% of the resources of the Chromium browser but handles 10 times more bandwidth than the Chrome extension.

In addition, Grass will also launch mobile versions and physical mining machines, which means that Android and IOS users can get rewards around the clock. Because of the convenience of mobile phones, it is very likely to attract many Web2 users, greatly expanding the Grass network. And because the IP addresses of computers and mobile phones are different, old users can also get an additional income from the mobile phone.

High-quality background with high PMF, amazing potential

Not only does the team's technology continue to be online and the community continues to pursue it, Grass, which already has a very high PMF, has a strong background of investors behind it.

Grass' parent company Wynd Network has previously received seed round financing from Polychain Capital and Tribe Capital. Not only that, Multicoin managing partner Kyle Samani, who has been attracting attention for his bet on Solana, participated in Wynd Network's Pre-seed round of financing.

It is worth noting that Hack VC also mentioned its investment in Grass in the article. It is not certain whether this means that Grass has a new round of financing that has not yet been disclosed.

Some community members expect that after the Grass TGE, when people realize that they can earn a lot of money passively through Grass without any risk, those who missed out on Grass will flock to it. This means that after the release, the potential and implicit demand coupled with the launch of the mobile app will increase the number of users dramatically. Based on the growth rate, attractiveness factor, and network effect, Grass may have 50 million users within a year.

As the crypto industry continues to disenchant new terms, revenue has become the focus of everyone's attention. According to the House of Chimera, the accumulated fees of various DePIN projects in the past three months, io.net and Helium have accumulated $500,000 each, and Akash has received $200,000.

The long-standing problem of obtaining real income does not seem to pose any challenge to Grass. Take the Reddit dataset UpvoteWeb mentioned above as an example. For a similar dataset, Google needs to pay 60 million to obtain it.

Compared with Bright Data, a leader in data crawling and proxy services in the Web2 track, the 600 million Reddit data sets obtained by Grass are worth a lot, whether it is calculated from the $0.001 per record of Data for AI or the $15,000 standard for 5 million requests obtained from Perplexity.

Not to mention Reddit's new policy of setting the API fee at $0.24 per 1,000 calls starting in July 2023. It should be noted that the above data is only when Grass has not launched tokens, mobile versions and dedicated mining machines. Once Grass forms a stronger network effect, all data will be updated again.

For systems with network effects, time is very important. Grass has established a sufficiently broad user base and technical accumulation in the fields of encryption and even AI. It is expected that its flywheel will develop further and become a real AI data layer.

Just like the beautiful vision conveyed by the TOUCH GRASS challenge held by Grass for the community, Grass will become the data map of the AI era, transfer the benefits of centralized enterprises to more users, and give Grass community members more time to Touch Grass.

Welcome to BlockBeats the BlockBeats official community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group: https://t.me/BlockBeats_App

Official Twitter account: https://twitter.com/BlockBeatsAsia

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Coingape

Trump Tariffs: U.S. To Raise EU Auto Tariffs To 25%

BlockTempo

The trade war has reignited! Trump announced a 25% tariff increase on EU cars next week, while offering exemptions to pressure EU companies to set up factories in the US.

BTC

1.64%

Decrypt

Pentagon Signs AI Deals With Google, OpenAI, Nvidia, Microsoft, Amazon and SpaceX