Author: Haotian
The slogan "let users have sovereignty over their own data" originally carried the grand vision of the entire web3 era, but the challenges of on-chain data costs and open privacy have not been truly applied. Recently, due to the huge demand for data sources in the AGI large model training market, the soon-to-be-launched @withvana on Binance has proposed a data ownership solution with a DLP liquidity pool + TEE. What are the highlights?
1) Data sovereignty and personal data dividends are an old issue. In the web2 era, personal data experienced a big bang, but this led to platform monopolies and serious infringement of data privacy; in the early days of web3, many projects tried to realize this vision through smart contract management + decentralized storage + on-chain confirmation, but found that the high cost of chain storage and the transparent nature of on-chain data increased the challenge of protecting privacy.
It is for this reason that the exploration of "data ownership" through blockchain has been shelved due to technical bottlenecks.
2) With the advent of the AI era, diverse application scenarios such as AGI large model training, multi-modal training, data reasoning, and fine-tuning, especially machine learning and professional model training in vertical fields, require a large amount of non-public high-quality data as support, making the private data held by individuals and institutions a key resource for the development of AI. Therefore, allowing data to be used for AI learning has become a huge "demand side".
This is the premise for Vana's governance to solve the data sovereignty for users in the AI era, because the personal sensitivity to data ownership and privacy is generally low in the web2 environment, while the situation in the AI era where "data" is seen as an oil asset is completely different.
3) The solution of Vana, which is about to launch on the mainnet, mainly addresses two major issues: "data double-spending" and "privacy protection". Specifically, when data is publicly disclosed on the chain and can be arbitrarily copied and stored, it may lead to the loss of scarcity of the data and the loss of value capture ability.
Vana establishes a data market through the DLP (Data Liquidity Pool) data liquidity pool, using the Proof of Contribution mechanism to support the system operation.
Data owners can pledge the right to use their data to specific data pools, such as the medical case pool, the financial transaction pool, etc. After the pledge, they will receive DataDAO & data tokens as equity certificates. When AI training demand parties pay fees to the specific data pool, the fees will be automatically distributed to the holders of the certificates proportionally, and the data owners can also participate in the governance of the DataDAO, participating in the joint decision-making of DLP operation rules, pricing strategies, etc.
This data liquidity pool is similar to the common DeFi trading pool, and will manage the entire data validity verification, pool access permissions, token distribution, etc. through smart contracts. These are also the key to effectively solving the "data double-spending" problem, allowing the tokenization of data to confirm ownership, and the entire process to be recorded and coordinated by smart contracts to ensure the traceability of data use and the automation of revenue distribution.
Vana uses the TEE secure enclave environment to solve the data privacy issue. The technical characteristics of TEE can realize the "right of use" under the premise of data privacy protection, and can achieve the entire process from data stored on the user's own server, to data accessed through the DLP pool, and then to data training use, with "end-to-end" security protection provided by the TEE environment.
For example, if a user authorizes part of the data to the DLP pool, that part of the data will be in the TEE privacy environment, and customers who are granted the right to use the data for training will not be able to back up or steal the data.
The entire process can provide full-process recording and isolation environment processing by TEE, ensuring the privacy of the data while it is being used. The "usable but invisible" feature of TEE perfectly solves the privacy protection dilemma. In addition to these two major features, Vana gives data owners complete control over their data, and users can withdraw or modify data use authorization at any time.
In addition, Vana adopts a clear layered technical architecture: the bottom layer supports users to flexibly store data through lightweight self-custody or proxy custody; the middle layer uses DLP as the protocol layer, using smart contracts for fine-grained scheduling and management, including data circulation, permission control, revenue distribution and other core functions; the top layer connects various AI application scenarios, providing standardized interfaces for large model training, data analysis and other needs.
This layered design ensures data sovereignty and also achieves the scalability of use cases.
That's all.
Finally, I would like to add a point that Vana's solution for data ownership in the AI era is an "old story" of data confirmation that has been catalyzed by the AI scenario, and is an important part of the entire AI Narrative trend.
Vana's moat lies in the fact that once its entire data collection, use, and equity chain is connected, it may expand to a wider range of scenarios and fields. Don't forget that the grand vision of data ownership may permeate the entire blockchain and web3.