Deep Dive into the Web3 Data Realm: The Landscape, Layers, and Future of User Data

This article is machine translated
Show original

Author: FC@ SevenX Ventures

Compiled by: MetaCat

Preface

One of the buzzwords in tech in 2022 is Web3. Across platforms and sectors — from finance to social media — it’s a conversation starter.

While everyone has a different definition of Web 3, users and enthusiasts agree that it allows users to maintain ownership and sovereignty over their data. As our lives and work become more thoroughly digitized—meaning that all human activities will be presented in the form of data streams in the future—the transfer of data rights will become increasingly important.

Therefore, we believe that the Web3 data sector will be crucial to the new order and show great potential for development. From an entrepreneur's perspective, the decentralized web is an open, permissionless, distributed database. When it comes to data, there are many scenarios that need to be served. If you choose one of them, you will most likely be able to develop and grow in the world of Web3.

In today’s article, I will discuss the structure of the Web3 data sector, typical players in the existing Web3 data space, and future development trends. I will also share some investment ideas of the SevenX team.

The core idea of ​​this article is:

1. Web3 breaks down data silos and returns data rights to individual users. Users own their own data, allowing them to carry and use the data on the Internet.

2. The structure of the Web3 data field can be divided into four levels: 1) Data source 2) Data acquisition 3) Data query and index 4) Data analysis and application. The decentralization, scalability, speed and accuracy of each level are unmatched. We tend to judge the potential of a project by these main indicators.

3. As more and more data market participants gradually join and data itself accumulates, the value of data will increase significantly. Protecting user privacy while using data to generate greater value will continue to be an important priority.

4. One of the most important use cases in the Web3 data field in the future is to establish a decentralized reputation system. Based on this reputation system, it will be possible to unlock a variety of financial scenarios such as credit lending.

What is Web3 Data?

As human civilization progresses, more and more data is generated. This data either disappears in the long river of time or solidifies in the history we know. The advent of the Internet makes the latter - recording data - easier. Sharing can be achieved efficiently and on a large scale. In the process, the value of data has been explored, and its importance has become necessary for the whole society. In the cover story of the May 2017 issue of The Economist, data is defined as "the world's most valuable resource."

However, as more and more data is deposited on the Internet, a fundamental problem begins to emerge: the data generated by individuals creates value, but this data does not belong to individuals. Therefore, the value created is not distributed to individuals. People have been longing for a new order with greater autonomy. The Web3 data field is the answer.

So how does the Web3 data sector reshape the value of data? There are three main aspects:

It makes data transparent and tamper-proof.

In the Web2 world, applications obtain user data by providing free services, and then monopolize this data to make profits and build their own businesses. The data is stored on their central servers and cannot be accessed by the outside world. There is no way to know what data is stored, and in what way and granularity. In addition, if these applications are attacked or actively terminate their services, the user's data will be lost overnight. However, with blockchain technology as the underlying Web3 framework, the on-chain data is open, transparent, and cannot be tampered with, achieving user independence and security.

Break down data silos and increase interoperability.

With Web2, users must complete a registration process every time they use a new application. This is because each application has its own independent database and cannot be connected to other applications. User data is fragmented and cannot be reused or integrated across platforms. In the world of Web3, users only need one address to access and use various decentralized applications; every on-chain transaction of that address can use the corresponding data. Application permissions are unnecessary.

Better distribution of value through token economics.

How to distribute the value created by data to the individuals who generate it is an important question that Web3 needs to answer. At present, it seems that the ever-evolving token economy is the core path to achieve value redistribution.

The development of the crypto market has driven the development of the Web3 data field. On the supply side, the formation of the multi-chain universe, the booming development of NFTs, and the influx of new users have led to an exponential growth in user data; on the demand side, multi-dimensional demands have created countless opportunities around the acquisition and organization of data.

Web3 data structure diagram

The structure of the Web3 data track can be divided into four levels: 1) data source 2) data acquisition 3) data query and indexing 4) data analysis and application.

First layer: data source

Data sources are divided into on-chain data and off-chain data. On-chain data includes chain-related data (such as hashes and timestamps), transfer transactions, wallet addresses, smart contract events, and data in cache (such as data queued in the Ethereum memory pool). Data is maintained by a decentralized database, and reliability is guaranteed by blockchain consensus. In addition, storage is the main source of on-chain data. Currently, it is mainly concentrated in protocols such as IPFS, Arweave, and Storj.

Off-chain data mainly includes centralized exchange data, social media data, GitHub data and some typical Web2 data (such as PV, UV, DAU, MAU, downloads and search index).

Over the past two years, the variety and volume of data has grown exponentially, but Web3’s layer 1 still has three major problems:

1. Some public chains, such as Solana, use a light node model, which results in incomplete data on the chain.

2. Data congestion at the storage layer. My good friend REVA once uploaded her NFT work to IPFS, but when she wanted to call it, it took 2 hours to download a file of several hundred megabytes. However, there are already some projects in the market that are working on solving this problem, such as one of SevenX's portfolio: Menson Network. It is a decentralized CDN network that aggregates unused bandwidth resources through mining and distributes bandwidth in an open market. It accelerates file and streaming markets such as websites, videos, live broadcasts, and blockchain storage solutions. Currently, the Menson network already supports AR and IPFS, etc.

3. The legitimacy of off-chain data cannot be verified. In addition, it is necessary to broaden the data dimension.

Second layer: data collection

The main participants in this layer are node service providers. If you choose to obtain on-chain data by building your own nodes, it will take a lot of time, money, and technical skills. In the process, you may also face problems such as memory leaks and insufficient disk space.

Node service providers have greatly optimized this process. They provide the infrastructure of the entire data field and are therefore the first and most important participants in the system.

Currently, well-known service providers include Infura, Quicknode, Alchemy, and Pocket. When choosing a service provider, developers and entrepreneurs will mainly consider the number of covered chains, business models, and the diversity of additional services (are there services similar to CDN? Can Mempool data be accessed? Can private nodes be provided?). They will also consider whether the service is decentralized.

In November 2020, Infura did not run the latest version of the Geth client, and some special transactions on the client triggered errors; Infura went bankrupt and triggered a series of chain reactions. For example, mainstream trading platforms could not recharge Mention ERC-20 Tokens, and MetaMask could not be used. A simple comparison of the four node service providers is as follows:

On February 8 this year, Alchemy completed a financing of US$200 million with a valuation of US$10.2 billion; Infura's parent company ConsenSys also completed a financing of US$200 million last year with a valuation of US$3.2 billion; as of March 2022, Pocket's market value reached US$3.28 billion.

The third layer: data query and index

Market participants provide data query and indexing services. They parse and format raw data to make it easier to use.

The Graph

The Graph is a decentralized on-chain data indexing protocol. The mainnet was launched in December 2020 and currently supports data indexing of more than 30 different networks including Ethereum, NEAR, Arbitrum, Optimism, Polygon, Avalanche, Celo, Fantom, Moonbeam, Arweave, etc.

It is similar to traditional cloud-based service APIs, with the main difference being that the on-chain data index consists of decentralized index nodes. With the GraphQL API, users can access information directly through subgraphs, which is fast and efficient. The Graph has designed a GRT token mechanism to encourage multiple parties to participate in its network, including Delegators, Indexers, Curators, and Developers. The business process can be summarized as follows: users submit query requirements, indexers operate The Graph nodes, delegators pledge GRT to indexers, and curators use GRT to determine which subgraphs have query value. Business process summary: users make query requests, indexers run The Graph nodes, and clients pledge GRT tokens to indexers.

Covalent

Covalent provides a data query layer that allows users to quickly call data in the form of an API. Currently, it supports mainstream Layer2 networks such as Ethereum, BNB Chain, Avalanche, Ronin, Fantom, Moonbeam, Klayth, HECO, and SHIDEN.

Covalent not only supports all types of data queries on the blockchain - such as transactions, balances, and blog types - but also supports data queries for specific protocols. The most distinctive feature of Covalent is that it provides users with the ability to perform cross-chain queries. By modifying the Chain ID, the same results as the Graph subgraph can be obtained without rebuilding the index. The project also has its own token CQT, which holders can use to stake and vote on events such as database updates.

SubQuery

SubQuery provides data query services for Polkadot and Substrate projects. This allows developers to focus on their core use cases and frontends instead of wasting time building custom backends for data processing. Inspired by The Graph, SubQuery also uses the graphQL language, and its token economics is similar to The Graph: There are three types of roles in the SubQuery system: 1) Consumers 2) Indexers 3) Delegators. In order to incentivize indexers to participate in work more honestly, consumers publish tasks, indexers provide data, and delegators pledge their idle SQT tokens to indexers.

Blocknative

Blocknative focuses on the function of retrieving real-time transaction data and provides a browser for mempool data, including address tracking, internal transaction tracking, failed transaction information, and replacement transaction (acceleration or cancellation) information. Since the memory pool data does not match the final block data, the real-time requirement is very high. Blocknative's live query is more immediate and accurate.

Blocknative focuses on the retrieval function of real-time transaction data. It also provides a memory pool data browser, such as address tracking, internal transaction tracking, unsuccessful transaction information, and replacement transaction (acceleration or cancellation) information. Since mempool data is consistent with the final block data, it has high real-time requirements. The field query provided by Blocknative is more direct and accurate.

Koii

Koii is a decentralized ecosystem for creators, designed to help them own content and therefore gain content value. Anyone can use the Koii system to earn token rewards by deploying tasks, running nodes, or producing/registering content. The system will reward participants based on data processed with real traffic proof, realizing the cycle of the "attention economy". In addition, the Atomic NFT developed by the Koii team realizes the preservation and confirmation of NFTs and their meta information on the same chain. Therefore, all content on the Koii platform is generated according to the same standards. If this scalability can successfully encourage content accumulation, Koii will become an important content data indexing platform.

The items listed below provide not only data query and indexing services, but also analysis layer products.

Dune Analytics

Dune Analytics is a comprehensive Web3 data platform that can query, analyze, and visualize massive amounts of on-chain data. It parses on-chain data stored in a key-value database and then inputs it into a PostgreSQQL relational database. Users do not need to write scripts, as long as they can query using simple SQL statements.

Dune Analytics encourages data sharing. By default, all queries and data sets are public. Users can directly copy other people's dashboards for reference. Currently, the best data analysts in the Web3 field gather here. Dune Analytics currently supports data queries for Ethereum, Polygon, Binance Smart Chain, Optimism, and Gnosis Chain. In February of this year, it completed a $69.42 million Series B financing with a valuation of $1 billion, officially entering the ranks of unicorns.

Flipside

Like Dune Analytics, Flipside also uses visualization tools and automatically generated APIs to allow users to query complex data through simple SQL statements. Users can also copy and edit SQL queries generated by others. Flipside actively cooperates with leading crypto projects to incentivize on-demand analysis through structured bounty programs and guidance, helping projects quickly gain the data insights they need to develop.

Currently, Flipside supports public chain networks such as Ethereum, Solana, Terra, and Algorand. On April 19, Flipside announced the completion of a $50 million financing.

DeBank

DeBank is a DeFi portfolio tracker. With DeBank, users can track and manage the DeFi applications they have interacted with in one place. They can also track address balances and changes, asset allocations, authorization status, rewards to be received, loan positions, and more. Currently, they support 1,147 protocols on 27 networks.

Last April, DeBank officially launched its own OpenAPI plan. The API will include access to all protocols on the specified chain, as well as a list of all chains supported by the specified protocol and their contract addresses. It will also include a real-time investment portfolio for the specified protocol. With 28 APIs, all institutional and individual developers can apply to become official partners and access DeBank's DeFi analysis data in real time. Currently, imToken, TokenPocket, Maizi Wallet, Mask, Hashkey Me, OneKey, and Zerion are all using DeBank's API. DeBank has successfully expanded its market from data applications to data query and indexing.

CyberConnect

CyberConnect is a decentralized social graph protocol. It aims to create scalable and standardized social graph modules, allowing developers to transfer modules to new applications through simple code, saving time and money. The protocol also enables end users to treat their social data as portable assets that can be easily moved to new applications. In this way, CyberConnect eliminates barriers between Web2 universal platforms.

RSS3

RSS3 is the next generation data indexing and distribution protocol derived from the RSS protocol. It allows users to generate RSS3 files based on addresses and associate their Twitter, Mirror, Instagram and other social platforms with these files. These files will synchronize users' assets and content in real time. The data is then stored in the RSS3 decentralized network. With the user's consent, developers can access the content posted by users on many platforms through various API interfaces, and filter and display various information according to the needs of the program.

Developers can call users through different API interfaces and publish to different platforms with user permission.

Go+

Go+ is committed to building a "security data layer" in Web3 based on its own "security engine". By entering the token contract address, users can access more than 30 security monitoring services such as contract security, transaction security, and information security of mainstream networks such as Ethereum, BNB chain, Polygon, Avalance, and Arbitrum.

In addition, developers and downstream applications can also use Go+'s security APIs to create a more secure crypto ecosystem. These security APIs include token detection, NFT detection, real-time risk warning, dApp contract security, interaction security, etc.

The emergence of Go+ reveals the trend of the Web3 data stack: verticalization of data indexing. According to SevenX's research, due to the increase in Web3 projects and the complexity of user behavior, the market has more data scenarios. Non-universal data and growing user demand - users who are both data users and data providers - are the characteristics of these scenarios. For these vertical scenarios, there will be more and more data indexing, querying, and analysis services in the future.

Space and Time

Space and Time is the first decentralized data warehouse that uses a patented new cryptographic technology called Proof of SQL™. It produces verifiable tamper-proof results, allowing developers to join trustless on-chain and off-chain data in a simple SQL format and load the results directly into smart contracts. As a result, developers can use Space and Time to connect on-chain and off-chain data, transform data using SQL, issue queries to the API, and send trustless data to smart contracts.

Kwil

Kwil is building the first permissionless SQL database for the decentralized internet on top of the Arweave permaweb. Kwil Social and Kwil DB provide a novel architecture to manage social graphs as well as decentralized relational database systems. Web3 social, decentralized science, decentralized analytics, and permissionless data ecosystems can all benefit from their infrastructure.

The fourth layer: data analysis and application

This layer directly targets consumer-oriented users (in a broad sense, not just individual users) and delivers ready-to-use data products. They use their own data methodology to present data value to users. Participants in this layer can be roughly divided according to data types, including on-chain transactions, token prices, DEFI protocols, DAOs, NFTs, security, social, and so on. Of course, more and more project departments focus on a certain type of data, aiming to become a specialized data analysis platform.

Blockchain Explorer

This may be the earliest data application layer product, allowing users to directly search for on-chain information through the website. The accessible data includes on-chain data, block data, transaction data, smart contract data, address data, etc.

Glassnode & Messari & CoinMetrics.io

Blockchain data and information providers enable investors to access on-chain data and transaction intelligence from different perspectives. They also create market analysis insights and research reports.

CoinGecko and CoinMarketCap

This is a token analysis tool for observing and tracking token prices, trading volumes, market caps, and more.

Token Terminal

The project allows users to analyze DeFi projects using traditional financial indicators such as P/S ratio, P/E ratio, and protocol revenue. It also currently supports analysis of the NFT trading market.

DeFiLlama

The DeFi TVL data analysis platform supports 107 first- and second-layer networks and nearly a thousand DeFi protocol TVLs. Networks and protocols can be viewed through the lens of different indicators and periods. Currently, DeFiLlama also supports the analysis of NFTs, focusing on the transaction volume and collection types of different trading markets on different chains.

NFTSCan & NFTGO

A data platform focusing on the NFT market, providing services such as data analysis and whale wallet monitoring. It aims to help users better assess the value of NFT projects and assets, enabling them to make informed investment decisions.

Nansen

Label is probably the word that best describes Nansen. To make it easier for users to discover signals and new investment opportunities, Nansen has examined the behavior of more than 50 million Ethereum wallet addresses. The analysis combines on-chain data with a database containing millions of labels. One of the most well-known projects in the Web3 data analysis and application layer is Nansen. In December last year, it completed a $75 million financing at a valuation of $750 million.

Nansen analyzes more than 50 million Ethereum wallet addresses, combining on-chain data with a database containing millions of tags. This enables users to find useful patterns and new investment opportunities. Nansen is currently one of the most promising projects in the data analysis and application layer. In December last year, it completed a $75 million round of financing at a valuation of $750 million.

Chainalysis

Chainalysis, known as the "FBI on the chain", was founded in 2014 and is an enterprise data solutions company that monitors and analyzes on-chain data to help customers - such as governments, cryptocurrency exchanges, international law enforcement agencies and banks - comply with regulations, assess risks and identify illegal activities. In June last year, Chainalysis announced that it had received $100 million in Series E financing, with a valuation of $4.2 billion.

Footprint

Footprint is a comprehensive data analysis platform for discovering and visualizing blockchain data. Compared with other applications, Footprint is intuitive and friendly to novice users. The platform provides rich data analysis templates and supports one-click forking, helping users to easily create and manage personalized dashboards. At the same time, Footprint marks wallet addresses and their activities on the chain. Users can access data indicators with rich dimensions and can use them to make investment decisions.

Zerion and Zapper

The earliest DeFi portfolio trackers and managers, these projects have also added support for NFT assets.

DeepDao

DeepDAO is a comprehensive data platform focusing on various DAO organizations. Users can easily view treasury quantities and changes, treasury token distribution, governance token holdings, active members, proposals, voting status, etc. DeepDAO also provides many tools for creating and managing DAOs.

There are many other applications in this layer, which are not listed here one by one.

In fact, SevenX has been paying attention to the data field for a long time and has invested in Debank, Zerion, Footprint, Koii, DeepDao, RSS3, CyberConnect, Go+, etc. In the process of screening projects, we have gained some wisdom, which we would like to share briefly here.

In general, application layer traffic is no longer a core barrier. Due to improvements in ease of use and update speed, users can migrate quickly at any time. Products with data provision capabilities and closed-loop data channels are more powerful and competitive than ever before.

How do we evaluate a project?

Here are 5 key factors.

1. Scenario

(1) Is there a demand, and is the demand mature enough?

For new projects, it is essential to analyze the maturity of user needs. Take GoPlus as an example. In the DeFi world, "sense of security" is essential. After various security incidents, this concern has been activated and gradually matured. So now everyone would rather take an extra step or spend money in exchange for a safer experience. This is a mature demand that deserves a project to answer.

(2) Should we build the end-user side first or the protocol first?

We believe that when the scenario needs are not fully stimulated, we should first design consumer-oriented products and find the pain points of consumers. For example, GoPlus first made the Go Pocket wallet, which is similar to a model room. Because of the model room, other partners understand the ability of the problem that the product is solving. These experiences will be beneficial when the product needs to be improved in the future.

2. Data Capabilities

Data acquisition and structuring are basic skills, but having data capabilities based on industry knowledge is key.

3. End-user product capabilities

The strength of a consumer-facing product is primarily determined by 1) whether it meets a pressing need of the audience and 2) whether the project is intuitive for the user.

4. Business development capabilities

Business expansion is a complex decision-making process. We will consider whether you can acquire benchmark users or long-tail users based on product positioning.

5. Team Background

A successful team should have the following qualities:
(1) A background in the field of vertical track web2
(2) Ability to independently operate projects
(3) Open source community experience
(4) The ability to learn quickly and without bias

Unlocking the Possibilities of Web3 Data

As on-chain analysis increases, the anonymity of blockchains is gradually disintegrating. For example, the transaction addresses and behaviors of large users can be tracked based on Nansen tags; the activities and organizations in which an address participates can also be identified through on-chain addresses. Nansen recently stated that it has tagged more than 100 million wallets, which emphasizes the need for user privacy.

Current privacy solutions mainly include privacy coins, privacy computing protocols, privacy transaction networks, privacy applications, etc.

If we want to protect our on-chain transactions or selective activity discovery, we can choose privacy computing protocols, such as Oasis Network. Commonly used technologies include zero-knowledge proof, secure multi-party computing, federated learning based on modern cryptography, trusted execution element (TEE), etc.

However, the current availability of privacy protocols is relatively limited, and most are still in the development stage. The most popular example is the secret network. The public chain has launched applications such as the cross-chain bridge Secret Bridge, the privacy DeFi protocol Sienna Network, the privacy transaction protocol Secret Swap, and the Bitcoin trustless privacy solution-Shinobi Protocol.

Since the second half of 2021, many top VCs and developers have begun to flock to the privacy track. I believe that as this market gradually develops, users will be able to follow the basic principles of blockchain and use data to generate greater value. In this way, users can find a balance between anonymity and privacy.

Finally, let me briefly talk about our judgment on a market trend: building a decentralized reputation system through multi-dimensional data vectors. It is one of the most important use cases in the Web3 data field. Based on the reputation system, various financial scenarios such as credit become possible.

Lending has always been an important part of the DeFi ecosystem. The main products on the market are mortgage loans (usually overcollateralized) and flash loan. Credit loans that do not (completely) rely on collateral are the most important direction in the future of this field because they will create a freer trading market.

However, the biggest obstacle to introducing credit lending in DeFi is that lenders only face one address and cannot verify the borrower's credit record. One way to solve this problem is to introduce off-chain credit data on the chain. But how to ensure the authenticity of off-chain data during the on-chain process has not been answered.

Thanks to the gradual improvement of the on-chain identity system, as well as the increase in data and data analysis tools available for analysis, the creation, contribution, earning, and ownership of users on the chain can gradually accumulate into the user's identity. This can be used to confirm whether the credit assessment of the address is valid. For example, Lens Protocol, supported by AAVE, is using NFTs to process data and lay the foundation for unsecured credit loans on the chain.

As on-chain identity systems and data analysis tools improve, users’ creations and assets will be able to accumulate in their reputation. As a result, lenders may verify the credit assessment of borrowers. The AAVE-backed Lens protocol does just that — using NFTs to manage data and lay the foundation for unsecured credit loans on-chain.

Conclusion

While tens of billions of dollars have been raised for unicorn projects, the Web3 data sector is still in its infancy. As you stand in the flood of on-chain applications, remember that every bit and every byte defines what kind of Web3 citizen you are. We need to find a new order and paradigm to thrive in this new world.

refer to

https://www.theblockresearch.com/a-data-dive-into-pocket-network-123733

https://www.theblockresearch.com/alchemy-company-intelligence-115930

https://ath.mirror.xyz/w2cxg5OP1OEcqvSgsEjSSyKRJhPmam0w-fXGogiG-8

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments