The Data Revolution: Uncovering the Panorama of Decentralized Storage

08-04

This article is machine translated

Show original

TL;DR

Decentralized storage means that a single entity or some groups use their idle storage space as a unit of storage network, thereby bypassing the absolute control of data by centralized institutions such as AWS and Google Cloud.

Low storage cost, data redundancy backup and token economy are also characteristics of decentralized storage, and a large number of Web3 applications are built on this infrastructure.

As of June 2023, the overall storage capacity of decentralized storage has exceeded 22,000 petabytes (PB), while the network utilization rate is only about 20%. This indicates that there is a lot of room for growth in the future.

Among the existing storage capacity, about more than 80% of the storage capacity is provided by Filecoin, which is undoubtedly the leader in this field. Filecoin has also launched projects like Filecoin Plus and FVM to incentivize developers and foster ecosystem growth.

With the rise of fields such as artificial intelligence and full-chain games, the decentralized computing and storage track is expected to usher in exciting growth opportunities.

1. Why do we need decentralized storage

Cloud storage services like Dropbox and Google Cloud have changed the way we store and share large files such as videos and photos online. They allow anyone to store terabytes of data at a fraction of the cost of buying a new hard drive and access files when needed from any device. However, there is a problem: users must rely on the management system of centralized entities, which may revoke their access to accounts at any time, share their files with government agencies, or even delete files without reason. This storage model leads to unclear ownership of data assets and effectively enables large Internet companies such as Amazon and Google to monopolize data. Furthermore, downtime of centralized services can often have disastrous consequences.

The storage field is actually natively suitable for decentralized applications. First, it addresses issues such as user data privacy and ownership. Files stored on a decentralized file service are free from the influence of any centralized authority, such as government agencies that may wish to control and censor content. It also prevents private companies from doing things like censoring services or sharing files with law enforcement.

Secondly, the storage of massive data in the index itself requires a distributed system to achieve. Existing centralized cloud services also use distributed solutions, such as Spanner and TiDB. It can be said that distributed does not mean decentralization, but decentralization must be distributed. Different from the centralized storage architecture, the existing decentralized scheme divides the data into small blocks and stores them on various nodes around the world after encryption. This process creates multiple copies of the data and improves the recovery ability for data loss .

Third, the resource consumption of invalid mining is solved. The large amount of power consumption caused by Bitcoin's PoW mechanism has been criticized for a long time, and decentralized storage gives users the opportunity to become nodes, and can mine and make profits through idle storage resources. A large number of storage nodes also means cost reduction. It is foreseeable that decentralized storage cloud services can even eat up a part of the Web2 cloud service market share. Today, with the continuous upgrade of network bandwidth and hardware services, this is an extremely huge market. According to the forecast of Business Research , the global database market will exceed 120 billion US dollars in 2028.

2. Decentralized storage architecture

In order to create truly decentralized applications, decentralized databases should also be included in the Web3 application architecture. It can be broken down into four main components: smart contract layer, file storage, database, and general infrastructure layer.

The smart contract layer is equivalent to Layer1, while the general infrastructure layer includes but is not limited to oracle machines, RPC, access control, identity, off-chain computing, and index networks.

Although not obvious to the user, both the file storage and database layers play a vital role in the development of Web3 applications. They provide the necessary infrastructure for storing structured and unstructured data, which is a requirement for various applications. Due to the nature of this report, these two components are described in further detail below.

2.1 Decentralized File Storage Networks (DFSNs)

DFSNs like Filecoin, Arweave, and Crust are primarily used for persistent storage of unstructured data that does not follow a predefined format and does not require frequent updates or retrievals. Therefore, DFSNs are commonly used to store various static types of data, such as text documents, images, audio files, and videos.

One advantage of this type of data in a distributed storage architecture is the ability to leverage edge storage devices or edge data centers to move data storage closer to endpoints. This storage method provides lower network communication costs, lower interaction latency, and lower bandwidth overhead. It also provides greater adaptability and scalability. For example, in Storj's case, 1TB of storage costs $4.00 per month, while Amazon S3, the market-leading enterprise cloud storage solution, charges about $23.00 per month for the same amount of data.

Users benefit from more cost-effective storage options compared to traditional centralized cloud storage solutions. The decentralized nature of DFSNs also provides greater data security, privacy, and control, as data is distributed among multiple nodes or miners rather than being stored in a single centralized server.

2.2 Decentralized database

The limitations of storing unstructured files in DFSNs are obvious, especially in terms of efficient data retrieval and updating. These architectures are suboptimal for data that needs to be updated frequently. In this case, traditional databases such as MySQL and Redis are more developer-friendly options that have been extensively optimized and tested in the Web 2.0 Internet era.

Especially in applications such as blockchain games and social networks, storing structured data is an unavoidable requirement. Traditional databases provide an efficient way to manage large volumes of dynamic data and control access to it. They provide features such as indexing, querying, and data manipulation that are critical to applications that rely on structured data. Therefore, whether it is based on DFSNs or self-developed underlying storage. A high-performance, high-availability decentralized database is a very important branch of the storage field.

3. Technical Analysis of DFSNs

3.1 General

Among the current Web3 projects, decentralized file storage projects (DFSNs) can be roughly divided into two categories. The first category includes projects based on IPFS implementations such as Filecoin and Crust. The second category includes projects like AR, Sia, and Storj, which have their own underlying protocols or storage systems. Although they have different implementation methods, they all face the same challenge: efficient data storage and retrieval while ensuring truly decentralized storage.

Since blockchains are not inherently suitable for storing large amounts of data on-chain, the associated costs and impact on block space make this approach impractical. Therefore, an ideal decentralized storage network must be able to store, retrieve and maintain data, while ensuring that the work of all participants in the network is incentivized and respects the trust mechanism of the decentralized system.

We will evaluate the technical characteristics and advantages and disadvantages of several mainstream projects from the following aspects:

Data storage format: The storage protocol layer needs to determine how the data should be stored, such as whether the data should be encrypted, and whether the data should be stored as a whole or divided into small hashed chunks.

Data replication backup: Need to decide where to store the data, such as how many nodes should hold the data, whether all data should be replicated to all nodes, or whether each node should receive different fragments to further protect data privacy. Data storage format and dissemination will determine the probability of data availability on the network, i.e. durability in the event of a device failure over time.

Long-term data availability: Networks need to ensure that data is available when and where it should be. This means designing incentives that prevent storage nodes from deleting old data over time.

Proof of stored data: Not only does the network need to know where data is stored, but storage nodes should be able to prove that they actually store the data they want to store in order to determine their share of incentives.

Storage Price Discovery: Nodes are expected to pay for persistent storage of files.

3.2 Data storage and replication

As just mentioned, Filecoin and Crust use IPFS as the network protocol and communication layer for transferring files between peers and storing them on nodes. The difference is that Filecoin uses erasure coding (EC) to achieve scalability of data storage. Erasure coding (EC) is a data protection method that divides data into fragments, expands and encodes redundant data blocks, and stores them in different locations such as disks, storage nodes or other geographical locations. EC created a mathematical function to describe a set of numbers, allowing them to be checked for accuracy and restored if one of the numbers was missing.

The basic equation is n=k+m, where the total data block is equal to the original data block plus the check block.

Calculate m check blocks from k original data blocks. Store these k+m data blocks on k+m hard disks respectively, and any m hard disk failures can be tolerated. When a hard disk failure occurs, all original data blocks can be calculated by randomly selecting k surviving data blocks. Similarly, if k+m data blocks are scattered on different storage nodes, m node failures can be tolerated.

When new data is to be stored on the Filecoin network, users must connect to a storage provider through the Filecoin storage market, negotiate storage terms, and then place a storage order. At the same time, the user must decide which type of erasure coding to use and the replication factor within it. With erasure coding, data is broken down into constant-sized pieces, each piece is expanded and redundant data is encoded, so only a subset of the pieces is needed to reconstruct the original file. The replication factor refers to how often data should be replicated to more storage sectors of the Storage Miner. Once the storage miner and the user agree on the terms, the data is transferred to the storage miner and stored in the storage miner's storage sector.

Crust's data storage method is different, they copy the data to a fixed number of nodes : when submitting a storage order, the data is encrypted and sent to at least 20 Crust IPFS nodes (the number of nodes can be adjusted). At each node, the data is broken into many smaller pieces that are hashed into a Merkle tree. Each node keeps all the fragments that make up the complete file.

Arweave also uses full file copying , but Arweave takes a somewhat different approach. After a transaction is submitted to the Arweave network, the first individual node stores the data as a block on blockweave (Arweave's blockchain representation). From there, a very aggressive algorithm called Wildfire ensures that data is quickly replicated across the network, because in order for any node to mine the next block, they must prove they have access to the previous block.

Sia and Storj also use EC to store files . Actually Crust's implementation: 20 complete datasets stored on 20 nodes is very redundant, but also makes the data very durable. But from a bandwidth point of view, this is very inefficient. Erasure coding provides a more efficient way to achieve redundancy by increasing data durability without a large bandwidth impact. Sia and Storj directly propagate EC shards to a certain number of nodes to meet certain durability requirements.

3.3 Data Storage Proof and Incentive

The reason for explaining the data storage format first is that the choice of technical path directly determines the difference between the proof and incentive layers of each protocol. i.e. how to verify that the data to be stored on a specific node is actually stored on that specific node . Only after verification occurs can the network use other mechanisms to ensure that data remains stored over time (i.e. storage nodes do not delete data after the initial storage operation).

Such mechanisms include algorithms to prove that data was stored for a specific period of time, financial incentives to successfully complete the duration of a storage request, and suppression of outstanding requests, among others. This section describes the storage and incentive protocols for each protocol.

3.3.1 Filecoin

On Filecoin, before receiving any storage requests, storage miners must deposit collateral into the network as a commitment to provide storage to the network. After completion, miners can provide storage and price their services on the storage market. At the same time, Filecoin innovatively proposed PoRep and PoSt to verify the storage of miners.

Proof of Replica (PoRep) : Miners need to prove that they store a unique copy of the data. The unique encoding ensures that two storage transactions of the same data cannot reuse the same disk space.

Proof of Time and Space (PoSt) : During the life cycle of a storage transaction, storage miners need to prove that they are continuously allocating dedicated storage space to store the data every 24 hours.

After submitting the proof, the storage space provider will get FIL rewards. If they fail to keep the promise, their pledged tokens will be confiscated (Slash).

But over time, storage miners need to consistently prove their ownership of the stored data by running the algorithm on a regular basis. However, consistency checks like this require a lot of bandwidth. The novelty of Filecoin is that in order to prove that data is stored over time and reduce bandwidth usage, miners use the output of the previous proof as the input of the current proof, generating copy proofs in order. This is performed over a number of iterations representing the duration for which the data is to be stored.

3.3.2 Crust Network

Like Filecoin, the relationship between Crust and IPFS is also the relationship between the incentive layer and the storage layer . In the Crust Network, nodes must also deposit collateral before accepting storage orders on the network. The amount of storage space a node provides to the network determines the maximum amount of collateral that is staked and allows the node to participate in creating blocks on the network. This algorithm is called Guaranteed Proof of Stake (GPoS), and it guarantees that only nodes with a stake in the network can provide storage space.

Unlike Filecoin, Crust's storage price discovery mechanism relies on DSM , and nodes and users will automatically connect to the decentralized storage market (DSM), which will automatically choose which nodes to store user data on. Storage prices are determined based on user requirements (such as storage duration, storage space, and replication factor) and network factors (such as congestion congestion). When a user submits a storage order, the data will be sent to multiple nodes on the network, which use the machine's Trusted Execution Environment (TEE: Trusted Execution Environment) to split the data and hash the pieces. Since the TEE is a closed hardware component, not even accessible by the hardware owner, node owners cannot rebuild the file themselves.

Once a file is stored on a node, a work report containing the file's hash is published to the Crust blockchain along with the node's remaining storage. From here to ensure data is stored over time, the network periodically requests random data checks: in the TEE, a random Merkle tree hash is retrieved along with the relevant file fragment, which is decrypted and rehashed. The new hash is then compared to the expected hash. The implementation of this proof of storage is called Meaningful Proof of Work (MPoW: Meaningful Proof of Work).

GPoS is a PoS consensus algorithm that defines quotas with storage resources. Through the workload report provided by the first-layer MPoW mechanism, the storage workload of all nodes can be obtained on the Crust chain, while the second-layer GPoS algorithm calculates a Staking quota for each node based on the node workload. According to this amount, the PoS consensus is carried out. That is , the block reward is proportional to the mortgage amount of each node, and the upper limit of the mortgage amount of each node is limited by the storage capacity provided by the node.

3.3.3 Arweave

Compared with the previous two pricing models, Arweave uses a very different pricing model. The core is that on Arweave, all stored data is permanent, and its storage price depends on the cost of storing data on the network for 200 years.

The bottom layer of the Arweave data network is based on Bockweave's block generation model. A typical blockchain, such as Bitcoin, is a single-chain structure, i.e. each block will be linked to the previous block in the chain. In blockweave's network structure, each block will also be linked to a random recall block in the previous history of the blockchain on the basis of the previous block. Recalling blocks is determined by the hash value of the previous block in the block history and the height of the previous block, which is deterministic but unpredictable. When a miner wants to mine or verify a new block, the miner needs to have access to the information of the recalled block.

Arweave's PoA adopts the RandomX hashing algorithm, and the miner's block probability = the probability of randomly recalling blocks * the probability of being the first to find the hash . Miners need to find a suitable hash value through the PoW mechanism to generate a new block, but the random number (Nounce) depends on the previous block and any random memory block information. The randomness of recalling blocks encourages miners to store more blocks, thereby obtaining relatively high calculation success rates and block rewards. PoA also incentivizes miners to store "scarce blocks", that is, blocks that are not stored by others, to obtain greater probability and rewards of block generation.

When the one-time charge means that subsequent data reading is a free service, sustainable means that users can access data at any time, so how to motivate miners to provide data reading services with zero income in the long run ?

In BitTorrent's game theory strategy "optimistic tit-for-tat algorithm" design, nodes are optimistic and will cooperate with other nodes, and non-cooperative behavior will be punished. Based on this, Arweave designed Wildfire, a node scoring system with implicit incentives. Each node in the Arweave network will score adjacent nodes according to the amount of data received and response speed, and the node will give priority to sending requests to peers with higher rankings. The higher the node ranking, the higher its credit, the greater the probability of producing blocks, and the greater the possibility of obtaining scarce blocks.

Wildfire is actually a game, a highly scalable game. There is no "ranking" consensus between nodes, and there is no obligation to report the generation and determination of rankings, and the "good and evil" among nodes is adjusted by an adaptive mechanism to determine the rewards and punishments for new behaviors.

3.3.4 Sia

Like Filecoin and Crust, storage nodes must deposit collateral in order to provide storage services . On Sia, nodes must decide how much collateral to post: Collateral directly affects the price of storage for users, but posting low collateral at the same time means that nodes have nothing to lose if they disappear from the network. These forces push nodes toward balancing collateral.

Users connect to storage nodes through an automatic storage market, which functions similarly to Filecoin: nodes set storage prices, and users set expected prices based on target prices and expected storage duration. Users and nodes are then automatically connected to each other.

Among these projects, Sia's consensus protocol uses the simplest method: storing contracts on the chain. After users and nodes agree on a storage contract, funds are locked in the contract, and erasure coding is used to split the data into pieces, each piece is individually hashed with a different encryption key, and each piece is then copied to on several different nodes. Storage contracts recorded on the Sia blockchain record the terms of the agreement as well as a Merkle tree hash of the data. In order to ensure that data is stored for the expected storage time, Proofs of Storage are periodically submitted to the network. These proofs of storage are created based on a randomly selected portion of the original storage file and a list of hashes of the file's Merkle tree recorded on the blockchain. Nodes are rewarded for every proof of storage they submit over a period of time, and finally when the contract is completed.

On Sia, storage contracts can last up to 90 days. To store files longer than 90 days, users must manually connect to the network using the Sia client software to extend the contract for another 90 days. Skynet is another layer on top of Sia, similar to the Filecoins Web3.Storage or NFT.Storage platforms, which automates the process for users by having Skynet's own client software instance perform contract renewals for them. While this is a workaround, it is not a Sia protocol-level solution.

3.3.5 Storj

In the Storj decentralized storage network, there is no blockchain or blockchain-like structure . Not having a blockchain also means that the network has no network-wide consensus on its state. Instead, tracking data storage location is handled by satellite nodes and data storage is handled by storage nodes. Satellite nodes can decide which storage nodes to use to store data, and storage nodes can decide which satellite nodes to accept storage requests from.

In addition to handling data location tracking across storage nodes, Satellites are also responsible for billing and payment of storage nodes' storage and bandwidth usage. In this arrangement, storage nodes set their own prices, and satellites connect them to each other as long as users are willing to pay those prices.

When a user wants to store data on Storj, the user must select a satellite node to connect to and share their specific storage requirements. Satellite nodes then pick out storage nodes that meet storage needs and connect storage nodes with users. The user then transfers the file directly to the storage node, paying the satellite at the same time. Satellite then pays storage node fees monthly for files saved and bandwidth used.

Such a technical solution is actually very centralized, and the development of satellite nodes is completely defined by the project party , which also means that the project party has the pricing power. Although the centralized architecture also brings performance-efficient services to Storj, as mentioned at the beginning, distributed storage does not necessarily mean decentralization. The ERC-20 token Storj issued by Storj on Ethereum does not use any smart contract functions, and it essentially only provides a variety of payment methods.

This has a lot to do with Storj's business model. They focus on enterprise-level storage services, directly benchmark Amazon's S3 service, and have established a partnership with Microsoft Azure, hoping to provide enterprises with comparable or even Services that go beyond Amazon storage. In the case of unknown performance data, their storage cost is indeed much more cost-effective than Amazon, which can explain to a certain extent that the business model of decentralized storage is feasible.

4. The impact of different technology paths

4.1 Economic Model

The choice of technical path also affects the design of the token model to a certain extent. Each of the four major decentralized storage networks has its own economic model.

Filecoin, Crust, and Sia all use the Stake for Access (SFA) token model. In this model, storage providers must lock assets native to the network in order to accept storage transactions. The amount locked is proportional to the amount of data the storage provider can store. This creates a situation where storage providers must increase their collateral as they store more data, increasing the demand for assets native to the network. In theory, the price of an asset should increase with the amount of data stored on the network.

Arweave utilizes a unique donation token model where a significant portion of each transaction's one-time storage fee is added to the donation pool. Over time, the tokens in the donation pool accumulate interest in the form of stored purchasing power. Donations are distributed to miners over time to ensure data persistence on the network. This donation model effectively locks up tokens for the long term: as storage demand on Arweave increases, more tokens are removed from circulation.

Compared to the other three networks, Storj's token model is the simplest. Its token $STORJ is used as a means of payment for storage services on the network, both for end users and storage providers, and for all other networks. Therefore, the price of $STORJ is a direct function of the demand for $STORJ services.

4.2 Target users

It's hard to say that one storage network is objectively better than another. When designing a decentralized storage network, there is no single best solution. Depending on the purpose of the network and the problems it is trying to solve, trade-offs must be made between technical design, token economics, community building, and more.

Filecoin is mainly for enterprises and application development, providing cold storage solutions. Its competitive pricing and accessibility make it an attractive alternative for Web2 entities seeking cost-effective storage for large amounts of archived data.

Crust ensures excess redundancy and fast retrieval, making it suitable for high-traffic dApps and efficient retrieval of popular NFT data. However, its lack of persistent redundancy severely impacts its ability to provide persistent storage.

Arweave stands out from other decentralized storage networks with its concept of permanent storage, which is especially popular for storing Web3 data such as blockchain state data and NFTs. Other networks are primarily optimized for hot or cold storage.

Sia targets the hot storage market, primarily focusing on developers looking for a fully decentralized and private storage solution with fast retrieval times. While it currently lacks native AWS S3 compatibility, access layers like Filebase provide such a service.

Storj seems to be more comprehensive, but sacrifices some decentralization. Storj significantly lowers the barrier to entry for AWS users, catering to a key target audience for enterprise hot storage optimization. It provides cloud storage compatible with AmazonS3.

5. Ecological construction of decentralized storage

In terms of ecosystem construction, we can mainly discuss two types: the first type is that the upper layer Dapps are completely built on the storage network, aiming to enhance the function of the network and the ecosystem; secondly, the existing decentralized applications and protocols such as Opensea , AAVE etc. choose to integrate with specific storage networks to become more decentralized. In this section, we will focus on Filecoin, Arweave, and Crust, since Sia and Storj are not prominent in terms of ecosystems.

5.1 Filecoin Ecology

In the ecosystem displayed by Filecoin, there are already 115 projects belonging to the first category above, and these projects are all based on the underlying structure of Filecoin. It can be observed that most of the projects focus on general storage, NFT and consumer storage. Another important milestone in the Filecoin ecosystem is the Filecoin Virtual Machine (FVM), which, similar to the Ethereum Virtual Machine (EVM), provides the environment needed to deploy and execute code in smart contracts.

With FVM, the Filecoin network gains the ability to execute smart contracts on top of the existing storage network. In FVM, developers will not program the user's storage data, but define how the data will be automatically or conditionally operated after being stored in the network through smart contracts (in a trustless manner). The scenarios that can be imagined are as follows:

Distributed computing based on data stored on Filecoin (compute where the data is stored without moving it first)

Crowdfunded dataset preservation initiatives - eg, anyone can fund the storage of some socially important data, such as crime data or data related to environmental warming

Smart storage market - such as dynamically adjusting storage rates based on time of day, replication tier, availability within a region)

Hundreds of years of storage and sustainable hosting - such as storing data so that it can be used by generations

Data DAOs or tokenized datasets - such as modeling the value of data as a token and forming a DAO to coordinate and trade computations performed on top of it.

Locally stored NFTs - such as co-locating NFT content with registry records that track NFTs

Time-locked data retrieval - such as unlocking relevant data sets only after the company's records are made public

Mortgage loans (such as granting specific purpose loans to storage providers, such as accepting FIL+ transaction proposals from specific users, or increasing capacity in a specific time window)

At the same time, from a core point of view, the FVM virtual machine is based on Webassembly (WASM). This option allows developers to write native upper-level applications in any programming language that can be compiled into WASM. This feature can make it easier for Web3 developers to use what they already know and bypass the learning curve associated with specific languages.

Developers can also port existing Ethereum smart contracts with little (or no) changes to the source code. The ability to reuse audited and battle-tested smart contracts in the Ethereum network allows developers to save development costs and time, while users enjoy its utility with minimal risk.

Also worth mentioning is Filecoin Plus, a program designed to subsidize users for storing large, valuable datasets at a discount. Customers who want to upload data to the network can apply to a select group of members in the community called notaries, who review and allocate resources called DataCaps (data quotas) to customers. Customers can then use DataCap to subsidize their deals with storage providers.

The Filecoin Plus program has brought many benefits, making the Filecoin network more active, and the storage of valuable data continues to generate block demand; customers receive better services at very competitive prices; as block rewards increase, Compared with 2021, after the launch of Filecoin Plus in 2022, the stored data will increase by 18 times.

5.2 Crust Network Ecology

Compared with Filecoin and Arweave, Crust has a different path in terms of ecosystem construction. It prefers to work directly with existing Web3 applications and provide services, rather than incentivizing third-party developers to build their own ecosystem applications on Crust. The main reason is that Crust is built on Polkadot. Although the Ethereum and Cosmos ecosystems were options considered by the Crust project party at the beginning, they are not compatible enough with their technical paths. Crust prefers Polkadot's Substrate framework for its highly customizable development space, on-chain upgrades, and on-chain governance.

Crust excels when it comes to developer support. It introduces the Crust development kit, which includes js SDK, Github Actions, Shell Scripts, and IPFS Scan, to meet the integration preferences of different Web3 projects. Currently, the SDK has been integrated into various Web3 projects such as Uniswap, AAVE, Polkadot Apps, Liquity, XX Messenger, and RMRK.

According to the data provided on the official website, there are currently more than 150 projects integrated with Crust Network. A large portion (over 34%) of these applications are DeFi projects. This is because DeFi projects usually have high performance requirements for data retrieval.

As mentioned earlier, on the Crust Network, data is replicated to at least 20 nodes, and in many cases, to over 100 nodes. While this does require greater initial bandwidth, the ability to retrieve data from multiple nodes simultaneously speeds up file retrieval and provides strong redundancy in the event of a failure or a node leaving the network. Crust Network relies on this high level of redundancy because it does not have data replenishment or repair mechanisms like other chains. Among these decentralized storage networks, Crust Network is the youngest.

5.3 Arweave Ecology

Source: Arweave, the newest ecosystem landscape

Arweave also has a strong ecosystem, as shown above. It highlights about 30 applications that are entirely based on Arweave. Although there are not as many applications as Filecoin's 115 applications, these applications still meet the basic needs of users and cover a wide range of fields, including infrastructure, exchanges, social, and NFT.

Of particular note is the decentralized database built on Arweave. Arweave primarily uses its block organization for data storage, while performing off-chain computations on the client side. Therefore, the cost of using Arweave is only determined by the amount of data stored on-chain.

This separation of computation from the chain, known as the Storage-based Consensus Paradigm (SCP), solves blockchain’s scalability challenges. SCP is feasible on Arweave, and since data inputs are stored on-chain, off-chain computations will reliably produce the same state as on-chain computations.

The successful implementation of SCP has opened the door for the development of numerous databases on Arweave. Four different databases built on Arweave:

● WeaveDB: A key-value database built as a smart contract on Arweave, which uses whitelist addresses for access control logic.

● HollowDB: A key-value database built as a smart contract on Arweave, it uses whitelist addresses and ZK proofs to ensure data verifiability. ZK proofs are also used to ensure the verifiability of data.

● Kwil: A SQL database that runs its own network of P2P nodes, but uses Arweave as a storage layer. It uses public/private key pairs for access control logic and its own consensus mechanism for data verification.

● Glacier: a NoSQL database with a ZK-Rollup architecture that uses Arweave as its data availability layer. It uses public/private key pairs for access control logic and ZK proofs for data verifiability.

6. Growth drivers

The growth of decentralized storage depends on several core factors, which according to their characteristics can be divided into three main categories: general market outlook, technology, and public awareness. These factors are interrelated and complementary, and can be further divided into finer subcategories. Subsequent paragraphs provide a more detailed breakdown of each factor.

6.1 Market prospect

6.1.1 Potential of cloud storage market

With the internet permeating contemporary life, cloud storage services are essential to almost everyone. With the global cloud storage market reaching a staggering $78.6 billion in 2022, the growth trajectory shows no signs of abating. According to a market study, the industry could be worth $183.75 billion by 2027.

Meanwhile, IDC expects the cloud storage market to be valued at $376 billion by 2029. The growing demand for data storage is further illustrated by IDC's forecast, which predicts that the global datasphere will expand to 175 zettaytes by 2025. Given these promising prospects, it can be concluded that decentralized storage as an alternative to Web2 counterparts will benefit from overall market growth, propelling it on an upward trajectory.

6.1.2 Driving forces of digital assets

As one of the key infrastructures of Web3, the growth of decentralized storage is intrinsically linked to the expansion of the entire cryptocurrency market. Even without accounting for the surge in storage demand, the market size for decentralized storage is likely to grow steadily if adoption of digital assets continues to rise. True decentralization cannot be achieved without a decentralized infrastructure. Increased adoption of cryptocurrencies may signal a greater public understanding of the importance of decentralization, driving the use of decentralized storage.

6.2 Technology Driving Force

6.2.1 Cloud Computing Products and Computing Resources

The value of data is often reflected in the analytical significance it provides, which requires data calculation. However, in the existing decentralized storage market, the apparent lack of mature computing-based products is a major obstacle to large-scale data applications. Projects such as Bacalhau and Shale are addressing this challenge and focusing their efforts on Filecoin. Other notable projects include Fluence and Space and Time, which are developing AI query systems and computing marketplaces, respectively. As computing-based products flourish, so will the demand for computing resources. This demand can be glimpsed through the price trajectory of $RNDR, a peer-to-peer GPU computing network for users who need additional computing power. Its year-to-date performance is up a staggering 500%, reflecting investor expectations for rising demand. As these industries mature and the ecosystem becomes more comprehensive, the adoption of decentralized storage will increase substantially as users come in.

6.2.2 Decentralized Physical Infrastructure Network (DePIN)

Decentralized Physical Infrastructure Network (DePIN) is a blockchain-based network that integrates real-world digital infrastructure into the Web3 ecosystem. Key areas of DePIN include storage, computing, content delivery network (CDN) and virtual private network (VPN). These transformative networks seek to increase efficiency and scalability through the adoption of cryptoeconomic incentives and blockchain technology.

The strength of DePIN lies in its potential to generate a virtuous circle, consisting of three important components. First, the protocol employs a token economic design to incentivize participants, often through tokens that augment actual application and network usage. As the economic model consolidated, the surge in token prices and protocol usage quickly attracted attention, prompting an influx of users and capital. This growing capital pool and expanding user base attracts more ecosystem builders and developers into the industry, perpetuating the cycle. As the core track of DePIN, storage will also become one of the main beneficiaries of DePIN expansion.

6.2.3 Artificial Intelligence (AI)

The rapid development of artificial intelligence is expected to catalyze the growth of the crypto ecosystem and accelerate the development of various fields of digital assets. AI brings incentives to decentralized storage in two main ways - by stimulating storage demand and increasing the importance of decentralized physical infrastructure networks (DePINs).

As the number of generative AI-based products grows exponentially, so does the data they generate. The proliferation of data has fueled the demand for storage solutions, thereby fueling the growth of the decentralized storage market.

Although Generative AI has seen significant growth, it is expected to continue this momentum over the long term. According to EnterpriseAppsToday, generating AI will account for 10% of all generated data globally by 2025. Furthermore, CAGR expects generative AI to grow at a compound annual growth rate of 36.10% to reach $188.62 billion by 2032, which shows its huge potential.

The popularity of generative AI has grown significantly over the past year, as evidenced by Google Trends and YouTube searches. This growth further highlights the positive impact of artificial intelligence on the demand for decentralized storage solutions.

The explosion of storage and computing resources required for AI technologies underscores the value of DePIN. With the monopoly of the Web 2.0 infrastructure market controlled by centralized entities, DePIN becomes an attractive alternative for users looking for cost-effective infrastructure and services. By democratizing access to resources, DePIN offers significantly lower costs, thereby increasing adoption. As artificial intelligence continues to develop upwards, its demand will further stimulate the growth of DePIN. In turn, this aids in the expansion of the decentralized storage industry.

6.2.4 Filecoin Virtual Machine (FVM)

The Filecoin Virtual Machine (FVM) not only unlocks the potential of Filecoin itself, but also revolutionizes the entire decentralized storage market. Since Filecoin is the largest decentralized storage provider with a large share of the market, its growth has largely paralleled the expansion of the industry as a whole. The emergence of FVM transformed Filecoin from a data storage network to a comprehensive decentralized data economy. In addition to enabling permanent storage, FVM also integrates DeFi into the ecosystem, thereby generating more revenue opportunities and attracting a larger user base and capital flow into the industry.

As of June 22, 100 days after FVM went live, more than 1,100 unique smart contracts supporting dApps have been deployed on the Filecoin network . Additionally, over 80,000 wallets have been created, enabling interactions with these FVM-powered dApps. The total balance of FVM accounts and contracts has exceeded 2.8 million FIL. Currently, the protocols within the FVM ecosystem are all related to DeFi, enhancing the utility of $FIL. As this upward trend continues, we expect a large number of applications to emerge, which could trigger another wave of growth in the storage market. In addition, we also expect other storage networks to introduce a virtual machine mechanism similar to FVM, triggering an ecological upsurge. For example, Crust Network officially launched its EVM storage on July 17, combining the Crust mainnet, Polkadot and EVM contracts to build a new Crust protocol that seamlessly provides storage services for any EVM public chain.

6.2.5 Social and games based on decentralized database

Whether it is a game or a social application, a decentralized database service is required, which can resist censorship and achieve high-speed reading and writing. A decentralized database can enhance current Web3 applications and support the development of new applications and experiences in different domains.

● Decentralized Social - By storing large amounts of social data in a decentralized database, users will have greater control over their data, be able to migrate between platforms, and unlock opportunities for content monetization.

● Games - Managing and storing player data, in-game assets, user settings, and other game-related information is an important aspect of blockchain-based games. A decentralized database ensures that this data can be seamlessly exchanged and combined by other applications and games. A hot topic in the current GameFi field is full-chain games, which means deploying all core modules, including static resource storage, game logic calculations, and asset management, to the blockchain. A decentralized database with high-speed read and write functions is an important infrastructure to realize this vision.

Gaming and social applications are the industries with the largest number of Internet users, and they are also the industries most likely to produce killer applications, such as Demus, which broke out in February this year. We believe that the explosion of Web3 games and social applications will also bring about a huge demand for decentralized databases.

6.3 Public Awareness

Apart from the market outlook and technology, public awareness is a key component driving the growth of the decentralized storage market. A comparison of centralized and decentralized storage clearly highlights the numerous advantages of the latter. However, the ability to attract more users depends on increasing awareness of these benefits. This can be a lengthy process that will require a concerted effort across the industry. From content output to brand exposure marketing, industry practitioners must work hard to convey how decentralized storage can revolutionize the field of cloud storage. This effort complements other growth factors, amplifying the impact of market expansion and technological evolution.

7. Conclusion and Outlook

Overall, decentralized storage is an infrastructure industry with huge technical challenges and a long investment cycle, but with huge growth potential.

The long investment cycle is mainly due to the long iteration cycle of distributed technology itself, and project developers need to find a delicate balance between decentralization and efficiency. Providing efficient and highly available data storage and retrieval services while ensuring data privacy and ownership undoubtedly requires extensive exploration. Even IPFS often experiences erratic access situations, and other projects like Storj are not decentralized enough.

The potential growth of this market is also highly anticipated. In 2012 alone, AWS S3 stored 1 trillion objects. Considering that an object might be between 10 and 100 MB, this means that AWS S3 alone uses 10,000 to 100,000 PB of storage space.

According to data from Messari , the storage utilization rate of the largest provider, Filecoin, will only be around 3% by the end of 2022. This means that only about 600 PB of storage space on Filecoin is actively utilized. Clearly, the decentralized storage market still has a lot of room to grow.

And with the rise of AI DePin, we remain bright for the future of decentralized storage as several key growth drivers will facilitate the expansion of the market.

References

Disclaimer: This report is the original work of @ChenxiL46898047 and @BC082559, students of @GryphsisAcademy, under the guidance of @Zou_Block and @CryptoScott_ETH. The authors are solely responsible for all content, which does not necessarily reflect the views of Gryphsis Academy, or the organization that commissioned the report. Editorial content and decisions are not influenced by readers. Please be aware that the author may own cryptocurrencies mentioned in this report. This document is for informational purposes only and should not be relied upon for investment decisions. It is strongly recommended that you conduct your own research and consult a neutral financial, tax or legal advisor before making an investment decision. Remember that the past performance of any asset is no guarantee of future returns.

Sector:

SEC Security Token

Storage

DePIN

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content