How Can Blockchain Improve Data Storage?
Compared to traditional cloud servers, decentralized solutions for data storage claim to be secure, private and efficient, but can they address the scalability issue?
Cloud servers are able to keep a full record of our lives, storing literally everything from personal photos and videos from our smartphones to work documents. At first glance, this solution makes our lives easier, but some unexpected threats can be found under a shell of comfort and extensive customer care.
Although centralized data storage has its own benefits — higher speed and availability, quick throughput and low latency — it all comes at a cost. Big cloud storage companies such as Google and Amazon that dominate the industry are often suspected of cooperating with the authorities and giving them access to private data. It can be easily accomplished because users’ files are not encrypted, stored in one place and are vulnerable to any manipulations. Moreover, a single centralized server can be hacked, leaving thousands of users without their private data.
Governments can also restrict access to certain content for political reasons, as was done by Turkish officials in 2017 when Wikipedia was banned in the country. China went even further as the world's most popular social media, cloud storage and video platforms have been banned in the state and replaced by native analogs.
In contrast to centralized cloud storage, decentralized ones brag of being more secure and private. They do not store user data on a single centralized server. Instead, they divide files into multiple pieces and send them to different servers or nodes, thereby reducing the possibility of external control over user data. Despite these improvements, decentralized storage also has some restraints.
Since blockchain started emerging, there have been enthusiasts who claim that it would make literally everything from banking to healthcare and from voting to fundraising better. Could that be true for data storage, and is blockchain able to improve the cloud storage industry? Time will tell, and although numerous solutions are being proposed, we are far from jumping to conclusions.
Decentralized cloud storage: Operation principles
Cloud storage systems store data on remote servers accessed from the internet and called “clouds.” These servers are maintained by cloud server providers. Unlike traditional cloud servers, decentralized cloud storage does not keep clients’ data on one particular centralized server. Instead, it uses different nodes located across the world, which are independent of each other. The nodes are not hosted by a single entity and are not controlled by service providers, and anyone can run a node.
It all started almost 20 years ago with the BitTorrent protocol, which was designed for peer-to-peer file sharing. BitTorrent users download various video, music, and text files to their local storage and then can share (“seed”) them with other users. Files on BitTorrent are not encrypted, but they are divided into pieces, and file fragments can be downloaded from different seeders, just like in a decentralized cloud.
The InterPlanetary File System (IPFS) protocol is another step in the evolution of decentralized storage. It appeared in 2015 and later became the foundation for some of the currently developing blockchain-based decentralized storage solutions, for example, Filecoin. As well as HTTP, IPFS is a hypermedia protocol for the web designed to transfer data between users and servers on the internet, but it works on multiple nodes instead of a central server. When someone uploads a file to the IPFS network, the file is divided into fragments called blocks. Each of them receives an individual hash. The blocks can later be found and retrieved into one file by their hash or name using content-based addressing, which differs from location-based addressing in HTTP.
Blockchain-based solutions in cloud storage: Off-chain and on-chain
BitTorrent and IPFS protocols are both far from perfect and have a number of challenges. With the emergence of blockchain technology, the idea of using it to improve data storage has become appealing to various developers worldwide. Blockchain-based decentralized cloud solutions have learned from their predecessors and aimed for improved security, privacy and users’ control over their data. One of their distinguishing features is encryption. When you upload a file to the network, it automatically encrypts the file. After that, you can get access to your file with an encryption key; without the key, no one can reach and read your file.
The thing that blockchain-based solutions have in common with BitTorrent and IPFS is sharding. In simple terms, it is a process of breaking a single file into numerous pieces so that these pieces could be stored on different nodes. No single node runner holds your entire file, instead, they only keep a fragment of it. Those fragments are duplicated, which leads to redundancy in data; even if a certain node breaks down with a fragment of your file, the same fragment can be found on other nodes.
There are two fundamentally different approaches in blockchain data storage solutions: off-chain and on-chain. The on-chain principle means that all data of users is stored within each block on the blockchain. The unquestionable benefit of this method is that even in the event of an attack, the data can be restored and resynchronized. The enhanced security comes at a price to maintain full nodes: Every node will have to contain literally all uploaded data, which is a far more expensive option. It is believed that the blockchain is not scalable enough to store users’ entire files. Any running node will have to keep a copy of all uploaded users’ data, and all nodes will have to constantly synchronize with one another. If each user uploads just a few megabytes of data, the network will become overloaded. Moreover, it will cost a fortune in network fees. This problem is known as blockchain bloating. That’s why almost all data storage solutions on the market are off-chain. They are trying to solve the scalability problem by not storing the users’ data in the blockchain, limiting themselves to just storing metadata on-chain and using blockchain for facilitating the platform ecosystem. The obvious weak point of off-chain solutions is weaker security. If the system gets attacked, theoretically, there can be a case when metadata will be the only thing left, while data itself will be lost completely. Although, off-chain solutions are more cost-efficient and have multiple use cases.
The off-chain solutions use miners who provide their hard disks to store other users’ files for a reward, and the blockchain is used to facilitate the storage market between miners and users. Convincing users to store someone else’s data on their disks and to run nodes might be challenging but is essential for scaling the ecosystem of off-chain solutions, and blockchain helps decentralized clouds with that. One of the most widely spread options is to use the platform's native crypto coins as an incentive. This motivates users to rent their spare disk space, therefore allowing this trustless ecosystem to grow.
BitTorrent introduced its BTT after the company was acquired by TRON. The main use case for BTT is rewarding users for keeping and distributing (“seeding”) files, but other options are planned, like paying for content, tipping content creators and crowdfunding.
In the Filecoin network, blockchain is also used to connect users who need to store their data with those who are able to provide storage space — they are also called “miners”. A client submits a bid on the blockchain, and when a matching order from a miner is found, parties sign a deal order. Miners are then rewarded with coins.
Decentralized cloud storage has its advantages, and blockchain adds some more
Compared to traditional centralized cloud servers like Amazon or Google Drive, blockchain-based decentralized cloud storage has a number of compelling advantages.
Security. As discussed in the previous section of this article, blockchain-based decentralized cloud storage makes retaining and transmission of data safer. Files are encrypted with private keys, which makes it impossible for anyone without the key to access the file. Files are also divided into pieces to be kept on multiple nodes so that there is no single point of failure. If a centralized server breaks down, you’ll probably lose access to your data. If a certain node goes wrong, you will keep your files safe.
Immutability. Since there is no central authority, no one can take away your file, restrict access or make amendments to it for the sake of censorship. The file’s hash is kept in the ledger.
Lower price. While centralized cloud storage products like Amazon S3, Google One and Dropbox offer 1 GB of space for $0.023, $0.02 and $0.005 per month, respectively, their competitors using blockchain have prices as low as $0.002.
Rewards for storing. A number of decentralized cloud projects use blockchain and native cryptocurrencies to incentivize users. Those who have spare storage space — unused hard drives, disks, data centers — can rent it for a reward. Blockchain cloud storage platforms connect users willing to share their storage space with the ones who need it, making it a win-win situation.
Latest solutions will probably allow storing users’ data on a blockchain
As explained above, off-chain solutions have been using blockchain for different purposes, but not for storing content itself. Data storage on a blockchain has certain limitations, and one of the most significant of these is the potential scalability problem. Blockchains can only process a limited — and relatively low — amount of transactions compared to traditional payment systems. In peak times, it leads to network overloads, delayed transactions and increased transaction fees. With a growing number of users and transactions under existing conditions, it could become a significant drawback.
Today, there are different data storage solutions that claim to address the scalability issue, although the majority of them are still in the development phase. One of these is ILCoin, and its RIFT protocol, which has already been implemented on the mainnet, as stated by the company. The RIFT protocol is a multi-layered solution where every block mined contains mini-blocks, which, in turn, contain users’ data. The ILCoin developers say that their block size could total up to 5 GB together with mini-blocks, which makes it the “biggest among the competitors.” According to the project’s team, by solving the first-in, first-out and the bottleneck problems, its RIFT protocol makes the “network size potentially unlimited.”
The ILCoin developers argue that the RIFT protocol opens up ample opportunities for safe and transparent storage of any digital content on the blockchain. Until now, storing large amounts of data on-chain was not possible due to blockchain bloating. The ILCoin team said that thanks to asynchronization principle and mini-block architecture, it will soon be possible in their decentralized cloud blockchain storage solution, whose launch is scheduled for later this year. Obviously, running a full node for the on-chain storage platform will be costly, so the ILCoin developers are betting on rewarding their future partners with proprietary coins like their off-chain competitors.
Both traditional data storage solutions and decentralized clouds have their own benefits and drawbacks. Traditional cloud servers have higher speed and availability, but they do not encrypt users’ data nor store it in one place, which threatens the security and privacy of data. Decentralized cloud storage improves the security and privacy of users’ data by encrypting their files and sharding, but they cannot brag about having the same high speeds and low latency as centralized ones. In addition, blockchain solutions for decentralized cloud storage have not yet demonstrated the ability to build a critical mass of users, which is essential for the ecosystem and remains one of the sticking points of distributed systems. Besides, decentralized storage can be off-chain or on-chain. Off-chain solutions successfully avoid the blockchain bloat problem but have weaker data security, as it is not stored on the blockchain. On-chain solutions claim to be safer but are more expensive and require larger blocks. To sum up, each type has its own benefits and drawbacks, and only time will tell which of them succeeds.
Disclaimer. Cointelegraph does not endorse any content or product on this page. While we aim at providing you all important information that we could obtain, readers should do their own research before taking any actions related to the company and carry full responsibility for their decisions, nor this article can be considered as an investment advice.