Over the past several years, there has been a keen interest in how we can use blockchains for storing documents. There are many reasons you might want to store documents or hashes of documents on a blockchain, and multiple ways to do this. Various projects are currently innovating around this idea, each proposing different methods with different trade-offs.
Why Use a Blockchain Anyway?
Throughout 2017, there was a huge amount of hype around the applications of blockchain technology and cryptocurrencies.
These expectations were often focused on projects with grand promises and little proof of concept. As a result, the reality did not match the hype, and many of them have yet to attract users to their products.
In contrast, document storage is a much drier and less exciting application. However, it is deliverable, with multiple improvements over existing document storage systems.
Immutability is perhaps the most important benefit a blockchain provides. Cryptographically linked blocks provide a record immune from tampering. This tamper resistance is highly effective in preventing the counterfeiting of documents and document fraud. If you cannot store the actual document on the blockchain due to file size limitations, then even storing a hash of the document makes a lot of sense.
Documents often take up a lot of space, compared to financial transactions, which blockchains like Bitcoin are designed for. It is often not feasible to store a whole document on a blockchain. Hashes take up just a small fraction of this space, therefore, are a much more efficient option.
Storing just the hash still offers you tamper resistance. Whenever you change the input of a file, its corresponding hash value will always change. This is a vital benefit secure hash algorithms provide. Regardless of where you store your document, whether in a centralized system like MySQL or in a distributed database like Azure, you can still verify the document has not been tampered with by rehashing it and comparing it to the blockchain-stored hash.
Using a public blockchain is a great way to make your document accessible to the public. Of course, you need to be absolutely confident that you want to make it fully visible. Once you store the document or its hash on the blockchain, it will be there permanently. There is no way to change data once you include it in a block.
A blockchain is certainly not the only way to do this. However, given its level of security and tamper-resistance, you can be confident of permanent visibility.
Of course, you could also use a federated or private blockchain if you wanted to limit access to your documents. Such blockchains can provide you with the ability to offer permanent visibility to a preselected group. These alternatives will, however, undermine decentralization and possibly tamper-resistance.
Need for Decentralization
The final reason to use a blockchain is if you require decentralization. Perhaps the nature of your document means that you cannot reliably trust a third-party storage provider to not tamper with or delete the document.
One such instance would be politically sensitive files, which malicious parties could target, if published. By uploading the document or its hash to a public blockchain you would have peace of mind that it is safe from state or corporate censorship. Of course, choosing the correct blockchain is very important here. Blockchains are not all made alike. If the consensus protocol is not properly decentralized or allows full nodes to reverse or censor transactions, then you will have the same problems as using traditional systems.
The Different Ways to Store a Document on a Blockchain
There are two main ways you might choose to store a document on the blockchain. One option is to store the entire document itself on-chain. Alternatively, you can store a hash of it on the blockchain.
Storing the Entire Document
Storing a whole document on-chain is possible with certain blockchains, however, it is rarely a good idea. Due to the huge data demands, unless it is a very small file or of extreme importance, you would be better choosing another method. If you wanted to store the document on Bitcoin, then you first have to compress it and then format it into a hexadecimal form.
The problem with storing whole documents on a blockchain is because of something called access latency. This just means how long it takes network users to upload and download files, such as documents. Fully decentralized public blockchains have thousands of nodes. Unfortunately, the benefits that come with this number of nodes also results in a corresponding increase in latency. Any file storage, including documents, needs to have low latency otherwise the system becomes clogged up, slow, and expensive to use.
A hybrid strategy can also make sense. This would involve storing a small part of the document, perhaps the signatures, as well as the document hash on-chain. This allows you to maintain decentralization and full transparency of the parts that absolutely require it while maintaining a cap on the data load.
Storing a Hash
The most efficient method is to store a document’s hash on-chain while keeping the whole document elsewhere. The document could be stored in a centralized database or on a distributed file storage system. You would put the document through a secure hash algorithm like SHA-256 and then store the hash in a block. This way you save a huge amount of space and cost. Additionally, you will be able to tell if someone tampers with the original document. The change in input would result in a completely new hash value, different from your original document.
Hash values are far smaller than whole documents and so are a vastly more efficient blockchain storage method. It also scales efficiently. For storing multiple documents, you can put the hashes into a distributed hash table, which you then store on-chain. The downside is that the storage of the original document is not decentralized nor necessarily publicly visible.
Who Is Working on This?
There are few projects that focus on documents alone right now. Most are built around decentralized file storage, which includes documents.
One project that is focused specifically on documents, particularly signed documents, is Blocksign. This uses the hash method. A user will sign the document and send it to Blocksign, where it is then hashed, and the hash is stored on the Bitcoin blockchain. We must warn users that Blocksign has not recently updated their site, and we would encourage further research before use.
Two cryptocurrency projects designed for decentralized storage more generally are Siacoin and Storj.
Siacoin does not use a blockchain for any form of storage. Instead, their distributed network stores an encrypted version of your document. The Siacoin network is comprised of hosts who provide storage and clients who desire storage. Clients and hosts agree upon contracts detailing the commitments made by the storage providers. Sia’s own proof of work blockchain stores these contracts.
Storj, on the other hand, is closer to the hash model. A hash of the document is stored within a hash table on-chain. Additionally, its distributed network also stores your document. Unlike Sia, however, Storj runs atop the Ethereum blockchain rather than its own.
Cryptyk, an enterprise-focused platform to store documents, uses a blockchain more distantly than all of the above. You do not store any documents or hashes on-chain. Instead, a distributed cloud system stores the documents. The platform only uses a blockchain to manage and referee document access and sharing.
Document blockchain storage is a sector of this industry moving forward steadily. Right now, we are waiting to see what role blockchains will play in storing documents. Fortunately, the competition among projects is furthering our understanding of this promising use case.