dStor - an interplanetary file storage solution.
- Storage Nodes dStor
It is wonderful to see that dStor establish itself as a new community at Peeranha. It is no secret that I am very excited about the pending release. I do have a question though.
Everyone knows that dStor is built upon the decentralised Interplanetary File System (IPFS). But how do we know that our documentation, will be reliably and readily accessible on this system?
In other words, is there any concern that our dStor documentation will be lost forever in an interplanetary black hole?
Please put my concerns at ease.
dStor is designed to include some important features that are missing from other incarnations of IPFS. First, it has a better engine and algorithm for seeding copies of each file. IPFS actually works to eliminate redundant copies of files. dStor intentionally seeds many copies (probably 16+ once the system is up and running and ~19 at scale) around the world. dStor periodically audits files to ensure that they are seeded where expected and dStor outpost storage nodes are paid, in part, based on how many requested seeds they are actually hosting, so it is in their economic interests to maintain them. Further, when seeds are found to be missing in these audits, a new storage node is requested to seed the file, thereby maintaining the target number of seeds. Further still, when a node is asked to serve a file that it does not have seeded, it will receive it from a nearby seed node and maintain it in its cache for a period of time in order to serve it (which is also incentivized with pay, so it's in the node's interest to hold on to it if there's ample room.)
Just to add to what Douglas said, IPFS as a technology is really good at a specific task, but it lacks a few features that make it unusable for most tasks one would traditionally consider. Forward replication, analytics, intelligent routing, and event based health triggers are just some of the features that IPFS wasn't built to handle that dStor solves.
TIPFS came about due to vanilla IPFS lacking any sort of forward replication to seed nodes as a health measure when the Telos Blockchain network was launching. We made that work as a proof of concept last year because of concerns similar to yours. The problem that unfolded following was once you had the data properly seeded, how do you intelligently get users as close as possible to that data in the most healthiest way possible? The layers of technology that went into solving that question and others are the layers that make up the foundation of dStor as a distributed storage suite.
When you look at traditional clustered storage systems, GlusterFS and/or CephFS for example, when you define a logical storage block ( for a file system ), you can tell their managers how many copies of each block you need. Most of the time the replica number is somewhere between one and three. This works for a system that you can formulate a prediction as to how often you will experience failure. The question becomes: how many copies do you need of your data to stay ahead of your failure rate?
With Gluster and Ceph, the replica number is low because your failures are few and far between. With dStor, we knew to predict that large swaths of the Internet might become partitioned from each other at any given moment. In order to overcome that, our replica count is raised to number that answers the next question: how many servers/copies of a file is it going to take to feel comfortable that the system could be resilient and heal from a major network partitioning before data loss occurs?
There is a formula I created last year as part of the TIPFS suite to calculate that number. That formula calls for roughly 20 replicas among 100s of nodes, scattered in the most logically decentralized way. We did this as a best effort solution. I want to be clear, that replica number could grow to as high as every outpost node online if demand calls for it. However, as demand relaxes and garbage collection occurs, the replica count will dynamically scale back.
To DDOS a file from dStor would one would have to know exactly which 20 seed nodes and all caching nodes had the file and be able to DDOS with enough force quick enough to remove those nodes from the Internet for good. I'm not saying it can't be done, but the system has been designed to heal around DOS events. You would have to physically destroy the nodes and do it quick enough that the last one was gone before the first one timed out. Also, thanks to the file multihash acting as a data checksum, we can prove if someone is trying to spoof a rogue point or poison a stream mid-flight.
In short, you can get bits and pieces of this sort of diversification on your own, but no technology provider gets even close to rolling all of these pieces together into one turnkey suite. dStor does this and does it in a way that we feel presents the best reasonable fair solution to try to solve these questions while providing our customers with the best possible experience to present data to their end users in a verifable manner.