Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
view the rest of the comments
This is not meant for human beings. A creature that needs over 140 TB of storage in a single device can definitely afford to run them in some distributed redundancy scheme with hot swaps and just shred failed units. We know they're not worried about being wasteful.
Rebuild time is the big problem with this in a RAID Array. The interface is too slow and you risk losing more drives in the array before the rebuild completes.
Realistically, is that a factor for a Microsoft-sized company, though? I'd be shocked if they only had a single layer of redundancy. Whatever they store is probably replicated between high-availability hosts and datacenters several times, to the point where losing an entire RAID array (or whatever media redundancy scheme they use) is just a small inconvenience.
Fairly significant factor when building really large systems. If we do the math, there ends up being some relationships between
Basically, for a given risk acceptance and total system size there is usually a sweet spot for disk sizes.
Say you want 16TB of usable space, and you want to be able to lose 2 drives from your array (fairly common requirement in small systems), then these are some options:
The more drives you have, the better recovery speed you get and the less usable space you lose to replication. You also get more usable performance with more drives. Additionally, smaller drives are usually cheaper per TB (down to a limit).
This means that 140TB drives become interesting if you are building large storage systems (probably at least a few PB), with low performance requirements (archives), but there we already have tape robots dominating.
The other interesting use case is huge systems, large number of petabytes, up into exabytes. More modern schemes for redundancy and caching mitigate some of the issues described above, but they are usually onlu relevant when building really large systems.
tl;dr: arrays of 6-8 drives at 4-12TB is probably the sweet spot for most data hoarders.