datahoarder
Who are we?
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we're trying really hard not to forget.
-- 5-4-3-2-1-bang from this thread
view the rest of the comments
You'll have to find some kind of balance, ad it js a game of chance. You are always limited by number of slots in your server and current largest drive size. Then you are trying to balance price, speed, and durability.
For exampl, let's say maximum amount of drives is 10, and maximum manufactured size is 50 TB. You probably don't need 500 TB of storage that is in no way durable (if a drive dies, all the data on it is lost) and on a single server.
Death of drives is almost certain, two drives dying at the same time is quite low, so something like RAID ~~4,~~ 5, 6, or 10 is a great start. Depends on how much storage you want, and then partition it accordingly. If you want 20 TB, you can do 4x 8TB in RAID 5, which yields 3x 8TB (=24) of effective storage.
Adding new drives is easy, and you are are always wasting just one drive. Then it depends if you want to sacrifice more space for more durability and switch to RAID 6 later on.
If you want even more storage, you can buy a micro server like ODROID H4+ and use it as network connected storage.
Sorry - whilst most of your advice is great, this is a bit misleading.
RAID 4 is very rarely used. It's not a particularly safe or efficient use of striping, and was replaced by 5 shortly after it was invented.
RAID 5 itself is now strongly discouraged for large arrays. (Google, "don't use raid 5 for large arrays" for literally millions of pages explaining this, but it basically boils down to; "If a drive fails, the chance of a second drive failing whilst rebuilding is very high")
But 6 is good if you've got enough drives and 10 (1+0) is also a fairly well regarded method for arrays of equal-numbered arrays.
With 4, you are correct, I went from top of my head back what we learned in high school 15 or so years ago. 5 is still better than nothing if you don't have the resources to get one more drive for 6. Of course, the best is completely mirroring all stuff to a separate geo location.
It all boils down to willingness of spending money for more durability.
I've edited my comment to scratch R4. But R5 is still great for smaller arrays, and it is possible to, for example, have RAID 5 for movies, and RAID 6 for photos.
There are also combinations of RAID levels, like aforementioned 10. There is a nice comparison table with apparent drive requirements and fault tolerance on Wikipedia: https://en.m.wikipedia.org/wiki/Standard_RAID_levels