datahoarder

7409 readers

47 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 5 years ago

MODERATORS

archivist@lemmy.ml

Split ZFS in two at two locations? (lemmy.nowsci.com)

submitted 1 month ago by fmstrat@lemmy.nowsci.com to c/datahoarder@lemmy.ml

4 comments fedilink hide all child comments

Trying to figure out if there is a way to do this without zfs sending a ton of data. I have:

s/test1, inside it are folders:
- folder1
- folder2

I have this pool backed up remotely by sending snapshots.

I'd like to split this up into:

s/test1, inside is folder:
- folder1
s/test2, inside is folder:
- folder2

I'm trying to figure out if there is some combination of clone and promote that would limit the amount of data needed to be sent over the network.

Or maybe there is some record/replay method I could do on snapshots that I'm not aware of.

Thoughts?

you are viewing a single comment's thread
view the rest of the comments

[–] ReversalHatchery@beehaw.org 0 points 1 month ago* (last edited 1 month ago) (1 children)

what is your goal with this?

do you still want to keep all the data in a single pool?
if so, you could make datasets in the pool, and move the top directories into the datasets. datasets are basically dirs that can have special settings on how they are handled

ninja edit: now that I think about it, moving across datasets probably makes that data to be resent.
it would be easier to give advice by knowing why do you want to do this

[–] fmstrat@lemmy.nowsci.com 0 points 1 month ago (1 children)

Yea your edit is the problem unfortunately. Moving across datasets would incur disk reads/writes and sending of terabytes of data.

The goal in separating them out is because I want to be able to independently zfs send folder 1 somewhere without including folder 2. Poor choice of dataset layout when I built the array.

[–] ReversalHatchery@beehaw.org 1 points 1 month ago

hmm I see. and why do you want that? balancing storage usage between backup sites? one of them is too little for the whole pool?

for now I don't have a better idea, sorry. maybe this is the second best time to think up a structure for the datasets, and move everything into it.
but if the reason is the latter, one backup site cant hold the whole pool, you may need to reorganize it again in the future. and that's not an easy thing, because now you'll have the same data (files of the same category) scattered around the FS tree even locally. maybe you could ease that with something like mergerfs, and having it write each file to the dataset with lower storage usage.

if you are ready to reorganize, think about what kinds (and subkinds) of files will you be likely to store in a larger amount, like media/video, media/image, and don't forget to take advantage of per-dataset storage settings, like for compression, recordsize, maybe caching. not everything needs its own custom recordsize, but for contiguously read files a higher value might be better, also if its not too often accessed and want better compression ratio as compression (and checksumming!) happens per records. video is sometimes compressible, or rather some larger data blob inside the container