view the rest of the comments
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
I use zfs so not sure about others but I thought all cow file systems have deduplication already? Zfs has it turned on by default. Why make your own file deduplication system instead of just using a zfs filesystem and letting that do the work for you?
Snapshots are also extremely efficient on cow filesystems like zfs as they only store the diff between the previous state and the current one so taking a snapshot every 5 mins is not a big deal for my homelab.
I can easily explore any of the snapshots and pull any file from and of the snapshots.
I'm not trying to shit on your project, just trying to understand its usecase since it seems to me ZFS provides all the benefits already
Btrfs does not have its own built in deduplication like zfs does. I’m surprised zfs has it turned on by default, considering file system level deduplication is fairly CPU and RAM intensive. But yeah, if you can use a deduplicated file system, go for it.
In my use case, I’m not willing to move away from ext4 (on my home server, which is where this is running), and I don’t need all files on my file system to be deduplicated, just a set of files that I add to every day. I made this because it fits my use cases better than any other solution (this current use case, and some more I’m planning to implement in the future).
As far as using snapshots to implement my current use case, it’s not possible. My Minecraft server runs on a different system than where I put my backups, and I want it that way. They are meant to be backups, not versions, and backups shouldn’t be stored on the same system. That server has also been migrated several times since I first started running it in 2019. I have back ups that go that far back too. So I need a system that I can put years worth of existing backups into, not just start taking backups now.
Thanks! Makes sense if you can't change file systems.
For what it's worth, zfs let's you dedup on a per dataset basis so you can easily choose to have some files deduped and not others. Same with compression.
For example, without building anything new the setup could have been to copy the data from the actual Minecraft server to the backup that has ZFS using rsync or some other tool. Then the back server just runs a snapshot every 5 mins or whatever. You now have a backup on another system that has snapshots with whatever frequency you want, with dedup.
Restoring an old backup just means you rsync from a snapshot back to the Minecraft server.
Rsync only needed if both servers don't have ZFS. If they both have ZFS, send and recieve commands are built into zfs are are designed for exactly this use case. You can easily send a snap shot to another server if they both have ZFS.
Zfs also has samba and NFS export built in if you want to share the filesystem to another server.
Yeah, that could work if I could switch to zfs. I’m also using the built in backup feature on Crafty to do backups, and it just makes zip files in a directory. I like it because I can run commands inside the Minecraft server before the backup to tell anyone who’s on the server that a backup is happening, but I’m sure there’s a way to do that from a shell script too. It’s the need for putting in years worth of old backups that makes my use case need something very specific though.
In the future I’m planning on making this work with S3 as the blob storage rather than the file system, so that’s something else that would make this stand out compared to FS based deduplication strategies (but that’s not built yet, so I can’t say that’s a differentiating feature yet). My ultimate goal is to have all my Minecraft backups deduplicated and stored in something like Backblaze, so I’m not taking up any space on my home server.