datahoarder

9503 readers
236 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago
MODERATORS
1
 
 

@ray@lemmy.ml Got it done, I'm first of the mods here and will be learning a little Lemmy over the next few weeks.

While everything is up in the air with the reddit changes I'll be very busy working on replacing the historical pushshift API without reddits bastardizations should a PS version come back.

In the mean time you should all mirror this data ensuring its survival, do what you do best and HOARD!!

https://the-eye.eu/redarcs/

2
 
 

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 6ae129b76fddbba0776d4a5430e71494245b04c4

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA256: 7D6935B1C63FF2F6BCABDD024EBC2A770F90C43B0D57B646FA7CBD4C0ABCF846 MD5: B8A72424AE812FD21D225195812B2502


Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9.

3
 
 

From Youtube and Google.

I see that long path limit can be turn off, but not in File explorer.

Is there a way to enable long path in File Explorer or an open source software to use as a file explorer replacement that is seamless?

I don't care about software incompatibility, I could move some file manually and open in softwares that doesn't support long path.

4
 
 

Hey everyone,

I’m working on archiving a few profiles from Loyalfans, but I’ve hit a wall with their CDN (CloudFront) security and rate-limiting. I’m looking to grab all media (high-res images, GIFs, videos, video thumbnails & audio), but the platform seems particularly hostile to bulk downloading. Has anyone successfully scraped/download a profile on Loyalfans? If YES! then how?

The site uses heavily signed URLs with Expires, Signature, and Key-Pair-Id parameters. These seem to be session-bound or very short-lived.

What I’ve tried so far:

  1. Manual "Save As" (Shift + Right Click): Result: Works for the first 10-15 files, then falls apart.
    The Issue: I’m running into what looks like a cache collision or rate limit. After a few downloads, the browser starts saving randomly previously downloaded imagese instead of the new one. It only resolves if I wait 30+ minutes, try again & then continue in this cycle.

  2. HAR Extraction & Shell Scripting: Result: Partially successful but extremely finicky. The Issue: I’ve been saving .har files from the network tab, then using grep to grab the CDN links. The problem is that the HAR often picks up thumbnails (_md.jpg, _sm.jpg) or pre-fetched neighbor images. Furthermore, if I don't run the wget/curl script quickly enough, the signatures expire.

  3. Selenium-based Python Script: Result: Identical to the manual method. The Issue: Even with headless browsing and random delays, the CDN eventually detects the automated behavior and starts serving 403s or throttles the connection, resulting in the same "duplicate image" cache bug.

  4. Vergil9000's Loyalfans Downloader: Link: https://github.com/Vergil9000/LoyalFans Result: Failed completely. I can load a list of profiles I follow, but the actual scraping/downloading logic seems broken or outdated for current site architecture.

Many thanks for taking the time to read my post. Any help would be greatly appreciated ....
5
 
 

Just Remembered that I still have my old Twitter Account I abandoned a Year Ago. I Think its high time I delete it Fully! But since i have Posts on it dating back to 2016, I wanna back it all up Properly!

So I was wondering if any of you have good Programs/Projects that can do this kind of thing that you Trust?

Thanks!

6
 
 

Mario Builder 64 is a level editor realized fully in Super Mario 64 itself, which should run on real hardware I think. It is intuitive to use and the community created ton of custom levels. I think a custom software is needed to handle the community stuff, but the Romhack itself is playable on an emulator if you want test building your own levels.

The download page for the patch file (remember its not a Rom, its just a patch) got hit by a DMCA. Usually Nintendo does not do that with Romhacks. Sure the patch files itself are not Rom files, but they might contain data that is copyrighted. That's why Nintendo might be annoyed by this.

Get your patch file copies (.bps format) and archive it if you care.

7
 
 

I apologize if this isn't the correct place to ask this. If not, just point me to the right place, please.

As per the title, I am looking into backing up files - pictures, movies, music, some documents - locally but with little to mo need of having the backup medium hot. I already have a few HDDs lying around but it reaches a point where is becomes bulky and takes up considerable space.

I've been thinking of memory cards because they are reasonably affordable and can be stored away easily. But how reliable are those?

I intend to make/save several backup copies.

8
 
 

Anna archive continues it's awesome work.

9
 
 

cross-posted from: https://beehaw.org/post/23758766

Our project to preserve the history of Sega Channel — including over 100 new Sega Channel ROMs.

By Phil Salvador

December 15, 2025

Sega broke ground in the late 90s with one of the first digital game distribution systems for consoles. Sega Channel offered access to a rotating library of Sega Genesis titles, along with game tips, demos, and even a few exclusive games that never came out in the United States in any other format. In an era of dial-up internet, Sega Channel delivered game data over television cable — a novel approach that gave the service its name.

...

https://gamehistory.org/segachannel/

10
13
submitted 2 months ago* (last edited 2 months ago) by tuff_wizard@aussie.zone to c/datahoarder@lemmy.ml
 
 

cross-posted from: https://aussie.zone/post/27191517

I spun up nextcloud to replace onedrive about a year ago. Everything was going well so I chose not to renew my onedrive subscription, this was exactly 6 months ago, I'd assume.

I got an email a few days ago reminding me that they would delete my data. I ignored it because obviously I had moved my data to nextcloud. not gonna trick me Mi¢ro$oft.

But yesterday I decided to have a quick look though and it turns out I didn't copy over everything, and certanly not my 5 years of camera roll backups.

I started a sync of everything last night and woke up in the morning to find that it had stopped at about 10gb out of 80gb. And now onedrive won't connect and if I try to log in to onedrive with that account via the web it just kicks me back to the microsoft portal.

I'm 99.5% sure there is nothing to be done and I'm not an overly sentimental person so if they are lost it won't break me. I have many important photos backed up in immich but just not everything.

But I just needed to ask in case someone knows where to find the M spot I can touch for magic file recovery.

Edit: turns out you can just pay them more money and they still had my stuff. thank you for joining me on the shortest support ticket of all time

11
 
 

I currently have a single Seagate Ironwolf Pro hard drive which I've been running in my NAS for about two years. I kind of want to buy two more drives of the same make and capacity and make it a software RAID 5 array. Is that a good idea? Do raid arrays need to have drives of the same age?

12
 
 

Hello.

I have been attempting to find a way to automate the generation of m3u8 URLs from streaming sites which require you to click on the video player to initiate loading the media.

I've found some information relating to Selenium, but haven't used that before and haven't had any success so I'm not sure if there are other solutions.

I'd considered generating URLs for successive videos based on apparent naming conventions, iterating over them to access one at a time, [figure out how to automatically initiate the video so the m3u8 requests get made], capture the m3u8 URL, initiate download with that URL and name each appropriately with something like yt-dlp's autonumber.

I've figured out and tested options for most of these steps, but I haven't had luck with the automated loading/initiation of the video stream in order to load the m3u8 requests. I'm still doing that step manually.

My laptop is crazy old and struggles to play video in a browser, seemingly it fills up its memory and it has crashed before. So I grab the m3u8 URLs to either load them into a local media player for streaming or download them for later, the latter especially if my internet connection is struggling as it often does.

Any advice or direction is greatly appreciated.

Thank you very much!

13
14
 
 

cross-posted from: https://swg-empire.de/post/4845931

I've had multiple reads fail on a fairly new drive.

I did a smartctl -t long /dev/sdb but after checking back a few minutes later smartctl -a /dev/sdb showed that no tests were running and that the previous test had "the read element of the test failed".

I did smartctl -t offline /dev/sdb next and after that was done smartctl -x /dev/sdb showed about 1500 errors but it also reported SMART as PASSED.

Here is the output of smartctl -x /dev/sdb: https://pastebin.com/09rNZZfD

How should I interpret these results? Was my assumption that the long test was done wrong? Should I replace the drive? Or might something else be wrong, like the SATA connection?

15
 
 

cross-posted from: https://lemmy.world/post/37159807

Have fun digging, and please share interesting findings below.

16
3
submitted 3 months ago* (last edited 3 months ago) by rpollost@lemmy.ml to c/datahoarder@lemmy.ml
 
 

If you're archiving a scriptbin.works script url(or a user profile url) to the wbm or elsewhere, append ?__termsofaccessagree=y to it. This skips directly to the actual script, so the actual script is captured.
Important: The creator of scriptbin also told me to NOT use that suffix when "normally" sharing script urls, as that will be problematic for scriptbin.
In other words, ONLY use ?__termsofaccessagree=y suffix for archiving purposes.
Now that you know this, if someone else asks you about it, DON'T just comment "Append ?__termsofaccessagree=y " and walk away.
Be a good steward of the internet and also mention the aforementioned warning along with your comment.
To reiterate the warning, DO NOT use that suffix for regular normal sharing of scriptbin urls. That suffix is only for archiving purposes.

Have fun archiving.
Cheers!

17
 
 

MESA, AZ—Gleefully describing the inevitable day when society would collapse and digital files would become unusable, local physical media collector David Campbell confirmed Wednesday he was “absolutely pumped” for the downfall of humanity. “When it all goes down, there’s only going to be one place to watch the Tomb Raider movies in their entirety with all the deleted scenes, and that’s going to be my bunker,” said Campbell, his eyes reportedly shining as he described how the end of organized society and the dissolution of government would make his cherished stockpile of Blu-rays even more valuable.

“No one will be mocking the CDs I’m still holding onto when the internet goes dark forever and the only way to listen to music is through boom boxes we trade canned goods for. And I’m definitely one of the only people who has a region-free DVD player and all three seasons of Father Ted plus the Christmas special, so I’ll essentially be a king. I can’t wait.” At press time, Campbell was grinning as he purchased the 50th anniversary edition of Jaws in 4k, which he anticipated would give him full control over the drinking water supply in the event of a nuclear winter situation.

18
19
 
 
20
 
 

cross-posted from: https://lemmy.sdf.org/post/40623875

Hello. Me and a few friends are attempting to backup every files from AndroidFileHost, and we need some help in doing so.

For those who haven't heard of it, AndroidFileHost is a website that hosts various Android related files. It's one of the last surviving large Android related file serving sites, and holds a LOT of rare files especially for older android devices. (rip d-h.st) Despite being such a valuable site, it hasn't been well maintained for the past few years. Their Xitter account's last update is from around 2022, and the owner isn't replying to any e-mails. The site has been extremely unstable with various issues, most recently no file could ever be downloaded from it for about a month. Luckily, it has been (kind of) solved for now, and most (not all, about 20% files are still gone) files are back online now. However, it's clear this site needs a backup.

I have scraped their website which gives us the unique ID and MD5 hash for every files available on the site. Now, using this ID we can automate the process of requesting mirror links, downloading them and checking for integrity. (Please check an example file to understand how their system works -- https://androidfilehost.com/?fid=745425885120701975 )

The sum of every file sizes we know is roughly 180TB. It's impossible to download this on a single machine, so I've developed a "tracker" system to concurrently download multiple files using different machines. The tracker server keeps a list of every known file IDs (btw, it's 256,640 files which is a bit less than 277,467 displayed on their main page. I believe it includes deleted files as well but not sure atm), assign it to each clients that request and appropriately mark the file as downloaded. The system is pretty robust now, so our plan is working great. Except that our internet is pretty slow and we can't afford 180TB instantly.

By talking to friends and their friends, we've got quite a few people willing to help a bit here. Unfortunately many of them lack storage space, so they need to keep downloading from AFH and uploading to my server. This works for a few clients, but not for many. The "my server" here every client uploads to have 500Mbps internet, and it gets terribly slow pretty quick. Plus, 180TB of storage isn't really cheap and easy to afford.

Ideally, we need to get people with faster internet speeds (I'm in asia, so not the best place to fetch files from AFH servers mostly around Europe and America) and more storage space. If you have some bandwidth or storage to share, it would greatly help us.

I'm sorry if a post like this isn't welcomed here, if so please feel free to remove it. Thanks for reading this post.

P.S. Also worth checking out - related XDA thread https://xdaforums.com/t/did-anyone-else-notice-signs-of-androidfilehost-com-being-abandoned.4578561/ (I'm LegendOcta)

21
 
 

I am upgrading the HDDs on my QNAP TS-432X-eU rack mount NAS. The NAS is connected to a UPS via a USB cable and is set to turn off after 5 minutes if it senses a power loss. What would happen if I were to lose power while resilvering the array? Would it suspend the resilvering, turn off, then resume when power is restored? Or would the array be corrupted?

22
 
 

I'm looking to spec out a new NAS. I have a relatively small media collection, that I hope to grow as I digitize more family VHS tapes etc. Right now I have around 4 TB of data, shared across an external drive and my internal ssd.

Whats the best path forward on drives in this new NAS? I've heard advice for buying one big 20TB drive over multiple smaller drives. What's best for mitigation of drive failure? Is that even a concern? If I do multiple drives, should I use RAID?

I'm a little new to this. If you have resources for learning some best practices I'm all ears.

23
 
 

Anyone used this successfully in their setup?

Garage is an S3-compatible distributed object storage service designed for self-hosting at a small-to-medium scale.

Garage is designed for storage clusters composed of nodes running at different physical locations, in order to easily provide a storage service that replicates data at these different locations and stays available even when some servers are unreachable. Garage also focuses on being lightweight, easy to operate, and highly resilient to machine failures.

Garage is built by Deuxfleurs, an experimental small-scale self hosted service provider, which has been using it in production since its first release in 2020.

24
 
 

Looking to build a collection that I just outright own, so any streaming platform that doesn't allow me to download the raw files is a no go. Other than the big players (Amazon, Walmart, etc.) what are some good sources for buying?

25
 
 

Looking to upgrade my NAS hard drives. Currently have two 4TB WD Red Plus hard drives but I wanted to get some large capacity drives. Was looking into getting 16 or 18TB drives. My current drives are basically whisper quiet and have been running great since 2019 but I feel like it's time to upgrade the capacity.

The NAS is currently on a desk beside my computer. I don't have any cabinets to place it in and would prefer not to connect to it through Wi-Fi. Hence why I'd like for the drives to be as quiet as possible.

I was considering getting a Seagate Exos or Ironwolf (and buying used for the great price) but I've read users online saying they regret buying those models because of their noise. I was also looking at the WD Red Pro but WD's own website only rates them at 3.6/5 with most of the negative complaints about dead on arrival drives. Additionally 25% of all reviews are 1 star; both of which don't fill me with much confidence.

TLDR: What's a quiet and reliable hard drive recommendation for a NAS?

Would it be better just to go with the WD Red Plus at a lower capacity?

view more: next ›