datahoarder

10281 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago
MODERATORS
1
 
 

@ray@lemmy.ml Got it done, I'm first of the mods here and will be learning a little Lemmy over the next few weeks.

While everything is up in the air with the reddit changes I'll be very busy working on replacing the historical pushshift API without reddits bastardizations should a PS version come back.

In the mean time you should all mirror this data ensuring its survival, do what you do best and HOARD!!

https://the-eye.eu/redarcs/

2
 
 

https://archive.org/details/snes_mods_and_romhacks_collection_20260312_patched

My personal collection of Super Nintendo Romhacks, in an already patched and ready to play ROM format. Most (if not all) games are patched by myself, but not all are tested. Each .sfc and .smc file comes with a description, copied from the places where I downloaded the Romhacks (but sometimes also from README files, random blogs and other websites too).

  • 1009 Romhacks across 174 different games (or across 169 games, depending on how you process data and count).
  • Download one package size: 406 MB
  • Unpacked size: 2.7 GB

flat structure: snes_mods_and_romhacks_collection_20260312_patched_flat.7z

         snes_mods_and_romhacks_collection_20260312/
            Super Metroid_Nature v1.03.smc
            Super Metroid_Nature v1.03.txt

or sub structure: snes_mods_and_romhacks_collection_20260312_patched_sub.7z

            Super Nintendo Mods and Romhacks Collection 2026-03-12/
                Documents/
                    Super Metroid/
                        Nature v1.03.txt
                Games/
                    Super Metroid/
                        Nature v1.03.smc

Both contain same files, just different file structure.

3
 
 
  1. I like Incremental backup methodology, but it needs frequent full backups (as i understand from "grandfather,father,son" method), How to have version control where i only create full-backup only once ?

  2. and I can choose to Delete changes older than 1 month.

  3. How to Only backup Select Data, like only personal data,

  4. and a "Ghost" for other data. Ghost is only filename and it's metadata (also folder structure). Data selected for Ghost is from internet which can be downloaded.

Related

  1. "file change tracker" to see summary of what files are moved/deleted/renamed.

  2. "File History" where I see previous version of files.

https://restic.net/ Pika - https://www.youtube.com/watch?v=W30wzKVwCHo https://www.borgbackup.org/

https://www.urbackup.org/download.html ??

4
 
 

I want to have windows image but have it saved incrementally Is there a way only backup data created by me so the backup is small, and I pull the windows OS data from ISO.

Nice to only backup "user data"

5
6
3
submitted 2 weeks ago* (last edited 2 weeks ago) by tdTrX@lemmy.ml to c/datahoarder@lemmy.ml
 
 

spayee/graphy course

Webpage has a sidebar with category and sub-category and each opens just a PDF.

PDF files are stored here - https://randomlettersandnumbers.cloudfront.net/w/o/randomLettersAndNumbers/v/randomLettersAndNumbers/u/randomLettersAndNumbers/p/assets/pdfs/2021/01/13/randomLettersAndNumbers/file.pdf

7
 
 

https://myrient.erista.me/ - main site

This is arguably the best site ever made for this kind of preservation. And they shutdown because of insufficient funding and increased prices for hardware. They have full sets for NoIntro, Redump, TOSEC, MAME, RetroAchivements supported games, exo sets and lots of important coverage from good Internet Archive sources. All of this with direct downloads, no ads, super fast. Everything neatly organized and always available.

Either people start donating fast, or its gone. I recommend to download as fast as possible what you need. Its closing in about a month from now on March 31st, 2026.

8
 
 

it's time. this is not a test. download or let be destroyed.

9
 
 

The drive has been only been powered on and used for read over the last 3+ years. CrystalDiskInfo reports it's bad but CrystalDiskMark shows decent read/write speeds. Only wrote to it in the very beginning when I dumped a lot of archives into it. Otherwise, very few actual write cycles which is making me think it's still ok to use. However, this isn't a NAS drive and is consumer-grade bought many years ago.

10
11
 
 

cross-posted from: https://lemmy.world/post/43115555

Here’s an overview of community efforts to make The Files more accessible. I’ve written a small description and possible warnings alongside them.

Epstein Research GitHub Mirror

Jmail

  • Access Jeffrey Epstein’s emails through a gmail interface and star important ones.
  • https://jmail.world/

Track The Files

  • A sourced, transparent investigation into the public figures named in the Epstein files — and the tax dollars that flow to them.
  • ⚠️ Made with LLMs
  • https://trackthefiles.org/

Epstein Document Network Explorer

EpsteIn

3D Network Cloud

Epstein Archive


Please add more sources as comments, or let us know if one of them has gone dark or appears to be dodgy.

12
 
 

cross-posted from: https://lemmy.ml/post/43038910

I see a lot of fragmented datasets out there, does anyone know of something comprehensive (e.g. all files from all datasets) who is annotating the files and accepting submissions?

13
 
 

If you merge the three versions of DataSet 9 that are found so far:

DataSet%209.zip : https://github.com/yung-megafone/Epstein-Files

Data Set 9.tar.xz : https://archive.org/details/data-set-9.tar.xz

dataset9-more-complete.tar.zst : https://github.com/yung-megafone/Epstein-Files

You will end up with 531,282 IMAGES files (PDF). You would think that there is a lot missing, however, the partially corrupted DataSet%209.zip gives us a DAT and OPT file to see what files remain.

The DAT file reveals there are only 531,307 IMAGES files (PDF) supposed to be in the archive. Which means only 25 PDF files are actually missing.

You'd notice that 25 PDF files couldn't possibly be the remaining 80-ish GB that remains of the original DataSet 9, but the DAT file doesn't reveal how many NATIVES there were.

NATIVES are media files like videos and audio. You can see an example if you have a full DataSet 10. But from DataSet 10 it reveals to us that all NATIVES have a placeholder as a PDF which is always 4670 bytes.

So by searching all files that are that exact size, it reveals there are about 135 NATIVES (media files) that are missing, which would be the rest of the 80 GB that is missing.

I have listed below what IMAGES (PDF) and NATIVES (media) files are missing, such that it is easy to coordinate to track down the remaining files that we need for a complete DataSet 9.

(Though the remaining PDFs could be placeholder for up to 25 more natives, which would have to be checked when finding them).

Update 1 (February 6):

In my original post (https://lemmy.world/post/42700643), I found that NATIVEs have a placeholder that is 4670 bytes.

However, from comparing every NATIVE in DataSet 10 to it's placeholder I have discovered a second placeholder size that is 2433 bytes.

The NATIVEs estimate is now 2542 (from previous 135).

I have attached the updated NATIVEs list. (And also the same 25 missing IMAGES list (since they also could be native placeholders).

NEW_MISSING_EFTA_NATIVES.txt

MISSING_EFTA_IMAGES.txt

Update 2 (February 6):

I have found 1983/2542 NATIVEs are directly downloadable from the DOJ.

1983_NATIVES_URLS.txt

If anyone wants to attempt the remaining natives, I have tried the following extensions: ".avi",".mp4",".mov",".mp3",".wav",".m4a",".m4v",".wmv",".ts",".vob",".3gp",".amr",".opus",".csv",".xlsx",".xls",".docx",".doc",".pluginpayloadattachment"

14
 
 

Soft 98 is an Iranian software distribution site, that has stood up after sanction had crippled the ability of the normal people and businesses in Iran from getting access to important software from the outside world.

As the Iranian government threatens to cut off from the world this rich archive of software is vulnerable to wiped from the internet. It is one of the most widely diverse software pool that's trusted I have ever seen.

Is there anyway to pool together resources to save the software's of this site, which to me is like The Software Library of Alexandria from permanent cyberspace loss.

15
 
 

Sorry if this is not the place to ask I also tried on a different instance as well

I bought an adapter to retrieve old files from ancient hard drives and I didn't save the stuff from one I had looked at. Now though when I plug it in it will only read as an android file system? It has 2 disk images now, one is labeled Presario D: which shows up as an android backup or something but all folders are empty. The other is Local Disk E: and if I click it it literally just locks up my file explorer to the point I have to restart the PC.

Any thoughts or ideas?

I may have plugged it into an android phone at some point? Not sure though.

16
17
 
 

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)
  • INTERNET ARCHIVE FOLDER (removed due to reports of CSAM)
  • INTERNET ARCHIVE DIRECT LINK (removed due to reports of CSAM)

Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9 (EDIT: NOT ANYMORE)


EDIT [2026-02-02]: After being made aware of potential CSAM in the original Data Set 9 releases and seeing confirmation in the New York Times, I will no longer support any effort to maintain links to archives of it. There is suspicion of CSAM in Data Set 10 as well. I am removing links to both archives.

Some in this thread may be upset by this action. It is right to be distrustful of a government that has not shown signs of integrity. However, I do trust journalists who hold the government accountable.

I am abandoning this project and removing any links to content that commenters here and on reddit have suggested may contain CSAM.

Ref 1: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html
Ref 2: https://www.404media.co/doj-released-unredacted-nude-images-in-epstein-files

18
 
 

From Youtube and Google.

I see that long path limit can be turn off, but not in File explorer.

Is there a way to enable long path in File Explorer or an open source software to use as a file explorer replacement that is seamless?

I don't care about software incompatibility, I could move some file manually and open in softwares that doesn't support long path.

19
 
 

Hey everyone,

I’m working on archiving a few profiles from Loyalfans, but I’ve hit a wall with their CDN (CloudFront) security and rate-limiting. I’m looking to grab all media (high-res images, GIFs, videos, video thumbnails & audio), but the platform seems particularly hostile to bulk downloading. Has anyone successfully scraped/download a profile on Loyalfans? If YES! then how?

The site uses heavily signed URLs with Expires, Signature, and Key-Pair-Id parameters. These seem to be session-bound or very short-lived.

What I’ve tried so far:

  1. Manual "Save As" (Shift + Right Click): Result: Works for the first 10-15 files, then falls apart.
    The Issue: I’m running into what looks like a cache collision or rate limit. After a few downloads, the browser starts saving randomly previously downloaded imagese instead of the new one. It only resolves if I wait 30+ minutes, try again & then continue in this cycle.

  2. HAR Extraction & Shell Scripting: Result: Partially successful but extremely finicky. The Issue: I’ve been saving .har files from the network tab, then using grep to grab the CDN links. The problem is that the HAR often picks up thumbnails (_md.jpg, _sm.jpg) or pre-fetched neighbor images. Furthermore, if I don't run the wget/curl script quickly enough, the signatures expire.

  3. Selenium-based Python Script: Result: Identical to the manual method. The Issue: Even with headless browsing and random delays, the CDN eventually detects the automated behavior and starts serving 403s or throttles the connection, resulting in the same "duplicate image" cache bug.

  4. Vergil9000's Loyalfans Downloader: Link: https://github.com/Vergil9000/LoyalFans Result: Failed completely. I can load a list of profiles I follow, but the actual scraping/downloading logic seems broken or outdated for current site architecture.

Many thanks for taking the time to read my post. Any help would be greatly appreciated ....
20
 
 

Just Remembered that I still have my old Twitter Account I abandoned a Year Ago. I Think its high time I delete it Fully! But since i have Posts on it dating back to 2016, I wanna back it all up Properly!

So I was wondering if any of you have good Programs/Projects that can do this kind of thing that you Trust?

Thanks!

21
 
 

Mario Builder 64 is a level editor realized fully in Super Mario 64 itself, which should run on real hardware I think. It is intuitive to use and the community created ton of custom levels. I think a custom software is needed to handle the community stuff, but the Romhack itself is playable on an emulator if you want test building your own levels.

The download page for the patch file (remember its not a Rom, its just a patch) got hit by a DMCA. Usually Nintendo does not do that with Romhacks. Sure the patch files itself are not Rom files, but they might contain data that is copyrighted. That's why Nintendo might be annoyed by this.

Get your patch file copies (.bps format) and archive it if you care.

22
 
 

I apologize if this isn't the correct place to ask this. If not, just point me to the right place, please.

As per the title, I am looking into backing up files - pictures, movies, music, some documents - locally but with little to mo need of having the backup medium hot. I already have a few HDDs lying around but it reaches a point where is becomes bulky and takes up considerable space.

I've been thinking of memory cards because they are reasonably affordable and can be stored away easily. But how reliable are those?

I intend to make/save several backup copies.

23
 
 

Anna archive continues it's awesome work.

24
 
 

cross-posted from: https://beehaw.org/post/23758766

Our project to preserve the history of Sega Channel — including over 100 new Sega Channel ROMs.

By Phil Salvador

December 15, 2025

Sega broke ground in the late 90s with one of the first digital game distribution systems for consoles. Sega Channel offered access to a rotating library of Sega Genesis titles, along with game tips, demos, and even a few exclusive games that never came out in the United States in any other format. In an era of dial-up internet, Sega Channel delivered game data over television cable — a novel approach that gave the service its name.

...

https://gamehistory.org/segachannel/

25
14
submitted 3 months ago* (last edited 3 months ago) by tuff_wizard@aussie.zone to c/datahoarder@lemmy.ml
 
 

cross-posted from: https://aussie.zone/post/27191517

I spun up nextcloud to replace onedrive about a year ago. Everything was going well so I chose not to renew my onedrive subscription, this was exactly 6 months ago, I'd assume.

I got an email a few days ago reminding me that they would delete my data. I ignored it because obviously I had moved my data to nextcloud. not gonna trick me Mi¢ro$oft.

But yesterday I decided to have a quick look though and it turns out I didn't copy over everything, and certanly not my 5 years of camera roll backups.

I started a sync of everything last night and woke up in the morning to find that it had stopped at about 10gb out of 80gb. And now onedrive won't connect and if I try to log in to onedrive with that account via the web it just kicks me back to the microsoft portal.

I'm 99.5% sure there is nothing to be done and I'm not an overly sentimental person so if they are lost it won't break me. I have many important photos backed up in immich but just not everything.

But I just needed to ask in case someone knows where to find the M spot I can touch for magic file recovery.

Edit: turns out you can just pay them more money and they still had my stuff. thank you for joining me on the shortest support ticket of all time

view more: next ›