this post was submitted on 31 Jan 2026
194 points (99.5% liked)

datahoarder

9606 readers
140 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago
MODERATORS
 

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 6ae129b76fddbba0776d4a5430e71494245b04c4

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA256: 7D6935B1C63FF2F6BCABDD024EBC2A770F90C43B0D57B646FA7CBD4C0ABCF846 MD5: B8A72424AE812FD21D225195812B2502


Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9.

you are viewing a single comment's thread
view the rest of the comments
[–] berf@lemmy.world 3 points 9 hours ago (3 children)

I’ve been working on a structured inventory of the datasets with a slightly different angle: rather than maximizing scrape coverage, I’m focusing on understanding what’s present vs. what appears to be structurally missing based on filename patterns, numeric continuity, file sizes, and anchor adjacency.

For Dataset 9 specifically, collapsing hundreds of thousands of files down into a small number of high-confidence “missing blocks” has been useful for auditing completeness once large merged sets (like yours) exist. The goal isn’t to assume missing content, but to identify ranges where the structure strongly suggests attachments or exhibits likely existed.

If anyone else here is doing similar inventory or diff work, I’d be interested in comparing methodology and sanity-checking assumptions. No requests for files (yet) Just notes on structure and verification

[–] jankscripts@lemmy.world 4 points 8 hours ago (2 children)

Keep in mind when looking at the file names the File name is the name of the first page of the document each page in the document is part of the numbering scheme.

EFTA00039025.pdf

EFTA00039026 ...

... EFTA00039152

[–] berf@lemmy.world 1 points 8 hours ago (1 children)

Just tested whether numeric gaps represent missing files or page-level numbering. In at least one major Dataset 9 block, the adjacent PDF’s page count exactly matches the numeric span, indicating page bundling rather than missing documents. I'm incorporating page counts into the audit model to distinguish the two.”

Thanks so much for setting that straight.

[–] ModernSimian@lemmy.world 2 points 8 hours ago

Take a minute to look at the eDiscovery database in the zip, it lays out each page.