this post was submitted on 31 Jan 2026
191 points (99.5% liked)

datahoarder

9503 readers
161 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago
MODERATORS
 

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 6ae129b76fddbba0776d4a5430e71494245b04c4

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA256: 7D6935B1C63FF2F6BCABDD024EBC2A770F90C43B0D57B646FA7CBD4C0ABCF846 MD5: B8A72424AE812FD21D225195812B2502


Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9.

top 50 comments
sorted by: hot top controversial new old

The site epsteinfilez.com claims to have the full Dataset 9. Can't find a way to download it directly from them, since the site is only set up for searching. Perhaps if we asked nicely?

[–] donmega@lemmy.world 5 points 2 hours ago
[–] ModernSimian@lemmy.world 3 points 4 hours ago

I'm not sure if it is useful to anyone, but the partial 9 zip from the DOJ website does contain the eDiscovery index files. VOL00009.DAT and VOL00009.OPT which are conveniently at the very start of the zip file. They are text files and it's easy to parse out what files they thought were included in the massive zip file... IDK if you have one from zero hour, but I have the first few GB from the one the CDN occasionally spits out saved if anyone wants them so see what files may be missing from the "index"

[–] bile@lemmy.world 3 points 4 hours ago* (last edited 4 hours ago) (1 children)

just advising you that there is confirmed csam in dataset9-more-complete.tar.zst and probably the other partial dataset9s

[–] xodoh74984@lemmy.world 1 points 4 hours ago

This is very concerning. DOJ has stated explicitly that any CSAM was removed before releasing the files. Should I remove the magnet link to the merged Data Set 9 torrent?

I haven't looked inside any of these sets myself. My primary goal has been to get the DOJ data distributed.

[–] berf@lemmy.world 3 points 4 hours ago (1 children)

I’ve been working on a structured inventory of the datasets with a slightly different angle: rather than maximizing scrape coverage, I’m focusing on understanding what’s present vs. what appears to be structurally missing based on filename patterns, numeric continuity, file sizes, and anchor adjacency.

For Dataset 9 specifically, collapsing hundreds of thousands of files down into a small number of high-confidence “missing blocks” has been useful for auditing completeness once large merged sets (like yours) exist. The goal isn’t to assume missing content, but to identify ranges where the structure strongly suggests attachments or exhibits likely existed.

If anyone else here is doing similar inventory or diff work, I’d be interested in comparing methodology and sanity-checking assumptions. No requests for files (yet) Just notes on structure and verification

[–] jankscripts@lemmy.world 4 points 4 hours ago (1 children)

Keep in mind when looking at the file names the File name is the name of the first page of the document each page in the document is part of the numbering scheme.

EFTA00039025.pdf

EFTA00039026 ...

... EFTA00039152

[–] berf@lemmy.world 1 points 4 hours ago (1 children)

Just tested whether numeric gaps represent missing files or page-level numbering. In at least one major Dataset 9 block, the adjacent PDF’s page count exactly matches the numeric span, indicating page bundling rather than missing documents. I'm incorporating page counts into the audit model to distinguish the two.”

Thanks so much for setting that straight.

[–] ModernSimian@lemmy.world 2 points 4 hours ago

Take a minute to look at the eDiscovery database in the zip, it lays out each page.

[–] TheBobverse@lemmy.world 5 points 5 hours ago (1 children)

Is there any grunt work that needs to be done? I would like to help out but I'm not sure how to make sure my work isn't redundant. I mean like looking through individual files etc. Is there an organized effort to comb through everything?

[–] kongstrong@lemmy.world 7 points 5 hours ago (2 children)

DM me your matrix account, we're looking to get more people to uncover what's missing from dataset 9, see https://lemmy.world/post/42440468/21884671

[–] TheBobverse@lemmy.world 3 points 5 hours ago (1 children)

Do you have a recommendation on provider choice?

[–] kongstrong@lemmy.world 3 points 5 hours ago

we're on element

[–] TheBobverse@lemmy.world 2 points 5 hours ago (1 children)

I don't have a matrix account currently, but would be willing to get one.

[–] kongstrong@lemmy.world 3 points 5 hours ago
[–] super_user_do@feddit.it 3 points 6 hours ago

Bro is about to be deported by ICE 

[–] Nomad64@lemmy.world 5 points 10 hours ago* (last edited 10 hours ago)

I am seeding sets 1-8, 10-12, and the larger set 9. Seedbox is outside the US and has a very fast connection.

I will keep an eye on this post for other sets. 👍

[–] dessalines@lemmy.ml 7 points 13 hours ago

Thx for posting, seed if you can ppl.

load more comments
view more: next ›