this post was submitted on 31 Jan 2026
209 points (99.5% liked)

datahoarder

9606 readers
69 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago
MODERATORS
 

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)
  • INTERNET ARCHIVE FOLDER (removed due to reports of CSAM)
  • INTERNET ARCHIVE DIRECT LINK (removed due to reports of CSAM)

Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9 (EDIT: NOT ANYMORE)


EDIT [2026-02-02]: After being made aware of potential CSAM in the original Data Set 9 releases and seeing confirmation in the New York Times, I will no longer support any effort to maintain links to archives of it. There is suspicion of CSAM in Data Set 10 as well. I am removing links to both archives.

Some in this thread may be upset by this action. It is right to be distrustful of a government that has not shown signs of integrity. However, I do trust journalists who hold the government accountable.

I am abandoning this project and removing any links to content that commenters here and on reddit have suggested may contain CSAM.

Ref 1: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html
Ref 2: https://www.404media.co/doj-released-unredacted-nude-images-in-epstein-files

you are viewing a single comment's thread
view the rest of the comments
[–] WhatCD@lemmy.world 25 points 2 days ago* (last edited 1 day ago) (4 children)

I'm working on a different method of obtaining a complete dataset zip for dataset 9. For those who are unaware, for a time yesterday there was an official zip available from the DOJ. To my knowledge no one was able to fully grab it. But I believe the 49Gb zip is a partial of that before downloads got cut. It's my thought that this original zip likely contained incriminating information and it's why it got halted.

What I've observed is that Akamai still serves that zip sporadically in small chunks. It's really strange and I'm not sure why it does, but I have verified with strings that there are pdf file names in the zip data. I've been able to use a script to pull small chunks from the CDN across the entire span of the file's byte range.

Using the 49GB file as a starting point I'm working on piecing the file together, however progress is extremely extremely slow. If there is anyone willing to team up on this and combine the chunks please let me know.

How to grab the chunked data:

Script link: https://pastebin.com/sjMBCnzm

For the script will probably have to:

pip install rich

Grab DATASET 9, INCOMPLETE AT ~48GB:

 magnet:?xt=urn:btih:0a3d4b84a77bd982c9c2761f40944402b94f9c64&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce 

Then name the downloaded file 0-(the last byte the file spans).bin

So for example the 48 GB file it would be: 0-48995762175.bin

Next to the python script make a directory called: DataSet 9.zip.chunks

Move the renamed first byte range 48 GB file in to that directory.

Make a new file next to the script called cookies.txt

Install the cookie editor browser extension (https://cookie-editor.com/)

With the browser extension open go to: https://www.justice.gov/age-verify?destination=%2Fepstein%2Ffiles%2FDataSet+9.zip

The download should start in your browser, cancel it.

Export the cookies in Netscape Format. They will copy to your clipboard.

Paste those in your cookies.txt, save and close it.

You can run the script like so:

python3 script.py \
  'https://www.justice.gov/epstein/files/DataSet%209.zip' \
  -o 'DataSet 9.zip' \
  --cookies cookies.txt --retries 3 \
  --backoff 5.0 \
  --referer 'https://www.justice.gov/age-verify?destination=%2Fepstein%2Ffiles%2FDataSet+9.zip' \
  -t auto -c auto

Script Options:

  • -t - The number of concurrent threads to use which results in trying that many byte ranges at the same time. Setting this to auto will auto calculate based on your CPU but will cap at 8 to be safe and avoid getting banned by Akamai.
  • -c - The chunk size to request from the server in MB. This is not always respected by the server and you may get a smaller or larger chunk, but the script should handle that. Setting this to auto scales with the file size, though feel free to try different sizes.
  • --backoff - The backoff factor between failures, helps prevent Akimai throttling your requests.
  • --retries - The number of times to retry a byte range for that iteration before moving on to the next byte range. If it moves on it will come back to it again on the next loop.
  • --cookies - The path to the file containing your Netscape formatted cookies.
  • -o - The final file name. The chunks directory is derived from this so make sure it matches the name of the chunk directory that you primed with the torrent chunk.
  • --referer - Just leave this for Akimai, set the referer http header.

There are more options if you tun the script with the --help option.

If you start to receive HTML and or HTTP/200 responses then you need to refresh your cookie.

If you start to receive HTTP/400 responses then you need to refresh your cookie in a different browser, Akamai is very fussy.

A VPN and multiple browser might be useful to change your cookie and location combo.

Edit

I tested the script on Dataset 8 and it was able to stitch a valid zip together so assuming we're getting valid data with Dataset 9 it should work.

[–] Wild_Cow_5769@lemmy.world 1 points 14 hours ago

Is anyone able to get this working again? It seemed to stop. I have updated cookies. If I remove the chunks it seems to start connecting again but when I put them back it runs for a few mins and then kicks the bucket.

[–] kongstrong@lemmy.world 2 points 1 day ago

Awesome, I don't really understand what's happening but I'm also running it (also doing it for the presumably exact same 48GB torrent, but I'm supposed to do that right?)

[–] epstein_files_guy@lemmy.world 7 points 1 day ago (2 children)

this method is not working for me anymore

[–] WhatCD@lemmy.world 6 points 1 day ago* (last edited 1 day ago) (1 children)

~~Yeah :/ I haven't been able pull anything in a while now.~~ I was just able to pull 6 chunks, the data is still out there!

[–] epstein_files_guy@lemmy.world 3 points 1 day ago (1 children)

I messaged you on the other site; I'm currently getting a Could not determine Content-Length (got None) error

[–] WhatCD@lemmy.world 2 points 1 day ago (2 children)

What happens when you go to https://www.justice.gov/epstein/files/DataSet%209.zip in your browser?

[–] WorldlyBasis9838@lemmy.world 4 points 1 day ago (1 children)

I also was getting the same error. Going to the link successfully downloads.

Updating the cookies fixed the issue.

[–] WorldlyBasis9838@lemmy.world 3 points 1 day ago (1 children)

Can also confirm, receiving more chunks again.

[–] WhatCD@lemmy.world 3 points 1 day ago (2 children)

Updated the script to display information better: https://pastebin.com/S4gvw9q1

It has one library dependency so you'll have to do:

pip install rich

I haven't been getting blocked with this:

python script.py 'https://www.justice.gov/epstein/files/DataSet%209.zip' -o 'DataSet 9.zip' --cookies cookie.txt --retries 2 --referer 'https://www.justice.gov/age-verify?destination=%2Fepstein%2Ffiles%2FDataSet+9.zip' --ua '<set-this>' --timeout 90 -t 16 -c auto

The new script can auto set threads and chunks, I updated the main comment with more info about those.

I'm setting the --ua option which let's you override the user agent header. I'm making sure it matches the browser that I use to request the cookie.

[–] WorldlyBasis9838@lemmy.world 2 points 1 day ago* (last edited 1 day ago) (1 children)

Gonna grab a some tea, then get back at it. Will update when I have something.

Thanks for this!

EDIT: This works quite well. Getting chunks right off the bat. About 1 per second, just guessing.

[–] WorldlyBasis9838@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

I had the script crash at line 324: BadStatusLine: HTTP/1.1 0 Init

EDIT: It’s worth noting that about every time I (re) start it after seemingly been blocked a bit, I get about 1gb more before it slows WAY down (no server response).

EDIT: It looks to me, that if I'm getting only FAILED: No server response, stopping the script for a minute or two and restarting immediately garners a lot more results. I think having a longer pause with many failures might be worth looking at. -- I'll play around a bit.

[–] epstein_files_guy@lemmy.world 2 points 1 day ago (1 children)
[–] WhatCD@lemmy.world 2 points 1 day ago (1 children)

Yeah when I run into this I’ve switched browsers and it’s helped. I’ve also switched IP addresses and it’s helped.

alrighty, I'm currently in the middle of the archive.org upload but I can transfer the chunks I already have over to a different machine and do it there with a new IP

[–] WhatCD@lemmy.world 2 points 1 day ago

I would be interested in obtaining the chunks that you gathered and stitch them to what I gathered.

[–] epstein_files_guy@lemmy.world 5 points 1 day ago (1 children)

I’m using a partial download I already had and not the 48gb version but I will be gathering as many chunks as I can as well. Thanks for making this

[–] WhatCD@lemmy.world 2 points 1 day ago (1 children)

how big is the partial that you managed to get?