The Trump-Epstein Files™

1151 readers

6 users here now

We keep track of the release of the files, but also to explore what’s already available, and why – with enough exposure – this could bring the man down, and who knows even his regime or the empire.

Want to start digging yourself? Check out our sticky post

Our Rules

(Subject to Change)

#1 Be kind: keep it civil and amicable. The enemy is not in this community but in Palaces, The White House and penthouses.
#2 Trigger Warnings: required. Mark posts which may be triggering to read or see for victims of sexual abuse with "[TW]" in front of your post title. If you're posting an image or video with explicit thumbnail, you will have to set the entire post as NSFW AND include the TW.
#3 Cite sources: preferably direct link to the article/pdf and or an archive link in case there is a paywalled. In the article find a relevant few paragraphs and quote them in your post.
#4 Post the Bates numbers: when referencing evidence (even if it is mentioned in an article), please post
- the Bates number of the file you’re referring to (EFTA00000000)
- a link to the original source
- a link to a mirror of the file in case DOJ pulls the file.
#5 include key paragraphs: when posting articles, include a few key paragraphs of the article (not the entire article)
#6 avoid links to social media as sources. Links to twitter must use xcancel.com.

Our Justice System

First offence: warning + 2 day ban
Second offence: 7 day ban
Third offence: permanent ban from community
Creating multiple accounts to interact with this community: permanent ban for all accounts in community + report to your instance admin.

This community is run by volunteers so please don't test the justice system, as with all justice systems it is critically underfunded.

founded 7 months ago

MODERATORS

kingofras@lemmy.world

epstein_bot@lemmy.world

Does anyone know of full lists of the files? (lemmy.ml)

submitted 1 month ago by untitled_backer@lemmy.ml to c/Epsteinfiles@lemmy.world

14 comments fedilink hide all child comments

I see a lot of fragmented datasets out there, does anyone know of something comprehensive (e.g. all files from all datasets) who is annotating the files and accepting submissions?

you are viewing a single comment's thread
view the rest of the comments

[–] untitled_backer@lemmy.ml 0 points 1 month ago (1 children)

Good link, thank you, that's awesome! I'm going to use that

[–] TropicalDingdong@lemmy.world 1 points 1 month ago (1 children)

good luck. I've had some issues with the torrents fyi. Especially Dataset 9, 10, 11, 12

[–] untitled_backer@lemmy.ml 1 points 1 month ago (1 children)

Trouble?

[–] TropicalDingdong@lemmy.world 2 points 1 month ago (1 children)

Not unzipping, corrupt, all kinds. I've had to redownload many times. Tried several magnets. Extracting issues.

It's just a shit ton of data. 9 is still in a haphazard state.

[–] untitled_backer@lemmy.ml 1 points 1 month ago (1 children)

Have you seen this?

https://github.com/yung-megafone/Epstein-Files

[–] TropicalDingdong@lemmy.world 2 points 1 month ago (1 children)

I.. I posted that?

[–] untitled_backer@lemmy.ml 0 points 1 month ago (1 children)

... So you did. I guess I was responding to your comment, forgot to read the thread. That's embarrassing. Not sure why you had problems. Are you still having trouble?

[–] TropicalDingdong@lemmy.world 2 points 1 month ago (1 children)

I got a version of 9 done. I think its the best up to date version, but not sure yet. This is... its an extraordinary amount of data.

I've got a 42 TB NAS and a processing machine that I can run up to 128gb vram machine learning models on locally.

I'm trying to use datashare (https://datashare.icij.org/) to organize/ index the records, but its been a bit of a disaster. I'm constantly having to restart/ rebuild the docker container because it gets into a bad/ hung state.

I had originally planned to just develop a postgres to index/ support analysis, but thought datashare could simplify this. Its not been good. It hangs when indexing documents. Also, none of the plug-ins seem to really work, but I appreciate it as an opensource concept.

I suppose I could just be hitting the .justice files directly.. but that seems problematic for several other reasons. First, document integrity. I don't trust them. Two, tracking. They'll almost certainly be able to reverse engineer a list of who is examining this data.

All in all, I have the gear to do this homelab style, I have the analysis expertise, and even though I shouldn't, I can put time into doing aspects of this.

I still need to follow up on datasets 11 and 12, but I've got 1-10 extracted and indexed.

I could use help though, if in nothing else, to have a conversation partner. Right now I've been focused almost exclusively on getting the data onto the stacks and figuring out an indexing solution. Beyond that I've poked around the .justice site and while listening to podcasts have pursued some keywords. But those don't compose a coherent analytical framework.

I did this a while back: https://codeberg.org/sillyhonu/Image_OCR_Processing_Epstein

Now these have been OCR"d already, but honestly, its kinda shit, and there are some real gaps in these data. In the codeberg example, I did that with rented compute. If I use my machine, I can push MUCH harder.

My thinking first is to try and collate entities, emails, phone numbers, ip's, and addresses. Then separately, dates and times. Right now one of the most difficult challenges with these data is an inability to sort by time. I'd like to address that.

My thinking is to build out a postgres db of these. This is going to require some fuzzy matching for partial reads. We can assume ALL OCR is going to fail to some degree.

Another reason to take a fuzzy matching approach would be to try and in-fill/ de anonymize the redactions. There are enough flaws and faults the manner of the redactions, when you get some sets of documents, you can effectively infer and fill in what should go in the gaps.

Anyways. What would be extremely helpful would be to have some conversations on how to approach this.

[–] ChunkMcHorkle@lemmy.world 1 points 1 month ago

This is amazing work. Thank you for taking the time and trouble.