jankscripts

joined 1 day ago
[–] jankscripts@lemmy.world 4 points 22 hours ago (2 children)

Keep in mind when looking at the file names the File name is the name of the first page of the document each page in the document is part of the numbering scheme.

EFTA00039025.pdf

EFTA00039026 ...

... EFTA00039152

[–] jankscripts@lemmy.world 4 points 1 day ago

The last page I got a non-duplicate URL from was 10853 which curiously only had 36 URLs on page. When I browsed directly to page 10853 36 URLs were displayed but then moving back and forth in the page count the tar pit logic must have re-looped there and it went back to 50 Displayed. I ended with 224751 URLs

[–] jankscripts@lemmy.world 15 points 1 day ago (2 children)

Heads up that the DOJ site is a tar pit, it's going to return 50 files on the page regardless of the page number your on seems like somewhere between 2k-5k pages it just wraps around right now.

Testing page 2000... ✓ 50 new files (out of 50)
Testing page 5000... ○ 0 new files - all duplicates
Testing page 10000... ○ 0 new files - all duplicates
Testing page 20000... ○ 0 new files - all duplicates
Testing page 50000... ○ 0 new files - all duplicates
Testing page 100000... ○ 0 new files - all duplicates