From watching the ArchiveTeam's Warrior URLs as they stream past, it looks like PubMed Central manuscripts are being archived, which is a good thing.
datahoarder
Who are we?
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we're trying really hard not to forget.
-- 5-4-3-2-1-bang from this thread
Yup, 🫡.
Thank you 🤗
Good that people are doing this
That's a lot of data to be archiving! What's the archiving action responsible for this, or what group? I work with SRA and GEO daily for work, so this is interesting to see on lemmy.
It looks like ArchiveTeam’s Warrior was mostly capturing PubMedCentral (PMC) articles. As far as I know, SRA and GEO aren’t being backed up by ArchiveTeam (that is a lot of data), but since SRA is largely also mirrored by ENA, it wouldn’t seem a priority.
Didn't know about ENA mirroring. Thanks! I'm tickled by the idea that all the paywalled journals are not backed up. If we ever have a planet wide catastrophe, we'll have to rebuild using the open articles only!