datahoarder

10348 readers

1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago

MODERATORS

archivist@lemmy.ml

Lots of backups of NCBI / NLM data going on at the moment ? (lemmy.sdf.org)

submitted 1 year ago by pansapiens@lemmy.sdf.org to c/datahoarder@lemmy.ml

7 comments fedilink hide all child comments

Just noticed this today - seems all the archiving activity has been noticed by NCBI / NLM staff. Thankfully most of SRA (the Sequence Read Archive) and other genomic data is also mirrored in Europe.

all 8 comments

sorted by: hot top controversial new old

[–] pansapiens@lemmy.sdf.org 13 points 1 year ago (1 children)

From watching the ArchiveTeam's Warrior URLs as they stream past, it looks like PubMed Central manuscripts are being archived, which is a good thing.

[–] VillainousKittyQueen@lemmy.ml 3 points 1 year ago (1 children)

Yup, 🫡.

[–] brbposting@sh.itjust.works 1 points 1 year ago

Thank you 🤗

[–] HK65@sopuli.xyz 7 points 1 year ago

Good that people are doing this

[–] taiidan@slrpnk.net 2 points 1 year ago (2 children)

That's a lot of data to be archiving! What's the archiving action responsible for this, or what group? I work with SRA and GEO daily for work, so this is interesting to see on lemmy.

[–] pansapiens@lemmy.sdf.org 3 points 1 year ago (1 children)

It looks like ArchiveTeam’s Warrior was mostly capturing PubMedCentral (PMC) articles. As far as I know, SRA and GEO aren’t being backed up by ArchiveTeam (that is a lot of data), but since SRA is largely also mirrored by ENA, it wouldn’t seem a priority.

[–] taiidan@slrpnk.net 1 points 1 year ago

Didn't know about ENA mirroring. Thanks! I'm tickled by the idea that all the paywalled journals are not backed up. If we ever have a planet wide catastrophe, we'll have to rebuild using the open articles only!