1
submitted 1 year ago by Gork@beehaw.org to c/technology@beehaw.org

This is something that keeps me worried at night. Unlike other historical artefacts like pottery, vellum writing, or stone tablets, information on the Internet can just blink into nonexistence when the server hosting it goes offline. This makes it difficult for future anthropologists who want to study our history and document the different Internet epochs. For my part, I always try to send any news article I see to an archival site (like archive.ph) to help collectively preserve our present so it can still be seen by others in the future.

all 9 comments
sorted by: hot top controversial new old
[-] thejml@lemm.ee 0 points 1 year ago

It’s important here to think about a few large issues with this data.

First Data Storage. Other people in here are talking about decentralizing and creating fully redundant arrays so multiple copies are always online and can be easily migrated from one storage tech to the next. There’s a lot of work here not just in getting all the data, but making sure it continues to move forward as we develop new technologies and new storage techniques. This won’t be a cheap endeavor, but it’s one we should try to keep up with. Hard drives die, bit rot happens. Even off, a spinning drive will fail, as will an SSD with time. CD’s I’ve written 15+ years ago aren’t 100% readable.

Second, there’s data organization. How can you find what you want later when all you have are images of systems, backups of databases, static flat files of websites? A lot of sites now require JavaScript and other browser operations to be able to view/use the site. You’ll just have a flat file with a bunch of rendered HTML, can you really still find the one you want? Search boxes wont work, API calls will fail without the real site up and running. Databases have to be restored to be queried and if they’re relational, who will know how to connect those dots?

Third, formats. Sort of like the previous, but what happens when JPG is deprecated in favor of something better? Can you currently open up that file you wrote in 1985? Will there still be a program available to decode it? We’ll have to back those up as well… along with the OSes that they run on. And if there’s no processors left that can run on, we’ll need emulators. Obviously standards are great here, we may not forget how to read a PCX or GIF or JPG file for a while, but more niche things will definitely fall by the wayside.

Fourth, Timescale. Can we keep this stuff for 50 yrs? 100 yrs? 1000 yrs? What happens when our great*30-grand-children want to find this info. We regularly find things from a few thousand years ago here on earth with archeological digsites and such. There’s a difference between backing something up for use in a few months, and for use in a few years, what about a few hundred or thousand? Data storage will be vastly different, as will processors and displays and such. … Or what happens in a Horizon Zero Dawn scenario where all the secrets are locked up in a vault of technology left to rot that no one knows how to use because we’ve nuked ourselves into regression.

[-] digitallyfree@kbin.social 1 points 1 year ago* (last edited 1 year ago)
[-] kool_newt@beehaw.org 0 points 1 year ago

Capitalism has no interest in preservation except where it is profitable. Thinking about the long-term future, archaeologist's success and acting on it is not profitiable.

[-] FuckFashMods@lib.lgbt 1 points 1 year ago

Its not just capitalism lol

Preserving things costs money/resources/time. This happens in a lot of societies.

[-] strainedl0ve@beehaw.org 0 points 1 year ago

This is a very good point and one that is not discussed enough. Archive.org is doing amazing work but there is absolutely not enough of that and they have very limited resources.

The whole internet is extremely ephemeral, more than people realize, and it's concerning in my opinion. Funny enough, I actually think that federation/decentralization might be the solution. A distributed system to back-up the internet that anyone can contribute storage and bandwidth to might be the only sustainable solution. I wonder.if anyone has thought about it already.

[-] entropicdrift@lemmy.sdf.org 1 points 1 year ago

I'd argue that it can help or hurt to decentralize, depending on how it's handled. If most sites are caching/backing up data that's found elsewhere, that's both good for resilience and for preservation, but if the data in question is centralized by its home server, then instead of backing up one site we're stuck backing up a thousand, not to mention the potential issues with discovery

this post was submitted on 15 Jun 2023
1 points (100.0% liked)

Technology

37602 readers
212 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS