this post was submitted on 17 Feb 2026
30 points (96.9% liked)

Asklemmy

53087 readers
482 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy πŸ”

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 6 years ago
MODERATORS
 

I've found that all the web archiving software I've encountered are either manual (you have to archive everything individually in a separate application) or crawler-based (which can end up putting a lot of extra load on smaller web server, and could even get your ip blocked).

Are there any solutions that simply automatically archive web pages as you load them in your browser? If not, why aren't there?

I could also see something like that being useful as a self-hosted web indexer, where if you ever go "I think I've seen this name before", you can click on it, and your computer will say something like "this name appeared in a news headline you scrolled past two weeks ago"

all 14 comments
sorted by: hot top controversial new old
[–] detonational_VuSE@lemmy.ml 1 points 2 hours ago

wget is the command line program to do what you're saying. Or what I use, anyway. Not tied to a browser, though. Maybe you could export your history and pipe it into wget if you're using Linux or have a Linux-like command line?

I also use the FF SingleFile plugin. Again, not automatic, though.

[–] triplenadir@lemmygrad.ml 2 points 6 hours ago

There's a Firefox extension which makes a full-text index of every page you visit - it seems to work, but I found the search a bit unreliable so I stopped using it: https://addons.mozilla.org/en-US/firefox/addon/full-text-tabs-forever/

[–] g5pw@feddit.it 3 points 8 hours ago

Maybe offpunk could fit? I’ve never used it but I read the blog post about it

[–] artifex@piefed.social 13 points 13 hours ago (1 children)

Huh. This seems like one of those "this must exist" situations, but I can't think of anything that does this, and a brief search suggests there may not be. The closest I could find was The Internet Archive's Archive-IT, though it's not an exact match. Otherwise, Archive Webpage , a pricey paid-for option (which seems like a terrible idea) appears to be the closest. OSS/self-host like Archivebox and Linkwarden don't really do this (though you can save/send a current tab to them), and apart from that... I don't really see anything.

[–] TropicalDingdong@lemmy.world 6 points 13 hours ago

Yeah, this is exactly what I was thinking, that "surely this must already be a thing"?

But yeah. I can't think of something. I mean, its like, you're already downloading the data. Just write it down somewhere else.

[–] Arcane2077@sh.itjust.works 4 points 10 hours ago (2 children)

Check out archive warrior It’s dead simple to set up on Docker, and will run in the background while you help literally save the internet. Ignore the steps about watchtower, as that has been deprecated

[–] triplenadir@lemmygrad.ml 2 points 6 hours ago

Does archive warrior have a way of downloading paged as you visit them in a browser? I read thr link but I only saw references to following Archive Team's tasks.

[–] jlow@discuss.tchncs.de 3 points 9 hours ago (1 children)

Wait, watchtower is deprecated? Noooooo.

[–] jlow@discuss.tchncs.de 3 points 9 hours ago

https://github.com/containrrr/watchtower/discussions/2135

Dang. Its still working fine for me for now (just like that long deprecated trailer downlaoder for the arrs).

[–] Aethr@lemmy.world 3 points 11 hours ago

The Firefox extension for archive.org has an option to archive the page you visit if said page hasn't been archived recently. Its not exactly what you're asking for, but similar

[–] bizarroland@lemmy.world 6 points 13 hours ago* (last edited 13 hours ago) (1 children)

I think a squid proxy can do something like that, or could be tweaked to do that, if you really wanted to.

https://www.squid-cache.org/

[–] mesamunefire@piefed.social 3 points 11 hours ago

How interesting. Ive never seen this before.

[–] NauticalNoodle@lemmy.ml 3 points 12 hours ago* (last edited 12 hours ago)

web pages used to sort of operate that way with the 'Temporary Internet Files' folder. i'm not sure how it's changed i just know this was how i used to circumvent websites that disabled right-clicking to save their images.