bad backup vibes there boss? backup was the task?
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
You either have a backup or will have a backup next time.
Something that is always online and can be wiped while you're working on it (by yourself or with AI, doesn't matter) shouldn't count as backup.
He did have a backup. This is why you use cloud storage.
The operator had to contact Amazon Business support, which helped restore the data within about a day.
AI or not, I feel like everybody has had "the incident" at some point. After that, you obsessively keep backups.
For me it was a my entire "Junior Project" in college, which was a music album. My windows install (Vista at that time - I know, vista was awful, but it was the only thing that would utilize all 8gb of my RAM because x64 XP wasn't really a thing) bombed out, and I was like "no biggie, I keep my OS on one drive and all of my projects on the other, I'll just reformat and reinstall Windows"
Well... I had two identical 250gb drives and formatted the wrong one.
Woof.
I bought an unformat tool that was able to recover mostly everything, but I lost all of my folder structure and file names. It was just like 000001.wav, 000002.wav etc. I was able to re-record and rebuild but man... Never made that mistake again. Like I said. I now obsessively backup. Stacks of drives, cloud storage. Drives in divverent locations etc.
AI or not, I feel like everybody has had "the incident" at some point. After that, you obsessively keep backups.
Yup!
Also totally unrelated helpful tip- triple check your inputs and outputs when using dd to clone a drive. dd works great to clone an old drive onto a new blank one. It is equally efficient at cloning a blank drive full of nothing but 0s over an old drive that has some 1s mixed in.
And that's a great example where a GUI could be way better at showing you what's what and preventing such errors.
If you're automating stuff, sure, scripting is the way to go, but for one-off stuff like this seeing more than text and maybe throwing in a confirmation dialogue can't hurt - and the tool might still be using dd underneath.
Nobody wants to point out that Alexey Grigorev changes to being named Gregory after 2 paragraphs?
Slop journalism at its sloppiest. I wouldn't be surprised to find out that this story was entorely fabricated.
The developer is to blame. Using a cutting edge tool irresponsibly. I have made mistakes using AI to help coding as well, never this bad though. Blaming AI would be like blaming the hammer a roofer was using to hammer nails and slamming their finger accidentally with it. You don't blame the hammer, you blame the negligence of the roofer.
A developer having the ability to accidentally erase your production db is pretty careless.
An AI agent having the ability to "accidentally" erase your production db is fucking stupid as all fuck.
An AI agent having the ability to accidentally erase your production db and somehow also all the backup media? That requires a special course on complete dribbling fuckwittery.
The lesson: AI cannot bridge an air-gapped backup. This could all be prevented with a crappy portable hard drive from costco.
The best prevention is not letting it happen in the first place. If your backup is a crappy portable hard drive from costco, you get what you buy, I wouldn't have much faith on that either.
The best prevention is not letting it happen in the first place.
Ya think?! We’re past that.
Completely unnecessary for you to preemptively assume someone would choose a “crappy” backup from a retail store when in fact such a backup would still likely have saved the day, and any half-decent dev should at least have some kind of RAID backup on site and better yet an offsite one too.
The flaw was not having any backup, not your straw man of a poor quality choice.
I am still unable to delete the backup. Trying *nuke tool*.
[Enter nuclear codes]:
I was able to remove the backup to eradicate the error both from the production and development environments. But wait a second, the user specified not to lose data. But I just eliminated all versions of the data. The user won't be happy. Oopsie whoopsie!
That's it Son of Anton is banned.
: You had a backup, right?
We used to say Raid is not a backup. Its a redundancy
Snapshots are not a backup. Its a system restore point.
Only something offsite, off system and only accessible with seperate authentication details, is a backup.
3-2-1 Backup Rule: Three copies of data at two different types of storage media, with 1 copy offsite
AND something tested to restore successfully, otherwise it's just unknown data that might or might not work.
(i.e. reinforcing your point, no disagreements)
AKA Schrödinger’s Backup. Until you have successfully restored from a backup, it is just an amorphous blob of data that may or may not be valid.
I say this as someone who has had backups silently fail. For instance, just yesterday, I had a managed network switch generate an invalid config file for itself. I was making a change on the switch, and saved a backup of the existing settings before changing anything. That way I could easily reset the switch to default and push the old settings to it, if the changes I made broke things. And like an idiot, I didn’t think to validate the file (which is as simple as pushing the file back to the switch to see if it works) before I made any changes.
Sure enough, the change I made broke something, so I performed a factory reset and went to upload that backup I had saved like 20 minutes prior… When I tried to restore settings after the factory reset, the switch couldn’t read the file that it had generated like 20 minutes earlier.
So I was stuck manually restoring the switch’s settings, and what should have been a quick 2 minute “hold the reset button and push the settings file once it has rebooted” job turned into a 45 minute long game of “find the difference between these two photos” for every single page in the settings.
That's always just one of the worst feelings in the world. This thing is supposed to work and be easy and............. nope. Not there. It's gone. Now you have work to do. heh
Schrödinger’s backup
Fukan yes
- D\L all assets locally
- proper 3-2-1 of local machines
- duty roster of other contributors with same backups
- automate and have regular checks as part of production
- also sandbox the stochastic parrot
But ai is s good thing! /s
AI is like a circular saw. Are circular saws useful?
Of course.
Can you cut your entire hand off if you don’t use it correctly? Absolutely.
And just like a circular saw, its only useful for a finite set of situations.
Sure — as with every tool. Hammers are great for many things, but don’t do all that well driving screws. Money is one of the most used tools humans have ever devised, but you can’t use it for everything.
AI in coding may only be good for a finite set of situations — but that set is massive. You’re dealing with regular languages that can be mathematically proven to be correct (in the sense that they will generate a working program, and not in the sense that they program will in fact function the way the user intends). This is a less open-ended scenario than something like an AI generated video, and so it’s easier for AI to excel at it, especially for non-novel algorithms.
But if you use it like an idiot, you’re going to get burned — and this guy was an idiot who doesn’t understand what he’s doing, or the tools researchers in software development have made over the last few decades. AI shouldn’t be touching your production environment — at all. And it shouldn’t have to — code needs to be stored in a versioning source repository of some sort (and backed up so you are unlikely to ever lose it), deployment needs to be fully scripted and should be able to rebuild your environments from scratch (from code right to production), and developers and development tools (like AI tools) should only have access to development environments, and not production environments.
So unless you’re a total dumbass, an AI agent (or even a shitty human developer) should never have the kind of access to do what happened here. They violated some pretty basic principals of software development, and got burned. This guy sawed his own hand off because he misused the tools to take a bunch of shortcuts, without building in any backups or reproducibility. The AI isn’t the proximal fault here — trusting it when you have no way to reproduce your environment when things go wrong is the problem, and that’s 100% on the human sitting at the keyboard (PEBKAC).
Filters out the biggest fools it seems.
Whoever gave it access to production is a complete moron.
If you've ever used it you can see how easily it can happen.
At first you Sandbox box it and you're careful. Then after a while the sand box is a bit of a pain so you just run it as is. Then it asks for permission a 1000 times to do something and at first you carefully check each command but after a while you just skim them and eventually, sure you can run 'psql *' to debug some query on the dev instance....
It's one of the major problems with the "full self driving" stuff as well. It's right often enough that eventually you get complacent or your attention drifts elsewhere.
This kind of stuff happened before the LLM coding agents existed, they have just supercharged the speed and as a result increased the amount of damage that can be done before it's noticed.
There are already a bunch of failures in place for something like this to happen. Having the prod credentials available etc etc it's just now instead of rolling the dice every couple weeks your LLM is rolling them every 20s.
If you've ever used it you can see how easily it can happen.
How could this happen easily? A regular developer shouldn’t even have access to production outside of exceptional circumstances (e.g. diagnosing a production issue). Certainly not as part of the normal dev process.
They shouldn't and we know that but this is hardly the first time that story has been told even before LLMs. Usually it was blamed on "the intern" or whatever.
This isn’t just an issue with a developer putting too much trust into an LLM though. This is a failure at the organizational level. So many things have to be wrong for this to happen.
If an ‘intern’ can access a production database then you have some serious problems. No one should have access to that in normal operations.
Sure, I'm not telling you how it should be, I'm telling you how it is.
The LLM just increases the damage done because it can do more damage faster before someone figures out they fucked up.
This is the last big one I remembered offhand but I know it happens a couple times a year and probably more just goes unreported.
https://www.cnn.com/2021/02/26/politics/solarwinds123-password-intern
but should serve as a cautionary tale.
Jesus there's a headline like this every month, how many tales people need to learn???
"and database snapshots that Grigorev had counted on as backups" -- yes, this is exactly how you run "production".

Honestly. At this point, after it having happened to multiple people, multiple times, this is the only appropriate response.
Given that the infrastructure description included the DataTalks.Club website, this resulted in a full wipe of the setup for both sites, including a database with 2.5 years of records, and database snapshots that Grigorev had counted on as backups. The operator had to contact Amazon Business support, which helped restore the data within about a day.
Non-story. He let Terraform zap his production site without offsite backups. But then support restored it all back.
I'd be more alarmed that a 'destroy' command is reversible.
Whether human, AI, or code, you don't give a single entity this much power in production.