Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
I worked at a bank at the time. We were moving to a new system and running recons against the old system to check the behaviour was the same. I had to run a manual recon of the old system vs the new 4 times per day. There was a lot of focus from management and users on the new system.
The week leading up to Christmas, I was the one person not on holiday yet, and also the most junior person on the project. I found that week so stressful, as I had to run these recons and quickly decide whether each break was real or not before reporting to the users. Despite having worked on that system, I had very little confidence and didn't have the same intuitive mental model of the system my colleagues had. I had to dig into each break case-by-case, but they seemed much more able to understand what was going on via a few simple queries.
Anyway, I get through the week and left for the holidays on Thursday evening. I'm just grateful that I've gotten through it. Then, around 3pm on the Friday, I see a missed call from the tech lead. I log in, and everything's on fire. I join the incident call, and it turns out that we hadn't processed a single trade in the new system that whole week. I discover that it was thanks to a config change I'd made several weeks before, that had just made it to production. No-one (neither the users, nor I) had realised! But we missed several hundred million pounds worth of payments in that week as a result.
It was so jarring, having been relieved that I made it to the holiday, then joining the incident call and struggling to work out what to do. I completely dissociated and my mind was blank. I remember being on the call and really passively and calmly walking around my room. I kept thinking "I need to do x, I need to do y" but my mind couldn't focus and I was just staring at the screen. At some point I just lay in bed with my laptop while on the call.
There had been a total failure of process: my change had been approved by two people, the nonprod environment was configured differently in a way that didn't expose the bug, the recon failures looked very similar to the false positives, and there were so many false positives that it was impossible to dig into all of them. Meanwhile, we didn't have basic queries monitoring that trades were flowing in, and the users weren't paying much attention either, until they realised that it was broken.
Still, I made a lot of mistakes. I should have just escalated that there were breaks instead of trying to figure it out myself. I shouldn't have been afraid to call the tech lead and bring them out of their holiday. And I shouldn't have been afraid of the confrontation with the users.
Anyway, that experience really messed me up mentally for a long time. I lost so much confidence and became so much more scared of production (not in a healthy way). It really was not the right environment for me.
This one is on the seniors. Seniors should never leave a junior with that kind of responsibility. Not only was it not fair on you but the stakes sound pretty high. But hey they all wanted their holiday.
I used to manage developers in an environment like that. In case it helps, I can assure you it was shit for everyone.
We even had an issue much like the one you describe.
In case it amuses you, I took quiet joy in ruthlessly using the root cause analysis as leverage to fix several of the issues. And then I left for more money, better hours and more interesting work.
So at least there was a happy ending for everyone who was me, and everyone who I liked working with enough to recruit to my new firm.
Edit: And I hired away the dev who made a similar mistake for more money, too. It wasn't their fault our environment was built with Kleenex!
I hate that so much. So many IT folks treat nonprod config changes like they won't still ruin my weekend. Haha.