The system:
MSI Raider GE67 HX 12UHS
Intel Core i9-12900HX
nVidia GeForce RTX 3080Ti (laptop)
32GiB RAM
Win11 Pro 64-bit
The problem:
Once in a while (usually 2-3 times per day), the system crashes, usually resulting in a blue screen with one of various error codes. Codes I've seen include:
HYPERVISOR_ERROR
CLOCK_WATCHDOG_TIMEOUT
VIDEO_TDR_FAILURE
IRQL_NOT_LESS_OR_EQUAL
Sometimes the system hangs but the blue screen never comes, and I have to power it off manually. When this happens, the fans go to full speed and yet the laptop quickly becomes incredibly hot if I don't power it off as soon as possible, suggesting that the CPU or GPU is maxing out for some reason.
Checking with Event Viewer shows nothing out of the ordinary in the lead up to the crash.
Things I've ruled out:
I initially thought it only happened while plugged in, and bought a new power supply. That didn't seem to affect the frequency of the issue, and I also have now seen it happen while on battery. I also initially thought it was more frequent while playing games that use the dedicated graphics card, but I'm not sure that's actually true; I have seen it happen even while just watching Youtube. At one point I felt that it happened more when I moved the laptop or plugged in USB devices, but I think that may be magical thinking; I have never been able to make it happen on purpose by doing those things. It does seem to be true that after it happens, if I let the laptop restart automatically, it often happens again in a short time, but shutting down and then turning it back on gives more time before the next incident.
Solutions I've tried:
I tried updating the BIOS and the Intel firmware to the latest available on MSI's website, but that doesn't seem to have helped. I also updated my nVidia drivers.
A possibly related issue:
A week or so before this happened for the first time, I updated the BIOS to fix a different issue. What happened then was: I was playing a game on battery unintentionally, and didn't notice until that "low battery - switching to Super Battery" warning appeared and began throttling system performance. I plugged the laptop in, but performance didn't improve. I restarted and performance was terrible across all applications, even Firefox. I checked Resource Manager and noticed that the CPU was being throttled down to around 0.16GHz. Event Viewer was showing warnings about this that said the processor was being limited by system firmware.
I tried using various Windows and MSI power management settings to resolve the issue, which persisted across restarts, fully charging the battery, etc. In the end, I solved it by updating the BIOS (to a version that is now one version back from the most current one).
It was a while, maybe a week, after running the update that the crash happened for the first time.
Current theory:
Is it possible I screwed up the BIOS update somehow? I noticed that it instructs you to return clock speeds to stock before doing the update. I don't think I've manually adjusted them, but MSI's "MSI Center" software seems to offer automatic adjustment. It was set to "Balanced" when I did the most recent update, but it may have been set to "Auto" when I did the first one, which I guess could be a problem if the CPU was automatically overclocked.
It sounds like a nuclear option but it should avoid what would otherwise be a lot of fucking about. Fingers crossed a reinstall sorts it and you don’t have an underlying hardware issue.
Update: it was the SSD, despite some tests I ran claiming otherwise. Reinstalling Windows failed, but an install on a fresh SSD is running fine!
Edit: Well that didn't last long, see my edit here.