taaz

joined 2 years ago
[–] taaz@lemmy.world 1 points 2 days ago

~/Projects which has everything I ever cloned or started. yes, it's getting kind of painful to backup :D

 

TLDR: Both raided disks are not showing up after reboot after the cpu went over 100°C. Awaiting Hetzner support for further steps.


I am writing this to primarily let the only other active user of my instance (Cephalotrocity@biglemmowski.win) know that the server might be out of order for an unknown amount of time.


Story so far:

  • About two hours ago I accidentally noticed that the CPU is sitting at 100+ ˚C and staying at 500Mhz.
  • LAV was sitting at 3 cores so the 12 thread CPU was not getting pressed hard in any way :weird:
  • After reboot it didn't come back online, connections kept timing out.
  • So I reboot the box into hetzners Linux rescue system... and can't see any of the two NVMe disks at all, with the following nvme controller errors in dmesg:
    [Wed Dec 10 11:20:13 2025] nvme nvme1: I/O tag 17 (0011) QID 0 timeout, disable controller
    [Wed Dec 10 11:20:13 2025] nvme nvme0: I/O tag 0 (0000) QID 0 timeout, disable controller
    [Wed Dec 10 11:20:13 2025] nvme nvme1: Device not ready; aborting shutdown, CSTS=0x1
    [Wed Dec 10 11:20:13 2025] nvme nvme0: Device not ready; aborting shutdown, CSTS=0x1
    [Wed Dec 10 11:20:13 2025] nvme 0000:01:00.0: probe with driver nvme failed with error -4
    [Wed Dec 10 11:20:13 2025] nvme 0000:09:00.0: probe with driver nvme failed with error -4
    
    Manual removal and pci rescan yielded basically the same thing.
  • Now awaiting if Hetzner can resuscitate the system.

UPDATE:

  • Hetzner support identified a faulty fan but the disks are dead even for them (both? really weird)
  • I will get the disks replaced and then delve into my slapped together backups and will see what I can recover.

UPDATE2:

  • it's completely gone, the off-site borgmatic backup was probably not running :/
2
Synchrule (lemmy.world)
submitted 2 years ago* (last edited 2 years ago) by taaz@lemmy.world to c/196@lemmy.blahaj.zone