This is an automated archive made by the Lemmit Bot.
The original was posted on /r/homeassistant by /u/anthony-hines on 2026-03-17 15:35:04+00:00.
A while back I set up automations to monitor Zigbee link quality across all my devices. I was tracking LQI and availability state changes so I could spot devices with poor signal or ones dropping off the mesh. Across 169 Zigbee devices it generated a lot of state change events but the system handled it fine under normal conditions and I didn't think much of it.
Then something unrelated forced me to hard reboot the host. At the time I was running Zigbee2MQTT as an add-on inside HA, which means Z2M shares HA's lifecycle. When the host came back up, everything restarted at once. All 169 Zigbee devices began reconnecting to the mesh in waves, each one transitioning from unavailable to available. Every reconnection fired my availability automations, and every automation was another event landing on HA's event bus while Core was still trying to stand up. HA couldn't keep pace. The UI was barely responsive, automations were queueing faster than they could execute, and the system felt like it was drowning.
So I did what felt logical and rebooted again. Same result. The same 169 devices had to rejoin the mesh on every restart. Each reboot wasn't a recovery attempt, it was a replay of the same overload.
What made this hard to diagnose in the moment was that it didn't look like a Zigbee problem. It looked like HA was just slow to start up. It took me longer than I'd like to admit to realise the load was coming from the mesh reconnection hitting the automations hitting the event bus, not from Core itself.
The short-term fix was disabling the availability automations and letting HA come up clean. I've since cut LQI monitoring back to a handful of critical devices rather than the whole mesh. The overhead isn't worth it when it becomes a liability at the exact moment you need the system to recover.
The longer-term fix was moving Z2M and Mosquitto out of HA and into their own LXC containers on Proxmox. Ironically this gets me back to roughly the same resilience model I had when I was running separate Hue bridges, the mesh has its own lifecycle independent of HA. If I need to reboot the HA host now, the mesh doesn't notice. Devices stay connected, Z2M keeps running, and when HA comes back it just picks up the current state over MQTT and the recovery is clean every time.