7
Proxmox AMD GPU Passthrough
(lemmy.world)
Proxmox VE is a complete, open-source server management platform for enterprise virtualization. It tightly integrates the KVM hypervisor and Linux Containers (LXC), software-defined storage and networking functionality, on a single platform. With the integrated web-based user interface you can manage VMs and containers, high availability for clusters, or the integrated disaster recovery tools with ease.
This is the output from journalctl, since stopping and rebooting the VM: Main error seems to occur at 16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK
Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.
Dec 19 16:41:41 pve systemd[1]: Started 195.scope.
Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state
Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state
Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running
Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting
Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting
Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting
Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting
Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error
Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1
Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>
Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.`
Formatted with a code block so it's more readable:
It does seem a lot like the reset bug, but then you already tried that. :/ Kernel module aren't as easy to install and if you're missing the required flags it might just do nothing.
Should show the 6 flags =y
Or maybe some variation of manual reset...
https://forum.proxmox.com/threads/issues-with-intel-arc-a770m-gpu-passthrough-on-nuc12snki72-vfio-pci-not-ready-after-flr-or-bus-reset.130667/
Just fyi, the 6 y-flags were shown