1D: ...

Filled a disk, and then kept getting error about acquiring cfs lock when trying to do anything, incl modifying VMs, the storage.cfg files, and running pvesr status

I noticed systemctl status pve-cluster had an error like crit: rollback transaction failed: cannot rollback - no transaction is active#010

All googling talked about failed quorum and disk replication issues.

However, I only have a single node instance, cluster. I disabled corosync and some other bits shortly after install.

This page helped: https://forum.proxmox.com/threads/remove-or-reset-cluster-configuration.114260/post-493906

I ended up running:


systemctl stop pve-cluster   
pmxcfs -l  
killall pmxcfs  
systemctl start pve-cluster

This fixed it for me. I didn't run the command to remove corosync config as I'd already done that.

2024-12-15 clustering, quorum, corosync etc.

clustering. Clustered RazerBlade and server as 2 node cluster. This was pretty straight forward, in proxmox, go to datacentre > cluster, create a new cluster on one node, then use the join information from that page to join the other node in the same settings.

Because it's a 2 node cluster, there's always going to be quorum issues. When thought of as a cluster, this means quorum will always be broken when one node goes down. Found this out the hard way, doing a test reboot of razerblade and noticed everything broke because main proxmox server also rebooted.

Solution is to add lastmanstanding and two_node to corosync.conf

2024-12-16 LXC, GPU passthrough etc.

Want to passthorugh GPU to LXC containers to install and run ollama.

A useful reference for proxmox lxc gpu: https://yomis.blog/nvidia-gpu-in-proxmox-lxc/

GPU passthrough for lxc containers on proxmox works by basically installing and configuring the nvidia drivers on the proxmox host, then creating LXC container, setting some perms for cgroup and passing through some nvidia device folders, then install the same nvidia drivers inside the lxc container.

Some notes from setup:

had to install make, gcc and pve-headers on proxmox first to get nvidia installer running
run nvidia installer with --dkms option on there.
had to sign kernel module using a new signing key, and loaded the module even though there's a chance signature couldn't be verified.

Ultimately, I couldn't get it working, because of secureboot. Because of secureboot, only signed kernels can run. To install the graphics drivers on proxmox, I need to add the kernel module which needs signing. I have no access to any trusted signing keys. I can create the new keys, but then I need to reboot into bios type area to accept/trust the keys, which I can't see.

If I could disable secureboot from OS, this wouldn't be a problem

2024-12-17 VM, GPU passthrough etc.

The aim of the GPU passthrough is to be able to run linux/ai/ollama stuff with GPU, and ideally run windows stuff too with GPU.

It should be doable with a VM though: with the VM, we pass through the hardware to the VM, not the drivers/software abstraction.

So with LXC, it was a hindrance that I could get proxmox to recognise the GPU. For VM, that is a benefit as it means the entire VM gets it.

The downside is with this approach, only 1 VM at a time can use the GPU, and the host can't also use it at the same time.

Some notes form this setup:

had to add intel_iommu=on in grub boot loader to ensure it was enabled (IOMMU is basically hardware passthrough support on intel CPU)
for the display, since GPU was passedthrough, boot and install stuff appears on monitor, but no mouse/keyboard passed through

was still able to move mouse around using web ui/novnc, but not all of screen, and novnc gave error/no connecting
changing Display type to VirtIO-GPU fix novnc view so could fully see/control VM via proxmox webui

nvidia install on rocky linux: https://docs.rockylinux.org/desktop/display/installingnvidiagpu_drivers/

Basically add nvidia repos, add some build tools. Had to also instal epel-release repo for dkms

Had to also disable secureboot in the vm by selecting uefi firmware settings option from boot menu.

This got me most of the way there, and nvidia-smi showed the card, but once ollama was installed, it would load the model but not do the generation

The solution to this was to set the CPU type to host.