Passwordless SSH can lock you out
If you follow standard security practices, you would not allow root
logins, let alone connections over SSH (as with Debian standard install). But this would deem your PVE unable to function properly, so you can only resort to fix your /etc/ssh/sshd_config
1 with the option:
PermitRootLogin prohibit-password
That way, you only allow connections with valid keys (not password). Prior to this, you would have copied over your public keys with ssh-copy-id
2 or otherwise add them to /root/.ssh/authorized_keys
.
But this has a huge caveat on any standard PVE install. When you examine the file, it is actually a symbolic link:
/root/.ssh/authorized_keys -> /etc/pve/priv/authorized_keys
This is because there’s already other nodes’ keys there to allow for cross-connecting - and the location is shared. This has several issues, most important of which is that the actual file lies in /etc/pve
which is a virtual filesystem 3 mounted only when all goes well during boot-up.
What could go wrong
If your /etc/pve
does not get mounted during bootup, your node will appear offline and will not be accessible over SSH, let alone GUI.
Warning
If accessing via other node’s GUI, you will get confusing Permission denied (publickey,password)
in the “Shell”.
You are essentially locked-out, despite the system otherwise booted up except for PVE services. You cannot troubleshoot over SSH, you would need to resort to OOB management or physical access.
This is because during your SSH connection, there’s no way to verify your key against the /etc/pve/priv/authorized_keys
.
Caution
If you allow root to authenticate also by password, it will lock you out of “GUI only”. Your SSH will not work - obviously - with key, but fallback to password prompt.
How to avoid this
You need to use your own authorized_keys
, different from the default that has been hijacked by PVE. The proper way to do this is define its location in the config:
cat > /etc/ssh/sshd_config.d/LocalAuthorizedKeys.conf <<< "AuthorizedKeysFile .ssh/local_authorized_keys"
If you now copy your own keys to /root/.ssh/local_authorized_keys
file (on every node), you are immune from this design flaw.
Tip
There are even better ways to approach this, e.g. SSH certificates, in which case you are not prone to encounter this bug for your own setup. This is out of scope for this post.
FAQ
- What about non-privileged user & sudo?
This will work just fine, too. Note that PVE does not come with sudo
and will nevertheless require root
allowed to login over SSH to preserve full features.
- Why is this considered a design flaw?
Due to the Proxmox stack setup, inaccessible SSH for root
user prevents you to e.g. troubleshoot failing services (when SSH is healthy) even from GUI shell of a healthy node. It is impossible to remove SSH access for root
account in Proxmox without losing features, some of which are documented.
Since you cannot disable root
over SSH, you might as well embrace it, however if you have another way in through other steps (e.g. FAQ 1), it is just as good (the GUI path will still not work though).
- The incidence ratio of system “down” (but has full networking) vs “down down” (when it need to rescue from console / kvm) seems low.
The issue is that failure of pve-cluster
service at boot (which needs to run also on standalone nodes) that causes the “lockout” is quite common side effect of e.g. networking misconfiguration or pmxcfs backend-database corruption. They are out of scope of this post, but happen definitely more often than just failing SSH, let alone networking as a whole. Also note that lots of home systems do not have OOB/KVM or even rely entirely on GUI.