Cluster probe setup

The Proxmox cluster probe

This is a practical follow-up post after the introductory piece on Corosync as well as proper initial technical reasoning base of the inevitably upcoming follow-up of Why Proxmox VE shreds your SSDs.

Understanding the role of Corosync in Proxmox clusters will be of benefit as we will create a dummy node - one that will be sharing all the information with the rest of the cluster at all times, but not provide any other features. This will allow for observing the behaviour of the cluster without actually having to resort to the use of fully specced hardware or otherwise disrupting the real nodes.

In fact, it’s possible to build this on a virtual machine, even in a container, so as long as we make sure that the host is not part of the cluster itself, which would be counter-productive.

The install

Let’s start with Debian network install image 1, any basic installation will do, no need for GUI - standard system utilities and SSH will suffice. Our host will be called probe and we will make just a few minor touches to have some of the requirements for the PVE cluster - that it will be joining later - easy to satisfy.

After the first post-install boot, log in as root.

Important

Debian defaults to SSH connections disallowed for a root user, if you have not created non-privileged user during install from which you can su -, you will need to log in locally.

Let’s streamline the networking and the name resolution.

First, we set up systemd-networkd 2 and assume you have statically reserved IP for the host on the DHCP server - so it is handed out dynamically, but always the same. This is IPv4 setup, so we will ditch IPv6 link-local address to avoid quirks specific to Proxmox philosophy.

Tip

If you cannot satisfy this, specify your NIC exactly in the Name line, comment out the DHCP line and un-comment the other two filling them up with your desired static configuration.

cat > /etc/systemd/network/en.network << EOF
[Match]
Name=en*

[Network]
DHCP=ipv4
LinkLocalAddressing=no

#Address=10.10.10.10/24
#Gateway=10.10.10.1
EOF

apt install -y polkitd
systemctl enable systemd-networkd
systemctl restart systemd-networkd

systemctl disable networking
mv /etc/network/interfaces{,.bak}

NOTE If you want to use stock networking setup with IPv4, it is actually possible - you would need to disable IPv6 by default via sysctl however:

cat >> /etc/sysctl.conf <<< "net.ipv6.conf.default.disable_ipv6=1"
sysctl -w net.ipv6.conf.default.disable_ipv6=1

Next, we install systemd-resolved 3 which mitigates DNS name resolution quirks specific to Proxmox philosophy:

apt install -y systemd-resolved

mkdir /etc/systemd/resolved.conf.d
cat > /etc/systemd/resolved.conf.d/fallback-dns.conf << EOF
[Resolve]
FallbackDNS=1.1.1.1
EOF

systemctl restart systemd-resolved

# Remove 127.0.1.1 bogus entry for the hostname DNS label
sed -i.bak 2d /etc/hosts

At the end, it is important that you should be able to successfully obtain your routable IP address when checking with:

dig $(hostname)
---8<---

;; ANSWER SECTION:
probe.			50	IN	A	10.10.10.199

You may want to reboot and check all is still well afterwards.

Corosync

Time to join the party. We will be doing this with a 3-node cluster, it is also possible to join a 2-node cluster or initiate a “Create cluster” operation from a sole node and instead of “joining” any nodes, perform the following.

Caution

While there’s nothing inherently unsafe about these operations - after all they are easily reversible, certain parts of PVE solution happen to be very brittle, i.e. the High Availability stack. If you want to absolutely avoid any possibility of random reboots, it would be prudent to disable HA, at least until your probe is well set up.

We will start, for a change, on an existing real node and edit the contents of the Corosync configuration by adding our yet-to-be-ready probe.

On a 3-node cluster, we will open /etc/pve/corosync.conf and explore the nodelist section:

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.101
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.102
  }
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.10.10.103
  }
}

This file is actually NOT the real configuration, it is a template which PVE distributes (once saved) to each node’s /etc/corosync/cosorync.conf from where it is read by the Corosync service.

We will append a new entry within the nodelist section:

  node {
    name: probe
    nodeid: 99
    quorum_votes: 1
    ring0_addr: 10.10.10.199
  }

Also, we will increase the config_version counter by 1 in the totem section.

Caution

If you are adding a probe to a single node setup, it will be very wise to increase the default quorum_votes value (e.g. to 2) for the real node should you want to continue operating it comfortably when the probe is off.

Now one last touch to account for rough edges in PVE GUI stack - it is completely dummy certificate not used for anything, but is needed to not deem your Cluster view inaccessible:

mkdir /etc/pve/nodes/probe
openssl req -x509 -newkey rsa:2048 -nodes -keyout /dev/null -out /etc/pve/nodes/probe/pve-ssl.pem -subj "/CN=probe"

Before leaving the real node, we will copy out the Corosync configuration and authentication key for our probe. The example below copies it from existing node over to the probe host - assuming only non-privileged user bud can get in over SSH - into their home directory. You can move it whichever way you wish.

scp /etc/corosync/{authkey,corosync.conf} bud@probe:~/

Now back to the probe host, as root, we will install Corosync and copy in the previously transferred configuration files into place where they will be looked for following the service restart:

apt install -y corosync

cp ~bud/{authkey,corosync.conf} /etc/corosync/

systemctl restart corosync

Now still on the probe host, we can check whether we are in the party: 4

corosync-quorumtool
---8<---

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1
         2          1 pve2
         3          1 pve3
        99          1 probe (local)

You may explore the configuration map as well: 5

corosync-cmapctl

We can explore the log and find:

journalctl -u corosync
  [TOTEM ] A new membership (1.294) was formed. Members joined: 1 2 3
  [KNET  ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
  [KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
  [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
  [KNET  ] pmtud: Global data MTU changed to: 1397
  [QUORUM] This node is within the primary component and will provide service.
  [QUORUM] Members[4]: 1 2 3 99
  [MAIN  ] Completed service synchronization, ready to provide service.

And can check all the same on any of the real nodes as well.

What is this good for

This is a demonstration of how Corosync is used by PVE, we will end up with a dummy probe node showing in the GUI, but it will be otherwise looking as if it was an inaccessible node - after all, there’s no endpoint for the any of the API requests coming. However, the probe will be casting votes as configured and can be used to further explore the cluster without disrupting any of the actual nodes.

Note that we have NOT installed any Proxmox component so far, nothing was needed from other than Debian repositories.

We will ALSO use this probe in a follow-up post that will build the cluster filesystem.