Week 6 - Lab: Proxmox Cluster
& High Availability
Module: Operating Systems 3 (Virtualisation & Cloud Technologies)
Estimated Time: 120 Minutes
Lab Type: Practical / Simulation
1. Objectives
By the end of this lab, you will be able to:
- Prepare a Multi-Node Environment: Clone or install a secondary Proxmox node.
- Create a Cluster: Initialize a Proxmox Cluster using
pvecm. - Join Nodes: Connect a secondary node to the cluster.
- Configure High Availability: Setup HA Groups.
- Execute Live Migration: Move a running VM between nodes.
2. Prerequisites & Environment Prep
To perform clustering, you need two distinct Proxmox nodes.
Phase 0: Provision Cluster Nodes
Do this BEFORE starting Part A.
Since the lab computers are stateless (reset after use), we will start by creating two fresh Proxmox VE nodes.
-
Create Node 1 (
pve-01):- Create a new Virtual Machine.
- Resources: 2 vCPU, 4GB RAM (minimum), 20GB Disk.
- Install Proxmox VE:
- Hostname:
pve-01.lab - IP Address:
192.168.1.100(or similar) - Gateway/DNS: Match your lab network.
- Hostname:
-
Create Node 2 (
pve-02):- Create a second new Virtual Machine.
- Resources: 2 vCPU, 4GB RAM (minimum), 20GB Disk.
- Install Proxmox VE:
- Hostname:
pve-02.lab - IP Address:
192.168.1.101(increment the last octet).
- Hostname:
Validation:
* Node 1 (pve-01): 192.168.1.100 (Example)
* Node 2 (pve-02): 192.168.1.101 (Example)
* Test: SSH into Node 1 and ping Node 2. If it fails, fix networking
before proceeding.
3. Lab Steps
Part A: Creating the Cluster
Perform on Node 1 (The Master):
- Open the Shell.
- Initialize the cluster:
bash pvecm create Lab-Cluster -
Verify:
bash pvecm status- You should see 1 Node, Quorate: Yes.
Part B: Joining the Cluster
Perform on Node 2 (The Joiner):
- Critical Check: Ensure Hostnames are different! (
hostname). If both arepve, rename one now:bash hostnamectl set-hostname pve2 nano /etc/hosts # Update 127.0.1.1 to pve2 reboot - Join the cluster (Use Node 1's IP):
bash # You will be asked for root password of Node 1 pvecm add <IP_OF_NODE_1> - Wait 1 minute. The web GUI on Node 2 might freeze. Access the GUI via Node 1
(
https://<IP_OF_NODE_1>:8006). - You should now see "Datacenter" has two nodes.
Part C: Live Migration
-
Create a Test VM on Node 1.
- Use a tiny ISO like Alpine Linux.
- Imp ortant: Store the Disk on Shared Storage (NFS/Ceph) if possible. If you only have Local storage, migration will take longer (Storage vMotion).
- Start the VM. Open the Console.
-
Migrate:
-
Right-Right on VM > Migrate.
- Target Node:
pve2. - Mode: Online.
- Click Migrate.
-
Observe:
-
Watch the Task Log.
- Watch the VM Console. Did it disconnect? (Ideally no).
Part D: High Availability Simulation
-
Configure HA:
- Datacenter > HA > Groups > Create
HA-Group. Select both nodes. - Datacenter > HA > Resources > Add. Select your Test VM.
-
Simulate Failure:
-
WARNING: This is disruptive.
- Physically unplug power (or "Hard Stop" the Nested VM) for Node 1 (where the VM is running).
-
Observe Recovery:
-
Watch Node 2's GUI.
- Within 2-3 minutes, the Cluster detects Node 1 is dead ("Fence").
- The VM should magically appear on Node 2 and boot up.
- Datacenter > HA > Groups > Create
4. Troubleshooting
"Task Error: migration aborted"
- Do you have shared storage? If not, did you check "with-local-disks"?
- Are CPUs different? (Intel vs AMD).
"Quorum information missing, see syslog"
- Your multicast/corosync network is broken. Check latencies.
5. Interactive Checkpoint
- Verify
pvecm statusshows multiple nodes. - Verify the Migration Task Log shows "TASK OK".