Mr.PlanB

# The Day a Three-Node Cluster Refused to Trust Itself ## When “High Availability” Suddenly Feels Fragile Running a three-node virtualization cluster usually brings a comforting sense of redundancy. You reboot one node, the other two keep everything alive. Maintenance mode migrates virtual machines smoothly. The system behaves like a well-choreographed dance: workloads slide between hosts while administrators sip coffee and watch dashboards. Then someone tries something that feels perfectly reasonable: reboot two nodes at the same time after migrating all the VMs to the third host. And suddenly the cluster behaves like it’s having an existential crisis. The last node starts panicking. Services restart. Sometimes the host even reboots. Instead of calmly running every workload in the cluster, the final machine refuses to trust the situation. To the administrator watching from the console, it looks absurd. The hardware is powerful enough to run everything. The VMs are already migrated. Nothing is technically overloaded. So why does the cluster revolt the moment two nodes disappear? ## The Invisible Rule: Majority Always Wins One explanation comes from people who’ve lived inside distributed systems for years. They point to a quiet rule that sits beneath most high-availability clusters: **majority rule**. In a three-node setup, the cluster expects at least two nodes to agree about reality. That agreement forms something called *quorum*. Lose quorum, and the system no longer trusts its own state. One engineer explained it bluntly: when two nodes disappear, the remaining node assumes the problem might actually be itself. From its perspective, the other machines haven’t vanished — it might simply be isolated. This sounds paranoid, but that paranoia exists for a reason. Distributed systems treat isolation as dangerous because it can lead to something worse than downtime: **split brain**. ## The Nightmare Scenario Engineers Fear Imagine the cluster behaves the way administrators initially expect. Two nodes go offline. The last node keeps running every VM without complaint. Now imagine the opposite situation. Instead of a planned reboot, the network link between nodes fails. The cluster splits into two groups that can’t see each other. If both sides believe they’re the legitimate cluster, they might start running the same virtual machines simultaneously. That’s how you end up with duplicate databases writing to the same storage. Two copies of a VM processing transactions. Filesystems corrupted beyond repair. One admin described the outcome in colorful terms: suddenly you have *two of every VM running*, and the day goes downhill fast. To prevent that nightmare, cluster software follows a strict rule. If a node can’t confirm it’s part of the majority, it refuses to run workloads. Even if that means shutting down services that were perfectly healthy seconds earlier. From the system’s perspective, stopping everything is the safer option. ## Why Maintenance Mode Doesn’t Solve It Many administrators assume maintenance mode should change the equation. After all, placing nodes in maintenance mode migrates workloads and signals the cluster that those machines are intentionally stepping away. That logic feels reasonable: if the cluster knows two nodes are deliberately offline, the remaining host should keep running the VMs. But the cluster doesn’t see it that way. Maintenance mode handles workload migration and scheduling decisions. It doesn’t change the fundamental quorum rules governing cluster safety. Once two nodes disappear from communication, the remaining node still loses majority agreement. From the software’s perspective, there’s no reliable proof the other hosts are truly offline rather than simply unreachable. Distributed systems operate on one harsh assumption: **networks lie all the time**. ## The VMware Comparison That Always Appears Whenever this behavior surprises someone, comparisons to other virtualization platforms appear almost instantly. In some environments, administrators are used to mechanisms like datastore heartbeating or storage locks. Those systems allow hosts to verify whether another machine still owns a VM or has lost access to shared storage. That approach creates a different style of failure detection. Instead of relying primarily on cluster quorum, the system looks at shared storage signals to determine whether a VM is still alive elsewhere. Some engineers prefer this model. Others see it as trading one set of risks for another. As one commenter pointed out, every platform makes different design choices about where trust comes from: network consensus, storage locks, or some hybrid combination. None of them are perfect. ## The Brutal Simplicity of Proxmox’s Philosophy The design philosophy behind many open-source clusters is almost stubbornly simple. If a node can’t confirm it’s part of the majority, it assumes something dangerous might be happening and shuts things down. No clever guessing. No optimistic assumptions. Just stop. This approach prioritizes consistency and data safety above convenience. It means administrators must follow strict operational patterns. Reboot nodes one at a time. Maintain quorum. Avoid situations where the cluster loses its voting majority. Some people find that frustrating. It feels like the system refuses to trust obvious intentions during maintenance windows. But from a distributed-systems perspective, the cluster is behaving exactly as designed. It’s protecting data from scenarios humans often underestimate. ## The Hidden Cost of Small Clusters This situation also highlights a quiet truth about small clusters. A three-node cluster can only tolerate one failure at a time. The moment two nodes disappear, quorum is gone. That limitation leads to the advice that inevitably surfaces in conversations like this: if you want the cluster to survive losing two nodes, you need more nodes. It sounds almost dismissive, but mathematically it’s accurate. Larger clusters increase the number of failures that can occur without breaking majority consensus. A five-node cluster can lose two nodes and still maintain quorum. A seven-node cluster can lose three. With three nodes, there’s simply no margin. ## The Moment the Design Finally Makes Sense For administrators encountering this behavior the first time, the system feels overly cautious. After all, they know exactly what they’re doing. They deliberately placed nodes into maintenance mode. The scenario is controlled. But distributed systems don’t rely on human certainty. They rely on rules that survive chaotic conditions — broken switches, network partitions, crashed hosts, corrupted memory. Those rules often look stubborn during routine maintenance. Yet they exist because someone, somewhere, once lost an entire cluster to split brain. So the next time a three-node cluster refuses to keep running after two nodes disappear, it’s not panicking. It’s doing something far less dramatic. It’s simply refusing to guess.

The Day a Three-Node Cluster Refused to Trust Itself

Keep Exploring