Back to Blog
Kubernetes
DevOps
Cluster Management
EKS
AKS
Who Needs Blue-Green? Tales from the Trench of Live Cluster Upgrades
July 22, 2025
7 min read read
Upgrading a Kubernetes cluster live might sound like playing with fire — but if you ask enough engineers, you'll find a surprisingly large number who keep the matches handy.
In an ideal world, we'd all do blue-green deployments with surgical precision: spin up a new cluster, mirror the environment, run all the integration tests, and flip the switch only when the stars align. But in the actual world? Many teams are saying, "Screw it. In-place it is."
And honestly, it's working more often than you'd expect.
## The Lazy Path Isn't Always the Wrong One
The confession that kicked off this wave of camaraderie was simple: "Upgrading cluster in-place coz I am too lazy to do blue-green."
There was no grand architectural excuse or elaborate risk assessment. Just a human moment of "I don't wanna." And the response? Over 700 upvotes and a comment section that read like group therapy for DevOps engineers who've ever pushed changes on a Friday.
Turns out, people are doing this a lot. And they're surviving to tell the tale.
## The Reality of Cluster Upgrades in the Wild
You'd expect horror stories of broken deployments and panicked rollbacks — and sure, there are a few — but mostly, it's just a mix of dry humor, tribal knowledge, and a shared understanding that nobody has the time or budget for pristine infrastructure hygiene.
One commenter put it bluntly: "I've been doing in-place for years. Been looking to blue/green maybe 2026."
Another chimed in: "Yeah AKS here... we've done in place since the get go... we have enough environments to test it all out first." That's a common thread: test thoroughly, hold your breath, and hit upgrade.
But don't mistake the casual tone for recklessness. Many of these engineers have a game plan, even if it's more trench warfare than surgical strike. They stagger node pool upgrades, drain nodes cautiously, double-check CRDs, and often rehearse the upgrade on staging first.
## "All in Default" and Other Bold Moves
The MVP award for chaos comfort goes to the folks running dev, QA, and prod in one cluster — sometimes even in the same namespace. One comment nailed the vibe: "Yes, it's also the dev and QA cluster." Followed immediately by "Real ones even use one namespace for all three."
You can hear the collective gasp of every compliance officer reading this. And yet, the sky hasn't fallen — at least not yet.
Some admitted they don't even have a separate dev cluster. One user summed it up: "I don't have a dev cluster, does that answer your question?" Yes. Yes it does.
## The Cost of Doing It "Right"
Here's the thing: blue-green deployments aren't just about best practices. They're about resources. Time, money, manpower. And many teams — especially startups or smaller engineering orgs — are forced to weigh the theoretical best against the practical "good enough."
One architect called out the elephant in the room: "The head architect likes to tell me we aren't mature because we don't have blue-green or a backup cluster running. I have to remind him we started out that way but stopped due to costs… complexity."
That's not laziness. That's tradeoffs. The kind every engineer makes every day.
## EKS, AKS, and the Comfort Blanket of Managed Control Planes
There's a noticeable confidence boost among users of managed services like EKS and AKS. When someone else is watching the control plane, it's easier to say, "Yeah, let's just do it live."
As one contributor wrote: "Managing EKS now, and previous job self-managed — both in-place are fine, just read the breaking changes beforehand."
The real headache? Node upgrades. Especially if you're running things like Jenkins that don't play nicely with moving targets. Or when you're on self-managed setups where every upgrade step feels like you're disarming a bomb while blindfolded.
## "Learning Is Fun" (Until It's Not)
The stories roll in — from devs who deleted nodes thinking it would speed up the upgrade, only to realize too late they had no functioning node pools, to those who handed every dev a cluster-admin certificate and hoped for the best.
One veteran summed up the attitude: "Sometimes it's better to just jump and figure it out. If it's production, I bet you figure it out by morning." A statement that feels less like advice and more like a DevOps dare.
## So... Who Really Needs Blue-Green?
Is blue-green a luxury? A mark of maturity? Or just another checkbox in the great DevOps wishlist?
Maybe all three. The point isn't that blue-green is wrong — it's that not doing it doesn't make you wrong either.
What the trenches tell us is that teams are making it work. They're bending best practices to fit their realities. They're prioritizing test coverage, read-the-docs diligence, and staging dry runs — even if they're hitting "upgrade" on the same prod cluster that's handling traffic.
And they're doing it because that's what their infrastructure, their budget, and their business allow.
## The Real Lesson: Embrace the Chaos, But Respect It
This isn't a celebration of cowboy engineering. It's a recognition that real infrastructure work lives in the messy middle between theory and practice.
If you're upgrading in place, cool. Do it carefully. Know your dependencies. Drain your nodes. Read the breaking changes. And if you're lucky enough to have the luxury of blue-green, that's great — but it doesn't make you morally superior.
The key is knowing why you're doing it the way you're doing it.
Because at the end of the day, Kubernetes doesn't care if you're lazy, cautious, or brilliant. It only cares if you break stuff.
And when you do — not if — here's hoping your DNS shifts fast, your CRDs behave, and your rollback plan isn't just "blame the ISP."
Keep Exploring
From Scripts to Simplicity: AWS Backup's Native Support for Amazon EKS
AWS Backup now natively supports Amazon EKS, eliminating the need for custom scripts and third-party tools. Here's why this changes everything for Kubernetes disaster recovery.
It Works... But It Feels Wrong - The Real Way to Run a Java Monolith on Kubernetes Without Breaking Your Brain
A practical production guide to running a Java monolith on Kubernetes without fragile NodePort duct tape.
Kubernetes Isn’t Your Load Balancer — It’s the Puppet Master Pulling the Strings
Kubernetes orchestrates load balancers, but does not replace them; this post explains what actually handles production traffic.
Should You Use CPU Limits in Kubernetes Production?
A grounded take on when CPU limits help, when they hurt, and how to choose based on workload behavior.