Back to Blog
Kubernetes
MariaDB
Database
Operators
Stateful Workloads
Why MariaDB Operator 25.10 Is a Big Deal for Stateful Workloads on Kubernetes
October 31, 2025
8 min read
# Why MariaDB Operator 25.10 Is a Big Deal for Stateful Workloads on Kubernetes
Running databases in Kubernetes has always felt a bit like trying to fit a square peg in a round hole. Kubernetes was designed for stateless applications, and for the longest time, databases—those pesky stateful, disk-hungry, fail-sensitive creatures—have been treated like second-class citizens in the cloud-native ecosystem.
But with the release of **MariaDB Operator 25.10**, things are starting to shift.
This update isn't just another bump in version numbers or a routine security patch. It's a significant leap forward, especially for teams that want to run **stateful workloads like MariaDB** inside Kubernetes without duct-taping together half-baked solutions. This release introduces **asynchronous replication as a fully supported feature**, adds **automated replica recovery**, and bakes in several smart operational improvements that make running production-grade MariaDB clusters inside Kubernetes not just possible—but sane.
## Asynchronous Replication Goes GA—And It's Actually Solid
The headline feature here is the **general availability (GA) of asynchronous replication**. That may not sound thrilling unless you've ever watched a database go sideways at 3 AM. For most users, asynchronous replication means something pretty straightforward: a primary database server does all the writes, and one or more replicas follow along, pulling changes over as fast as they can.
It's not bleeding-edge tech. MySQL and MariaDB have supported this for ages. But what's important is that **MariaDB Operator now understands it deeply**. You define a simple Kubernetes manifest, flip the replication switch, and boom—you've got a primary-replica setup.
```yaml
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-repl
spec:
storage:
size: 1Gi
storageClassName: rook-ceph
replicas: 3
replication:
enabled: true
```
That's it. Behind the scenes, the operator sets up users, manages credentials, syncs the binary logs, and monitors replication lag. You don't need to babysit it. You just define the desired state and let the operator handle the dirty work.
## Failover You Can Actually Trust
Another big win? **Automated primary failover**.
If your primary pod dies—and it will, eventually—the operator **automatically picks the most up-to-date replica** and promotes it. This isn't some flaky hack that crosses its fingers and hopes the new primary has all the writes. The operator checks replication lag and relay log application status to ensure the candidate is clean.
Here's a sample of what that looks like during a failover:
```
NAME READY STATUS PRIMARY UPDATES
mariadb-repl False Switching primary to 'mariadb-repl-1' mariadb-repl-0 ReplicasFirstPrimaryLast
...
NAME READY STATUS PRIMARY UPDATES
mariadb-repl True Running mariadb-repl-1 ReplicasFirstPrimaryLast
```
That transition takes seconds. It's the kind of zero-touch recovery most teams wish they had when managing databases manually—except now it's baked in.
You can even control failover behavior with settings like `autoFailoverDelay` for tuning how aggressively the system promotes a new primary. That's huge for high-availability setups where uptime is measured in dollars per second.
## Replica Recovery That Doesn't Suck
Let's talk about the elephant in the cluster: **replica corruption**.
Anyone who's dealt with asynchronous replication knows the pain of error code **1236**—the dreaded "replica can't catch up because the primary purged the binary logs" situation. It's a silent killer that leaves your cluster in a weird limbo.
MariaDB Operator 25.10 solves this with **automated replica recovery** using a construct called `PhysicalBackup`. If a replica can't recover normally, the operator triggers a recovery flow that takes a volume-level snapshot from a healthy replica and restores it into the broken one. All without manual intervention.
And the best part? It actually works:
```
kubectl get mariadb
NAME READY STATUS PRIMARY
mariadb-repl False Recovering replicas mariadb-repl-1
...
kubectl get mariadb
NAME READY STATUS PRIMARY
mariadb-repl True Running mariadb-repl-1
```
Recovery time depends on your storage driver and data size, but it's typically fast enough that you don't need to scramble. For teams dealing with production-grade workloads, this is a godsend. It turns replica recovery from a 30-minute firefight into a non-event.
## Smarter, Safer Scaling and Backups
This release doesn't just stop at failover and recovery. It also offers **flexible strategies for scaling out**, including support for different backup methods. You can use fast, local **VolumeSnapshots** for rapid scaling, or switch to **mariadb-backup** for longer-term durability.
This gives teams more control over how they balance performance and reliability. For example, you can maintain one `PhysicalBackup` spec for nightly S3 backups and another for instant snapshot-based recovery. The operator supports both, and choosing the right one is as easy as plugging in a different template.
## The Community's Fingerprints Are All Over This
It's worth calling out that much of what makes 25.10 so good came from **real-world feedback**. Users in the open-source community reported issues with early replication support, submitted manual recovery runbooks, and pushed the maintainers to refine the operational experience.
The maintainers—especially mmontes11, who appears to be spearheading a lot of the development—deserve props for listening and iterating. You can feel the difference between a "built-in-a-bubble" feature and one forged through actual production use.
As one user noted in the release discussion, many features exist today **because people kept breaking their clusters and wanted better recovery paths**. That kind of evolution is rare in projects trying to be everything to everyone.
## Not Perfect—But Getting Close
There's still some room to grow. Right now, the operator only supports replication within a **single Kubernetes cluster**, not across clusters or regions. That's a limitation for teams building multi-region failover systems. But given how fast things are moving, cross-cluster support feels more like a "when," not "if."
There's also the usual caveats about performance. If you're running Kubernetes on-prem, **local storage is a must**. Networked volumes can become a bottleneck, especially with write-heavy workloads. And as the maintainer put it, "Don't make any assumptions—run sysbench."
Still, even with those constraints, MariaDB Operator 25.10 brings a level of confidence that stateful workloads inside Kubernetes have often lacked. It's not a bolt-on experiment anymore. It's production-ready, thoughtfully built, and backed by a community that clearly cares.
## TL;DR
MariaDB Operator 25.10 doesn't just support asynchronous replication. It **makes it work the way you'd want it to**—automatically, intelligently, and resiliently. With features like:
- General availability of async replication
- Automated failover to the most up-to-date replica
- Snapshot-based replica recovery on error code 1236
- Flexible backup strategies for different use cases
…it's a milestone release for anyone looking to move stateful workloads into Kubernetes with minimal drama.
If you've been waiting for a sign that running a database in k8s isn't reckless, this is it.
Keep Exploring
Why Kubernetes Still Doesn't Natively Support Live Container Migration (And Why It Should)
Kubernetes has mastered orchestration, but still lacks native live container migration. Explore why this feature is missing, how CAST AI is changing the game with CRIU, and why it's time for K8s to catch up.
It Works... But It Feels Wrong - The Real Way to Run a Java Monolith on Kubernetes Without Breaking Your Brain
A practical production guide to running a Java monolith on Kubernetes without fragile NodePort duct tape.
Kubernetes Isn’t Your Load Balancer — It’s the Puppet Master Pulling the Strings
Kubernetes orchestrates load balancers, but does not replace them; this post explains what actually handles production traffic.
Should You Use CPU Limits in Kubernetes Production?
A grounded take on when CPU limits help, when they hurt, and how to choose based on workload behavior.