Back to Blog
    Kubernetes
    Tooling
    Production

    What Actually Goes Wrong in Kubernetes Production?

    February 1, 2026
    4 min read read
    # What Actually Goes Wrong in Kubernetes Production? ## What started the discussion Hey Kubernetes folks, I’m curious to hear about real-world production experiences with Kubernetes. For those running k8s in production: What security issues have you actually faced? What observability gaps caused the most trouble? What kinds of things have gone wrong in live environments? I’m especially interested in practical failures — not just best practices. Also, which open-source tools have helped you the most in solving those problems? (Security, logging, tracing, monitoring, policy enforcement, etc.) Just trying to learn from people who’ve seen things break in production. Thanks! ## What stood out in the comments ### Discussion point 1 There was a time when everything was recorded here: [https://k8s.af/](https://k8s.af/) I still laugh about it today because I've already been through some of them. ### Discussion point 2 Subnet size for k8s created to small, hit limits needed a new larger subnet ip range which required a whole lot of new firewall requests ### Discussion point 3 Accidentally added ~60 machines to the apiserver pool instead of the node pool, etcd got REALLY angry and collapsed under its own weight. That day, I learned two things: - The workloads will largely continue to operate in their last-known state for a surprisingly long time if the control plane goes down. Nothing can recover or move, but they'll keep chugging along in place. - If you shut down all but one member of the pre-change apiservers, you can hand etcd its own data directory as a backup/snapshot and it'll happily restore the cluster data _without_ etcd membership, then rejoin the other members that you want in the etcd cluster. ### Discussion point 4 DockerHub rate limits are a major chicken and egg. ### Discussion point 5 Windows and Kubernetes is one of the circles of hell Dante was talking about ## Thread snapshot - Original subreddit: r/kubernetes - Original author: u/Apple_Cidar - Reddit score: 121 - Comment count: 86 - Original thread: https://www.reddit.com/r/kubernetes/comments/1r7t6lv/what_actually_goes_wrong_in_kubernetes_production/

    Related Resources