Back to Blog
    Zabbix
    FinOps
    Infrastructure Cost
    Capacity Planning

    This Zabbix Module Quietly Exposes How Much Money Your Infrastructure Is Burning

    April 3, 2026
    4 min read
    **“This Zabbix Module Quietly Exposes How Much Money Your Infrastructure Is Burning”** ## The Cost Nobody Sees Until It’s Too Late Infrastructure waste rarely shows up as a loud failure. There’s no alert screaming that a server is oversized, no red dashboard warning that half your CPU sits idle day after day. Instead, it quietly drains budgets in the background. That’s exactly the gap this FinOps module tries to fill — not by adding new data, but by reinterpreting what’s already there . By analyzing 30 days of metrics — CPU, memory, network, and load — the tool surfaces something most teams overlook: underutilization. Not in vague terms, but in measurable signals like “Waste Score” and “Efficiency Score.” It’s less about performance and more about accountability. And that shift changes how monitoring feels. ## Turning Metrics Into Money Conversations Most monitoring setups stop at health. Is the system up? Is it stable? This module pushes further, asking a more uncomfortable question: is it worth what it costs? Instead of just flagging overprovisioned machines, it suggests specific downsizing actions — reducing vCPUs, trimming memory, making concrete adjustments . That’s where things get interesting. One perspective frames it as overdue: “Finally something that translates metrics into actual decisions.” But there’s hesitation too. Another voice leans cautious: “Right-sizing sounds great until you cut too much and performance tanks.” That fear isn’t irrational. Infrastructure decisions aren’t just technical — they carry risk, and not every team is ready to automate that judgment. ## The 95th Percentile: Smarter or Riskier? One of the module’s core ideas is using the 95th percentile instead of peak usage. It’s a subtle change, but it carries weight. A single spike won’t block optimization decisions, which makes sense in theory . Some see it as a necessary correction. “Why should a five-minute spike justify overpaying for months?” That argument hits hard, especially in cloud-heavy environments. Others aren’t convinced. “Those spikes exist for a reason,” someone might argue. “Ignore them, and you might be ignoring the exact moment your system actually needs capacity.” It’s the classic trade-off: efficiency versus safety. There’s no universal answer, only context. ## When Automation Meets Real-World Complexity The module doesn’t blindly recommend downsizing. It checks for growth trends, compares usage over time, and even considers other bottlenecks like network or disk before suggesting changes . That layered approach gives it more credibility. Still, questions surface quickly. Concerns about database load, especially when pulling large amounts of historical data, highlight a practical risk. “I always get nervous about large DB reads,” one comment notes, pointing to potential performance impacts . The response is reassuring but limited: tested on environments with over 300 hosts, no noticeable issues. That’s solid, but not universal. Scaling concerns don’t disappear — they just move further down the road. ## Adoption Friction: Timing, Versions, and Trust Even when a tool makes sense, adoption isn’t automatic. One user points out a simple blocker: it’s built for a non-LTS version, so they’ll wait for the next stable release . That’s a reminder that technical merit alone doesn’t drive usage — timing matters. There’s also a trust barrier. Tools that suggest cost-cutting changes inherently challenge existing setups. Accepting those recommendations means admitting that resources have been wasted, sometimes for years. And that’s not always an easy conversation to have. ## A Different Kind of Monitoring Mindset What makes this module stand out isn’t just its functionality — it’s the mindset it introduces. Monitoring stops being purely reactive and starts becoming financial. Systems aren’t just “healthy” or “unhealthy.” They’re efficient or wasteful. Some teams will embrace that shift. They’ll see it as a way to align engineering with cost awareness, to make smarter decisions without adding external tools. Others will resist it. “Monitoring should stay about uptime,” one might argue. “Cost optimization belongs somewhere else.” That divide reflects a broader change happening across infrastructure teams. Because once cost becomes visible inside the same interface as performance, it’s harder to ignore. And maybe that’s the real impact here. Not the scores, not the recommendations — but the quiet realization that every idle CPU cycle has a price tag attached to it.