Mr.PlanB

# “The Observability Treasure Hunt: Why Engineers Still Don’t Know What Metrics Their Systems Actually Emit” ## The Hidden Chaos Behind Modern Metrics Observability promises clarity. Dashboards, alerts, and traces should give engineers a clear picture of what’s happening inside their systems. But ask a surprisingly simple question—*what metrics does our stack actually emit?*—and things quickly fall apart. One engineer described this problem perfectly while building a tool to solve it. Modern observability stacks generate thousands of metrics across different layers: application instrumentation, exporters, Kubernetes components, and cloud platforms. Yet those metrics are scattered across hundreds of repositories, documentation pages, and instrumentation libraries. The result is a strange paradox. Systems produce enormous amounts of telemetry, yet engineers often struggle to understand what signals even exist. This isn’t a small inconvenience. It affects how teams design monitoring strategies, how they build alerts, and how they debug incidents. And it reveals a deeper flaw in how observability ecosystems evolved. ## The Metric Sprawl Nobody Planned For Modern infrastructure stacks are assembled like digital Lego sets. A typical environment might include Kubernetes clusters, application services written in multiple languages, managed databases, message queues, and dozens of supporting components. Each of those systems exposes its own metrics through different mechanisms. Some metrics come from OpenTelemetry instrumentation embedded in application code. Others come from Prometheus exporters—tiny services designed to expose metrics for systems like PostgreSQL, Redis, Kafka, or MySQL. Then Kubernetes adds its own signals through tools like kube-state-metrics and cAdvisor. Cloud providers add yet another layer with metrics from services like EC2, Lambda, S3, and API gateways. Each source uses a slightly different format. Some metrics are defined in YAML metadata files. Others are embedded directly inside source code. Python libraries may use decorators or instrumentation APIs. In many cases the only documentation exists as scattered comments in repositories. Put all that together and the observability ecosystem starts to resemble an archaeological dig. The data exists. Finding it is the hard part. ## A Project That Tried to Map the Chaos Faced with this fragmentation, one engineer built a tool that feels almost obvious in hindsight: a public metric registry. Instead of manually searching through dozens of repositories, the registry scans source code, documentation, and metadata files across observability ecosystems. The result is a searchable catalog containing thousands of metrics from OpenTelemetry, Prometheus exporters, Kubernetes components, and cloud services. The current dataset includes over 3,000 metrics gathered from a wide range of sources. OpenTelemetry Collector components alone contribute more than a thousand metrics. Prometheus node exporters add hundreds more. Redis exporters, MySQL exporters, and PostgreSQL exporters each introduce their own sets of telemetry signals. Then there are metrics from AWS CloudWatch covering services like Lambda, DynamoDB, S3, and application load balancers. Individually, these metrics are documented somewhere. But until projects like this appear, they rarely exist in one place. That’s the problem the registry tries to solve. ## The Real Value: Planning Before You Instrument When engineers first hear about a metric registry, they often assume its purpose is documentation. But one comment from the community revealed a deeper use case. The real value isn’t just seeing what metrics already exist. It’s understanding what metrics *would appear* if you deployed a certain exporter or instrumentation library. That distinction matters more than it sounds. Observability planning often happens before systems are deployed. Engineers want to know what signals they’ll have available once monitoring tools are installed. For example: If you deploy the Prometheus PostgreSQL exporter, what database metrics will appear? If you instrument your service using OpenTelemetry, what request metrics will become available automatically? If you enable Kubernetes monitoring components, what cluster health signals can you collect? Without a central reference, answering those questions requires digging through documentation and source code. A registry simplifies that process dramatically. ## The Everyday Pain Engineers Recognize The reaction from other engineers was immediate. Many acknowledged that the scattered nature of observability metrics is one of the most frustrating parts of monitoring systems. Metrics definitions live across countless GitHub repositories, exporter packages, and documentation sites. Even experienced engineers sometimes struggle to track down the meaning of a specific metric name. One developer summed up the feeling perfectly. Pulling Prometheus and OpenTelemetry metrics information into a single place makes it far easier to plan monitoring strategies before deploying new infrastructure. That comment hints at something bigger. Observability has matured dramatically over the last decade. Tools for collecting telemetry are powerful. Storage systems can handle enormous volumes of data. But understanding the signals themselves is still surprisingly difficult. ## Why Observability Documentation Still Feels Fragmented Part of the problem lies in how observability ecosystems evolved. Most exporters and instrumentation libraries were created independently by different communities. Each project documented its metrics in whatever format made sense at the time. Some use structured metadata files. Others embed descriptions directly in source code. Some rely on auto-generated documentation. Very few follow a universal standard. Even OpenTelemetry, which aims to standardize telemetry across languages and frameworks, still contains many separate instrumentation packages maintained by different teams. That fragmentation isn’t anyone’s fault. But it creates a discovery problem for engineers trying to understand their monitoring systems. ## The Scale of Modern Metrics One detail in the registry project highlights how quickly metrics multiply. The Prometheus node exporter alone exposes over five hundred system metrics covering CPU usage, memory pressure, disk I/O, and network activity. Kubernetes monitoring components add hundreds more signals about pods, nodes, and cluster state. Then application instrumentation adds request metrics, latency distributions, error rates, and custom business signals. It’s easy for a moderately complex system to produce thousands of metrics. Yet most engineering teams actively monitor only a small fraction of them. The rest remain hidden until something goes wrong. ## Observability’s Next Problem: Discoverability For years the observability industry focused on solving three technical challenges. How to collect telemetry. How to store telemetry. How to visualize telemetry. Those problems are largely solved. The next challenge might be discoverability. Engineers need better ways to understand the telemetry their systems can generate. They need tools that explain which signals exist, where they originate, and what they mean. Projects like metric registries are an early step toward that goal. They turn observability from a guessing game into something closer to a searchable knowledge base. ## The Quiet Shift Toward Observability Knowledge Systems As observability ecosystems grow more complex, tools that organize telemetry knowledge may become just as important as the tools that collect it. Instead of treating metrics as isolated signals, engineers may start thinking of them as structured information. Where did this metric originate? What system emitted it? What does it measure? What alerts typically rely on it? Answering those questions quickly could make debugging faster and monitoring strategies more effective. And it might save engineers countless hours digging through repositories. ## The Real Lesson Behind the Registry At first glance, a searchable catalog of metrics might sound like a niche project. But it exposes a deeper truth about modern infrastructure. We’ve become incredibly good at generating telemetry. We’re still learning how to understand it. And until engineers can easily answer the question “what metrics does my system produce,” observability will remain part science, part detective work.

The Observability Treasure Hunt: Why Engineers Still Don't Know What Metrics Their Systems Actually Emit

Keep Exploring