Advanced Metrics Optimization: Filter, Reduce, and Aggregate

#observability #metrics #telemetry

Introduction

The massive growth of observability data isn’t limited to just log data. Metrics are growing just as fast, or faster. Making matters worse, DevOps and Engineering teams aren’t just dealing with the increasing volume of metrics data causing a spike in egress, storage, and compute costs. Many tools also charge by the number of custom metrics they track. When you consider that metric tags and tag values count towards many tools' custom metrics tally, all of this growth in metrics can cripple budgets.

Observo AI has several ways to help DevOps and Engineering teams control their metrics usage. In this article, we will review three different metrics use cases that show how Observo AI's Observability Pipeline can massively reduce metrics volumes and custom metric counts so your teams can analyze all of the data that matters without impacting your budget. Optimizing metrics can also improve query performance and help DevOps and Engineering teams address the most important areas for potential improvement.

Filter out high cardinality, unqueried metrics

Metrics can add a huge amount of volume to your telemetry data depending on how much an organization collects and how much cardinality those metrics have. Some collect and store more than a hundred million metrics on a daily basis. Compounding the issue, many if not most metrics are never queried by your DevOps and Engineering teams. Observo AI can integrate directly with data feeds from metric stores like Datadog and Elasticsearch to get metrics that have not been queried over a period of time and use these insights to filter out unused metrics. This reduces ingestion costs and also improves query performance in downstream metric stores. Observo AI can also provide insights about the cardinality of metrics and help you identify and filter out metrics that are causing an explosion in cardinality. No need to change agents and collectors across thousands of endpoints - just a massive reduction in data volume that isn’t being used. Observo AI can also rehydrate any metrics from your low-cost data lake to analyze previously filtered-out metrics.

Reduce custom metrics by filtering out tags

Not all metrics are created equally, and those with extremely high cardinality can multiply the number of custom metrics you pay for with tools like Datadog, Elasticsearch, Splunk, and others. This also severely impacts performance of metrics stores. Every tag and tag value creates a new custom time series that can quickly sink your observability budgets even if those tag values offer very little analytical value - examples include many different label values, such as user IDs, email addresses, or other unbounded sets of values. Observo AI can help tame the cardinality explosion with bespoke data transforms that limit or altogether filter out tags with a high number of metric tag values. Using the Observo metric limit enforcer, you can choose how many tag values to ingest before filtering them. You also optionally define an allow-list and deny-list of metric tags and tag values. This eliminates the high cost of ingesting and indexing these tag values so you can focus on custom metrics that provide the insights your DevOps and Engineering teams need to optimize your enterprise.

Aggregate high-frequency metrics

Metrics provide deep insights into the observability of your IT environment, but not all metrics give you any new insights, especially if they are collected at high frequency and there isn’t a lot of variation between one value to the next. Sampling can reduce the number of metrics you send to your tools, but it risks you sampling out an interesting value along with all of the low-signal, normal values. A better approach is to aggregate high-frequency metrics by summarizing all of the events across a specific time frame into a single event. Based on the specific metric, the ML model will summarize the data set with max/min value, median, cumulative values, or other summarization based on characteristics of the data set. The single event shows you the time range summarized and the important value during that period of time. Aggregating high-frequency metrics can significantly reduce data volume without losing any of the insights your team needs.

Conclusion

Observo AI is the right observability pipeline to address the rapid growth of metrics data. By filtering out metrics that are never queried, reducing the cardinality of metrics with low-value tag values, and aggregating high frequency metrics, you can dramatically reduce costs and improve the performance of the tools used by your Engineering and DevOps teams.

If you are analyzing metrics using Datadog, Splunk, Elasticsearch, Dynatrace or any other tools, Observo AI can help you optimize this data using the techniques we describe above. Don’t just take our word for it, schedule a demo today to see how we can help you save money and improve the performance and effectiveness of your DevOps and Engineering teams’ efforts.

DEV Community

Advanced Metrics Optimization: Filter, Reduce, and Aggregate

Top comments (0)

Read next

Understanding State Variables in React: Why and How

From Local to Live: How to Deploy Your React Application using Netlify.

ThrashBucket, store and share files anonymously

Configurando Prettier, ESLint y Husky en Angular