DEV Community

gaurang101197
gaurang101197

Posted on • Edited on

Plotting Histogram Distribution Over Time in Grafana

Histogram Distribution Over Time

If you are looking for plotting histogram distribution over time as shown in above image then this blog is for you. This blog does not cover internals of histogram and Grafana.

Why Histogram Distribution Over Time

  • It helps to understand how distribution looks like over time.
  • It is very useful to find the time period when distribution skewed.
  • While histogram distribution summarize distribution and useful to check system performance at glance, distribution over time help to detect time period when performance degrades.

Pre-requisite

  1. Internals of histogram: https://prometheus.io/docs/practices/histograms/
  2. Better to have hands on experience on how Prometheus histogram works and prior experience with Grafana.

Use-case

Plot latency distribution over time of any operation, for e.g. API latency, db latency.

Setup

  • Measure latency metric using Prometheus Histogram.
  • Metric name is my_latency_metric.
  • Histogram buckets used are [0, 80, 160, 320, 640, 1280, 2560, 5120].

Step 1: Panel visualization

Select Heatmap in Panel section shown as below image.

Heatmap Panel

Step 2: Query



round(sum by (le) (increase(my_latency_metric_bucket{label_name=~"label_value"}[$__interval])))


Enter fullscreen mode Exit fullscreen mode
  1. label_name=~"label_value" - [Optional] filters the metric data.

  2. increase - Calculate the difference between two data points. We have used $__interval to make use of appropriate interval automatically calculated by Grafana.

    Quote from prometheus documentation.

    increase(v range-vector) calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.

    increase acts on native histograms by calculating a new histogram where each component (sum and count of observations, buckets) is the increase between the respective component in the first and last native histogram in v.

  3. sum by (le): Sums metric values by le (where le refers histogram bucket label name). Suppose you measure latencies of your API which is deployed on k8s with multiple pods and you have pod id as label name. In this case, each pod emits latency data and we want to get picture of overall deployment. So we need to aggregates data of all pods and sum by (le) perform this. It aggregates increase happens in each pod by le.

  4. round: As you might know, increase can return non integer value and if we see non-integer number for counter then it looks bad. To avoid this, we use round function to convert all values to integer.

Step 3: Query Options

Select heatmap in Format and type {{le}} in Legend in query option as shown in below image.

Query Option

Step 4: Panel Query Options

Select Min Interval as twice of Scrape Interval. In given example, I have used 1m. This handles variation in Scrape Interval If any.

Panel Query Options

Reference

  1. https://grafana.com/blog/2020/06/23/how-to-visualize-prometheus-histograms-in-grafana/

Top comments (0)