In this article, I demonstrate how to generate metrics and create alert on Prometheus from the pod logs. I used Prometheus and Fluentd, both installed via helm charts in my k8s cluster.
Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects and stores metrics as time-series data, providing powerful query capabilities and integrations for setting up alerts based on predefined thresholds.
Fluentd is an open-source data collector used to unify logging infrastructure. It allows users to collect, filter, and route log data from various sources, making it easier to manage logs across distributed systems and direct them to storage or analysis tools like Prometheus.
By the end of this article, you'll have a clear understanding of how to set up Prometheus and Fluentd in a Kubernetes environment to monitor pod logs, generate metrics, and create custom alerts.
I have an existing installation of Prometheus on my cluster. If you'd like to install Prometheus, you can look here Prometheus.
- We install Prometheus with helm chart, you need to install and configure it to scrape your cluster. Below is an example;
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install [RELEASE_NAME] prometheus-community/prometheus -n your-namespace
helm show values prometheus-community/prometheus -n your-namespace > values.yaml
prometheus.yml:
rule_files:
#...
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
helm upgrade --install release-name chart-name -f values.yaml --namespace your-namespace
Next, we will install Fluentd using helm. You can look here Fluentd.
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm install fluentd fluent/fluentd --namespace your-namespace
helm show values fluent/fluentd > values.yaml
- After installation, patch the values.yaml file to configure. Be sure to replace "pod-name" with the actual name of your pod. If you'd like to more information, you can look at this documentationFluentd Documentation.
fileConfigs:
01_base_config.conf: |-
<source>
@type tail
path /var/log/containers/*pod-name*.log
pos_file /var/log/fluentd/pod-name.pos
tag fluentd
<parse>
@type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key false
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%NZ'
keep_time_key false
</pattern>
</parse>
</source>
<filter fluentd>
@type grep
@id grep_fluentd
<regexp>
key log
pattern /.*(ERROR|error).*/
</regexp>
</filter>
<filter fluentd>
@type prometheus
<metric>
name fluentd_error_count
type counter
desc fluentd error
</metric>
</filter>
<match fluentd>
@type stdout
@id match_fluentd
</match>
helm upgrade --install fluentd fluent/fluentd -f values.yaml --namespace your-namespace
The configuration includes a source for reading logs, filters for processing and filtering logs, and a match directive for outputting logs. Here's a breakdown of each section in this configuration:
1. Source Section(<source>
):
The <source>
block defines where and how Fluentd should read logs. In this case, it's set up to tail log files.
<source>
@type tail
path /var/log/containers/*pod-name*.log
pos_file /var/log/fluentd/pod-name.pos
tag fluentd
<parse>
@type multi_format
...
</parse>
</source>
@type tail
: Specifies that Fluentd should read the logs by tailing a file (continuous reading as new lines are added).path /var/log/containers/*pod-name*.log
: The path to the log files Fluentd should read. It looks for files that match*pod-name*
in their name, which allows filtering by specific pod names.pos_file /var/log/fluentd/pod-name.pos
: The position file records the last read position in the log file, enabling Fluentd to continue from the last position if it restarts.tag fluentd
: Assigns the tagfluentd
to the logs read from this source, which is useful for routing and filtering logs within Fluentd.
Log Parsing (<parse>
)
The <parse>
block defines the log formats that Fluentd should recognize in these files. This config uses a multi_format
parser, allowing it to handle different log formats.
<parse>
@type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key false
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%NZ'
keep_time_key false
</pattern>
</parse>
@type multi_format
: Allows Fluentd to match multiple log formats.First
<pattern>
(JSON):format json
: Specifies that the log format is JSON.time_key time
: Identifies the key in JSON that represents the timestamp.time_format "%Y-%m-%dT%H:%M:%S.%NZ"
: Specifies the timestamp format in the JSON logs.keep_time_key false
: Removes the originaltime
key after parsing.Second
<pattern>
(Regular Expression):format regexp
: Specifies that the format is a regular expression.expression
: Regular expression pattern to extract fields from log lines that don’t follow JSON format.time_format
: Specifies the timestamp format.This pattern captures the following fields:
time
: The timestamp.stream
: Specifies if the log is from stdout or stderr.log
: The main log message content.
2. Filter Section (<filter>
):
The <filter>
blocks apply transformations and filters to the log entries tagged with fluentd
.
Filter - Grep
<filter fluentd>
@type grep
@id grep_fluentd
<regexp>
key log
pattern /.*(ERROR|error).*/
</regexp>
</filter>
- `@type grep`: This filter only allows log entries that match certain patterns.
- `<regexp>`:
- `key log`: Specifies that the `log` field should be checked.
- `pattern /.(ERROR|error)./`: Matches log entries containing the words `ERROR` or `error`. Only logs that match this pattern will pass through, filtering out non-error logs.
Second Filter - Prometheus Metrics
yaml
@type prometheus
name fluentd_error_count
type counter
desc fluentd error
-
@type prometheus
: This filter creates Prometheus metrics based on the logs. -
Metric Configuration
: -
name fluentd_error_count
: The name of the Prometheus counter metric. -
type counter
: Specifies that this metric is a counter, which increments with each matching log entry. -
desc fluentd error
: A description for this metric, which could be displayed in Prometheus.
3. Match Section (<match>
):
The <match>
block specifies what Fluentd should do with the filtered logs.
<match fluentd>
@type stdout
@id match_fluentd
</match>
-
@type stdout
: This output plugin writes logs to stdout. In a Kubernetes environment, this means logs will be visible in the pod's logs. -
@id match_fluentd
: Identifier for this match block.
After upgrade, you can view the existing metrics. First, you must do port-forward to the relevant service or pod. You can send a request to access the metrics.
kubectl port-forward service/fluentd 24231:24231 --namespace your-namespace
curl http://127.0.0.1:24231/metrics
- You can now view metrics. For example;
# TYPE fluentd_error_count counter
# HELP fluentd_error_count fluentd error
fluentd_error_count 42.0
If everything has been successful up to this point, we now need to configure Prometheus to scrape Fluentd service or pod.
If you have one or more nodes, you need to scrape the node where the pod whose logs you want to collect is running. Otherwise, you might not see the metrics. If there is only one nodes, this isn't an issue. If the pod is on multiple nodes, you can choose to scrape each node individually or you can scrape the Fluentd service directly.
prometheus.yml:
rule_files:
#...
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: fluentd
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets:
- <node-ip or pod ip>:24231
labels:
instance: fluentd-node-<node-name>
# or you can scrape service
- job_name: fluentd
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets:
- 'fluentd.your-namespace.svc.cluster.local:24231'
helm upgrade --install release-name chart-name -f values.yaml --namespace your-namespace
Now that Prometheus is installed and configured, we need to view the metrics in the Prometheus UI. To access it, you can use port-forwarding to create a secure tunnel from your local machine to the Prometheus service or pod within your Kubernetes cluster.
kubectl port-forward service/prometheus-server <local-port>:<target-port> --namespace your-namespace
- Then, open any browser and go to the following URL -> http://localhost:local-port/graph
After executing the request, you will be able to view the metrics and their counts.
If everything has been successful up to this point, you can create an alert for the metric on Prometheus. You can create alert based on your needs. For example;
alerting_rules.yml:
groups:
- name: fluentd_error_alerts
rules:
- alert: FluentdErrorsDetected
expr: increase(fluentd_error_count[1m]) > 0
labels:
severity: critical
annotations:
summary: "Fluentd errors detected on {{ $labels.instance }}"
description: "The fluentd_error_count metric increased. Check Fluentd logs on {{ $labels.instance }} for more details."
helm upgrade --install release-name chart-name -f values.yaml --namespace your-namespace
Thank you for reading until the end! I hope this guide helped you understand how to deploy and configure Prometheus, Fluentd using Helm. With these basics covered, you're well on your way to monitoring and managing your Kubernetes environment effectively.
In my next article, "How to Send Prometheus Alerts as Email and Teams Channel Notifications," we'll explore how to set up alerting for critical issues.
Top comments (1)
I used to mix Protheus and Prometheus. Whereas they aren't same. Thank much for accurating and giving more infos about.