This article is part of a personal project called Smart-cash. Previous posts covered topics such as The deployment of AWS and Kubernetes resources and Configuring logging with FluentBit, among others.
Project Source Code
The full project code can be found here, the project is still under development, but you can find the terraform code to create AWS resources and also the Kubernetes manifests.
Key concepts
Two key concepts that often come up in modern system monitoring are observability and telemetry.
- Observability helps us understand what is happening in the system.
- Telemetry refers to the data generated by applications, which includes logs, metrics, and traces.
This post focuses on traces
What is distributed tracing?
Distributed tracing provides visibility into the path that requests follow through an application, helping to identify which parts of the system are experiencing errors and how much time each operation takes.
Imagine an application that generates two random numbers and stores them in a database, two complex functions handle the numbers calculation. To gain visibility into the system’s behavior, we can introduce tracing.
Here we can introduce another important concept: The Span.
A span represents a single operation within a trace, in the example, 3 spans can be defined:
- One for each function that calculates a random number.
- One for the database call.
By structuring the trace with these spans, we can better understand what happens at each step, identify bottlenecks, and debug issues more effectively.
Introduction to OpenTelemetry
OpenTelemetry(OTel) is an open-source, vendor agnostic tool for generating and managing telemetry data, such as traces, metrics, and logs. However, storage and visualization of this data must be handled with other tools.
OTel helps us instrument our applications by providing APIs to define how telemetry data is generated, as well as components that can receive and export this data to external endpoints.
OpenTelemetry collector
OpenTelemetry collector is a key component that works as a proxy to receive, process, and export telemetry data.
Detailed information about receivers, processors, and exporters can be found in OpenTelemetry Documentation.
The collector is not a mandatory component; data can be exported directly to the backend using libraries. However, doing so adds processing overhead to the application.
In this scenario, a collector will be installed, and the data will be sent to Jaeger.
Installing OpenTelemetry on an AWS EKS cluster
We will use OpenTelemetry Helm chart, installation is managed by FluxCD. All the files can be found in the repo.
Helm chart values
Let's start with some general values for the Helm chart
mode: "deployment"
namespaceOverride: "observability"
presets:
kubernetesAttributes:
enabled: true
extractAllPodLabels: true
extractAllPodAnnotations: true
image:
repository: otel/opentelemetry-collector-k8s
pullPolicy: IfNotPresent
command:
name: "otelcol-k8s"
Let's focus in the presets section, which allows predefined configurations for specific scenarios. In this case, the kubernetesAttributes preset:
✔ Extracts all pod labels and annotations to enrich traces with Kubernetes metadata.
✔ Uses the Kubernetes Attributes processor to automatically add pod-related information to telemetry data.
This additional metadata helps correlate telemetry data with the Kubernetes environment.
Collector configurations
OpenTelemetry collector configuration is passed in the chart values, the configuration defines:
- Receivers ---> Where telemetry data is received.
- Processors ---> How the data is modified or filtered.
- Exporters ---> Where the data is sent.
Let's break down the configuration
Receivers
The collector listens for telemetry data on port 4318 (HTTP). Applications should send telemetry data to this endpoint.
receivers:
otlp:
protocols:
http:
endpoint: $${env:MY_POD_IP}:4318
Processors
The processor manages data in batches using a default configuration ({}). This groups data before exporting it, helping to reduce network load.
processors:
batch: {}
memory_limiter:
check_interval: 5s
limit_percentage: 80
The memory_limiter processor prevents high memory consumption by monitoring usage at intervals defined by check_interval and limiting usage based on limit_percentage.
- In this case, the collector checks memory usage every 5 seconds.
- If usage exceeds 80%, the collector drops data to prevent a crash.
Exporters
The collector will send the data to a K8 service using the internal DNS jaeger-traces-collector.observability.svc.cluster.local.
config:
exporters:
otlphttp:
endpoint: "http://jaeger-traces-collector.observability.svc.cluster.local:4318"
tls:
insecure: true
Services section
service:
pipelines:
traces:
receivers:
- otlp
exporters:
- otlphttp
The service section defines the enabled components(receiver, exporters, processors) and specifies how the data(traces, metrics, or logs) flows through pipelines
For this scenario:
- Only traces are processed in the pipeline.
- The exporters and receivers defined above are used.
Instrumenting a Golang microservice
At a high level, the code is structured as shown in the following diagram:
The Gin Web Framework is used to create the API. Incoming requests pass through different layers (handler, service, and repository), each responsible for specific logic.
OTel provides APIs and SDKs to generate and collect telemetry data. Additionally, there are several instrumentation libraries that simplify these tasks.
Gin instrumentation library will be used.
Init the OTel SDK
To begin, we need to create an OTel resource, which represents an entity producing telemetry data—in this case, the microservice.
A resource can have attributes that are configured during its creation. These attributes help in discovering telemetry data, particularly the traces generated by the application.
Let's see part of the code used here
res, err := resource.New(
context.Background(),
resource.WithFromEnv(), // Discover and provide attributes from OTEL_RESOURCE_ATTRIBUTES and OTEL_SERVICE_NAME environment variables.
resource.WithTelemetrySDK(), // Discover and provide information about the OpenTelemetry SDK used.
resource.WithContainer(), // Discover and provide container information.
resource.WithAttributes(semconv.ServiceNameKey.String("ExpenseService")), // Add custom resource attributes.
)
Next, we need to create an exporter to send the data to the previously created collector:
exporter, err := otlptracehttp.New(
context.Background(),
otlptracehttp.WithEndpoint(otelUrl+":4318"),
otlptracehttp.WithInsecure(),
)
To start generating traces, we need to define a Tracer Provider responsible for generating and managing traces. Here we set some configurations.
tp := trace.NewTracerProvider(
trace.WithBatcher(
exporter,
trace.WithMaxExportBatchSize(trace.DefaultMaxExportBatchSize),
trace.WithBatchTimeout(trace.DefaultScheduleDelay*time.Millisecond),
trace.WithMaxExportBatchSize(trace.DefaultMaxExportBatchSize),
),
trace.WithResource(res),
)
The complete code for initializing the collector can be found here
Configuring Gin middleware
router := gin.New()
router.Use(
otelgin.Middleware("ExpenseService", otelgin.WithFilter(filterTraces)),
gin.Recovery(), gin.Recovery(),
)
The key point in this code is the service_name(ExpenseService) passed to the middleware. It must remain consistent across all spans generated to ensure accurate and unified trace data.
Gin library will manage internally the instrumentation for this part of the code.
Creating spans
A trace is composed of multiple spans that can be nested using context.
Here’s how to create a span in one of our functions (service layer) used to create an expense:
func createExpense() {
tr := otel.Tracer("ExpenseService") # Trace Creation Microservice Name
trContext, childSpan := tr.Start(ctx,"CreateExpense")
childSpan.SetAttributes(attribute.String("component","serviceLevel"))
defer childSpan.End()
response, err := s.expensesRepository.CreateExpense(trContext, expense)
}
In the first line, a tracer is created with the name ExpenseService.
Then, a new span named CreateExpense is created and associated with this tracer. This span includes an attribute called component with the value serviceLevel.
The span creation returns:
- trContext: A new context that carries span information for passing to other functions.
- childSpan: The span object itself.
This allows you to track the execution and timing of different parts of the function.
Now, let’s proceed to the Jaeger installation to visualize the traces created.
Visualize traces with jaeger
Jaeger is an end-to-end distributed tracing system designed for monitoring and troubleshooting microservices-based architectures.
Commonly integrated as an OpenTelemetry backend, Jaeger stores and visualizes trace data, helping developers understand the flow and performance of their applications.
The Jaeger operator simplifies managing Jaeger resources on Kubernetes, the creation of the Jaeger instance is made by this yaml file:
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger-traces
namespace: observability
spec:
strategy: production
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx # Ingress annotations here
ingressClassName: nginx
hosts:
- jaeger.smartcash.rootkit.site #your domain name.
collector:
maxReplicas: 5
resources:
limits:
cpu: 100m
memory: 128Mi
Through K8 ingress we can access the Jaeger UI and visualize the traces.
In the Jaeger UI, traces can be filtered based on the service name defined during OpenTelemetry (OTel) setup — in this case, expenses.
Jaeger displays the trace along with its spans, which represent individual operations within the trace. For this example, we have three spans:
- Handler Span: Captures the execution of the HTTP handler.
- Service Span: Covers the business logic executed in the service layer.
- Repository Span: Represents the database connection and related operations. This structure helps in understanding the flow of requests and identifying potential bottlenecks or failures in the microservice.
Each span includes the component tag added in the code, along with metadata about the Kubernetes environment (such as pod labels and annotations). This enriched information helps in tracing requests across distributed components effectively.
Visualizing span created by Gin Library
The following image shows the span generated by the Gin Instrumentation library, all the data related to the incoming request is aggregated by this library.
Top comments (1)
Thanks, Daniel, for the detailed walkthrough.