DEV Community

Cover image for Demystifying Azure Kubernetes Cluster Automatic
Ivan Porta
Ivan Porta

Posted on • Originally published at gtrekter.Medium

Demystifying Azure Kubernetes Cluster Automatic

It seems that Microsoft timed this release perfectly to coincide with the 10th anniversary of Kubernetes. A couple of weeks ago, Microsoft officially announced the public preview of Azure Kubernetes Service (AKS) Automatic. In this article, I will explain what AKS Automatic is and highlight the differences between it and the standard AKS cluster.

What problem does Azure Kubernetes Service Automatic solve?

Kubernetes has become the go-to container orchestration engine for many organizations. According to the Annual CNCF survey 2023, 66% of respondents were using Kubernetes in production, and 18% were evaluating it. Kubernetes offers significant flexibility, improved resource utilization, and other benefits. Additionally, it boasts an impressive ecosystem of plugins and projects backed by the CNCF.

Image description

However, with great power comes great challenges. Kubernetes can be intimidating and overwhelming for some workloads. In fact, over half of the 1,300 respondents to a Statista study in 2022 indicated that the biggest challenge they faced when migrating to or using Kubernetes and containers was the lack of in-house skills and, consequently, limited manpower.

Image description

This is where Azure Kubernetes Service Automatic comes in to help.

Prerequisites

To create an Azure Kubernetes Cluster Automatic, there are several preview flags what you will need to register to your subscription. In particular:

az feature register --namespace "Microsoft.ContainerService" --name "EnableAPIServerVnetIntegrationPreview"
az feature register --namespace "Microsoft.ContainerService" --name "NRGLockdownPreview"
az feature register --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"
az feature register --namespace "Microsoft.ContainerService" --name "DisableSSHPreview"
az feature register --namespace "Microsoft.ContainerService" --name "SafeguardsPreview"
az feature register --namespace "Microsoft.ContainerService" --name "AutomaticSKUPreview"
Enter fullscreen mode Exit fullscreen mode

What is Azure Kubernetes Cluster Automatic?

While Azure Kubernetes Service takes away the overwhelming task of installing and setting up a Kubernetes cluster, configuring the nodes, and more, Azure Kubernetes Service Automatic takes a step forward. In addition to cluster setup, it will install and configure additional projects like KEDA and Cilium, and enable multiple features like VPA, node autoscaling, and resource group locks by default.

Note: All these configurations can be done manually on your AKS Standard resource. In this case, this service provides a pre-made resource with all these configurations and settings enabled and configured.

To better understand what it brings to the table, let’s break it down into the following groups:

Security

We can identify two scopes where the default configurations will affect: the cluster and the workloads. From the cluster perspective, it will:

Azure Role-Based Access Control (RBAC)

Ensures that Azure RBAC is enabled and both local and SSH access to the pool are disabled. This best practice prevents direct access to the nodes, reducing the possible attack surface, and uses Azure’s fine-grained identity control to define who can access and manage the cluster resources.

As you can see, SSH access to the nodes in the pool used by AKS is disabled:

$ az aks nodepool list --resource-group rg-training-dev --cluster-name aks-training-auto-wus-01
[
  {
    ...
    "osSku": "AzureLinux",
    "osType": "Linux",
    ...
    "securityProfile": {
      "enableSecureBoot": false,
      "enableVtpm": false,
      "sshAccess": "Disabled"
    },
    ...
  }
]
Enter fullscreen mode Exit fullscreen mode

While SSH is disabled for the nodes, you can still SSH into a pod; this configuration only affects nodes.

$ kubectl run -i --tty --rm --image=busybox --restart=Never -- sh
Warning: [azurepolicy-k8sazurev2containerenforceprob-56e31e6a92773e331f84] Container <sh> in your Pod <sh> has no <livenessProbe>. Required probes: ["readinessProbe", "livenessProbe"]
Warning: [azurepolicy-k8sazurev2containerenforceprob-56e31e6a92773e331f84] Container <sh> in your Pod <sh> has no <readinessProbe>. Required probes: ["readinessProbe", "livenessProbe"]
Warning: [azurepolicy-k8sazurev3containerlimits-8d53352efa522a0527f5] container <sh> has no resource limits
Warning: [azurepolicy-k8sazurev1containerrestrictedi-bb9d0e008cf63badac4c] sh in default does not have imagePullSecrets. Unauthenticated image pulls are not recommended.
If you don't see a command prompt, try pressing enter.
/ #
Enter fullscreen mode Exit fullscreen mode

By default, even if you have enabled the integration with Azure AD, subscription Owners and Contributors can still access the cluster by using the --admin flag when generating static credentials. However, Azure Kubernetes Service Automatic will automatically disable local accounts.

$ az aks get-credentials --resource-group rg-training-dev --name aks-training-auto-wus-01 --overwrite-existing --admin
The behavior of this command has been altered by the following extension: aks-preview
(BadRequest) Getting static credential is not allowed because this cluster is set to disable local accounts.
Code: BadRequest
Enter fullscreen mode Exit fullscreen mode

It will also prevent the re-enabling of local accounts.

$ az aks update --resource-group rg-training-dev --name aks-training-auto-krc-01 --enable-local-accounts
(BadRequest) Managed cluster 'Automatic' SKU should enable 'DisableLocalAccounts' feature with recommended values
Code: BadRequest
Message: Managed cluster 'Automatic' SKU should enable 'DisableLocalAccounts' feature with recommended values
Enter fullscreen mode Exit fullscreen mode

As you can see, the adminUsers property is set to null.

$ az aks show --resource-group rg-training-dev --name aks-training-auto-krc-01 --query "aadProfile"
{
  "adminGroupObjectIDs": null,
  "adminUsers": null,
  "clientAppId": null,
  "enableAzureRbac": true,
  "managed": true,
  "serverAppId": null,
  "serverAppSecret": null,
  "tenantId": "00434baa-68ec-4d73-b0a2-fec5bac28891"
}
Enter fullscreen mode Exit fullscreen mode

Image Cleaner Add-On

It will Install the Image Cleaner Add-On, which will automatically identify and remove stale images on the nodes, which might contain vulnerabilities that could create security issues.

az aks show --resource-group rg-training-dev --name aks-training-auto-wus-01
{
  ...
  "securityProfile": {
    ...
    "imageCleaner": {
      "enabled": true,
      "intervalHours": 168
    },
    ...
  },
  ...
}

$ kubectl get pods --namespace kube-system
NAME                                                   READY   STATUS    RESTARTS      AGE
eraser-controller-manager-794b999f7c-ml68b             1/1     Running   0             25h
...
Enter fullscreen mode Exit fullscreen mode

Lockdown of Node Manager Azure Resource Group

Prevents users from changing the Azure resources directly. Changes to these resources can affect cluster operations or cause future issues. For example, scaling and network configurations should be done via the Kubernetes API and not directly on the resources.

Image description

API Server VNet Integration

This feature ensures that the network traffic between your API server and your node pools remains on the private network only by putting the API server behind an internal load balancer VIP in the delegated subnet, which the nodes are configured to utilize.

$ az aks show --resource-group rg-training-dev --name aks-training-auto-wus-01 
{
  ...
  "enableVnetIntegration": true,
  ...
}
Enter fullscreen mode Exit fullscreen mode

In the resource group hosting the resources used by the AKS cluster, you will see an additional load balancer besides the default Kubernetes load balancer.

Image description

This load balancer will have the following backend pool addresses, which will point to the new subnet dedicated to the API Server in the Virtual Network used by the AKS cluster. Its backend pool will contain the IP addresses of the individual API server instances.

$ az network lb address-pool list --resource-group rg-training-dev-infrastructure --lb-name kube-apiserver
[
  {
    "etag": "W/\"********-****-****-****-************\"",
    "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/loadBalancers/kube-apiserver/backendAddressPools/kube-apiserver-backendpool",
    "loadBalancerBackendAddresses": [
      {
        "ipAddress": "10.226.0.10",
        "name": "10.226.0.10",
        "subnet": {
          "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/virtualNetworks/aks-vnet-30931460/subnets/aks-apiserver-subnet",
          "resourceGroup": "rg-training-dev-infrastructure"
        }
      },
      {
        "ipAddress": "10.226.0.11",
        "name": "10.226.0.11",
        "subnet": {
          "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/virtualNetworks/aks-vnet-30931460/subnets/aks-apiserver-subnet",
          "resourceGroup": "rg-training-dev-infrastructure"
        }
      },
      {
        "ipAddress": "10.226.0.5",
        "name": "10.226.0.5",
        "subnet": {
          "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/virtualNetworks/aks-vnet-30931460/subnets/aks-apiserver-subnet",
          "resourceGroup": "rg-training-dev-infrastructure"
        }
      },
      {
        "ipAddress": "10.226.0.6",
        "name": "10.226.0.6",
        "subnet": {
          "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/virtualNetworks/aks-vnet-30931460/subnets/aks-apiserver-subnet",
          "resourceGroup": "rg-training-dev-infrastructure"
        }
      },
      {
        "ipAddress": "10.226.0.8",
        "name": "10.226.0.8",
        "subnet": {
          "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/virtualNetworks/aks-vnet-30931460/subnets/aks-apiserver-subnet",
          "resourceGroup": "rg-training-dev-infrastructure"
        }
      },
      {
        "ipAddress": "10.226.0.9",
        "name": "10.226.0.9",
        "subnet": {
          "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/virtualNetworks/aks-vnet-30931460/subnets/aks-apiserver-subnet",
          "resourceGroup": "rg-training-dev-infrastructure"
        }
      }
    ],
    "loadBalancingRules": [
      {
        "id": "/subscriptions/********-****-****-****-************/resourceGroups/rg-training-dev-infrastructure/providers/Microsoft.Network/loadBalancers/kube-apiserver/loadBalancingRules/kube-apiserver-rule",
        "resourceGroup": "rg-training-dev-infrastructure"
      }
    ],
    "name": "kube-apiserver-backendpool",
    "provisioningState": "Succeeded",
    "resourceGroup": "rg-training-dev-infrastructure",
    "type": "Microsoft.Network/loadBalancers/backendAddressPools"
  }
]
Enter fullscreen mode Exit fullscreen mode

Form a workload prospective it will:

Workload Identity

This service is based on OpenID Connect (OIDC) and works with resources both inside and outside Azure, such as GitHub and other Kubernetes clusters. It’s a best practice because it doesn’t require storing any passwords. This is an enhancement over User or System Managed Identity, which also used OpenID Connect but was limited to working only with Azure resources. As you can see, the OIDC issuer is enabled by default and provides a URL that clients can use to discover and interact with the identity provider.

$ az aks show --resource-group rg-training-dev --name aks-training-auto-krc-01  --query "oidcIssuerProfile"
{
  "enabled": true,
  "issuerUrl": "https://koreacentral.oic.prod-aks.azure.com/00434baa-68ec-4d73-b0a2-fec5bac28891/250b5570-227b-46d2-a0d2-a705dd5ce854/"
}
Enter fullscreen mode Exit fullscreen mode

Note: Once enabled, the OIDC issuer on the cluster, disabling it is not supported.

Deployment Safeguards

This feature enables deployment safeguards to enforce Azure Policies, ensuring that specific misconfigurations that could introduce security issues are not deployed to the cluster. Deployment safeguards programmatically assess your clusters at creation or update time for compliance. There are two levels of configuration for deployment safeguards:

  • Warning Level: Alerts you to any non-compliant configuration in the request.
  • Enforcement Level: Blocks you from deploying non-compliant configurations.

The compliance information is aggregated and displayed in Azure Policy’s compliance dashboard. Behind the scenes, Deployment Safeguards utilize the open-source Gatekeeper. Gatekeeper acts as an admission controller, intercepting requests to the Kubernetes API server and evaluating the requests against the defined policies.

az aks show --resource-group rg-training-dev --name aks-training-auto-krc-01 --query "addonProfiles"
{
  ...
  "azurepolicy": {
    "config": null,
    "enabled": true,
    "identity": {
      "clientId": "********-****-****-****-************",
      "objectId": "********-****-****-****-************",
      "resourceId": "/subscriptions/********-****-****-****-************/resourcegroups/rg-training-dev-infrastructure/providers/Microsoft.ManagedIdentity/userAssignedIdentities/azurepolicy-aks-training-auto-krc-01"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

As a result, you will have both the pods used by Gatekeeper and to interact with Azure Policy:

$ kubectl get pods -A
NAMESPACE            NAME                                                   READY   STATUS                     RESTARTS        AGE
...
gatekeeper-system    gatekeeper-audit-7bd8cb9f77-5xrr6                      1/1     Running                    0               33h
gatekeeper-system    gatekeeper-controller-54694cd6c5-cjnpj                 1/1     Running                    0               33h
gatekeeper-system    gatekeeper-controller-54694cd6c5-x7hdb                 1/1     Running                    0               33h
kube-system          azure-policy-947f696dd-b2scr                           1/1     Running                    0               33h
kube-system          azure-policy-webhook-6755fbcdbf-q87dx                  1/1     Running                    0               33h
Enter fullscreen mode Exit fullscreen mode

Azure Key Vault Provider

The Azure Key Vault Provider for the Secrets Store CSI Driver allows to integrate and use Azure Key Vault as secrets store via a CSI volume. By doing so, you will be able to mount secrets, keys, and certificates to a pod using a CSI volume.

$ az aks show --resource-group rg-training-dev --name aks-training-auto-krc-01 --query addonProfiles
{
  "azureKeyvaultSecretsProvider": {
    "config": {
      "enableSecretRotation": "true"
    },
    "enabled": true,
    "identity": {
      "clientId": "********-****-****-****-************",
      "objectId": "********-****-****-****-************",
      "resourceId": "/subscriptions/********-****-****-****-************/resourcegroups/rg-training-dev-infrastructure/providers/Microsoft.ManagedIdentity/userAssignedIdentities/azurekeyvaultsecretsprovider-aks-training-auto-krc-01"
    }
  }
  ...
}
Enter fullscreen mode Exit fullscreen mode

When enabled, the add-on creates a user-assigned managed identity named azurekeyvaultsecretsprovider-xxx that is going to be used to authenticate to your key vault.

Image description

This managed identity is named and is automatically assigned to the Virtual Machine Scale Sets (VMSS) used by the cluster.

Image description

In the cluster it will create the related CSI drivers pods, and Azure Provider running on each agent node.

$ kubectl get pods -n kube-system
NAME                                                   READY   STATUS                     RESTARTS        AGE
aks-secrets-store-csi-driver-vd4pr                     3/3     Running                    0               25h
aks-secrets-store-csi-driver-zrnhj                     3/3     Running                    0               25h
aks-secrets-store-csi-driver-zwd7t                     3/3     Running                    0               4h31m
aks-secrets-store-provider-azure-2c4xw                 1/1     Running                    0               25h
aks-secrets-store-provider-azure-lzndd                 1/1     Running                    0               4h31m
aks-secrets-store-provider-azure-zrmgr                 1/1     Running                    0               25h
Enter fullscreen mode Exit fullscreen mode

Networking

Azure Container Networking Interface (CNI)

Azure CNI builds on the open-source Cilium project, utilizing the Berkeley Packet Filter (eBPF) dataplane to improve cluster performance and security. For example, by moving filtering tasks to kernel space through bpfilter instead of iptables, it significantly boosts performance. Additionally, it extends the Kubernetes NetworkPolicy API by introducing a custom CRD that supports more sophisticated network policies. This includes L7 network policies (beyond the standard L3/L4) and allows for the specification of port ranges for both ingress and egress.

You will be able to see the running Cilium pods, which handle network policies and connectivity in the cluster:

$ az aks show --resource-group rg-training-dev --name aks-training-auto-wus-01 --query "networkProfile"
{
  ...
  "networkDataplane": "cilium",
  "networkMode": null,
  "networkPlugin": "azure",
  "networkPluginMode": "overlay",
  "networkPolicy": "cilium",
  "outboundType": "managedNATGateway",
  "podCidr": "10.244.0.0/16",
  "podCidrs": [
    "10.244.0.0/16"
  ],
  "serviceCidr": "10.0.0.0/16",
  "serviceCidrs": [
    "10.0.0.0/16"
  ]
}

$ kubectl get pods --namespace kube-system
NAME                                                   READY   STATUS    RESTARTS      AGE
cilium-fdbjp                                           1/1     Running   0             26h
cilium-operator-559887cf4-5drnd                        1/1     Running   1 (25h ago)   26h
cilium-operator-559887cf4-74c8x                        1/1     Running   1 (25h ago)   26h
cilium-tgfqp                                           1/1     Running   0             26h
cilium-xpww4                                           1/1     Running   0             26h
...
Enter fullscreen mode Exit fullscreen mode

According to a performance test performed by Microsoft, where 50,000 requests were generated and measured for overall completion time. It was observed that while the service routing latency was similar at the beginning, the number of pods exceeded 5000, the latency for kube-proxy based clusters increased, while it remained consistent for Cilium-based clusters.

Image description

Application Routing Add-On and Integration with Azure DNS

The Application Routing add-on installs a Custom Resource Definition (CRD) called NginxIngressController, which specifies the schema, validation rules, and properties for this resource.

$ kubectl get crds
NAME                                                                    CREATED AT
..
nginxingresscontrollers.approuting.kubernetes.azure.com                 2024-06-21T16:06:41Z
Enter fullscreen mode Exit fullscreen mode

Alongside this, a new IngressClass named webapprouting.kubernetes.azure.com is created, which defines the class of ingress controllers available in the cluster and specifies which controller will handle Ingress resources with this class.

$ kubectl get IngressClass -A
NAME                                 CONTROLLER                                 PARAMETERS   AGE
webapprouting.kubernetes.azure.com   webapprouting.kubernetes.azure.com/nginx   <none>       4d13h
Enter fullscreen mode Exit fullscreen mode

The add-on also creates a LoadBalancer service named nginx in the app-routing-system namespace and a deployment named nginx, which in turn creates pods labeled app=nginx running the nginx-ingress-controller and configures them to use the webapprouting.kubernetes.azure.comIngressClass.

$ kubectl get service -A
NAMESPACE            NAME                               TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)                                      AGE
...
app-routing-system   nginx                              LoadBalancer   10.0.113.74    20.249.170.147   80:31926/TCP,443:32563/TCP,10254:32575/TCP   3d20h

$ kubectl get pods -A
NAMESPACE            NAME                                                   READY   STATUS                     RESTARTS        AGE
...
app-routing-system   nginx-6f5b856d74-m2kfm                                 1/1     Running                    0               41h

$ kubectl describe pod nginx-6f5b856d74-m2kfm -n app-routing-system
Name:                 nginx-6f5b856d74-m2kfm
Namespace:            app-routing-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      nginx
Node:                 aks-system-surge-sbl4j/10.224.0.7
Start Time:           Tue, 25 Jun 2024 12:03:35 +0900
Labels:               app=nginx
                      app.kubernetes.io/component=ingress-controller
                      app.kubernetes.io/managed-by=aks-app-routing-operator
                      pod-template-hash=6f5b856d74
..
Status:               Running
Controlled By:  ReplicaSet/nginx-6f5b856d74
Containers:
  controller:
    Container ID:  containerd://3e9b9d44953e7f88e660744f171d356d3918f2e53505858afed4b54bf7a1a911
    Image:         mcr.microsoft.com/oss/kubernetes/ingress/nginx-ingress-controller:v1.10.0
    Image ID:      mcr.microsoft.com/oss/kubernetes/ingress/nginx-ingress-controller@sha256:65a29e557a3c2b4f4762e5c2f90b563bf07ee0ceb23bebfd0f8161f029ffb2a6
    ...
    Args:
      /nginx-ingress-controller
      --ingress-class=webapprouting.kubernetes.azure.com
      --controller-class=webapprouting.kubernetes.azure.com/nginx
      --election-id=nginx
      --publish-service=$(POD_NAMESPACE)/nginx
      --configmap=$(POD_NAMESPACE)/nginx
      --enable-annotation-validation=true
      --http-port=8080
      --https-port=8443
...
Enter fullscreen mode Exit fullscreen mode

This setup simplifies the integration and management of DNS and SSL configurations within your AKS cluster, as these ingress controllers can integrate with Azure DNS by default, eliminating the need for manual DNS settings configuration and DNS record management through Azure DNS.

NAT Gateway

To avoid limitations related to the available number of outbound flows of traffic in an Azure Load Balancer, it installs a NAT Gateway for scalable outbound connection flows. Azure NAT Gateway allows up to 64,512 outbound UDP and TCP traffic flows per IP address, with a maximum of 16 IP addresses. This means that a single NAT Gateway can manage up to 1,032,192 outbound connections.

Image description

Autoscaling

We can identify two scopes where the default configurations will affect: the cluster and the workloads. From the cluster perspective, it will:

Node auto-provisioning (NAP) (preview)

When deploying workloads onto AKS, you need to decide on the node pool configuration regarding the VM size needed. As your workloads evolve, they may require different CPU, memory, and capabilities to run. Node Autoprovisioning (NAP) (Preview) is based on the open-source Karpenter project developed by AWS and uses an Azure provider to interact with the Azure API to manage VM instances.

Karpenter monitors the requests of pending pods and starts or terminates nodes based on resource requests (CPU, memory, GPUs) and constraints (node affinity, node anti-affinity). This behavior ensures the best VM configuration based on the workload requirements while simultaneously reducing infrastructure costs.

From the workload perspective, it will:

Kubernetes Event-Driven Autoscaling (KEDA)

KEDA is an open-source event-driven autoscaler that acts as an agent to monitor events from multiple sources such as queues, databases, file systems, messaging systems, HTTP endpoints, custom metrics, and more, triggering the related scaling actions. It enhances the existing Kubernetes Horizontal Pod Autoscaler (HPA) by introducing new types of triggers based on events rather than just metrics resource utilization like CPU or memory.

$ kubectl get pods -A
NAMESPACE            NAME                                                   READY   STATUS                     RESTARTS        AGE
...
kube-system          keda-admission-webhooks-7778cc48bd-8wcpl               1/1     Running                    0               9m17s
kube-system          keda-admission-webhooks-7778cc48bd-tvqcw               1/1     Running                    0               9m17s
kube-system          keda-operator-5c76fdd585-mgv77                         1/1     Running                    0               9m17s
kube-system          keda-operator-5c76fdd585-tgnqv                         1/1     Running                    0               9m17s
kube-system          keda-operator-metrics-apiserver-58c8cbcc85-rslrm       1/1     Running                    0               9m21s
kube-system          keda-operator-metrics-apiserver-58c8cbcc85-z4g56       1/1     Running                    0               9m21s
Enter fullscreen mode Exit fullscreen mode

Vertical Pod Autoscaler

Microsoft officially announced support for the VPA on AKS in October 2022. However, this new AKS Automatic service enables it by default. The VPA dynamically adjusts the CPU and memory requests and limits of containers based on both historical and current usage data.

It’s important to note that Kubernetes cannot modify these values for a running pod. If a pod’s resource allocations are out of sync with the VPA’s recommendations, Kubernetes will delete the misaligned pods one by one, then redeploy them with the original limits and requests. Thanks to the mutating admission webhook, it will update the resource limits and requests according to the VPA’s recommendations before the pods are scheduled on a node.

$ kubectl get pods -A
NAMESPACE            NAME                                                   READY   STATUS                     RESTARTS        AGE
...
kube-system          vpa-admission-controller-7cdd598b67-87h9b              1/1     Running                    0               4m46s
kube-system          vpa-admission-controller-7cdd598b67-rjrbj              1/1     Running                    0               4m51s
kube-system          vpa-recommender-76b9bb6fd-9v7cr                        1/1     Running                    0               4m41s
kube-system          vpa-updater-5d69655799-h6dl5                           1/1     Running                    0               4m50s
Enter fullscreen mode Exit fullscreen mode

Observability

During the creation of the AKS Automatic cluster, you have the option of enabling Container Insights, Managed Prometheus, and Managed Grafana. When these features are enabled, you won’t see any pods running Grafana or Prometheus, nor persistent volumes storing data. Instead, you will only see pods collecting metrics and logs, managed by the Azure Monitor agents. These agents send the collected data to the Azure Monitor workspace.

Image description

Azure Managed Prometheus

This PaaS service, built on top of the open-source Prometheus system developed by SoundCloud, leverages data stored in the Azure Monitor workspace sent by the ama-metrics-* pods. It allows you to perform queries and set up alerting on the collected metrics. For each Linux node in the cluster, there will be a dedicated ama-metrics-node pod.

Image description

You can identify the Azure Monitor agents responsible for collecting logs and metrics by inspecting the running pods:

$ kubectl get pods -A
NAMESPACE            NAME                                                   READY   STATUS                     RESTARTS        AGE
...
kube-system          ama-logs-67s4m                                         2/2     Running                    0               13m
kube-system          ama-logs-7mm5t                                         2/2     Running                    0               3d11h
kube-system          ama-logs-bnjtm                                         2/2     Running                    0               3d11h
kube-system          ama-logs-rmbzz                                         2/2     Running                    0               8m14s
kube-system          ama-logs-rs-c9db97d64-nsrb9                            1/1     Running                    0               31h
kube-system          ama-logs-xb2rj                                         2/2     Running                    0               3d11h
kube-system          ama-metrics-797c67fbf7-wf7g8                           2/2     Running                    0               3d10h
kube-system          ama-metrics-ksm-d9c6f475b-28gtt                        1/1     Running                    0               3d10h
kube-system          ama-metrics-node-75t2k                                 2/2     Running                    0               3d10h
kube-system          ama-metrics-node-78h6k                                 2/2     Running                    0               3d10h
kube-system          ama-metrics-node-gvzxr                                 2/2     Running                    0               3d10h
kube-system          ama-metrics-node-j8fzx                                 2/2     Running                    0               13m
kube-system          ama-metrics-node-p8l8h                                 2/2     Running                    0               8m14s
kube-system          ama-metrics-operator-targets-5849768d84-fsj66          2/2     Running                    2 (3d10h ago)   3d10h
Enter fullscreen mode Exit fullscreen mode

Container Insights

Similarly to the Managed Prometheus Instance, Container Insights collects stdout/stderr logs from containers and sends them to the configured Azure Log Analytics workspace. These logs can then be queried using the Kusto Query Language (KQL) instead of PromQL.

$ kubectl get pods -A
NAMESPACE            NAME                                                   READY   STATUS                     RESTARTS        AGE
...
kube-system          ama-logs-67s4m                                         2/2     Running                    0               13m
kube-system          ama-logs-7mm5t                                         2/2     Running                    0               3d11h
kube-system          ama-logs-bnjtm                                         2/2     Running                    0               3d11h
kube-system          ama-logs-rmbzz                                         2/2     Running                    0               8m14s
kube-system          ama-logs-rs-c9db97d64-nsrb9                            1/1     Running                    0               31h
kube-system          ama-logs-xb2rj                                         2/2     Running                    0               3d11h
Enter fullscreen mode Exit fullscreen mode

Azure Managed Grafana

This PaaS service is built on top of the Grafana software by Grafana Labs. It comes with several pre-installed Azure and Kubernetes dashboards, and it uses an extension to read metrics and logs stored in the Azure Monitor workspace, providing a dashboard to visualize them.

Image description

References

Top comments (1)

Collapse
 
choonho profile image
Choonho Son

Thank you for good information for Azure Kubernetes.