DEV Community

Karim
Karim

Posted on • Originally published at deep75.Medium on

AIOps : Investigation par l’IA dans Kubernetes avec HolmesGPT, Ollama et RunPod …

Dans le monde de l’orchestration de conteneurs, Kubernetes est devenu une norme pour gérer les workloads conteneurisés. Cependant, la gestion et le dépannage des clusters Kubernetes peuvent être complexes et chronophages. Cet article explore comment l’intelligence artificielle (IA) peut être intégrée dans Kubernetes pour améliorer l’investigation et la gestion des incidents. J’avais d’ailleurs évoqué le sujet dans un article précédent :

AIOps : Déboguer son cluster Kubernetes en utilisant l’intelligence artificielle générative via…

Ici je vais m’intéresser à HolmesGPT. HolmesGPT, développé par Robusta, est un agent de dépannage open source qui utilise l’IA pour investiguer les incidents dans les clusters Kubernetes avec ces caractéristiques :

  • Intégration avec les outils de gestion d’incidents : HolmesGPT se connecte à des outils comme PagerDuty, OpsGenie et Prometheus pour collecter des données et analyser les alertes.
  • Investigation automatisée : Grâce à l’IA, HolmesGPT peut identifier et résoudre des problèmes tels que l’expiration des certificats SSL, les problèmes de ressources insuffisantes et les problèmes d’affinité des nœuds. Cela réduit significativement le temps et l’effort nécessaires pour le dépannage.
  • Personnalisation : HolmesGPT permet de créer des livres de recettes (runbooks) personnalisés pour gérer des problèmes spécifiques, en utilisant des API et des outils personnalisés si nécessaire.

GitHub - robusta-dev/holmesgpt: On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation

Pour cet exercice, je vais d’abord lancer une instance Ubuntu 24.04 LTS de nouveau chez le fournisseur Cloud DigitalOcean :

Je vais y installer Incus, un fork de LXD qui va me servir de base pour la formation d’un cluster Kubernetes avec plusieurs containers :

Linux Containers - Incus - Introduction

Comme pour LXD, je vais procéder à la création de plusieurs profiles. Mais dans un premier temps, installation d’Incus sur l’instance :

root@k0s-incus:~# curl -fsSL https://pkgs.zabbly.com/key.asc | gpg --show-keys --fingerprint
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
pub rsa3072 2023-08-23 [SC] [expires: 2025-08-22]
      4EFC 5906 96CB 15B8 7C73 A3AD 82CC 8797 C838 DCFD
uid Zabbly Kernel Builds <info@zabbly.com>
sub rsa3072 2023-08-23 [E] [expires: 2025-08-22]

root@k0s-incus:~# mkdir -p /etc/apt/keyrings/

root@k0s-incus:~# curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc

root@k0s-incus:~# sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc

EOF'

root@k0s-incus:~# apt-get update

Hit:1 http://security.ubuntu.com/ubuntu noble-security InRelease
Hit:2 http://mirrors.digitalocean.com/ubuntu noble InRelease
Hit:3 https://repos-droplet.digitalocean.com/apt/droplet-agent main InRelease
Hit:4 http://mirrors.digitalocean.com/ubuntu noble-updates InRelease
Hit:5 http://mirrors.digitalocean.com/ubuntu noble-backports InRelease
Get:6 https://pkgs.zabbly.com/incus/stable noble InRelease [7358 B]   
Get:7 https://pkgs.zabbly.com/incus/stable noble/main amd64 Packages [3542 B]
Fetched 10.9 kB in 1s (13.3 kB/s)   
Reading package lists... Done

root@k0s-incus:~# apt-get install incus incus-client incus-ui-canonical -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  attr dconf-gsettings-backend dconf-service dns-root-data dnsmasq-base fontconfig genisoimage glib-networking glib-networking-common glib-networking-services gsettings-desktop-schemas
  gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-x incus-base iw libaa1 libasyncns0 libavc1394-0 libboost-iostreams1.83.0 libboost-thread1.83.0 libbtrfs0t64 libcaca0
  libcairo-gobject2 libcairo2 libcdparanoia0 libdatrie1 libdaxctl1 libdconf1 libdv4t64 libflac12t64 libgdk-pixbuf-2.0-0 libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgraphite2-3
  libgstreamer-plugins-base1.0-0 libgstreamer-plugins-good1.0-0 libharfbuzz0b libiec61883-0 libmp3lame0 libmpg123-0t64 libndctl6 libnet1 libogg0 libopus0 liborc-0.4-0t64 libpango-1.0-0
  libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0 libpmem1 libpmemobj1 libproxy1v5 libpulse0 librados2 libraw1394-11 librbd1 librdmacm1t64 libshout3 libsndfile1 libsoup-3.0-0
  libsoup-3.0-common libspeex1 libspice-server1 libtag1v5 libtag1v5-vanilla libthai-data libthai0 libtheora0 libtwolame0 libusbredirparser1t64 libv4l-0t64 libv4lconvert0t64 libvisual-0.4-0
  libvorbis0a libvorbisenc2 libvpx9 libwavpack1 libx11-xcb1 libxcb-render0 libxcb-shm0 libxdamage1 libxfixes3 libxi6 libxrender1 libxtst6 libxv1 session-migration sshfs wireless-regdb
  x11-common xdelta3

root@k0s-incus:~# incus
Description:
  Command line client for Incus

  All of Incus's features can be driven through the various commands below.
  For help with any of those, simply call them with --help.

  Custom commands can be defined through aliases, use "incus alias" to control those.

Usage:
  incus [command]

Available Commands:
  admin Manage incus daemon
  cluster Manage cluster members
  config Manage instance and server configuration options
  console Attach to instance consoles
  copy Copy instances within or in between servers
  create Create instances from images
  delete Delete instances
  exec Execute commands in instances
  export Export instance backups
  file Manage files in instances
  help Help about any command
  image Manage images
  import Import instance backups
  info Show instance or server information
  launch Create and start instances from images
  list List instances
  move Move instances within or in between servers
  network Manage and attach instances to networks
  pause Pause instances
  profile Manage profiles
  project Manage projects
  publish Publish instances as images
  rebuild Rebuild instances
  remote Manage the list of remote servers
  rename Rename instances
  restart Restart instances
  resume Resume instances
  snapshot Manage instance snapshots
  start Start instances
  stop Stop instances
  storage Manage storage pools and volumes
  top Display resource usage info per instance
  version Show local and remote versions
  webui Open the web interface

Flags:
      --all Show less common commands
      --debug Show all debug messages
      --force-local Force using the local unix socket
  -h, --help Print help
      --project Override the source project
  -q, --quiet Don't show progress information
      --sub-commands Use with help or --help to view sub-commands
  -v, --verbose Show all information messages
      --version Print version number

Use "incus [command] --help" for more information about a command.
Enter fullscreen mode Exit fullscreen mode

Initialisation d’Incus en version minimaliste :

root@k0s-incus:~# incus admin init
Would you like to use clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (btrfs, dir, lvm) [default=btrfs]: dir
Where should this storage pool store its data? [default=/var/lib/incus/storage-pools/default]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=incusbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the server to be available over the network? (yes/no) [default=no]:    
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "init" preseed to be printed? (yes/no) [default=no]: 

root@k0s-incus:~# incus list
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

root@k0s-incus:~# incus profile list
+---------+-----------------------+---------+
| NAME | DESCRIPTION | USED BY |
+---------+-----------------------+---------+
| default | Default Incus profile | 0 |
+---------+-----------------------+---------+

root@k0s-incus:~# incus profile show default
config: {}
description: Default Incus profile
devices:
  eth0:
    name: eth0
    network: incusbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default
used_by: []
project: default

root@k0s-incus:~# incus profile create k8s
Enter fullscreen mode Exit fullscreen mode

Incus dispose d’un tableau de bord de contrôle qui peut être actionné temporairement par incus webui.

incus webui

Activation de ce dernier :

root@k0s-incus:~# nohup incus webui &
[1] 4104
root@k0s-incus:~# nohup: ignoring input and appending output to 'nohup.out'

root@k0s-incus:~# cat nohup.out 
Web server running at: http://127.0.0.1:34363/ui?auth_token=3c5f5d4b-f9ed-4bf9-a174-d5ea2366cfbf

Enter fullscreen mode Exit fullscreen mode

Utilisation de pinggy.io pour y accéder :

Pinggy - Simple Localhost Tunnels


root@k0s-incus:~# ssh -p 443 -R0:127.0.0.1:34363 a.pinggy.io

Enter fullscreen mode Exit fullscreen mode

Je récupère le même profile qu’utilise LXD pour MicroK8s :

MicroK8s - MicroK8s in LXD | MicroK8s

roothttps://microk8s.io/docs/install-lxd@k0s-incus:~# wget https://raw.githubusercontent.com/ubuntu/microk8s/master/tests/lxc/microk8s.profile -O k8s.profile
--2025-01-14 20:58:42-- https://raw.githubusercontent.com/ubuntu/microk8s/master/tests/lxc/microk8s.profile
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 816 [text/plain]
Saving to: ‘k8s.profile’

k8s.profile 100%[=====================================================================================================>] 816 --.-KB/s in 0s      

2025-01-14 20:58:42 (33.4 MB/s) - ‘k8s.profile’ saved [816/816]

root@k0s-incus:~# cat k8s.profile | incus profile edit k8s
root@k0s-incus:~# rm k8s.profile

root@k0s-incus:~# incus profile show k8s
config:
  boot.autostart: "true"
  linux.kernel_modules: ip_vs,ip_vs_rr,ip_vs_wrr,ip_vs_sh,ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,br_netfilter
  raw.lxc: |
    lxc.apparmor.profile=unconfined
    lxc.mount.auto=proc:rw sys:rw cgroup:rw
    lxc.cgroup.devices.allow=a
    lxc.cap.drop=
  security.nesting: "true"
  security.privileged: "true"
description: ""
devices:
  aadisable:
    path: /sys/module/nf_conntrack/parameters/hashsize
    source: /sys/module/nf_conntrack/parameters/hashsize
    type: disk
  aadisable2:
    path: /dev/kmsg
    source: /dev/kmsg
    type: unix-char
  aadisable3:
    path: /sys/fs/bpf
    source: /sys/fs/bpf
    type: disk
  aadisable4:
    path: /proc/sys/net/netfilter/nf_conntrack_max
    source: /proc/sys/net/netfilter/nf_conntrack_max
    type: disk
name: k8s
used_by: []
project: default
Enter fullscreen mode Exit fullscreen mode

Comme Incus a la faculté d’utiliser cloud-init, je crée un nouveau profile destiné à cet usage :

root@k0s-incus:~# incus profile show cloud
config:
  cloud-init.user-data: |
    #cloud-config
    package_update: true
    package_upgrade: true
    package_reboot_if_required: true
    packages:
      - vim
      - wget
      - git
      - curl
      - htop
      - openssh-server
    bootcmd:
      - systemctl enable ssh
      - systemctl start ssh
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCpbsaaVUMa2TM9q8VkeBmbKvJpbreXTcqI5F5N3riGsoZ7Z/IIN7eR6J47UP2bj3IBTdgHmij1uOexm60QBO2PY4abIhsN+xnVS4a0LSyI8v6nYECWbEehL/gFn6uDmSLA4m0hZCF5BSpLxQYzKS28dHIdXsLC4CDd67nAXIhOiVpM0q/AUCuSy+mA0VwFa/JAkFCk8TpQBorgwJIq635imrgxYIpEUA2wHXOhw23mO3zTUlay13LSlA2a1xyTkP8hSDWdRYVxr2DEB/MtmTX2BdWlA5rDRmzXE7R2/csE245WAxG+XfSu4zNqhHzm8Df3zmZn3/UyKLcx4eJF//mVZyrM7RQHRteA/im8I4IavrReGyCUKY+OsSfygYVFyO87rYQ+IOauOnB4LxBohBjSBN3Skk4X7krYFIi8D9R1lmL+VvBfpvy0YMurOahY1VJFzD0dUeK2bDUdeWzfFkcX039d9/RRXRxieNpxwp1BLPi5/DXG8FihzgwVTf6h60J9/fkYzY+BO8CKG2kYTUsy1ykuXLzLY5sTCREiEoEKcJ9IGz8OimZ1AmkgJJCrQnI6mT/KiNDU6YCc75ONKTKX5HKVPhZWT255Aw4f5LBbBrj06cJX3GuunV0I30+BYyHwLbPBoqgd4GUk3YJlr8wS3qre/YUSc2iKNDTOzFCC8Q== root@k0s-incus
description: incus with cloud-init
devices: {}
name: cloud
used_by: []
project: default
Enter fullscreen mode Exit fullscreen mode

Je suis prêt pour la création de trois containers qui me serviront de pivot à la création d’un cluster Kubernetes :

root@k0s-incus:~# for i in {1..3}; do incus launch -p default -p k8s -p cloud images:ubuntu/24.04/cloud k0s-$i; done
Launching k0s-1
Launching k0s-2                                    
Launching k0s-3

root@k0s-incus:~# incus list                       
+-------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+-------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| k0s-1 | RUNNING | 10.224.160.99 (eth0) | fd42:4641:b619:c782:216:3eff:fea4:53d3 (eth0) | CONTAINER | 0 |
+-------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| k0s-2 | RUNNING | 10.224.160.54 (eth0) | fd42:4641:b619:c782:216:3eff:feee:7af8 (eth0) | CONTAINER | 0 |
+-------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| k0s-3 | RUNNING | 10.224.160.215 (eth0) | fd42:4641:b619:c782:216:3eff:fef3:709b (eth0) | CONTAINER | 0 |
+-------+---------+-----------------------+-----------------------------------------------+-----------+-----------+

root@k0s-incus:~# cat .ssh/config 
Host *
   StrictHostKeyChecking no
   UserKnownHostsFile=/dev/null

root@k0s-incus:~# ssh ubuntu@10.224.160.99

Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 6.8.0-51-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/pro

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@k0s-1:~$
Enter fullscreen mode Exit fullscreen mode

Récupération de k0sctl pour la création d’un cluster Kubernetes avec k0s :

Using k0sctl - Documentation

root@k0s-incus:~# wget -c https://github.com/k0sproject/k0sctl/releases/download/v0.21.0/k0sctl-linux-amd64 && chmod +x k0sctl-linux-amd64 && mv k0sctl-linux-amd64 /usr/local/bin/k0sctl

Saving to: ‘k0sctl-linux-amd64’

k0sctl-linux-amd64 100%[=====================================================================================================>] 18.21M --.-KB/s in 0.1s    

2025-01-14 21:22:23 (122 MB/s) - ‘k0sctl-linux-amd64’ saved [19091608/19091608]

root@k0s-incus:~# k0sctl 
NAME:
   k0sctl - k0s cluster management tool

USAGE:
   k0sctl [global options] command [command options]

COMMANDS:
   version Output k0sctl version
   apply Apply a k0sctl configuration
   kubeconfig Output the admin kubeconfig of the cluster
   init Create a configuration template
   reset Remove traces of k0s from all of the hosts
   backup Take backup of existing clusters state
   config Configuration related sub-commands
   completion  
   help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --debug, -d Enable debug logging (default: false) [$DEBUG]
   --trace Enable trace logging (default: false) [$TRACE]
   --no-redact Do not hide sensitive information in the output (default: false)
   --help, -h show help

root@k0s-incus:~# k0sctl init --k0s > k0sctl.yaml

root@k0s-incus:~# cat k0sctl.yaml 
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
  user: admin
spec:
  hosts:
  - ssh:
      address: 10.224.160.99 
      user: ubuntu
      port: 22
      keyPath: /root/.ssh/id_rsa
    role: controller
  - ssh:
      address: 10.224.160.54
      user: ubuntu
      port: 22
      keyPath: /root/.ssh/id_rsa
    role: worker
  - ssh:
      address: 10.224.160.215
      user: ubuntu
      port: 22
      keyPath: /root/.ssh/id_rsa
    role: worker
  k0s:
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          kubeProxy:
            disabled: false
            mode: iptables
          kuberouter:
            autoMTU: true
            mtu: 0
            peerRouterASNs: ""
            peerRouterIPs: ""
          podCIDR: 10.244.0.0/16
          provider: kuberouter
          serviceCIDR: 10.96.0.0/12
        podSecurityPolicy:
          defaultPolicy: 00-k0s-privileged
        storage:
          type: etcd
        telemetry:
          enabled: true 
Enter fullscreen mode Exit fullscreen mode

Lancement de la création :


root@k0s-incus:~# k0sctl apply --config k0sctl.yaml 

⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███
⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███ ███ ███
⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿⠀⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███ ███ ███
⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███ ███ ███
⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████ ███ ██████████
k0sctl v0.21.0 Copyright 2023, k0sctl authors.
By continuing to use k0sctl you agree to these terms:
https://k0sproject.io/licenses/eula
INFO ==> Running phase: Set k0s version  
INFO Looking up latest stable k0s version         
INFO Using k0s version v1.31.3+k0s.0              
INFO ==> Running phase: Connect to hosts 
INFO [ssh] 10.224.160.215:22: connected           
INFO [ssh] 10.224.160.99:22: connected            
INFO [ssh] 10.224.160.54:22: connected            
INFO ==> Running phase: Detect host operating systems 
INFO [ssh] 10.224.160.215:22: is running Ubuntu 24.04.1 LTS 
INFO [ssh] 10.224.160.99:22: is running Ubuntu 24.04.1 LTS 
INFO [ssh] 10.224.160.54:22: is running Ubuntu 24.04.1 LTS 
INFO ==> Running phase: Acquire exclusive host lock 
INFO ==> Running phase: Prepare hosts    
INFO ==> Running phase: Gather host facts 
INFO [ssh] 10.224.160.215:22: using k0s-3 as hostname 
INFO [ssh] 10.224.160.54:22: using k0s-2 as hostname 
INFO [ssh] 10.224.160.99:22: using k0s-1 as hostname 
INFO [ssh] 10.224.160.215:22: discovered eth0 as private interface 
INFO [ssh] 10.224.160.54:22: discovered eth0 as private interface 
INFO [ssh] 10.224.160.99:22: discovered eth0 as private interface 
INFO ==> Running phase: Validate hosts   
INFO ==> Running phase: Validate facts   
INFO ==> Running phase: Download k0s on hosts 
INFO [ssh] 10.224.160.215:22: downloading k0s v1.31.3+k0s.0 
INFO [ssh] 10.224.160.54:22: downloading k0s v1.31.3+k0s.0 
INFO [ssh] 10.224.160.99:22: downloading k0s v1.31.3+k0s.0 
INFO ==> Running phase: Install k0s binaries on hosts 
INFO [ssh] 10.224.160.99:22: validating configuration 
INFO ==> Running phase: Configure k0s    
INFO [ssh] 10.224.160.99:22: installing new configuration 
INFO ==> Running phase: Initialize the k0s cluster 
INFO [ssh] 10.224.160.99:22: installing k0s controller 
INFO [ssh] 10.224.160.99:22: waiting for the k0s service to start 
INFO [ssh] 10.224.160.99:22: wait for kubernetes to reach ready state 
INFO ==> Running phase: Install workers  
INFO [ssh] 10.224.160.99:22: generating a join token for worker 1 
INFO [ssh] 10.224.160.99:22: generating a join token for worker 2 
INFO [ssh] 10.224.160.215:22: validating api connection to https://10.224.160.99:6443 using join token 
INFO [ssh] 10.224.160.54:22: validating api connection to https://10.224.160.99:6443 using join token 
INFO [ssh] 10.224.160.215:22: writing join token to /etc/k0s/k0stoken 
INFO [ssh] 10.224.160.54:22: writing join token to /etc/k0s/k0stoken 
INFO [ssh] 10.224.160.54:22: installing k0s worker 
INFO [ssh] 10.224.160.215:22: installing k0s worker 
INFO [ssh] 10.224.160.215:22: starting service    
INFO [ssh] 10.224.160.215:22: waiting for node to become ready 
INFO [ssh] 10.224.160.54:22: starting service     
INFO [ssh] 10.224.160.54:22: waiting for node to become ready 
INFO ==> Running phase: Release exclusive host lock 
INFO ==> Running phase: Disconnect from hosts 
INFO ==> Finished in 42s                 
INFO k0s cluster version v1.31.3+k0s.0 is now installed 
INFO Tip: To access the cluster you can now fetch the admin kubeconfig using: 
INFO k0sctl kubeconfig
Enter fullscreen mode Exit fullscreen mode

Le cluster est actif :


root@k0s-incus:~# curl -LO https://dl.k8s.io/release/v1.31.3/bin/linux/amd64/kubectl && chmod +x kubectl && mv kubectl /usr/local/bin/
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 138 100 138 0 0 923 0 --:--:-- --:--:-- --:--:-- 926
100 53.7M 100 53.7M 0 0 476k 0 0:01:55 0:01:55 --:--:-- 1023k

root@k0s-incus:~# mkdir .kube
root@k0s-incus:~# k0sctl kubeconfig --config k0sctl.yaml > .kube/config

root@k0s-incus:~# kubectl cluster-info
Kubernetes control plane is running at https://10.224.160.99:6443
CoreDNS is running at https://10.224.160.99:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

root@k0s-incus:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k0s-2 Ready <none> 5m1s v1.31.3+k0s 10.224.160.54 <none> Ubuntu 24.04.1 LTS 6.8.0-51-generic containerd://1.7.24
k0s-3 Ready <none> 5m1s v1.31.3+k0s 10.224.160.215 <none> Ubuntu 24.04.1 LTS 6.8.0-51-generic containerd://1.7.24

root@k0s-incus:~# kubectl get po,svc -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-645c5d6f5b-kgnsf 1/1 Running 0 5m2s
kube-system pod/coredns-645c5d6f5b-n2rbk 1/1 Running 0 5m2s
kube-system pod/konnectivity-agent-2dg8l 1/1 Running 0 5m4s
kube-system pod/konnectivity-agent-5l5dl 1/1 Running 0 5m4s
kube-system pod/kube-proxy-cx47n 1/1 Running 0 5m7s
kube-system pod/kube-proxy-sp5fd 1/1 Running 0 5m7s
kube-system pod/kube-router-6l4qv 1/1 Running 0 5m7s
kube-system pod/kube-router-b9t89 1/1 Running 0 5m7s
kube-system pod/metrics-server-78c4ccbc7f-jxpzz 1/1 Running 0 5m1s

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5m17s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 5m7s
kube-system service/metrics-server ClusterIP 10.109.44.51 <none> 443/TCP 5m1s
Enter fullscreen mode Exit fullscreen mode

Il est alors possible de procéder à l’installation d’HolmesGPT via Pipx :

GitHub - robusta-dev/holmesgpt: On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation

root@k0s-incus:~# apt install pipx -y

root@k0s-incus:~# pipx ensurepath

Success! Added /root/.local/bin to the PATH environment variable.

Consider adding shell completions for pipx. Run 'pipx completions' for instructions.

You will need to open a new terminal or re-login for the PATH changes to take effect.

Otherwise pipx is ready to go! ✨ 🌟 ✨

root@k0s-incus:~# pipx install "https://github.com/robusta-dev/holmesgpt/archive/refs/heads/master.zip"
  installed package holmesgpt 0.1.0, installed using Python 3.12.3
  These apps are now globally available
    - holmes
done! ✨ 🌟 ✨
root@k0s-incus:~# holmes version
/root/.local/share/pipx/venvs/holmesgpt/lib/python3.12/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
HEAD -> master-bfafbde3
Enter fullscreen mode Exit fullscreen mode

Pour l’accompagner, je récupère K9s qui fournit une interface utilisateur de terminal pour interagir avec vos clusters Kubernetes. L’objectif de ce projet est de faciliter la navigation, l’observation et la gestion de vos applications dans la nature. K9s surveille continuellement Kubernetes pour les changements et offre des commandes ultérieures pour interagir avec vos ressources observées.

GitHub - derailed/k9s: 🐶 Kubernetes CLI To Manage Your Clusters In Style!

root@k0s-incus:~# wget -c https://github.com/derailed/k9s/releases/download/v0.32.7/k9s_linux_amd64.deb
HTTP request sent, awaiting response... 200 OK
Length: 31832132 (30M) [application/octet-stream]
Saving to: ‘k9s_linux_amd64.deb’

k9s_linux_amd64.deb 100%[=====================================================================================================>] 30.36M --.-KB/s in 0.1s    

2025-01-14 21:40:07 (291 MB/s) - ‘k9s_linux_amd64.deb’ saved [31832132/31832132]

root@k0s-incus:~# apt install -f ./k9s_linux_amd64.deb 
root@k0s-incus:~# k9s --help
K9s is a CLI to view and manage your Kubernetes clusters.

Usage:
  k9s [flags]
  k9s [command]

Available Commands:
  completion Generate the autocompletion script for the specified shell
  help Help about any command
  info List K9s configurations info
  version Print version/build info

Flags:
  -A, --all-namespaces Launch K9s in all namespaces
      --as string Username to impersonate for the operation
      --as-group stringArray Group to impersonate for the operation
      --certificate-authority string Path to a cert file for the certificate authority
      --client-certificate string Path to a client certificate file for TLS
      --client-key string Path to a client key file for TLS
      --cluster string The name of the kubeconfig cluster to use
  -c, --command string Overrides the default resource to load when the application launches
      --context string The name of the kubeconfig context to use
      --crumbsless Turn K9s crumbs off
      --headless Turn K9s header off
  -h, --help help for k9s
      --insecure-skip-tls-verify If true, the server's caCertFile will not be checked for validity
      --kubeconfig string Path to the kubeconfig file to use for CLI requests
      --logFile string Specify the log file (default "/root/.local/state/k9s/k9s.log")
  -l, --logLevel string Specify a log level (error, warn, info, debug, trace) (default "info")
      --logoless Turn K9s logo off
  -n, --namespace string If present, the namespace scope for this CLI request
      --readonly Sets readOnly mode by overriding readOnly configuration setting
  -r, --refresh int Specify the default refresh rate as an integer (sec) (default 2)
      --request-timeout string The length of time to wait before giving up on a single server request
      --screen-dump-dir string Sets a path to a dir for a screen dumps
      --token string Bearer token for authentication to the API server
      --user string The name of the kubeconfig user to use
      --write Sets write mode by overriding the readOnly configuration setting

Use "k9s [command] --help" for more information about a command.
Enter fullscreen mode Exit fullscreen mode

Ollama, une alternative à ChatGPT, peut être déployée pour fournir des capacités de traitement du langage naturel directement dans votre environnement. Cela permet de bénéficier des capacités de traitement du langage naturel de Ollama sans dépendre de services cloud externes.

Ollama

En intégrant Ollama à vos outils de dépannage, vous pouvez générer des réponses et des solutions basées sur l’analyse des logs et des données de votre cluster Kubernetes.

Pour son exécution, je suis amené à utiliser RunPod, une plateforme qui permet d’exécuter des tâches de traitement du langage naturel et d’autres tâches IA. RunPod vous permet en effet de créer des environnements de pod personnalisés pour exécuter des modèles de langage comme Ollama ou d’autres applications IA :

RunPod - The Cloud Built for AI

Création d’un Pod GPU qui me permet donc d’utiliser Ollama …

Set up Ollama on your GPU Pod | RunPod Documentation

Je peux m’y connecter via SSH :

Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 6.5.0-44-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 ____________ _
(_____\ (_____ \ | |
 _____) ) _ _____ _____) )___ __| |
| __/ | | | || _ \ |____ // _ \ / _ |
| | \ \ | |_| || | | || | | |_| |( (_| |
|_| |_|| ____/ |_| |_||_| \___ / \ ____ |

For detailed documentation and guides, please visit:
https://docs.runpod.io/ and https://blog.runpod.io/

root@5ed8df208cf4:~# nvidia-smi
Tue Jan 14 22:03:15 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti On | 00000000:81:00.0 Off | N/A |
| 0% 28C P8 11W / 285W | 2MiB / 12282MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Exécution d’Ollama :

root@5ed8df208cf4:~# apt update 2> /dev/null && apt install -qq lshw -y 2> /dev/null

root@5ed8df208cf4:~# export OLLAMA_HOST=0.0.0.0:11434
root@5ed8df208cf4:~# (curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) &
[1] 950
root@5ed8df208cf4:~# >>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
WARNING: systemd is not running
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

root@5ed8df208cf4:~# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name    
tcp 0 0 0.0.0.0:7861 0.0.0.0:* LISTEN 52/nginx: master pr 
tcp 0 0 0.0.0.0:8081 0.0.0.0:* LISTEN 52/nginx: master pr 
tcp 0 0 0.0.0.0:8001 0.0.0.0:* LISTEN 52/nginx: master pr 
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 70/sshd: /usr/sbin/ 
tcp 0 0 0.0.0.0:3001 0.0.0.0:* LISTEN 52/nginx: master pr 
tcp 0 0 0.0.0.0:9091 0.0.0.0:* LISTEN 52/nginx: master pr 
tcp 0 0 127.0.0.11:39145 0.0.0.0:* LISTEN -                   
tcp6 0 0 :::22 :::* LISTEN 70/sshd: /usr/sbin/ 
tcp6 0 0 :::11434 :::* LISTEN 1006/ollama         
udp 0 0 127.0.0.11:33663 0.0.0.0:* -
Enter fullscreen mode Exit fullscreen mode

Récupération d’un LLM avec Llama3.2 :

root@5ed8df208cf4:~# ollama pull llama3.2:3b-instruct-q4_K_S
pulling manifest 
pulling d5e517daeee4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.9 GB                         
pulling 966de95ca8a6... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB                         
pulling fcc5a6bec9da... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB                         
pulling a70ff7e570d9... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB                         
pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B                         
pulling 9c65e8607c0c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 561 B                         
verifying sha256 digest 
writing manifest 
success 

root@5ed8df208cf4:~# ollama list       
NAME ID SIZE MODIFIED       
llama3.2:3b-instruct-q4_K_S 80f2089878c9 1.9 GB 31 seconds ago
Enter fullscreen mode Exit fullscreen mode

L’endpoint d’Ollama est disponible publiquement via le proxy offert par RunPod :

Modification de K9s pour y intégrer cet endpoint et HolmesGPT sous forme d’un plug-in :

root@k0s-incus:~# cat ~/.config/k9s/plugins.yaml
plugins:
  holmesgpt:
    shortCut: Shift-H
    description: Ask HolmesGPT
    scopes:
      - all
    command: bash
    background: false
    confirm: false
    args:
      - -c
      - |
        holmes ask "why is $NAME of $RESOURCE_NAME in -n $NAMESPACE not working as expected" --model="openai/llama3.2:3b-instruct-q4_K_S"
        echo "Press 'q' to exit"
        while : ; do
        read -n 1 k <&1
        if [[$k = q]] ; then
        break
        fi
        done

root@k0s-incus:~# export OPENAI_API_BASE="https://vsr6spvysc6jly-11434.proxy.runpod.net/v1"
root@k0s-incus:~# export OPENAI_API_KEY=123
Enter fullscreen mode Exit fullscreen mode

Déploiement d’un exemple de Pod problématique dans le cluster Kubernetes via les exemples fournis par Robusta :

root@k0s-incus:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml
deployment.apps/payment-processing-worker created

root@k0s-incus:~# kubectl get po
NAME READY STATUS RESTARTS AGE
payment-processing-worker-747ccfb9db-njgmw 0/1 CrashLoopBackOff 1 (4s ago) 9s
Enter fullscreen mode Exit fullscreen mode

Je peux lancer la requête relative au plug-in avec HolmesGPT et la combinaison CTRL+H pour obtenir cette première réponse :


The payment-processing-container container has crashed and is being restarted for the 6th time due to a CrashLoopBackOff. The last state indicates that the container was terminated with an  
exit code of 0, which suggests that the command executed successfully but did not complete as expected.                                                                                       

To investigate further, you can check the logs of the payment-processing-container container to see if there are any error messages or clues about what is causing the issue. You can also    
check the Kubernetes events for any other errors or warnings that may be related to this issue.                                                                                               

Additionally, you can try to debug the command executed by the payment-processing-container container to see if it's correct and working as expected. The command is:                         


 if [[-z "${DEPLOY_ENV}"]]; then echo Environment variable DEPLOY_ENV is undefined ; else while true; do echo hello; sleep 10;done; fi

This command checks if the DEPLOY_ENV environment variable is set, and if it's not, it prints a message. If it is set, it enters an infinite loop that prints "hello" every 10 seconds.       

If you're running this container in a Kubernetes pod, you can try to debug the issue by checking the pod's logs or using a tool like kubectl to inspect the container's state and logs.       
Press 'q' to exit

Enter fullscreen mode Exit fullscreen mode

Modification de la requête et autre réponse :

root@k0s-incus:~# cat ~/.config/k9s/plugins.yaml
plugins:
  holmesgpt:
    shortCut: Shift-H
    description: Ask HolmesGPT
    scopes:
      - all
    command: bash
    background: false
    confirm: false
    args:
      - -c
      - |
        holmes ask "why is $NAME of $RESOURCE_NAME in -n $NAMESPACE not working and why $NAME is crashed?" --model="openai/llama3.2:3b-instruct-q4_K_S"
        echo "Press 'q' to exit"
        while : ; do
        read -n 1 k <&1
        if [[$k = q]] ; then
        break
        fi
        done

The payment-processing-container container has crashed and is being restarted for the 6th time due to a CrashLoopBackOff. The last state indicates that the container was terminated with an  
exit code of 0, which suggests that the command executed successfully but did not produce any output.                                                                                         

To investigate further, you can check the logs of the payment-processing-container container to see if there are any error messages or clues about what is causing the crash:                 


 kubectl logs payment-processing-worker-747ccfb9db-njgmw -c payment-processing-container

Additionally, you can check the configuration of the payment-processing-container container to ensure that it is running with the correct environment variables and settings.                 


 kubectl describe pod payment-processing-worker-747ccfb9db-njgmw -c payment-processing-container

This will provide more detailed information about the container's configuration and any errors that may be occurring. 
Enter fullscreen mode Exit fullscreen mode

HolmesGPT peut s’intégrer plus globalement à la plateforme Robusta via une installation dans le cluster Kubernetes et Helm …

AI Analysis - Robusta documentation

Pour cela génération du fichier YAML de ce type en configuration :

root@k0s-incus:~# cat generated_values.yaml 
globalConfig:
  signing_key: 568927d5-6e65-4c13-b3fe-fdc50e616fde
  account_id: a4d7cea6-fba3-4ce6-ba3d-941b55ec83db
sinksConfig:
  - robusta_sink:
      name: robusta_ui_sink
      token: <TOKEN>
enablePrometheusStack: true
kube-prometheus-stack:
  grafana:
    persistence:
      enabled: true
enablePlatformPlaybooks: true
runner:
  sendAdditionalTelemetry: true
enableHolmesGPT: true
holmes:
  additionalEnvVars:
    - name: ROBUSTA_AI
      value: "true"
Enter fullscreen mode Exit fullscreen mode

Utilisation des commandes et du fichier de configuration YAML fournis par la plateforme Robusta :

root@k0s-incus:~# helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
"robusta" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "robusta" chart repository
Update Complete. ⎈Happy Helming!⎈
root@k0s-incus:~# helm install robusta robusta/robusta -f ./generated_values.yaml --set clusterName="k0s-cluster" \
--set isSmallCluster=true \
--set holmes.resources.requests.memory=512Mi \
--set kube-prometheus-stack.prometheus.prometheusSpec.retentionSize=9GB \
--set kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
--set kube-prometheus-stack.prometheus.prometheusSpec.resources.requests.memory=512Mi
NAME: robusta
LAST DEPLOYED: Tue Jan 14 22:59:09 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing Robusta 0.20.0

As an open source project, we collect general usage statistics.
This data is extremely limited and contains only general metadata to help us understand usage patterns.
If you are willing to share additional data, please do so! It really help us improve Robusta.

You can set sendAdditionalTelemetry: true as a Helm value to send exception reports and additional data.
This is disabled by default.

To opt-out of telemetry entirely, set a ENABLE_TELEMETRY=false environment variable on the robusta-runner deployment.
Note that if the Robusta UI is enabled, telemetry cannot be disabled even if ENABLE_TELEMETRY=false is set.

Visit the web UI at: https://platform.robusta.dev/

root@k0s-incus:~# helm ls -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
robusta default 2 2025-01-14 23:10:13.935906491 +0000 UTC deployed robusta-0.20.0 0.20.0

root@k0s-incus:~# kubectl get po,svc
NAME READY STATUS RESTARTS AGE
pod/alertmanager-robusta-kube-prometheus-st-alertmanager-0 0/2 Pending 0 2m
pod/payment-processing-worker-747ccfb9db-njgmw 0/1 CrashLoopBackOff 10 (2m33s ago) 28m
pod/prometheus-robusta-kube-prometheus-st-prometheus-0 0/2 Pending 0 2m
pod/robusta-forwarder-cd847ccc-wxc6d 1/1 Running 0 2m5s
pod/robusta-grafana-8588b8fb85-fv5vj 3/3 Running 0 2m5s
pod/robusta-holmes-55dd58ff6d-m4zth 1/1 Running 0 2m5s
pod/robusta-kube-prometheus-st-operator-6885c8f675-szncg 1/1 Running 0 2m5s
pod/robusta-kube-state-metrics-8667fd9775-s49z4 1/1 Running 0 2m5s
pod/robusta-prometheus-node-exporter-c6jvb 1/1 Running 0 2m5s
pod/robusta-prometheus-node-exporter-j6zp5 1/1 Running 0 2m5s
pod/robusta-runner-5d667b7d9c-dm2z7 1/1 Running 0 2m5s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 2m1s
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 94m
service/prometheus-operated ClusterIP None <none> 9090/TCP 2m1s
service/robusta-forwarder ClusterIP 10.102.7.41 <none> 80/TCP 2m5s
service/robusta-grafana ClusterIP 10.106.69.72 <none> 80/TCP 2m5s
service/robusta-holmes ClusterIP 10.110.124.241 <none> 80/TCP 2m5s
service/robusta-kube-prometheus-st-alertmanager ClusterIP 10.105.101.210 <none> 9093/TCP,8080/TCP 2m5s
service/robusta-kube-prometheus-st-operator ClusterIP 10.103.213.208 <none> 443/TCP 2m5s
service/robusta-kube-prometheus-st-prometheus ClusterIP 10.107.13.104 <none> 9090/TCP,8080/TCP 2m5s
service/robusta-kube-state-metrics ClusterIP 10.103.53.30 <none> 8080/TCP 2m5s
service/robusta-prometheus-node-exporter ClusterIP 10.102.243.65 <none> 9104/TCP 2m5s
service/robusta-runner ClusterIP 10.97.82.15 <none> 80/TCP 2m5s
Enter fullscreen mode Exit fullscreen mode

Je peux procéder à l’installation complête via cette formule :

root@k0s-incus:~# helm upgrade robusta robusta/robusta -f ./generated_values.yaml --set clusterName="k0s-cluster"
Release "robusta" has been upgraded. Happy Helming!
NAME: robusta
LAST DEPLOYED: Tue Jan 14 23:14:02 2025
NAMESPACE: default
STATUS: deployed
REVISION: 5
NOTES:
Thank you for installing Robusta 0.20.0

As an open source project, we collect general usage statistics.
This data is extremely limited and contains only general metadata to help us understand usage patterns.
If you are willing to share additional data, please do so! It really help us improve Robusta.

You can set sendAdditionalTelemetry: true as a Helm value to send exception reports and additional data.
This is disabled by default.

To opt-out of telemetry entirely, set a ENABLE_TELEMETRY=false environment variable on the robusta-runner deployment.
Note that if the Robusta UI is enabled, telemetry cannot be disabled even if ENABLE_TELEMETRY=false is set.

Visit the web UI at: https://platform.robusta.dev/
Enter fullscreen mode Exit fullscreen mode

Le cluster apparaît sur Robusta :

Et là également via HolmesGPT, intterogation de la plateforme sur les éventuelles problématiques rencontrées dans le cluster Kubernetes :

Le tout avec une consommation moindre dans le cluster …

L’utilisation de l’IA pour le dépannage et l’analyse des incidents réduit le temps et l’effort humain nécessaire, permettant aux équipes de se concentrer sur des tâches plus stratégiques.

Les outils comme HolmesGPT et Ollama peuvent être mis à l’échelle en fonction de la demande, ce qui est particulièrement utile dans les environnements de production où la charge de travail peut varier significativement.

On peut donc en conclure que l’intégration de l’IA dans les clusters Kubernetes à l’aide d’outils comme HolmesGPT, Ollama et de fournisseur d’instances GPU comme RunPod, offre des avantages significatifs en termes d’efficiacité, de scalabilité et de tolérance aux pannes.

Ces technologies permettent de rationaliser le cycle de vie des applications, de simplifier le dépannage et d’améliorer la gestion des ressources, rendant ainsi les opérations Kubernetes plus robustes et plus performantes …

À suivre !

Top comments (0)