This post was originally published on my blog: Installing Elasticsearch inside a Kubernetes cluster with Helm and Terraform
Github Repository: tf-helm-kubernetes-elasticsearch
Note:
This guide uses Terraform for making API calls and state management. If you have helm installed on your machine, you can use that instead for installing the chart.
What is Elasticsearch?
According to the Elasticsearch website:
Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.
Elasticsearch is generally used as the underlying engine for platforms that perform complex text search, logging, or real-time advanced analytics operations. The ELK stack (Elasticsearch, Logstash, and Kibana) has also become the de facto standard when it comes to logging and it's visualization in container environments.
Architecture
Before we move forward, let us take a look at the basic architecture of Elasticsearch:
The above is an overview of a basic Elasticsearch Cluster. As you can see, the cluster is divided into several nodes. A node is a server (physical or virtual) that stores some data and is a part of the elasticsearch cluster. A cluster, on the other hand, is a collection of several nodes that together form the cluster. Every node in turn can hold multiple shards from one or multiple indices. Different kinds of nodes available in Elasticsearch are Master-eligible node, Data node, Ingest node, and Machine learning node(Not availble in the OSS version). In this article, we will only be looking at the master and data nodes for the sake of simplicity.
Master-eligible node
A node that has node.master
flag set to true
, which makes it eligible to be elected as the master node which controls the cluster. One of the master-eligible nodes is elected as the Master via the master election process. Following are few of the functions performed by the master node:
- Creating or deleting an index
- Tracking which nodes are part of the cluster
- Deciding which shards to allocate to which nodes
Data node
A node that has node.data
flag set to true
. Data nodes hold the shards that contain the documents you have indexed. These nodes perform several operations that are IO, memory, and CPU extensive in nature. Some of the functions performed by data nodes are:
- Data related operations like CRUD
- Search
- Aggregations
Terminology
Now that we have a basic idea about the Elasticsearch Architecture, let us see how to Elasticsearch inside a Kubernetes Cluster using Helm and Terraform. Before moving forward, let us go through some basic terminology.
Kubernetes: Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation
Helm: Helm is an application package manager running atop Kubernetes. It allows describing the application structure through convenient helm-charts and managing it with simple commands
Terraform: Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions
Installation
First, let us describe the variables and the default values needed for setting up the Elasticsearch Cluster:
Default Values:
variable "elasticsearch" {
type = object({
master_node = object({
volume_size = number
cpu = number
memory = number
})
data_node = object({
volume_size = number
cpu = number
memory = number
})
})
default = {
master_node = {
volume_size = 20
cpu = 1
memory = 1.5
}
data_node = {
volume_size = 20
cpu = 1
memory = 1.5
}
}
}
variable "kubeconfig_file_path" {
type = string
default = "/my/file/path"
}
For the sake of simplicity, I will assume that you have a working helm installtion. Although, you can still go over to the Github Repository to take a look at how to install helm and tiller onto your Kubernetes cluster using Terraform.
Terraform Helm Setup
This step involves declaring a helm provider and the elasticsearch helm repository to pull the helm chart from
provider "helm" {
kubernetes {
config_path = var.kubeconfig_file_path
}
version = "~> 0.10.4"
service_account = kubernetes_service_account.tiller.metadata[0].name
install_tiller = true
}
data "helm_repository" "stable" {
name = "elastic"
url = "https://helm.elastic.co"
}
Setting up Master Eligible and Data nodes
Let us take a look at some of the important fields used in the following helm release resources:
-
clusterName
- This refers to the name of the elasticsearch cluster and has the default value ofelasticsearch
. Because elasticsearch looks at the cluster name when joining a new node, it is better to set the value of this field to something else. -
nodeGroup
- This tells the elasticsearch helm chart whether the node is a master eligible node or a data node -
storageClassName
- The kubernetes storage class that you want to use for provisioning the attached volumes. You can skip this field if your cloud provider has a default storageclass object defined -
cpu
: The number of CPU cores you want to give to the elasticsearch pod -
memory
: The amount of memory you want to allocate to the elasticsearch pod
Master Eligible Nodes
resource helm_release "elasticsearch_master" {
name = "elasticsearch-master"
repository = data.helm_repository.stable.metadata[0].name
chart = "elasticsearch"
version = "7.6.1"
timeout = 900
values = [
<<RAW_VALUES
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: ${var.elasticsearch.master_node.volume_size}Gi
resources:
requests:
cpu: ${var.elasticsearch.master_node.cpu}
memory: ${var.elasticsearch.data_node.memory}Gi
roles:
master: "true"
ingest: "false"
data: "false"
RAW_VALUES
]
set {
name = "imageTag"
value = "7.6.2"
}
set {
name = "clusterName"
value = "elasticsearch-cluster"
}
set {
name = "nodeGroup"
value = "master"
}
}
Data Nodes
resource helm_release "elasticsearch_data" {
name = "elasticsearch-data"
repository = data.helm_repository.stable.metadata[0].name
chart = "elasticsearch"
version = "7.6.1"
timeout = 900
values = [
<<RAW_VALUES
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: ${var.elasticsearch.data_node.volume_size}Gi
resources:
requests:
cpu: ${var.elasticsearch.data_node.cpu}
memory: ${var.elasticsearch.data_node.memory}Gi
roles:
master: "false"
ingest: "true"
data: "true"
RAW_VALUES
]
set {
name = "imageTag"
value = "7.6.2"
}
set {
name = "clusterName"
value = "elasticsearch-cluster"
}
set {
name = "nodeGroup"
value = "data"
}
}
Happy Coding! Cheers :)
Top comments (0)