DEV Community

Cover image for Elasticsearch Overview(2)- Cluster & Terminology
Richard Zhang
Richard Zhang

Posted on

Elasticsearch Overview(2)- Cluster & Terminology

Related

Elasticsearch Overview(1)- Benefits & Scenarios

Elasticsearch Overview(3)- Shard, Indexing, and Replicas

Elasticsearch Overview(4)- Node & Policy Design

Elasticsearch Overview(5)- Best Practices (Security)

What is Elasticsearch Cluster?

1. Introduction

We discussed Elasticsearch earlier. What is an Elasticsearch cluster? A cluster is a pool of nodes that provide Elasticsearch functionality. In an Elasticsearch cluster, you will have different nodes, which may be different computers, docker containers, or different physical machines. The nodes may be located in the same or different geographical locations. All these nodes work together to provide you with Elasticsearch functionality.

2. Cluster models

When it comes to clusters, there are many models. For example, there are popular models such as all-in-one clusters and role-based multi-node clusters. On the left, you can see a multi-node cluster. It has three master nodes and four green data nodes, and then a coordinator node and two gray nodes.

3. Node roles

Master nodes have a special purpose. They manage the cluster by receiving and sending information about the cluster to stabilize the cluster instance. They do not do any data processing. Data nodes are where logs or data are stored. They are the actual storage nodes. The coordinator node acts as a client. It accepts requests and processes them by getting results from the data nodes. Ingest nodes help get data into the Elasticsearch data nodes. Coordinator and collection annotations are part of Data Annotation 2 and are optional but useful.

4. Best Practices

A three-node all-in-one cluster means that the three nodes act as master, data nodes, and coordinator at the same time. However, for easy scalability and better Elasticsearch clusters, it is better to use multi-node and role-based clusters.

Image description

Multi-Node Cluster

3 Node Cluster

3 Node Cluster

Terminology & How it works

1. Raise the question

We have been discussing the Elasticsearch cluster. Now, let's think about what is inside the cluster. Specifically, how is the data stored and how does the data flow occur in Elasticsearch?

2. The question about the internal structure of the cluster

The cluster we have discussed is full of nodes. There are data nodes inside the cluster, where the data is stored. The data exists in the form of indexes, which are logical aliases for data. Indexes can be split into different shards. Shards are where the data is actually stored. Documents are the real entities of data in Elasticsearch.

3. The relationship between document storage and indexing

All documents are in JSON format and have key-value pairs. They are stored in the form of shards. There is another technology called segments, where the data is actually stored. A group of segments constitutes a shard. When we combine all the shards together, we get an index. The index is a logical entity that you can search for data.

4. Data Inflow

In the picture below, you can see two colors, blue represents how data flows into Elasticsearch. Data sources can be various entities. Elasticsearch supports different data types, including unstructured data. Data sources may be logs, caches, or directly from services or infrastructure, such as Windows, Linux, and service servers such as ENGINETICS and Apache. These data sources send data to Elasticsearch. After the initial evaluation, data processing is done.

5. Data Characteristics and Processing

So, what does your data look like? What key-value pairs does it have? What is its data type? All this metadata information will be processed and extracted into the Elasticsearch index. The index is the logical entity we search. On the other side, green represents the Elasticsearch searcher. For example, Kibana is a tool provided by the Elastic Stack. You can connect through the API or integrate with enterprise search in a web application. Elasticsearch provides multiple clients for different languages. These clients connect to the Elasticsearch cluster. The cluster connects to the index, and then you get the results you expect. This is how the data flow works.

Image description

Top comments (0)