Covering Index

One of the beautiful concepts in databases is the Covering Index, which plays a significant role in improving query performance.
Simply put, this index contains all the columns needed in a query without having to perform a lookup inside the data table to fetch the required columns.

As we know, indexes enable the DBMS to access data more quickly, and indexes can be either clustered or non-clustered.

Each table has one clustered index, and the clustered index represents the physical order of the records. So, if we have a table named Student containing the following columns: Id, FirstName, LastName, Email, GPA, and our clustered index is on the primary key (which is Id), when we write a query to retrieve all the columns from Student where Id = 287, the clustered index will be used to reach that record and return the data.

On the other hand, a non-clustered index is stored separately from the table data and contains the columns I specify, along with either the clustered index key (if it exists) or the location of the row in the heap data pages (if the table doesn't have a clustered index).

So, if I create a non-clustered index on GPA, it will store GPA sorted in ascending order along with the clustered index key, which in our case is the Student Id.

If I need to retrieve all the columns from the table where the records match the condition GPA > 2.8, the query will first access the non-clustered index to get the IDs of the students whose GPA is greater than 2.8. Then it will use the original IDs to perform a key lookup to retrieve the column values.

This approach is much more performant than scanning every record in the original table, checking the GPA, and then retrieving the data.

However, it still involves two steps: first, fetching the matched records using the non-clustered index and then going back to the table to get the data.

Is there a way to fetch the data without going back to the table?

The answer is yes, and that's where the Covering Index comes in.
It's called a Covering Index because it covers the query's select columns that are needed.
Essentially, it's a non-clustered index with all the columns required for the query added to it, so it doesn't have to go back to the data table for a lookup. The data is already available within the index itself. Additionally, we can include the columns we need in the query without making them key columns in our non-clustered index.

#sql
#databases
#covering_index
#indexes
#clustered_index
#non_clustered_index
#development

DEV Community

Covering Index

Top comments (1)

Read next

Semi-Supervised Domain Generalization: Robust Pseudo-Labeling for Unseen Domains

How do you manage tasks when you have a lot of todos?

State Management in React Using MobX State Tree (MST)

👽Lucky Alien Game 🚀