If you've ever read a book, you absolutely know what indexing is. Whether you love fantasy, fiction, or science-fiction(guilty), all of these genres typically have one thing in common: they all have an Index! Often this index is called a "Table of Contents" but it's the same thing: at least one page of information in regards to what the book contains.
This table is also very much structured like a table in our database, with at least one key/value pair. The key usually being the chapter's title, and the value is the page the chapter starts on.
If it weren't for these indices, we would have to flip through every page in the book until we find the chapter we're looking for.
Even worse, what if were just looking for an excerpt from that chapter but the book had no chapters? No page numbers? It would be terrible! Luckily for us that's not the case, and the book's table of contents allows us to easily find the chapter we're looking for and the exact page that it's on.
Indexing a database serves that same purpose: to help developers retrieve information they're looking for with minimal input/output(I/O) operations and a fast sub-linear time complexity.
Data Basics
Indices in databases are amazing in the power they harness but the small amount of space they take up. They can be any data structure that helps improve a database's performance.
Very commonly, developers use B+ Trees to index. B+ Trees are self-balancing data structures that store information in keys in a condensed a manner that allows for a rapid retrieval rate.
An alternate, equally as powerful index structure is the B-Tree(above), which is also self-balancing, but stores information in key/value pairs.
Indices are created by using at least one, if not multiple columns in a table. Indices are also incredibly flexible because they don't follow a standard structure, therefore, implementation techniques can be left up to the developer.
There are a few syntax specifics in regards to their construction but overall, fewer semantics involved as well (looking at you AJAX).
Benefits & Trade-offs
Indices are a snippet of the database called the Key or Database Key. This miniature version of the database is it's own entity that keeps a shallow copy of the disk block address, or a direct link to the queried field.
Because of the space this 'mini-base' also takes up, we trade-off a rapid retrieval time with the amount of records our database can hold, as well as additional memory. One could also consider the initial time it takes to set up the index in development as a minor drawback, but frankly, I find this to be quite a fair trade in the end.
While it is possible to retrieve a specific field using only the first column in the index, it is not possible to retrieve a field only using the greater indexed columns, this is why it is important to keep the columns in order when indexing.
By keeping ordered columns in our index, we are able to use parallel processing algorithms that have guaranteed results and keep a sub-linear time complexity. What we wind up with is a useful tree-structure that cuts down our I/O operations.
Constraints in Construction
A lot of us absolutely crave structure (even if we don't realize or admit it), especially in work. Indexing is perfect for developers who need to translate their jumbled thoughts and data into a well-organized system. By policing the constraints we set for our database, the index keeps structure and order. Let me reiterate, indices are not the actual constraints, they just moderate and enforce them.
These constraints are placed on the database in creation and implemented using a Database Management System(DBMS) like mySQL, mongoDB(schemaless), or mariaDB.
My favorite is mongoDB because of it's readability and use of javascript functions, so let’s check out an example of what setting some of these constraints would look like:
//create a new mongoDB schema using mongoose
const artistSchema = new mongoose.Schema({
//set the constraints for the index to moderate the artistId and name
id: { type: Number, index: true, unique:true },
name: { type: String, primary: true },
hasVocals: Boolean,
hasMoves: Boolean,
hasBags: Mixed,
});
//create models for the db
const Beyoncé = mongoose.model('Beyoncé', artistSchema);
const Nicki = mongoose.model('Nicki', artistSchema);
const Rihanna = mongoose.model('Rihanna', artistSchema);
Great! We've built out our database schema and added some records, and now we can see (even with this incredibly basic example) how much easier it would be to find specific fields even in a sea of data. If we want to find Rihanna's information, all we have to do is type in her name or id to get her field's address. Indices are immensely helpful by providing fast, guaranteed results given a correct input.
In Conclusion
It's easy to turn a standard database into a super-base just by adding an index data structure! While my example was simple, imagine a search through thousands of documents and how difficult it would be to find one piece of specific information without an index to locate exactly where it is.
Without an index, we fall into a slow linear time complexity and also increase the amount of input and output operations we would have to do, like breaking out of or continuing a loop once the data is actually found.
Next time you're constructing the schema for your database, make sure to set some constraints and create an index using the data structure of your choice to match. To every developer, everywhere: you'll be glad you did!
Thanks for reading!
Top comments (5)
hello. nice post. question. do we really need to index a primary key? or just the column that we usually search. like firstName, lastName. Because primary keys are unique it'll be a constant time getting a row right? hehe. just wondering.
Hey arToo, thanks for catching that! I'm still pretty new to server side database work so I appreciate that feedback! You probably don't need to set that as the unique key for the exact reason you stated!
hehe. cool. thanks. I was debating it by myself too. thanks again, :)
I believe you can also index other columns as well. It doesn't have to be the primary key.
of course you. hehe. just wondering if its still ok to index a primary key with a unique constraint.