Let's face it, building a database that works is not a challenge at all. The real challenge is to build a database that is fast, reliable, scalable, and well-engineered. In this article, let me show you some of the most important tips when building relational databases like MySQL, PostgreSQL, etc.
[1] Index Your Database
Consider a database table (customers) with the following header/columns:
| id | state | points |
// 10 million rows go here...
Here's how to query the database to get customers in 'NYC':
SELECT id
FROM customers
WHERE state = 'NYC';
The query will be slow because we've got 10 million rows and the DB engine needs to query all rows to get the relevant data. With indexing we can eliminate all irrelevant rows before querying the data so only rows with 'NYC' will be checked when 'NYC' is queried. Here's the same query with index:
CREATE INDEX idx_state ON customers (state);
-- convention is to prefix index with "idx" followed by underscore and then column name.
SELECT id
FROM customers
WHERE state = 'NYC';
This query will be so much faster because only rows with 'NYC' will be queried, not all rows, thanks to indexing. Think of indexing as educating the database engine on which data is where and how it is stored so that when you query data next time, it gives the relevant data more efficiently without wasting resources/time.
I can talk so much about indexing but I don't want to bore you. If you want to learn more about indexing, Google these: "multiple-column indexing", "indexing and sort operation", "composite indexes", etc.
[2] Think in Present
You need to only store that data that you need or the business needs. Don't try to solve a problem which might or might not happen in future. If you need to store additional data, you can always create a new table and add relationship(s).
[3] Say No To Blobs
Blob stands for Binary Large Object and this basically means images, audios, and other large files converted to binary format. If I could impose a rule in the tech world, I'd forbid blob as a data in database tables, because it's incredibly bad to do so. Blobs increase the size of database like crazy and this impacts the performance.
[4] Find The Sweet Spot Between Normalizing / Denormalizing Data
If a table has way too much columns, it never means the table is well-engineered. If that's the case, consider splitting your table to two or more tables. In contrast, if there are a whole lot of tables with very few columns (say 1 to 2) and they're connected with relationships, consider combining them (and be okay with duplicating some data). A lot of beginners think that denormalizing data is something bad because it has the prefix "de" but I can assure you it's not true, lol.
[5] Select The Perfect Data Type
Just because a data type works well with your use case, it doesn't mean it's the best one. For example, to store age of users, use TINYINT data type (an unsigned TINYINT stores values from 0 to 255) instead of INT. INT data type stores up to 2 billion (or something like that). This takes more space and we don't need it at all. The age of users will never go beyond 255. If anyone does reach 254 years old, we still have 1 year to change the data type.
[6] Don't Use Relational Databases, They're Not Scalable
Just kidding, or am I? Just kidding... or am I?
Top comments (9)
jokes aside, people who say relational databases aren't scalable enough are either on the scale of major web service providers like Netflix, Discord etc, or extremely underestimating what a properly setup relational database can do
I mean there is a reason why YouTube, Paypal, Netflix are using MySQL. Instagram uses PostgreSQL
yep, usually multiple different databases are used at that scale, discord getting millions of DB writes a second made them choose use Mongo initially, then even that did not cope with the amount of writes they were getting, they then migrated to Cassandra, and eventually Scylla as their messages DB
Yeah, relationally DBs force you to be extremely organized when designing them.
Right 😃
Full true, agree with all. My five cents: I even created a post about indexes where I shown on real open source experiments which show huge power of indexes hinty.io/devforth/sql-query-optimi...
Damn, you wrote an entire crash course. 😆
This is a great article! 🚀
If you are an opensource enthusiast, feel free to contribute it as a chapter to this opensource eBook here:
bobbyiliev / introduction-to-sql
Free Introduction to SQL eBook
This is an open-source introduction to SQL guide that will help you learn the basics of SQL and start using relational databases for your SysOps, DevOps, and Dev projects. No matter if you are a DevOps/SysOps engineer, developer, or just a Linux enthusiast, you will most likely have to use SQL at some point in your career.
The guide is suitable for anyone working as a developer, system administrator, or a DevOps engineer and wants to learn the basics of SQL.
To download a copy of the ebook use one of the following links:
Dark mode
Light mode
This book is made possible thanks to these fantastic companies!
DigitalOcean is a cloud services platform delivering…
Thanks man.