Arthur Azrieli for MeteorOps

Posted on Feb 28 • Originally published at meteorops.com

Proper mindset for handling data and databases: between scaling and failing

#kubernetes #architecture #productivity #database

Startups and software companies put a lot of effort into what languages to use, what tech stacks to employ, and what cloud to deploy the app to but fail to put the same focus on their data and databases

Data is the core of the app and product from which everything is derived and upon which everything depends. Startups and software houses should put more thought into how they plan to gather, store, analyse, and use their data. Doing so could mark the difference between success and failure.

Avoid Common Data Architecture Regrets

Data is the raw material from which you mold your application. Anything else is just tools. If you treat your raw ingredients with proper care, the final recipe is bound to succeed.

The tried and true methods of optimizing data are abundant. It’s well known that databases can be tweaked to better perform through various means:

Indexing - speed up searches that rely on primary keys.
Normalization - keep data atomic and separate.
Query optimization - select only what you need, when you need it.
Partitioning - divide large tables into smaller ones.

So if it’s all tried and true and has been established, why do we highlight it as an overlooked part of many applications? That’s because there’s a long way to go from theory to practice. The list above only represents what can be done to optimize performance, not how and when to do so. Moreover, in a fast-paced environment of software development and especially in startups, proper planning for data is sometimes pushed aside in favor of rapid growth.

Plan Ahead for Your Data

We’ve mentioned several ways to optimize database performance, but what we should really focus on is planning for the data. Questions like what sort of data it will be, what will be the format, and what sort of manipulations it will undergo. Perhaps even more rudimentary than the type and usage of data is the database itself and how it fits your application.

Get to Know the Database

There are two main types of databases these days: relational (SQL) and document-based (NoSQL).

NoSQL:

If your application needs to handle single yet flexible documents.
If you predict large amounts of data that might be distributed and sharded.
If you expect a lot of unstructured data.

SQL:

If your application requires rigid, well-defined schemas and relations.
If your application requires consistency throughout the datascape.
If you intend to digest columnar data using big data tools.

Once you’ve chosen the database to work with, ask yourself again what your use case is. Inform yourself as to what others experienced working with MySQL, MariaDB, PostgreSQL, MongoDB, to name a few. Find the setbacks that others faced and see if at any point in the future you might face something similar.

Get to Know the Data and its Characteristics

The way you design your data now will impact you in the future. It’s a hard task, but force yourself to think of what other functionality you have in store and plan to implement. See if the current data scheme and models allow easy integration of such functionality.

Functionality implies data moving around and being updated constantly. Consider the behavior of your data:

If you do a lot of writes but fewer reads, opt for throughput.
If you do many reads but fewer writes, opt for io and use caching.

Load and Stress the Data

Data-related performance issues mostly hit you when you least expect them. The smooth functionality that you are used to is not attributed to the choice of database or data scheme. It’s mostly attributed to lack of load and stress on the database. This load and stress is what you should strive for.

Again a hard task ahead that requires you to accept that tens of thousands of requests per minute can easily become millions. It’s easy to list and discuss ways to optimize data manipulation and retrieval, but no one gains experience and knowledge without trying.

If you want to know if your data is well-structured and well-retrieved:

Try to write and read more than you imagine would be possible.
Only when it stops working do we look under the hood to find and fix the problem.

Like we said earlier, anything else is just tools. The data is the heart and core of the application and should be created like one: consistent, resilient, scalable.

Protect Your Data

With the efforts of choosing a database, data models, and optimizing their usage behind you, you should think about protecting your data.

Protecting your data from bad actors goes without saying. It’s the internal actors that you need to shield the data from. Internal actors can be services and humans, and since humans make mistakes, so do services.

Consider the following means of mitigating accidental service disruption or, worse yet, data loss or corruption:

Back up the data and plan for deploying from a snapshot.
Limit access from the get-go.
Reads from replicas, no human ever writes directly to the data.
If a service is the owner or main user of a table or database, other services request data through internal APIs.
Monitor the database CPU and memory and plan ahead in case you need to scale.
Look for, kill, and find the source of long-running queries to detect misbehaving services.

Keep The Data Clean

It’s not enough to protect your data. You also have to keep it nice and tidy. A lot of data accumulates through the product lifecycle and more often than not becomes stale.

Modern hard drives are fast, reliable, and reach terabytes in volume, but that doesn’t mean you should fill them with data.

Too much data puts strain on the disk and memory, not to mention that more data means longer queries, even with indexing.

Consider the following as ways to keep your data clean:

Don’t do soft deletes.
Scan and find least retrieved data and archive it.
If you’re using PostgreSQL, use vacuum. If you use MySQL, use optimize. Do the same for any other database you use.
Be wary of making changes – don’t add tables that duplicate data.

Keep Your Data in Mind

Out of all the aspects and methods we discussed, there’s one conclusion to be drawn:

Data is the most important, most overlooked aspect of software development.

To keep your data in mind means to consider all the pros and cons of choosing a database.

To keep your data in mind means you always check how data retrieval affects performance.

To keep your data in mind is to follow these principles:

Choose the right database for the workload.
Create indices and optimize queries.
Optimize I/O through hardware adjustments and caching.
Get rid of unnecessary data – no soft deletes.
Check and check again that high volume doesn’t create bottlenecks.
Back up your data and limit access.

Strive to apply the principles listed above because no matter what you do with your app or product, it’s almost always related to data.

Always keep in mind that Data is the foundation. When it’s well-maintained, the whole system benefits.

DEV Community

Proper mindset for handling data and databases: between scaling and failing

Avoid Common Data Architecture Regrets

Plan Ahead for Your Data

Get to Know the Database

NoSQL:

SQL:

Get to Know the Data and its Characteristics

Load and Stress the Data

Protect Your Data

Keep The Data Clean

Keep Your Data in Mind

Top comments (0)

Read next

What's Your Favorite Developer Workflow Hack?

My Love-Hate Relationship with Helm

👮🏻‍♂️ Enfoque en la Verificación de la Arquitectura 'EleganFlow'

Level Up Your HTML Game with html-tags-utils