Saying the above out loud is almost like publicly admitting you've got lepra today, because all the hype seems to be geared towards NoSQL and document based database systems, while RDBMS types of systems doesn't seem to get much love. However, somebody needs to say it out loud, and it might as well be me. To illustrate my point let's examine what you can do with SQL that's simply not possible with most NoSQL systems.
- You can create join statements
- You can extract meaningful statistics
- Partial record updates
- You've got referential integrity
- Etc ...
Let's go through the above features one at the time to see the power of such features. To make this as much "hands on" as possible, please do the following first please.
- Register a Hyperlambda cloudlet here
- Install the SQLite Chinook DB plugin from the "Management/Plugins" section - Screenshot below
Then go to "Tools/SQL Studio" and click the load button, for then to choose the SQL script called "chinook_aggregate_profitable_countries". Make sure you choose the "chinook" database and execute the script. The result should resemble the following.
Doing something such as the above with MongoDB, Couchbase, CosmosDB or DynamoDB could easily turn into a 6 months long software development project by itself. We did everything in less than 5 minutes, assuming you followed the hands on parts. And the result was that we now know which countries are the most profitable countries in regards to music buyers according to the Chinook database. Below is the SQL for reference purposes.
select BillingCountry, round(sum(Total),2) as Revenue
from Invoice
group by BillingCountry
order by Revenue Desc;
Statistics such as the above often depends upon combining joins with group by constructs, and aggregates. Now try to perform a join statement in your CosmosDB installation to extract similar statistics. I'll just eat my popcorn in the meantime ...
Partial record updates
OK, this is only a "partially true argument" since (some) NoSQL systems actually supports partial record updates - However, the last time I checked, CosmosDB did not support it, and unless you've got it, you'll need to lock database records somehow as you're updating them. There are 3 different techniques for locking database records, and they're all disastrous in regards to implementation details, and/or the resulting software.
- Optimistic locking
- Pessimistic locking
- Ignore locking
To start out with the worst, ignoring locking implies you are 100% destined to experience race conditions in the future. Having race conditions on your database, is a guarantee of making sure your data becomes garbage over time.
Going for optimistic locking at best explodes the complexity of your codebase. Going for pessimistic locking ensures that you'll go insane if one of your employees goes on holiday with a web form open on his computer, in addition to that it explodes the complexity of your code of course. I wrote about database locks a year ago at DZone in case you're interested.
With partial record updates, the problem simply vanishes more or less. To understand why, read the above article for details. Some NoSQL database systems do support partial record updates, such as CouchBase if I remember correctly, but most struggles with these kind of constructs.
Referential integrity
This one is the big one (pun!). Most NoSQL database systems poses restrictions upon you in regards to the size of your documents. I think the maximum size in Cosmos is 2MB. The mantra for NoSQL believers of course is as follows ...
You don't need referential integrity because you can just store everything in one document
"Good luck with that when the maximum document size is 2MB" - Is my answer to these people :D
Maybe this is why Twitter only allows you to use 140 characters ...? ;)
Besides, storing everything in one document implies zero normalisation, making it almost impossible to update stuff, where multiple documents are having the same value, and that value needs to be changed over time.
Things you can't do with NoSQL
Below is a list of SQL statements that are more or less impossible to execute in NoSQL database systems.
Counting records from artists
select ar.Name, count(*) as count
from Album al, Artist ar where al.ArtistId = ar.ArtistId
group by al.ArtistId
order by count desc
limit 25
The above counts how many records all artists have released.
Calculate audio format profitability
select mt.Name, count(mt.MediaTypeId) as Amount
from MediaType mt
inner join Track t on mt.MediaTypeId = t.MediaTypeId
group by mt.MediaTypeId
order by Amount desc;
The above calculates how much profit was made from each audio format.
Get info about genre consumers
select distinct c.Email, c.FirstName, c.LastName, g.name
from Customer c
inner join Invoice i on c.CustomerId = i.CustomerId
inner join InvoiceLine ii on i.InvoiceId = ii.InvoiceId
inner join Track t ON ii.TrackId = t.TrackId
inner join Genre g ON t.GenreId = g.GenreId
where g.Name = "Rock"
order by c.Email
The above finds the names and emails of all customers that are listening to rock music.
Count songs by artists
select ar.Name AS ArtistName, count(t.TrackId) as TrackCount
from Track t
inner join Album al on t.AlbumId = al.AlbumId
inner join Artist ar on al.ArtistId = ar.ArtistId
inner join Genre g on t.GenreId = g.GenreId
where g.GenreId = 1
group by al.ArtistId
order by TrackCount desc
limit 10
The above counts how many songs each band has created, and orders such that the most productive bands are listed first.
Conclusion
Every now and then you really need a NoSQL database system. There are problem domains where document based databases are better. However, 99.99999999% of the time a relational SQL based database is simply a bajillion times better. My rule of thumb is as follows ...
Am I to create Twitter, Google or Facebook? If the answer is no, I'm using SQL ... ;)
Top comments (29)
Rick Houlihan (then at AWS):
"One of the things I hear a lot: ‘Use NoSQL because it’s very flexible’. I’ve done a thousand NoSQL applications, I can tell you nothing could be further from the truth.
NoSQL is not a flexible database, it’s an efficient database. The data model is very much not flexible [because it's tightly coupled to the access patterns of the application]."
AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
The funniest I've heard is "we don't need a schema", which of course ends up having to enforce a schema in the application code instead ... :/
Believing you can just "dump" your documents into a NoSQL database is a very rude awakening once you realise what you've done I suspect ... :/
Honestly, there is a right tool for the job for every problem/solution set. You do bring up some valid points here and there which would be factored in on deciding which to choose given the problem at hand. However, a few I disagree on one that stood out "Besides, storing everything in one document implies zero normalisation, making it almost impossible to update stuff, where multiple documents are having the same value, and that value needs to be changed over time." You are living in a Object oriented world, have you ever had a project where requirements or the sow changes? Reorganizing and restructing a sql table imo is far more involved typically (especially legacy ones.) Wheras just popping in a new property in a document requires no overhead. Not to mention, with nosql a team can build an entire frontend application safely knowing all their objects will eventually 'fit' into your db. Vs sql decoupled development becomes challenging imo. Finally, the thing sql had that nosql doesn't is 'technology' lockin today you may be using mongodb tomorrow firestore, the next day? A git based file system ..it really doesn't matter you can hold a json style object anywhere and everywhere.
Great points, and I agree with most of them. However the idea that you can “dump stuff into your db”, is fundamentally wrong, because you’ve got code. You’re just ending up with an “implicit schema” instead of an “explicit schema”. As to me living in OOP land, read older articles in this series 😉
Not to mention, and don't want to generalize because it's not 'always' true just normally true. Deploying multi Tennant or isolated data architecture is a significantly more involved process when dealing with your traditional sql db vs nosql.
You’re preaching for the choir mate. We’ve got a multi location K8s cluster, basically the definition of multi tenant, with hundreds of separate clients, running in the same cluster, oblivious of each other’s existence … 😉
Disclaimer: my point of view will only concern MongoDB. I'm not aware of all of the specificities of other solutions.
I worked exclusively with MongoDB for the last two years and it's a real pain. As you said 99.99% of the time you don't need MongoDB unless you need a big performance boost for specific microservices.
SQL has been proved to be well designed for several decades with a ton of good features and it will always surpass NoSQL in terms of flexibility.
Yes, MongoDB is simple and fast, but it is fast not because it is "well written" but because it's not baked with ton of features like schema validation, transactions, foreign keys, etc.. (but it seems that last versions of MongoDB begin to integrate those concepts.. which is really weird to me). Since those features aren't in MongoDB you NEED to write a whole bunch of defensive code and you need to write it WELL. The task is not easy.
Moreover, as pointed out in this post, there are plenty of queries you can't do in one call. Even with aggregation. And often you'll need to write JS to sort, filter or transform your data in the way you need it.
Unless you don't need schemas and data coherence (like stocking logs or really simple data), my best advice is : don't use it.
I only have one comment for you; Faith in humanity restored 😉
I worked with CosmosDB for 6 months. If one of my team members as much as mumbled “let’s use Cosmos” today, I’d throw the person out the window!
yes u can count records in nosql and u can do lots of other things too but it is much more of a pain in the butt than just using sql
first fetch a json file (which is already a nosql format)
then do this
var len = Object.keys(myJsonFile).length;
console.log(len);
Word!
Awesome, then we agree 100% - My problem is that I see a lot of people choosing NoSQL because it's "cool" ...
You cannot do this, not even in theory, with a db that doesn't have a strongly typed schema. You could of course just return "whatever you've got" as documents in your db to the caller, but that would potentially risk returning stuff such as credit card information, and other sensitive data to unauthorised callers ...
As to CRUD an NoSQL; Yet again, you cannot do this, not even in theory. Generating a CRUD API by the very definition of the term, implies having access to meta data, unless you want a CRUD API that says; "Create, Read, Update and Delete 'garbage'".
Meta data in these regards translates to "a schema". NoSQL databases do not have a schema, and in fact this is a large part of their unique selling point.
To understand the dilemma, check this app, and ask yourselves; "How can I implement this CRUD app when I have no schema?"
The database schema is the formal specification that the above app was generated around. No schema, no meta data - No meta data, no specification - No specification, no app ... :/
Of course, if you've got a NoSQL database with a schema, the argument becomes different ...
However, the last time I checked, neither Cosmos, Mongo, CouchBase or Dynamo had schemas ...
I think that SQL or NOSQL are both good, it depends the project and the scope of the solution, for example if you have a changing schema (I worked with a lot of these use cases) NOSQL databases are the best option because it gives you the flexibility you need. But if you have transactional processes, referencial and structured data use a SQL database. With a good design both can be very powerful tools :)
Very good post bro! I really enjoyed reading it
Thank you :)
The right tool for the job comes to mind ... ;)
JSON column type with indexes and ->> function in mySQL look like end of discussion what is better
Hehe, well, it's an argument ^_^
A good argument too ...
Two things I didn't see you bring up that are key to the conversation:
Scaling and cost.
I would love to see the comparisons when including these two items. Scaling is done with either sql or nosql. What's the difference in complexity and cost for equivalent systems?
What happens if you need to scale that system up 100x for data and traffic?
Which one has overall lower total cost of ownership and which one is will require fewer total hours of developer, administrator, and data scientist time?
I kind of did indirectly bring it up, in that I said if you're about to "implement Twitter" you should probably choose NoSQL. However, if you've got data here I would love for you to put it on the table. Honestly, I don't know the answer to these questions, besides of that I know that if you're doing anything vaguely more complex than "dumping documents into your NoSQL database", then (obviously) the complexity of your middle layer will become much larger, making it more difficult to maintain, with higher costs in regards to human resources. Something echoed by one of the other commenters here where he explained some of his pains related to Mongo ...
Yet again, if you need document database systems, you need document database systems, and you're better choosing such systems. The article was written to convince others about that most of the time you do not need document database systems ...
Document based database systems was invented to solve a very narrow problem domain. The hype surrounding them today though ensures that they're used for everything, also domains where they're extremely inferior to RDBMS ...
Those are all very good points. What I run into is slowing down of sql queries due to the amount of data. Efficiency is the key there. I look at it as if you aren't expecting a lot of data, or don't think the amount of data will grow, sql is the best bet. If you don't know the upper limit to the data you're going to have then nosql is a better choice to me.
Oh, and if you're data structures and access patterns are going to change a lot then sql is probably a better choice.
Am example from me is using RDS vs DynamoDB on AWS. The same data and access is around $1k a month for RDS but only about $35 a month for DynamoDB. And DynamoDB seems to be faster at it as we can massively parallelize the data queries. It just works better for our use case. Also, if we do need the power of sql we can just export to athena and get the information we want for only a couple dollars.
I've seen systems that show to a crawl when using SQL because they get too much data for them to handle with how they have set up the system.
Yes you can easily scale sql if you know what you're doing, but I think it's rarely inexpensive.
I do think it's a right tool for the right job sort of thing.
You can scale RDB systems, but yes it's obviously much harder than to scale a HA replicated database system. I have seen RDB databases with billions of records, but I don't recommend it. Yes, data size is the key axiom here that defines if you need consistency or availability. However, very few databases needs data sizes that justifies NoSQL and availability, at least in my space. My article was just an attempt at pinpointing that fact. As to ...
Word! That was kind of like my punchline here ...
I think you can kinda do the join statement with $lookup in MongoDB right? Or is there some limitation to $lookup I don't know about:
mongodb.com/docs/manual/reference/...
That might be true, I wouldn’t know. However, the whole idea (and unique selling point) about NoSQL is that there’s no consistency (CAP theorem). Without consistency your “joined record(s)” might not even yet having been persisted and available for the cluster as a whole.
NoSQL advocates will object and say; “NoSQL has EVENTUAL consistency”, which is true. However, eventual consistency is equivalent to having your database in a never ending non stop state of “some of my stuff is garbage”
If you’ve got lots of updates, inserts and deletes, you can never trust your database without immediate consistency, ref ACID 😉
Try create a user in RavenDB, then login with that user immediately on your next line of code to see the problem (Hint; Your user doesn’t exist in the database at that point)
Sacrificing consistency for availability is like implementing 100+ threads. Never, ever, ever, ever do this. Unless you absolutely need it is my advice …
"Try create a user in RavenDB, then login with that user immediately on your next line of code to see the problem (Hint; Your user doesn’t exist in the database at that point)" the same issue doesn't occur in a sql db why? If you need to create a user in any style db, there's steps that must be taken to fill out the appropriate data no matter what.
No, you’re wrong. Google the CAP theorem. It explains the mechanics of the “Raven DB” problem. I could explain it to you, but it would be copy/paste of consistency versus availability concepts from CAP …
Have you heard of yugabyteDB? It makes this debate completely nonsense in 2022.
It's an "interesting" database yes. I haven't studied it, but it's a nice piece of tech ...
Distributed ACID transactions, NoSQL and SQL APIs, extreme performance, and it was born in Meta / Facebook... these are some reasons to try it seriously.