DynamoDB is the service that powers the Amazon.com e-commerce store which many of us purchase on almost daily.
Thanks to DynamoDB’s architecture, the e-commerce platform can handle millions of requests to read and write data per second. That’s nothing short of impressive.
However Amazon is not the only platform that can implement this. With the right knowledge, you can make use of DynamoDB yourself to create a database for your applications that will allow them to scale, literally infinitely. In this article, I’ll show you exactly how you can do this, with practical examples.
Subscribe to my newsletter here
Understanding How DynamoDB Works
In one of my previous articles, I wrote extensively about how DynamoDB works under the hood to scale to massive scales. You can check it out here:
https://medium.com/@atomicsdigital/how-focusing-on-user-needs-helped-scale-the-biggest-e-commerce-store-in-the-world-dde8bb9deffc
To summarize, DynamoDB has a fleet of storage nodes that they provision for you on demand. Each item you write to DynamoDB is hashed and stored inside a partition — and replicated to 2 other twin nodes for fail-safes — inside one of these storage nodes.
DynamoDB is a key-value store, which means each item or record is essentially an object (value) identified by a key. To retrieve an item that contains attributes of data, you specify the primary key, which consists of a partition key (like a primary key in an RDBMS) and a sort key (a key that is used for sorting the items), which together uniquely identifies a record in your table.
Each item in your table is stored inside a partition based on its partition key. So if you have 10 items with the partition key “user#101”, they will all be stored on the same partition. Then items with the partition key “user#102” will be stored on a different partition, and so on.
In the section below we’ll see how this architecture makes reads to your table super efficient.
Why DynamoDB can scale infinitely
As your writes grow, DynamoDB will keep adding more storage nodes automatically for you, sharding your partitions, and managing the nodes themselves in case of failure.
Besides for the hyper-managed service, DynamoDB’s on-demand provisioned storage nodes allow you to create unlimited partitions to store your data in for lightning quick retrieval.
So as you have more users writing data to your database, the data is, hopefully, partitioned well across your storage nodes, enabling subsequent reads of the data to be of maximum efficiency.
In order to read an item from say a billion items stored in your database, DynamoDB simply needs to find the correct partition in which the requested item resides in, which is done in constant time or O(1), instead of querying your entire dataset, of billions of items (which would become very slow the more you scale).
So regardless of the size of your database, DynamoDB can retrieve your item(s) amongst billions of items, just as fast as if there were just a few items in your database, without doing any table scans or index lookups. Pretty mind-blowing.
The data in a particular partition then can also be retrieved very quickly because it is structured as a B-tree. A B-tree can be thought of as a dictionary, where to get a particular item whose key is “John”, you just go to the “J” section and narrow down your search as you would when consulting a dictionary.
Therefore no matter the size of your data inside that partition, the time complexity will always be O(log n), where n is the number of items inside that partition. Still a very fast lookup speed.
Aside from querying, DynamoDB also works by offering you a usage-based cost, instead of a constant cost. What is most amazing about this is you pay based on RCUs and WCUs which stands for read/write capacity units. In short, every 4kb of data you read from dynamoDB per second or 1kb for writes per second, you pay a few cents per month for. The more RCUs per second you enable on your table, the more you pay.
However, you can choose to enable on-demand provisioning and let DynamoDB scale up your reads/writes per second and scale down as well, in cases of low traffic periods. This elasticity allows you to really optimize costs in a way that doesn’t exist in any other database out there.
How to design your database for infinite scalability
Here we will get more technical and practical about how to design your database for infinite scalability.
The most important element in how you choose your partition and sort keys. I go more in-depth on this point in this article, however choosing high cardinality primary keys allows for efficient reads at any scale.
Other design patterns for high scalability include the single table design, data access patterns-based design, overloading keys and indexes, and sparse indexes. All of these patterns and techniques are discussed in great detail in this article, which is a follow-up of this article.
Finally, the most important thing to keep in mind is to group related data under a particular partition key and to keep the composite key (primary and sort key) at a high cardinality — as unique as possible.
Conclusion
In this article, we went over the reasons why Amazon.com is able to scale so immensely using DynamoDB.
We understood how DynamoDB works under the hood and how it can route requests to the database, how it writes data to your tables, and how subsequent reads can be done so efficiently by automatically sharding your data into partitions, which are themselves stored into auto-generated store nodes which DynamoDB fully manages for you.
We understood that data inside a partition is accessed like a dictionary, enabling very fast queries.
Finally we briefly saw a glimpse of some powerful techniques that we can implement ourselves to achieve infinite scaling.
If you enjoyed this article please consider subscribing to my newsletter for more:
Newsletter.
👋 My name is Uriel Bitton and I’m committed to helping you master AWS, Cloud Computing ☁️, and Serverless development. ✨
🥰 You can also follow my journey on:
See you in the next one!
Top comments (0)