To start the year strong I wanted to discuss and illustrate some essential concepts of solid DynamoDB data modeling.
These concepts will easily level up your understanding of DynamoDB and allow you to build more performant and scalable databases than ever before.
Single Table Design & Key Overloading
The single table design in key-value databases aims to solve the issue of querying multiple entities to fetch related data and to reduce your table costs to one table’s minimal cost.
While setting on-demand mode in DynamoDB will close the gap on the costs between a single and multiple tables, the big win on latency and number of requests remain.
With the single table design, we store all related data entities under one table. We do this so that when we need to query related data, we can use a single query rather than sequential queries across data entities.
So how does this work in practice?
Well, with the single table design, we can model the data for multiple data entities and overload our primary keys to store these entities together.
Here’s an example.
Imagine a university database that needs to store information about campuses, buildings, classrooms, and courses.
Our database entities are:
- Campuses
- Buildings
- Classrooms
- Courses
All of these entities are related in some way so we can store them together using the single table design.
Let’s see how we can design this database with a hierarchical data model.
Hierarchical Data Modeling
Using a hierarchical data model allows us to represent these entities and filter through them efficiently.
Here’s an example:
Say we wish to store records of courses, we can model the items like so:
The sort key (sk) holds the information of a course in a hierarchical structure.
We store the building number, the room number and the course number together, separated by hashes.
Storing the items in this way is what is meant by a hierarchical data model — each entity is a child of the one before it.
This lets us perform some interesting queries on the university’s courses.
If we want to get all courses in campus-A, we can write the following query:
pk = campus-A AND begins_with(sk, "building#")
This will return all items in the campus-A partition, whose “sk” values start with the string “building#” — that’s the first two items in our database above (and essentially every item in the campus-A partition).
We can filter further and query “all courses in building 001”:
pk = campus-A AND begins_with(sk, "building#001")
That would return the first item only (and any other courses that might be in building#001).
Finally, we can also query all courses that are being taught in room B for example:
pk = campus-A AND begins_with(sk, "building#001#room#B")
The idea of using multiple entities within the sort key, as we did above, is referred to as key overloading.
Key overloading also introduces different query access patterns across multiple partitions — I explain this concept in this article.
Remember, this data model we chose fits the needs of the data as it is being stored.
If we had a different method of storing and representing these entities we may alter our data model and hierarchical structure to better fit our needs.
Date Range Filtering
If we have use cases for complex date filtering, we can accomplish this using some clever sort key modeling along with the query methods DynamoDB provides.
Imagine our university database now needs to store information about the dates exams are taken. The university also needs to query specific date ranges by year, month, and even by week and specific days.
Let’s see how we can achieve this with our data model.
Here is an extract of some exam items in the database:
We have a list of exams being given across 2 different months.
For each exam item, we prefix the sort key with the date it is given on and suffix it with the exam number.
Now, say the university wants to get all the exams from a given course, throughout the year.
We can write the following query to satisfy this access pattern:
pk = course#101 AND sk BETWEEN "exam-date#2025-01-01#" AND "exam-date#2025-12-31#"`
This query gets all items with the exam-date prefix of the first day of the year until the last day of the year.
We can also further specify our query by month, like so:
`pk = course#101 AND sk BETWEEN "exam-date#2025-01-01#" AND "exam-date#2025-01-31#"
We can also narrow down our filter to exams given on specific days. For that we’d just need to add the time as well to the date string of all sort keys (I removed these for brevity).
pk = course#101 AND sk BETWEEN "exam-date#2025-01-01-00:00#" AND "exam-date#2025-12-31-23:59"
We now have a data model that can accept efficient filtering with simple ways to query this data at scale.
Conclusion
We’ve seen some interesting and advanced data model examples of some real-world scenarios and how to most efficiently design them.
Using the single table design, key overloading and hierarchical data modeling lets us accomplish powerful filtering techniques on our database.
👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.
🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter Excelling With DynamoDB.
Thanks for reading and see you in the next one!
Top comments (0)