DEV Community

Cover image for Non-relational data models
Barbara
Barbara

Posted on • Edited on

Non-relational data models

What

NoSQL databases are distributed databases. They are needed to have high availability, this means we have copies of the data. If there are new changes the data can be different in different locations for milliseconds. This behaviour is called eventual consistency.

When

  • handle different data configurations and store different data types
  • need high availability: There is no single point of failure like in relational databases
  • have a large amount of data, because they are able to scale also horizontally
  • need linear scalability
  • need fast write and reads
  • need a flexible schema

Common types

Wide Column Store:

Document store:

Key Value Store:

Graph DBMS:

CAP Theorem

It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees.

C onstintency

Every read from the database gets the latest piece of data or an error

A vailability

Every request gets received and a response is given. Without a guarantee that the data is the latest update.

P artition tolerance

The system continues to work regardless of losing network connectivity between nodes.

Consistency in the CAP theorem refers to every read from the database getting the latest piece of data or an error.
Consistency in the ACID principle refers to the requirement that only transactions that abide by constraints and database rules are written into the database (correct across rows and tables).

In production supporting availability and partition tolerance makes sense.

Data Modeling with Apache Cassandra

Offers high availability at the potential cost of consistency.

  • it is optimised for fast reads and writes
  • queries first, there are no JOINS in Apache Cassandra. To achieve this denormalisation must be done.
  • one table per query is a good strategy

CQL - Cassandra Query Language

It is similar to SQL, but JOINS, GROUP BY and SUBQUERIES are not supported.

Primary key

PRIMARY KEY (year)
PRIMARY KEY (year, artist_name)
Enter fullscreen mode Exit fullscreen mode
  • must be unique
  • can be only the partition key or can also include additional clustering columns
  • the partition key will determine the distribution across the system

Apache Cassandra does not allow for duplicated data in the rows. Thats why we might need to combine several columns in the primary key to make a composite key.

Clustering columns

PRIMARY KEY ((year), artist_name, album_name)
Enter fullscreen mode Exit fullscreen mode
  • clustering column will sort in ascending order, in order of how they were added to the primary key
  • none, one or n clustering columns can be added

WHERE clause

  • data Modeling in Apache Cassandra is query focused, thats why we need the WHERE clause
  • The partition key must be included in the query and any clustering columns can be used in order they appear in the primary key
  • By using the WHERE statement, we know which node to go to, from which node to get that data and serve it back

SELELCT *

it can be done with ALLOW FILTERING but should be avoided. As the query will be very slow or simply fail due to the amount of data, terabytes of data.

Summarize summary: Sketchnote

To have all the above information in one view, I made a sketchnote.

Sketchnote non relational data models

If you need a higher resolution please use this page

more links:
Data Modeling with noSQL
Composite Partition Keys
Why not to use SELECT *

Top comments (1)

Collapse
 
roman_guivan_17680f142e28 profile image
Roman Guivan

She's posting her notes, with multi-colored ink! wheee!