DEV Community

Cover image for Scaling Databases (Sharding)
Tanisk Annpurna
Tanisk Annpurna

Posted on

Scaling Databases (Sharding)

๐Ÿ’œ DAY4 -> Scaling Databases (Sharding) ๐Ÿ’œ

As in my previous blogs, We talked about replication of databases. In this We will talk about Sharding.

๐Ÿ‘‰ What is Sharding?

  • Sharding is simply dividing DB mutually exclusive, so data is divided.

  • In simple word, you divide data in smaller chunks called as shards because as data becomes huge the performance also decreases, but having smaller data does not impact performance that much.

*Real life example : *

  • Let's say we have a huge list of names stored on db.
  • Now we can divide db into 3 shards like shard1 will store names starting 'A' to 'J', shard2 -> 'K' to 'T' and shard3 -> 'U' to 'Z'.
  • When request comes to store new names in db, we can simply check first character and point it to proper shard for storing data.
  • When GET request comes, then also we will be able to decide which shard would hold data.

API to shards

-> Remember Image shown above as different shards is not different database but its a single database which has 3 nodes.

SQL vs NO-SQL

  • Both dbs has inbuilt sharding features.
  • SQL follows ACID properties but NO-SQL doesn't. So if there is any update that have to be run on multiple shards, SQL will either perform operations on all shards or none shards at all. but NO-SQL can give intermediate results.

BENEFITS

  • Improved Scalability: It allows Dbs to handle more capacity in much better way.
  • Increased Performance: Any operation that works on only one shard, will achieve result much faster as data is less, so performance is more.
  • Fault Tolerance: If one fails, the others can continue to serve the requests. So, not complete outbreak.
  • Reduced Costs: Vertical Scaling is costly thus sharding allows horizontal scaling.

Why is sharding not used by default

  • The reason for this is very simple, Its very costly as well as time consuming for 2 shards to communicate with each other even though they belong to same DB. So it becomes absolute important to select such data on which sharding is done, so that there is none to minimum communication requirement between shards.

Top comments (0)