Database Partitioning/Sharding

Ever felt tired of reading long articles with lot of repetitive information? reading through bunch of self explanatory paragraphs and miss what matters the most? This article covers the most important points without self descriptive lengthy explanatory part, depicted with illustrations.

Data Partitioning/ Sharding

What is data partitioning?

It is process of dividing a large dataset into several smaller partitions placed on different machines. Partitions may reside on same machine (coresident) or different machines scan 2022-07-17 19.10.19n_14.jpg

Why to partition? What are the benefits?

Some of the reasons to consider data partition are

with growth in data and users, single database server can have degraded performance
we might run out of disc space on a single server
network bandwidth reaches saturation for a single server to handle

Textually speaking,

Improve availability: if one partition is unavailable, other partitions are available to query data, eliminating single point of failure
Improve scalability: adding new machines to scale horizontally makes it easier to increase capacity
Improve security: by partitioning sensitive and non-sensitive data on different machines and managing differently/independently
Increase performance: queries do not need to execute against whole database but instead are focused towards smaller partitions, improving service performance
Improve data manageability: individual partitions can be managed separately during maintenance

When to partition?

You can think of partitioning, to list a few,

when tables are too large to fit in memory of a single server
when new data is added/updated everyday, example, historical vs current data based on months
when you want to scale up performance of read queries, you can distribute data on machines and achieve better latency and throughput

Data Partitioning Methods

Horizontal Partitioning
Vertical Partitioning/ Normalization
Functional Partitioning: data is partitioned based on functional/contextual dependency (inventory data vs invoice data), can further leverage horizontal/vertical partitioning based on structure of data

Data Partitioning Criteria

Range based partitioning

covered in schematic of horizontal partitioning above
Hash based partitioning

Disadvantage of dynamically changing no of partitions can be addressed using Consistent Hashing
List based partitioning

partition key is selected as list of values instead of range of values
Composite partitioning
partition the data based on 2 or more partition techniques, example, first range partition based on date and then hash subpartition based on userId. To list few combinations
- Composite range-range
- Composite range-hash
- Composite hash-list

Database Partitioning/Sharding - System Design

Table of contents

Data Partitioning/ Sharding

Data Partitioning Methods

Data Partitioning Criteria

Database Partitioning/Sharding - System Design

Table of contents

Data Partitioning/ Sharding

Data Partitioning Methods

Data Partitioning Criteria

Did you find this article valuable?