Database Partitioning/Sharding - System Design

Database Partitioning/Sharding - System Design

·

2 min read

Ever felt tired of reading long articles with lot of repetitive information? reading through bunch of self explanatory paragraphs and miss what matters the most? This article covers the most important points without self descriptive lengthy explanatory part, depicted with illustrations.

Data Partitioning/ Sharding

What is data partitioning?

It is process of dividing a large dataset into several smaller partitions placed on different machines. Partitions may reside on same machine (coresident) or different machines scan 2022-07-17 19.10.19n_14.jpg

Why to partition? What are the benefits?

Some of the reasons to consider data partition are

  • with growth in data and users, single database server can have degraded performance
  • we might run out of disc space on a single server
  • network bandwidth reaches saturation for a single server to handle

Textually speaking,

  • Improve availability: if one partition is unavailable, other partitions are available to query data, eliminating single point of failure
  • Improve scalability: adding new machines to scale horizontally makes it easier to increase capacity
  • Improve security: by partitioning sensitive and non-sensitive data on different machines and managing differently/independently
  • Increase performance: queries do not need to execute against whole database but instead are focused towards smaller partitions, improving service performance
  • Improve data manageability: individual partitions can be managed separately during maintenance

When to partition?

You can think of partitioning, to list a few,

  • when tables are too large to fit in memory of a single server
  • when new data is added/updated everyday, example, historical vs current data based on months
  • when you want to scale up performance of read queries, you can distribute data on machines and achieve better latency and throughput

Data Partitioning Methods

  1. Horizontal Partitioning
  2. Vertical Partitioning/ Normalization
  3. Functional Partitioning: data is partitioned based on functional/contextual dependency (inventory data vs invoice data), can further leverage horizontal/vertical partitioning based on structure of data scan 2022-07-17 19.10.19n_12.jpg scan 2022-07-17 19.10.19n_13.jpg

Data Partitioning Criteria

  1. Range based partitioning

    covered in schematic of horizontal partitioning above

  2. Hash based partitioning

    Disadvantage of dynamically changing no of partitions can be addressed using Consistent Hashing scan 2022-07-17 19.10.19n_15.jpg

  3. List based partitioning

    partition key is selected as list of values instead of range of values scan 2022-07-17 19.10.19n_16.jpg

  4. Composite partitioning

    partition the data based on 2 or more partition techniques, example, first range partition based on date and then hash subpartition based on userId. To list few combinations

    • Composite range-range
    • Composite range-hash
    • Composite hash-list

Did you find this article valuable?

Support Udaysinh by becoming a sponsor. Any amount is appreciated!