Consistent Hashing - System Design

Consistent Hashing - System Design

·

3 min read

Follow up post from Hashing example applied in database partitioning

Why do we need consistent hashing?

To address drawbacks with static hashing technique, to list a few,

  • tight coupling with number of nodes in the distributed system makes it cumbersome to dynamically add/remove nodes
  • adding/removing nodes makes us change the hash function and re-distribute load on new number of nodes by recalculating hash for all data/requests
  • this introduces a downtime until this migration is complete and causes a hit on system availability, which is not what we want in a distributed system
  • if a node goes down, it causes uneven distribution of load on remaining nodes

How does consistent hashing help

Consistent hashing solves the problem of rehashing by providing a distribution scheme which does not directly depend on the number of servers.

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to scale without affecting the overall system.

Let's take an example of consistent hashing: scan 2022-07-17 19.10.19n_17.jpg

  1. we have a distributed system serving requests from users
  2. we use the email addresses as partition key to distribute the load on available servers
  3. servers are arranged in circular fashion on a hash ring
  4. we use same hash function f(h) to calculate hash for servers as well as incoming requests to place each on the hash ring
  5. s1 handle requests from 2 users, s2 doing the same, s2 handling requests from only 1 user
  6. let's assume s1 goes down, the requests handled by s1 will be handled by s2. There is no need of rehashing and s3 remains unchanged
  7. this avoids rehashing but implementation is half baked which causes uneven load (s2 processing 4 requests vs s3 processing only 1)

Better implementation of consistent hashing

Let's extend above example and improve the placement of servers on hash ring: scan 2022-07-17 19.10.19n_18.jpg

  1. instead of using one hash function, we apply 3 different hash functions to place the nodes at multiple places on the hash ring. f1(h) -> f2(h) -> f3(h)
  2. These become the virtual nodes on hash ring, underlying its just one server placed virtually on hash ring making s1 as s11,s12,s13 (due to 3 hash functions) and respectively for s2,s3
  3. we use only f1(h) to calculate hash for users to place on hash ring similar to first approach
  4. s1 handle requests from only 1 user, s2 and s3 handling requests from 2 users each
  5. let's assume s2 goes down, so s21, s22, s23 go down
  6. requests from s22 will be handled by s32, requests from s23 will be handled by s12
  7. s1 now handles 2 requests and s3 handles 3 requests, creating even load better than what we saw earlier without virtual nodes

Did you find this article valuable?

Support Write what you know by becoming a sponsor. Any amount is appreciated!