Sharding & Partitioning
Once a single database leader cannot keep up with the write rate (or once the dataset no longer fits on a single host), you have to split it. Sharding (also called horizontal partitioning) is the practice of splitting one logical table across many physical nodes, each holding a subset of the rows. Every query then has to figure out which shard to talk to.
Sharding is one of the highest-leverage, highest-cost moves in system design. You bring it up only when you’ve justified it with numbers.
Vertical vs horizontal partitioning
Section titled “Vertical vs horizontal partitioning”Worth distinguishing once:
- Vertical partitioning — different columns live on different machines (or in different tables). E.g., move
user_profile_bioto a separate table because it’s rarely accessed. - Horizontal partitioning (sharding) — different rows live on different machines. Most of the time when people say “partitioning” or “sharding,” they mean this.
When in doubt, “sharding” = horizontal, and that’s what the rest of this page is about.
The shard key is the entire decision
Section titled “The shard key is the entire decision”Picking the shard key is the single most consequential decision in a sharded system. Every read and write either:
- Hits a single shard (cheap, scales linearly), because it knows the shard key.
- Has to fan out across all shards (expensive, doesn’t scale), because it doesn’t.
The right shard key is the one your most important queries already include. Picking based on a less-frequent dimension is the classic mistake that turns “we sharded for scale” into “every query is now a scatter-gather.”
A useful test: write down your top five queries. Look at the WHERE clauses. The column that appears most often is your shard key candidate.
Sharding strategies
Section titled “Sharding strategies”Three classic strategies, plus one modern hybrid.
Range sharding
Section titled “Range sharding”Rows are partitioned by ranges of the key. user_id 1–1M lives on shard A, 1M–2M on shard B, and so on.
- Pros: Range scans within a partition are efficient. Easy to reason about.
- Cons: Hot spots. If new users are heavier than old users, the latest range gets all the writes. Monotonically increasing keys (timestamps, auto-increment IDs) are a disaster as range shard keys.
Used by: HBase, Bigtable, CockroachDB (which rebalances ranges dynamically).
Hash sharding
Section titled “Hash sharding”Apply a hash function to the shard key and partition by the hash. hash(user_id) mod N picks the shard.
- Pros: Even distribution by default. No hot spots from key sequencing.
- Cons: Range queries on the shard key become scatter-gather. Resharding (changing N) is painful because every row’s destination changes.
This is where Consistent Hashing earns its keep — by minimizing the rows that move when N changes.
Used by: Cassandra, DynamoDB, most key-value stores.
Directory-based sharding
Section titled “Directory-based sharding”A lookup table maps each key (or key range) to its shard. The mapping is itself stored somewhere — usually a small, replicated metadata service.
- Pros: Maximum flexibility. You can place specific tenants on specific shards, isolate a hot customer, or migrate keys without resharding everything.
- Cons: The directory becomes a critical-path service. You pay an extra hop on each lookup unless clients cache the mapping.
Used by: Vitess, many bespoke multi-tenant systems.
Geographic / tenant-based sharding
Section titled “Geographic / tenant-based sharding”A variant of directory sharding where the shard key is a real-world dimension — country, customer, organization. Keeps related data colocated for both performance and compliance.
Used by: most B2B SaaS at scale (one shard per large customer), apps with strict data-residency requirements.
What changes when you shard
Section titled “What changes when you shard”Sharding is not free. A non-exhaustive list of things you now have to handle:
Joins across shards. They become expensive scatter-gather. Mitigate by denormalizing so joins don’t cross shards, or by colocating related data on the same shard (e.g., shard orders by the same user_id that shards users).
Transactions across shards. The cheap single-shard transaction is fine. Cross-shard transactions require distributed commit (two-phase commit or modern equivalents) and are an order of magnitude slower. Design schemas so most transactions stay on one shard.
Unique constraints. A UNIQUE(email) across shards is hard. Either pick a shard key that makes the uniqueness scope a single shard, or maintain a separate “uniqueness service” — usually a small, separately-keyed lookup table.
Rebalancing. Eventually a shard fills up, gets too hot, or a customer outgrows it. You need a story: consistent hashing with virtual nodes, range splits (Bigtable/Cockroach style), or a directory with online moves (Vitess style). Whatever you pick, mention it.
Operational overhead. Backups, schema migrations, monitoring — all of it gets N times as much work. Tooling that hides this (Vitess, Citus, managed Spanner) is worth its cost.
Hot keys
Section titled “Hot keys”Even with a well-chosen shard key, individual keys can be hot. The classic case: a celebrity user with 10M followers, on a user_id-sharded followers table. One shard, one key, hundreds of thousands of reads per second.
Standard mitigations:
- Caching. Put a CDN or per-key cache in front of the hot shard.
- Replication. Run extra read replicas of the hot shard.
- Key splitting. Append a random suffix to writes and aggregate across them on read.
- Special-case routing. Route known-hot users to a dedicated tier.
In interviews, mentioning the hot-key problem before the interviewer asks is a strong signal.
Sharding vs scaling up
Section titled “Sharding vs scaling up”Always compare:
- Vertical scaling (bigger box). Cheap-ish operationally, real ceiling.
- Read replicas (followers). Scales reads but not writes.
- Sharding. Scales writes but multiplies operational cost.
Senior candidates don’t immediately reach for sharding. They show the math: “At our estimated 50k writes/sec, a single Postgres leader on a modern instance is fine — we’d plan to revisit when sustained writes exceed 30k/sec, which our growth model puts about 18 months out.”
What to say in an interview
Section titled “What to say in an interview”A clean way to introduce sharding:
“For the messages table we shard by
conversation_idusing consistent hashing across 32 shards. The dominant access pattern — ‘load the last N messages in this conversation’ — stays on one shard. Cross-conversation queries are rare and handled by a denormalized search index in Elasticsearch. We use virtual nodes so that adding shards moves roughly 1/N of the data instead of half.”
Three concepts (shard key, strategy, rebalancing), one for each sentence. That is the bar.