What Is Database Sharding? Scaling Your Database Horizontally
Has your database reached billions of rows? Single server can't keep up? Sharding splits your data across multiple databases for virtually unlimited scalability.
What Is Sharding?
Sharding splits a large database into smaller, independent pieces (shards). Each shard holds a subset of data and runs on a separate server.
Sharding Strategies
1. Key-Based (Hash)
shard = hash(shard_key) % total_shards
Even distribution, but resharding is difficult.
2. Range-Based
Shard A: user_id 1-1M | Shard B: user_id 1M-2M
Easy range queries, but hotspot risk.
3. Directory-Based
Lookup table maps data to shards. Flexible but adds a dependency.
4. Geographic
Route users to the nearest shard by region.
Shard Key Selection
The most critical decision in sharding:
| Criteria | Good Choice | Bad Choice | |----------|-------------|------------| | Even distribution | user_id | country | | Query locality | tenant_id | created_at | | Cardinality | UUID | boolean |
Consistent Hashing
Minimizes data movement when adding shards. Only the neighboring shard's data needs redistribution.
Challenges
- Cross-shard queries — Slow scatter-gather across shards
- Transactions — Two-phase commit is complex
- JOINs — Cannot JOIN across different shards
- Resharding — Requires large data migrations
Sharding Tools
| Tool | Database | Approach | |------|----------|----------| | Vitess | MySQL | Proxy-based, YouTube scale | | Citus | PostgreSQL | Extension-based | | MongoDB | MongoDB | Native sharding | | CockroachDB | — | Auto-sharding |
Try Before Sharding
- Read replicas | 2. Caching (Redis) | 3. Better indexing | 4. Vertical partitioning | 5. Data archiving
Best Practices
- Delay as long as possible — Sharding is a last resort
- Choose the right shard key — Hard to change later
- Use consistent hashing — Easier resharding
- Monitor per-shard metrics — Size, latency, hotspots
Conclusion
Sharding provides horizontal scalability for massive data volumes, but at significant complexity cost. Try simpler optimizations first; shard only when truly necessary.
Learn database scaling and sharding on LabLudus.