Sharding
Sharding is a database partitioning technique that distributes data across multiple machines, or shards, to improve scalability and performance. By dividing a large dataset, a system can store more data and handle higher query and write throughput than a single server could support. Each shard contains its own subset of data and can be managed and queried independently, while the system presents a unified view to applications.
To assign data to shards, systems use a shard key. A routing layer or query planner uses
Common sharding strategies include hash-based sharding, range-based sharding, and directory-based sharding. Hash-based assigns records based on
Benefits include linear scale-out of capacity and isolated failures, which can improve availability. Drawbacks include added
Sharding is widely used in NoSQL and NewSQL databases, such as MongoDB and Cassandra, and is also