Database Sharding in Rust

Database sharding is a technique to scale out databases by breaking them into smaller, more manageable pieces called shards. It’s…

Luis Soares

02 Feb 2024 — 6 min read

Database sharding is a technique to scale out databases by breaking them into smaller, more manageable pieces called shards. It’s particularly useful for applications that need to handle large volumes of data and high throughput.

Let’s see how we can implement database sharding in Rust! 🦀

Sharding Schemes

There are several sharding schemes, each with its own advantages and use cases. Let’s explore some of the common sharding schemes with a barefoot code snippet:

1. Key-Based (Hash-Based) Sharding

In key-based sharding, a shard is determined by applying a consistent hash function to a sharding key associated with each record. The hash function maps each key to a shard. This approach ensures an even distribution of data across shards, provided the hash function is chosen well.

use std::collections::hash_map::DefaultHasher; 
use std::hash::{Hash, Hasher}; 
 
fn hash_based_shard<T: Hash>(key: &T, number_of_shards: usize) -> usize { 
    let mut hasher = DefaultHasher::new(); 
    key.hash(&mut hasher); 
    (hasher.finish() as usize) % number_of_shards 
} 
 
// Usage 
let shard_id = hash_based_shard(&"user123", 10); 
println!("Shard ID for 'user123': {}", shard_id);

Pros: Even distribution of data, simplicity.
Cons: Rebalancing data can be challenging when adding or removing nodes.

2. Range-Based Sharding

Range-based sharding involves dividing data into shards based on ranges of a certain key. Each shard holds the data for a specific range of values. For example, in a user database, one shard might hold users with IDs from 1 to 1000, another from 1001 to 2000, and so on.

fn range_based_shard(key: i32, range_size: i32, number_of_shards: usize) -> usize { 
    ((key / range_size) as usize) % number_of_shards 
} 
 
// Usage 
let shard_id = range_based_shard(12345, 1000, 10); 
println!("Shard ID for key 12345: {}", shard_id);

Pros: Easy to implement, efficient for range queries.
Cons: Can lead to uneven data and load distribution if the data isn’t uniformly distributed.

3. Directory-Based Sharding

Directory-based sharding uses a lookup table to keep track of which shard holds which data. When a query comes in, the system consults the lookup table to determine where to route the query.

use std::collections::HashMap; 
 
struct DirectorySharder { 
    directory: HashMap<String, usize>, // Maps a key to a shard ID 
} 
 
impl DirectorySharder { 
    fn new() -> Self { 
        Self { 
            directory: HashMap::new(), 
        } 
    } 
 
    fn add_key(&mut self, key: &str, shard_id: usize) { 
        self.directory.insert(key.to_string(), shard_id); 
    } 
 
    fn get_shard(&self, key: &str) -> Option<usize> { 
        self.directory.get(key).cloned() 
    } 
} 
 
// Usage 
let mut sharder = DirectorySharder::new(); 
sharder.add_key("user123", 1); 
println!("Shard ID for 'user123': {:?}", sharder.get_shard("user123"));

Pros: Flexibility in data distribution, easy to add new shards.
Cons: The lookup table can become a bottleneck if not managed properly.

4. Geographic Sharding

Geographic sharding involves distributing data based on geographic locations. This can be particularly useful for services that are region-specific and can significantly reduce latency by locating data closer to its users.

fn geographic_shard(region: &str) -> usize { 
    match region { 
        "North America" => 0, 
        "Europe" => 1, 
        "Asia" => 2, 
        _ => usize::MAX, // Unknown or fallback shard 
    } 
} 
 
// Usage 
let shard_id = geographic_shard("Europe"); 
println!("Shard ID for Europe: {}", shard_id);

Pros: Reduced latency for geographically distributed applications, improved local data compliance.
Cons: Complexity in managing data consistency across regions.

5. Vertical Sharding

Vertical sharding, also known as functional partitioning, involves splitting a database into shards based on features or services. For example, user-related data might be stored in one shard, while product-related data might be stored in another.

fn vertical_shard(data_type: &str) -> usize { 
    match data_type { 
        "User Data" => 0, 
        "Order Data" => 1, 
        "Product Data" => 2, 
        _ => usize::MAX, // Fallback shard for unknown data types 
    } 
} 
 
// Usage 
let shard_id = vertical_shard("Order Data"); 
println!("Shard ID for Order Data: {}", shard_id);

Pros: Isolation of workloads, potential for performance optimization.
Cons: Can lead to data duplication and complicates transactions that span multiple shards.

6. Tenant-Based Sharding (Multi-Tenancy)

In multi-tenant applications, data from different tenants (customers, organizations) is stored in separate shards. Each tenant’s data is isolated and can be managed independently.

fn tenant_based_shard(tenant_id: &str, number_of_shards: usize) -> usize { 
    // Simple hash-based approach for tenant ID 
    let mut hasher = DefaultHasher::new(); 
    tenant_id.hash(&mut hasher); 
    (hasher.finish() as usize) % number_of_shards 
} 
 
// Usage 
let shard_id = tenant_based_shard("tenant123", 10); 
println!("Shard ID for tenant 'tenant123': {}", shard_id);

Pros: Data isolation, scalability per tenant, easier backup and restore.
Cons: Overhead in managing multiple tenants, potential underutilization of resources.

A working example

For simplicity, let’s see a conceptual Rust example that simulates the sharding logic and demonstrates how you might perform CRUD operations across multiple shards. This example will use a hash-based sharding approach.

Assumptions

We’re simulating the database layer to focus on sharding logic.
We’ll use a simple in-memory structure to represent each shard.
The example will be simplified and not production-ready.

Setup

Rust Environment: Ensure you have Rust installed on your system.
New Project: Create a new Rust project by running cargo new rust_sharding_crud and navigate into the project directory.

Dependencies

This example doesn’t require external crates for simplicity, but in a real application, you might consider crates like diesel for ORM, tokio for async, and serde for serialization.

Code Implementation

Replace the content of src/main.rs with the following code:

use std::collections::hash_map::DefaultHasher; 
use std::collections::HashMap; 
use std::hash::{Hash, Hasher}; 
 
const NUMBER_OF_SHARDS: usize = 4; 
 
struct Shard { 
    data: HashMap<String, String>, // Simulating a simple key-value store 
} 
 
impl Shard { 
    fn new() -> Self { 
        Shard { 
            data: HashMap::new(), 
        } 
    } 
 
    fn insert(&mut self, key: String, value: String) { 
        self.data.insert(key, value); 
    } 
 
    fn get(&self, key: &str) -> Option<&String> { 
        self.data.get(key) 
    } 
 
    fn update(&mut self, key: String, value: String) -> Option<String> { 
        self.data.insert(key, value) 
    } 
 
    fn delete(&mut self, key: &str) -> Option<String> { 
        self.data.remove(key) 
    } 
} 
 
struct ShardedDatabase { 
    shards: Vec<Shard>, 
} 
 
impl ShardedDatabase { 
    fn new() -> Self { 
        let mut shards = Vec::with_capacity(NUMBER_OF_SHARDS); 
        for _ in 0..NUMBER_OF_SHARDS { 
            shards.push(Shard::new()); 
        } 
        ShardedDatabase { shards } 
    } 
 
    fn determine_shard<T: Hash>(&self, key: &T) -> usize { 
        let mut hasher = DefaultHasher::new(); 
        key.hash(&mut hasher); 
        (hasher.finish() as usize) % NUMBER_OF_SHARDS 
    } 
 
    fn insert(&mut self, key: String, value: String) { 
        let shard_id = self.determine_shard(&key); 
        self.shards[shard_id].insert(key, value); 
    } 
 
    fn get(&self, key: &str) -> Option<&String> { 
        let shard_id = self.determine_shard(key); 
        self.shards[shard_id].get(key) 
    } 
 
    fn update(&mut self, key: String, value: String) -> Option<String> { 
        let shard_id = self.determine_shard(&key); 
        self.shards[shard_id].update(key, value) 
    } 
     
    fn delete(&mut self, key: &str) -> Option<String> { 
        let shard_id = self.determine_shard(key); 
        self.shards[shard_id].delete(key) 
    } 
} 
 
fn main() { 
    let mut db = ShardedDatabase::new(); 
     
    // Insert some data 
    db.insert("user1".to_string(), "Alice".to_string()); 
    db.insert("user2".to_string(), "Bob".to_string()); 
     
    // Retrieve and print a value 
    if let Some(name) = db.get("user1") { 
        println!("Found: {}", name); 
    } 
     
    // Update a value 
    db.update("user1".to_string(), "Alicia".to_string()); 
 
    // Delete a value 
    db.delete("user2"); 
     
    // Try to retrieve a deleted value 
    if db.get("user2").is_none() { 
        println!("User2 deleted successfully"); 
    } 
}

Explanation

Shard Structure: Represents a database shard. It’s a simple key-value store for this example.
ShardedDatabase Structure: Manages multiple shards and distributes data among them based on the hash of the key.
CRUD Operations: Implemented as methods on ShardedDatabase, which delegate operations to the appropriate shard based on the key.

Running the Example

Execute your program with Cargo:

cargo run

This will compile and run the Rust application, demonstrating simple CRUD operations across sharded in-memory data stores.

Extending to a Real Database

To extend this example to work with a real database:

Setup Database Instances: Each acting as a shard.
Database Connections: Use a Rust database driver (like diesel for SQL databases) to connect to and interact with each shard.
Error Handling: Add comprehensive error handling for database operations.
Asynchronous Operations: Use async/await for non-blocking database IO operations, likely requiring an async runtime like tokio.

🚀 Explore More by Luis Soares

📚 Learning Hub: Expand your knowledge in various tech domains, including Rust, Software Development, Cloud Computing, Cyber Security, Blockchain, and Linux, through my extensive resource collection:

Hands-On Tutorials with GitHub Repos: Gain practical skills across different technologies with step-by-step tutorials, complemented by dedicated GitHub repositories. Access Tutorials
In-Depth Guides & Articles: Deep dive into core concepts of Rust, Software Development, Cloud Computing, and more, with detailed guides and articles filled with practical examples. Read More
E-Books Collection: Enhance your understanding of various tech fields with a series of free e-Books, including titles like “Mastering Rust Ownership” and “Application Security Guide” Download eBook
Project Showcases: Discover a range of fully functional projects across different domains, such as an API Gateway, Blockchain Network, Cyber Security Tools, Cloud Services, and more. View Projects
LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here

🔗 Connect with Me:

Medium: Read my articles on Medium and give claps if you find them helpful. It motivates me to keep writing and sharing Rust content. Follow on Medium
Personal Blog: Discover more on my personal blog, a hub for all my Rust-related content. Visit Blog
LinkedIn: Join my professional network for more insightful discussions and updates. Connect on LinkedIn
Twitter: Follow me on Twitter for quick updates and thoughts on Rust programming. Follow on Twitter

Wanna talk? Leave a comment or drop me a message!

All the best,

Luis Soares
luis.soares@linux.com