Understanding Pinning in Rust
Pinning in Rust is an essential concept for scenarios where certain values in memory must remain in a fixed location, making it critical for Rust developers working with async programming, self-referential structs, and Foreign Function Interfaces (FFI). In this article, we’ll dive deeper into pinning, explore the mechanics of the Pin
type, and implement a practical, real-world example to solidify your understanding.
Why Pinning is Essential
Rust’s ownership model allows values to be freely moved in memory by default, ensuring optimal performance. However, certain cases require that a value’s memory address remains constant:
- Self-referential Types: Data structures that reference themselves, creating a direct link to their own fields.
- Async Programming: Many async tasks in Rust (
Future
s) rely on pinned data structures to avoid issues during suspension and resumption. - FFI (Foreign Function Interface): When working with C libraries or other external code, values need to remain at fixed addresses to keep pointers valid.
Pinning helps manage these cases by marking a value as immovable, which Rust enforces through the Pin
type.
Overview of Pin
and Unpin
Rust’s Pin
type ensures that once a value is pinned, it cannot be moved. Here’s how it works:
Pin<P>
Wrapper: This type is a wrapper around pointers likeBox
,Rc
, or&mut
, enforcing that the value inside it stays at a fixed memory address.Unpin
Trait: By default, most types in Rust areUnpin
, meaning they can be moved. Types that are sensitive to movement (like self-referential structs and async futures) do not implementUnpin
, making them compatible withPin
.
Building a Self-Referential Struct with Pinning
Let’s create a more advanced example: a self-referential struct that relies on Pin
to safely hold a reference to its own data. Self-referential structs are tricky in Rust because moving them would invalidate any internal references, leading to undefined behavior. By using Pin
, we ensure the struct remains in a fixed location, making internal references safe.
Implementing a Self-Referential Cache Struct
In this example, we’ll implement a Cache
struct that stores a reference to its own data in a cached_data
field. This structure simulates a self-referential cache that refreshes its content based on a computationally intensive function, and it relies on Pin
to ensure that the self-referential structure is memory-safe.
- Define the Struct: We create the
Cache
struct with fields for the original data and a reference to the cached data. - Use
Pin
: To prevent theCache
instance from moving, we pin it inside aBox
.
use std::pin::Pin;
use std::marker::PhantomPinned;
struct Cache {
data: String,
cached_data: Option<*const String>,
_pinned: PhantomPinned, // Prevents the struct from being `Unpin`
}
impl Cache {
/// Creates a new Cache instance with no cached data initially.
fn new(data: String) -> Self {
Cache {
data,
cached_data: None,
_pinned: PhantomPinned,
}
}
/// Initializes or refreshes the cached reference.
fn refresh_cache(self: Pin<&mut Self>) {
// Safe to use `as_ref` because `data` is pinned and will not move.
let self_ptr: *const String = &self.data;
// SAFETY: `self_ptr` remains valid as long as `self` is pinned.
unsafe { self.get_unchecked_mut().cached_data = Some(self_ptr) };
}
/// Returns the cached data reference.
fn get_cached_data(&self) -> Option<&String> {
// SAFETY: `cached_data` contains a valid reference to `data`.
self.cached_data.map(|ptr| unsafe { &*ptr })
}
}
fn main() {
// Step 1: Pin the Cache instance in memory
let mut cache = Box::pin(Cache::new("Initial Data".to_string()));
// Step 2: Refresh the cache to set up the self-referential pointer
cache.as_mut().refresh_cache();
// Access the cached data reference
if let Some(cached_data) = cache.get_cached_data() {
println!("Cached data: {}", cached_data);
} else {
println!("No data in cache.");
}
// Update data and refresh the cache
cache.as_mut().get_unchecked_mut().data = "Updated Data".to_string();
cache.as_mut().refresh_cache();
// Access the updated cached data
if let Some(cached_data) = cache.get_cached_data() {
println!("Updated cached data: {}", cached_data);
}
}
- The
Cache
Struct:
data
: Stores the primary data.cached_data
: Stores a raw pointer todata
, making it self-referential._pinned
: APhantomPinned
marker that preventsCache
from automatically implementingUnpin
.
2. Pinning the Cache:
- We pin the
Cache
instance usingBox::pin
, ensuring that it won’t move in memory. This is required for the self-referential structure to be safe.
3. Refreshing the Cache:
- The
refresh_cache
method usesPin<&mut Self>
to modifycached_data
, storing a pointer todata
. - The
Pin
wrapper ensures thatdata
won’t be moved, so any references to it inside the struct remain valid.
4. Accessing Cached Data:
get_cached_data
returns a safe reference todata
by dereferencing the pointer stored incached_data
.- This method uses
unsafe
, but the pointer remains valid as long asdata
is pinned, making it safe.
5. Modifying and Refreshing Data:
- We modify
data
directly (while usingunsafe
to bypass pinning restrictions), then callrefresh_cache
to update the self-referential pointer incached_data
.
What Would Happen Without Pinning?
Without pinning, the Cache
struct could be moved in memory after it’s created. When a Rust value is moved, it is effectively relocated to a different memory address. If our Cache
struct holds a pointer to one of its own fields (like data
in our example), moving the struct would invalidate that pointer, creating a dangling pointer.
In this scenario:
- Creating the Self-Referential Pointer: Initially, creating the
cached_data
pointer might work because it would correctly point todata
. - Moving the Struct: If the
Cache
instance were moved after setting upcached_data
, the pointer incached_data
would still point to the old memory location ofdata
, which is now incorrect. - Dereferencing the Pointer: Attempting to use
cached_data
to accessdata
would result in undefined behavior, potentially causing memory corruption, crashes, or other serious issues.
Let’s break down the problems we’d encounter without pinning and why pinning is necessary.
Key Issues Without Pinning
1. Dangling Pointers
A dangling pointer occurs when a pointer references memory that is no longer valid. If the Cache
struct were moved in memory, the cached_data
pointer would no longer point to the correct data
field.
For example:
let mut cache = Cache::new("Initial Data".to_string());
cache.refresh_cache(); // cached_data now points to &cache.data
// Moving `cache` by reassigning it to a new location in memory
let cache = cache; // This moves `cache` to a new address
// Attempting to access `cache.get_cached_data()`
// would now reference an invalid memory location.
Because cache
was moved, cached_data
would now be an invalid pointer, pointing to the previous address of data
, which could lead to undefined behavior when dereferenced.
2. Rust’s Safety Guarantees Broken
Rust’s memory safety model is designed to prevent issues like dangling pointers and invalid references. By default, Rust prevents the creation of self-referential structs that contain pointers to their own fields, as the compiler cannot guarantee they will remain valid if the struct is moved. Without pinning, the Rust compiler would generally disallow this kind of struct.
In our example, we circumvented this using unsafe
and raw pointers. However, without pinning, the structure is inherently unsafe because it relies on an assumption that data
will never move, which Rust cannot guarantee.
3. Undefined Behavior on Dereference
If cached_data
becomes a dangling pointer due to a move, dereferencing it leads to undefined behavior. Undefined behavior in Rust can have various consequences, including:
- Memory Corruption: Accessing or modifying memory that the pointer no longer owns could overwrite other data or cause unexpected behavior.
- Crashes: Attempting to access invalid memory could lead to segmentation faults or crashes.
- Silent Bugs: In some cases, undefined behavior might not immediately crash the program but could produce incorrect results, leading to hard-to-trace bugs.
Why Pinning Solves These Issues
Pinning solves these problems by guaranteeing that a value will not move in memory once it’s pinned. With Pin<Box<Cache>>
, we create a boxed instance of Cache
that cannot be moved, ensuring that any pointers within the struct (like cached_data
) will always reference valid memory. Rust’s Pin
type makes self-referential patterns safe and possible to use by enforcing immovability.
Pinned vs Unpinned in Action
To illustrate how pinning keeps the memory address constant and how, in contrast, an unpinned implementation allows the memory address to change, we can add logging to showcase the memory addresses of both data
and cached_data
during execution.
Let’s start by implementing a version with pinning and logging to show the constant memory address, then proceed to an unpinned version where the address might change.
Example 1: Pinned Implementation with Constant Memory Address Logging
In this pinned version, we’ll log the memory address of data
and show that it remains constant throughout the lifetime of the Cache
instance.
use std::pin::Pin;
use std::marker::PhantomPinned;
struct Cache {
data: String,
cached_data: Option<*const String>,
_pinned: PhantomPinned, // Prevents the struct from being `Unpin`
}
impl Cache {
fn new(data: String) -> Self {
Cache {
data,
cached_data: None,
_pinned: PhantomPinned,
}
}
fn refresh_cache(self: Pin<&mut Self>) {
let self_ptr: *const String = &self.data;
unsafe {
self.get_unchecked_mut().cached_data = Some(self_ptr);
}
println!("Pinned data address: {:p}", &self.data);
println!("Pinned cached_data address: {:p}", self_ptr);
}
fn get_cached_data(&self) -> Option<&String> {
self.cached_data.map(|ptr| unsafe { &*ptr })
}
}
fn main() {
// Step 1: Pin the Cache instance in memory
let mut cache = Box::pin(Cache::new("Initial Data".to_string()));
// Step 2: Refresh the cache to set up the self-referential pointer
cache.as_mut().refresh_cache();
// Access the cached data reference
if let Some(cached_data) = cache.get_cached_data() {
println!("Accessing cached data: {}", cached_data);
} else {
println!("No data in cache.");
}
// Update data and refresh the cache
cache.as_mut().get_unchecked_mut().data = "Updated Data".to_string();
cache.as_mut().refresh_cache();
// Access the updated cached data
if let Some(cached_data) = cache.get_cached_data() {
println!("Accessing updated cached data: {}", cached_data);
}
}
Output
With this pinned version, you’ll see consistent memory addresses for both data
and cached_data
before and after updates, since the Cache
instance cannot move.
Example 2: Unpinned Implementation with Memory Address Logging
Now, let’s implement the same structure without pinning and add similar logging. We’ll see that if the Cache
instance is moved, the memory addresses might change, demonstrating how an unpinned implementation is unsafe for self-references.
struct Cache {
data: String,
cached_data: Option<*const String>,
}
impl Cache {
fn new(data: String) -> Self {
Cache {
data,
cached_data: None,
}
}
fn refresh_cache(&mut self) {
let self_ptr: *const String = &self.data;
self.cached_data = Some(self_ptr);
println!("Unpinned data address: {:p}", &self.data);
println!("Unpinned cached_data address: {:p}", self_ptr);
}
fn get_cached_data(&self) -> Option<&String> {
self.cached_data.map(|ptr| unsafe { &*ptr })
}
}
fn main() {
let mut cache = Cache::new("Initial Data".to_string());
// Refresh the cache to set up the self-referential pointer
cache.refresh_cache();
// Move `cache` by reassigning it to a new variable
let cache = cache; // This moves `cache`
// Try accessing the cached data reference
if let Some(cached_data) = cache.get_cached_data() {
println!("Accessing cached data: {}", cached_data);
} else {
println!("No data in cache.");
}
// Update data and refresh the cache
let mut cache = cache; // Moving it again
cache.data = "Updated Data".to_string();
cache.refresh_cache();
// Access the updated cached data
if let Some(cached_data) = cache.get_cached_data() {
println!("Accessing updated cached data: {}", cached_data);
}
}
Explanation of the Output
In this unpinned version, you may observe different memory addresses printed for data
and cached_data
:
- The first time
refresh_cache
is called,data
andcached_data
will have the same address. - After the first move (
let cache = cache;
), thecached_data
pointer now points to an old memory location, and accessing it could lead to undefined behavior. - Calling
refresh_cache
again may assign a new memory address fordata
, showing that without pinning,data
can move in memory, making thecached_data
pointer potentially invalid and unsafe to dereference.
Summary
- Pinned Version: The memory address remains constant, ensuring that
cached_data
points to a valid location. - Unpinned Version: Moving the struct changes the address of
data
, potentially leading to a dangling pointer incached_data
.
This comparison highlights why pinning is essential for self-referential structs in Rust. Without pinning, any movement of the struct results in invalid references, leading to unsafe and undefined behaviour.
🚀 Discover More Free Software Engineering Content! 🌟
If you enjoyed this post, be sure to explore my new software engineering blog, packed with 200+ in-depth articles, 🎥 explainer videos, 🎙️ a weekly software engineering podcast, 📚 books, 💻 hands-on tutorials with GitHub code, including:
🌟 Developing a Fully Functional API Gateway in Rust — Discover how to set up a robust and scalable gateway that stands as the frontline for your microservices.
🌟 Implementing a Network Traffic Analyzer — Ever wondered about the data packets zooming through your network? Unravel their mysteries with this deep dive into network analysis.
🌟Implementing a Blockchain in Rust — a step-by-step breakdown of implementing a basic blockchain in Rust, from the initial setup of the block structure, including unique identifiers and cryptographic hashes, to block creation, mining, and validation, laying the groundwork.
and much more!
✅ 200+ In-depth software engineering articles
🎥 Explainer Videos — Explore Videos
🎙️ A brand-new weekly Podcast on all things software engineering — Listen to the Podcast
📚 Access to my books — Check out the Books
💻 Hands-on Tutorials with GitHub code
📞 Book a Call
👉 Visit, explore, and subscribe for free to stay updated on all the latest: Home Page
LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here
🔗 Connect with Me:
- LinkedIn: Join my professional network for more insightful discussions and updates. Connect on LinkedIn
- X: Follow me on Twitter for quick updates and thoughts on Rust programming. Follow on Twitter
Wanna talk? Leave a comment or drop me a message!
All the best,
Luis Soares
luis@luissoares.dev
Lead Software Engineer | Blockchain & ZKP Protocol Engineer | 🦀 Rust | Web3 | Solidity | Golang | Cryptography | Author