Dynamic Linking and Memory Relocations in Rust
When you compile source code into object files (such as .o
files), the compiler generates machine code along with metadata that indicates how different parts of the code should be adjusted when the program is loaded into memory. These adjustments are known as relocations. They ensure that references to functions and variables point to the correct memory addresses, even if the final placement of the code in memory isn't known at compile time.
A relocation typically specifies:
- Offset: The location in the code or data segment where an address needs to be updated.
- Symbol Reference: The function or variable whose address needs to be inserted.
- Relocation Type: The kind of adjustment required (e.g., absolute address, relative address).
In this guide, we’ll focus on parsing ELF object files, extracting relocation entries, resolving symbol addresses across multiple libraries, and applying these relocations to a simulated memory space.
Setting Up the Environment
Before diving into the code, ensure you have Rust installed. You’ll also need the goblin
, anyhow
, and plain
crates, which facilitate parsing ELF files, error handling, and byte-level data manipulation, respectively.
Cargo.toml
Begin by setting up your Cargo.toml
with the necessary dependencies:
[package]
name = "toy_linker_demo"
version = "0.1.0"
edition = "2021"
[dependencies]
goblin = "0.7"
anyhow = "1.0"
plain = "0.3"
Writing the Linker in Rust
We’ll construct a Rust program that simulates a simple linker. This linker will:
- Load two ELF object files (
a.o
andb.o
). - Parse their sections and symbols.
- Resolve symbol references between them.
- Apply relocations to adjust addresses accordingly.
Structuring the Global Symbol Table
To manage symbols across multiple libraries, we introduce a GlobalSymbolTable
. This structure maintains a mapping of exported symbols to their memory addresses and keeps track of loaded memory sections.
struct ExportedSymbol {
file_name: String,
address: usize, // Memory address where the symbol resides
}
struct GlobalSymbolTable {
exports: std::collections::HashMap<String, ExportedSymbol>,
mem_map: std::collections::HashMap<String, Vec<u8>>,
}
impl GlobalSymbolTable {
fn new() -> Self {
Self {
exports: std::collections::HashMap::new(),
mem_map: std::collections::HashMap::new(),
}
}
}
Loading and Relocating Object Files
The core functionality resides in the load_and_relocate_object
function. This function performs several critical tasks:
- Reading the Object File: It reads the raw bytes of the ELF object file.
- Parsing the ELF Structure: Using
goblin
, it parses the ELF headers, sections, and symbols. - Copying Relevant Sections: It identifies and copies sections like
.text
,.data
, and.rodata
into a simulated memory buffer. - Processing Symbols: It distinguishes between exported symbols (those defined within the object file) and undefined symbols (those referencing external symbols).
- Applying Relocations: It parses relocation entries and adjusts the memory buffer based on symbol addresses.
Here’s how the function is implemented:
fn load_and_relocate_object(
file_name: &str,
load_base: usize,
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
println!("Loading file: {} at base 0x{:x}", file_name, load_base);
// 1) Read the object file
let bytes = fs::read(file_name)?;
// 2) Parse the ELF
let obj = match Object::parse(&bytes)? {
Object::Elf(elf) => elf,
_ => {
println!("Not an ELF file: {}", file_name);
return Ok(());
}
};
// Create a memory buffer (64 KB for demonstration)
let mut memory = vec![0u8; 65536];
// 3) Copy .text, .data, .rodata, etc. into 'memory'
for sh in &obj.section_headers {
if sh.sh_size == 0 {
continue;
}
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if name == ".text" || name == ".data" || name == ".rodata" {
let section_start = load_base + (sh.sh_addr as usize);
let section_end = section_start + (sh.sh_size as usize);
let file_offset = sh.sh_offset as usize;
let file_end = file_offset + (sh.sh_size as usize);
memory[section_start..section_end]
.copy_from_slice(&bytes[file_offset..file_end]);
println!("Copied section {}: 0x{:x}..0x{:x}",
name, section_start, section_end);
}
}
}
// 4) Parse the symbol table and note which are exported vs. undefined
let mut symbols: Vec<(String, Sym)> = Vec::new();
let syms = &obj.syms; // Direct Symtab reference
for sym in syms.iter() {
if sym.st_name == 0 {
continue;
}
if let Some(name) = obj.strtab.get_at(sym.st_name) {
symbols.push((name.to_string(), sym));
}
}
// 4b) For each symbol, if st_shndx != 0 => export
for (sym_name, sym) in &symbols {
if sym.st_shndx != 0 {
let sym_addr = load_base + sym.st_value as usize;
println!("Symbol '{}' exported at 0x{:x} by {}",
sym_name, sym_addr, file_name);
global_syms.exports.insert(sym_name.clone(), ExportedSymbol {
file_name: file_name.to_string(),
address: sym_addr,
});
} else {
// It's an undefined symbol => we'll patch references
println!("Symbol '{}' is UNDEF in {}", sym_name, file_name);
}
}
// 5) Apply relocations: .rel.* (Rel) or .rela.* (Rela)
apply_rel_or_rela(&obj, &bytes, false, load_base, &mut memory, &symbols, global_syms)?;
apply_rel_or_rela(&obj, &bytes, true, load_base, &mut memory, &symbols, global_syms)?;
// 6) Store the final memory buffer
global_syms.mem_map.insert(file_name.to_string(), memory);
Ok(())
}
Handling Relocations
Relocations are entries that specify where and how to adjust addresses in the loaded sections. The apply_rel_or_rela
function processes both .rel.*
and .rela.*
relocation sections. It utilizes the plain
crate to parse raw bytes into Rel
or Rela
structures.
fn apply_rel_or_rela(
obj: &goblin::elf::Elf,
file_bytes: &[u8],
is_rela: bool,
load_base: usize,
memory: &mut [u8],
symbols: &[(String, goblin::elf::Sym)],
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
for sh in &obj.section_headers {
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if (is_rela && name.starts_with(".rela")) || (!is_rela && name.starts_with(".rel")) {
println!("Processing relocation section: {}", name);
let entry_size = if is_rela {
std::mem::size_of::<Rela>()
} else {
std::mem::size_of::<Rel>()
};
let count = sh.sh_size as usize / entry_size;
let mut offset = sh.sh_offset as usize;
for _ in 0..count {
if is_rela {
let rela: Rela = from_bytes::<Rela>(&file_bytes[offset..offset + entry_size])
.map_err(|e| anyhow!("Failed to parse Rela: {:?}", e))?;
offset += entry_size;
let sym_index = rela.r_info >> 32;
let r_type = (rela.r_info & 0xffffffff) as u32;
let reloc_offset = rela.r_offset as usize;
let addend = rela.r_addend;
apply_one_reloc(
reloc_offset,
sym_index as usize,
r_type,
addend,
load_base,
memory,
symbols,
global_syms
)?;
} else {
let rel: Rel = from_bytes::<Rel>(&file_bytes[offset..offset + entry_size])
.map_err(|e| anyhow!("Failed to parse Rel: {:?}", e))?;
offset += entry_size;
let sym_index = rel.r_info >> 32;
let r_type = (rel.r_info & 0xffffffff) as u32;
let reloc_offset = rel.r_offset as usize;
// .rel typically has implicit addend = 0
apply_one_reloc(
reloc_offset,
sym_index as usize,
r_type,
0,
load_base,
memory,
symbols,
global_syms
)?;
}
}
}
}
}
Ok(())
}
This function iterates over all section headers, identifying relocation sections based on their names (.rel.*
or .rela.*
). For each relocation entry, it parses the raw bytes into a Rel
or Rela
structure and then delegates the patching process to apply_one_reloc
.
Patching Memory with Relocations
The apply_one_reloc
function performs the actual memory patching. It calculates the final address for a symbol and updates the memory buffer accordingly.
fn apply_one_reloc(
reloc_offset: usize,
sym_index: usize,
r_type: u32,
addend: i64,
load_base: usize,
memory: &mut [u8],
symbols: &[(String, goblin::elf::Sym)],
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
let patch_addr = load_base + reloc_offset;
println!("Applying reloc @ 0x{:x}, sym_idx {}, type {}, addend={}",
patch_addr, sym_index, r_type, addend);
// 1) Find symbol name from sym_index
let (sym_name, sym) = match symbols.get(sym_index) {
Some(pair) => pair,
None => {
eprintln!("No symbol for index {}", sym_index);
return Ok(()); // Gracefully skip unresolved symbols
}
};
// 2) Resolve the symbol address
let final_addr: u64 = if sym.st_shndx == 0 {
// Imported symbol; look it up in the global symbol table
if let Some(export) = global_syms.exports.get(sym_name) {
export.address as u64
} else {
eprintln!("Symbol '{}' not found in global exports!", sym_name);
0
}
} else {
// Local symbol; compute its address based on load_base
(load_base + sym.st_value as usize) as u64
};
// Incorporate the addend into the relocation value
let reloc_value = final_addr.wrapping_add(addend as u64);
// 3) Patch the memory buffer with the computed address (little-endian)
let bytes = reloc_value.to_le_bytes();
for i in 0..8 {
memory[patch_addr + i] = bytes[i];
}
println!(" -> Patched 0x{:x} with 0x{:x} (symbol={})",
patch_addr, reloc_value, sym_name);
Ok(())
}
This function begins by calculating the absolute address where the relocation needs to be applied. It then retrieves the symbol’s name and determines whether the symbol is local or imported. For imported symbols, it looks up the address in the global symbol table. Finally, it updates the memory buffer at the specified offset with the resolved address, taking into account any addend.
The Main Function
The main
function orchestrates the loading and linking process. It initializes the global symbol table, loads each object file, and displays the resolved symbols.
fn main() -> Result<()> {
let mut global_symbols = GlobalSymbolTable::new();
// Load 'b.o' first, then 'a.o'
load_and_relocate_object("b.o", 0x20000, &mut global_symbols)?;
load_and_relocate_object("a.o", 0x30000, &mut global_symbols)?;
println!("\nDone loading both libraries!\n");
println!("Global symbols known are:");
for (name, sym) in &global_symbols.exports {
println!(" - {} => address 0x{:x} (in file {})",
name, sym.address, sym.file_name);
}
Ok(())
}
This function sequentially loads each object file, allowing symbols exported by earlier files to be resolved by later ones. After loading, it prints out the symbols that have been successfully linked.
Compiling and Running the Linker
Before running the linker, ensure that your object files (a.o
and b.o
) are in ELF format. On macOS, the default object file format is Mach-O, which is incompatible with ELF parsers like goblin
. To generate ELF object files on macOS, you need to cross-compile them targeting Linux.
Compiling C Source Files to ELF Object Files
Assuming you have two C source files, a.c
and b.c
, where a.c
references a function defined in b.c
:
b.c
// b.c
int my_add(int x, int y) {
return x + y;
}
a.c
// a.c
extern int my_add(int x, int y);
int foo(int val) {
return my_add(val, 5);
}
Compile these files to ELF object files using clang
with the appropriate target:
clang -c -target x86_64-linux-gnu b.c -o b.o
clang -c -target x86_64-linux-gnu a.c -o a.o
Ensure you have the necessary cross-compilation tools installed. On macOS, tools like brew install llvm
can provide the required clang
with cross-compilation capabilities.
Addressing Common Issues
Handling Non-ELF Object Files on macOS
If you attempt to run the linker on Mach-O object files, goblin
will fail to recognize them as ELF files, resulting in messages like:
Not an ELF file: b.o
Not an ELF file: a.o
To avoid this, ensure you’re using ELF-formatted object files by cross-compiling as shown above.
Resolving Slice Out-of-Bounds Errors
During the relocation process, you might encounter errors indicating that a slice index is out of bounds. For example:
thread 'main' panicked at src/main.rs:91:23:
range end index 131090 out of range for slice of length 65536
This occurs because the computed section_start
exceeds the size of the memory
buffer. To address this:
- Increase Memory Allocation: Allocate a larger buffer to accommodate higher load addresses.
let mut memory = vec![0u8; 2 * 1024 * 1024]; // 2 MB buffer
2. Adjust Section Placement: Instead of using sh.sh_addr
directly, manage section placement within the buffer to ensure they fit.
let mut place_offset = 0;
for sh in &obj.section_headers {
if sh.sh_size == 0 {
continue;
}
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if name == ".text" || name == ".data" || name == ".rodata" {
let section_start = place_offset;
let section_end = section_start + (sh.sh_size as usize);
if section_end > memory.len() {
panic!("Out of space in memory buffer!");
}
let file_offset = sh.sh_offset as usize;
let file_end = file_offset + (sh.sh_size as usize);
memory[section_start..section_end]
.copy_from_slice(&bytes[file_offset..file_end]);
println!("Copied section {} into memory offset {:#x}..{:#x}", name, section_start, section_end); place_offset = section_end; // Advance for the next section } } }
This adjustment ensures that sections are placed sequentially within the allocated memory, preventing out-of-bounds errors.
Ignoring Unrelated Relocations
Object files may contain relocation entries for sections like .rela.eh_frame
, which pertain to debugging and unwinding information. These relocations reference symbols that your toy linker doesn't handle, resulting in messages like:
Applying reloc @ 0x20020, sym_idx 2, type 2, addend=0
No symbol for index 2
To mitigate cluttering your output with these messages:
- Filter Out Specific Sections: Modify the relocation processing to skip sections related to debugging.
if name == ".rela.eh_frame" || name == ".rela.debug_info" {
println!("Skipping relocations in {}", name);
continue;
}
- Compile Without Debug Information: When compiling your C sources, disable debug and unwind tables.
clang -c -target x86_64-linux-gnu a.c -o a.o -fno-asynchronous-unwind-tables -fno-exceptions -g0
clang -c -target x86_64-linux-gnu b.c -o b.o -fno-asynchronous-unwind-tables -fno-exceptions -g0
This approach prevents the inclusion of relocation entries that your linker doesn’t process, resulting in cleaner output.
Running the Linker
With the code properly set up and object files in ELF format, running the linker should process the sections and apply relocations without errors. An example output might look like:
Loading file: b.o at base 0x20000
Copied section .text: 0x20000..0x20012
Copied section .data: 0x20100..0x201XX
Symbol 'my_add' exported at 0x20000 by b.o
Processing relocation section: .rela.text
Applying reloc @ 0x30014, sym_idx 4, type 4, addend=0
No symbol for index 4
Processing relocation section: .rela.eh_frame
Applying reloc @ 0x30020, sym_idx 2, type 2, addend=0
-> Patched 0x30020 with 0x20000 (symbol=my_add)
Done loading both libraries!
Global symbols known are:
- my_add => address 0x20000 (in file b.o)
- foo => address 0x30000 (in file a.o)
In this output:
- The linker successfully copies the
.text
and.data
sections fromb.o
anda.o
into the simulated memory. - It recognizes and exports the
my_add
symbol fromb.o
and thefoo
symbol froma.o
. - It applies relocations, correctly patching the reference from
a.o
'sfoo
function tob.o
'smy_add
function. - Warnings about unresolved symbols (like those related to
.rela.eh_frame
) indicate relocations that the linker chose to skip or couldn't resolve, which is expected in a simplified linker.
Enhancing the Toy Linker
While this example provides a foundational understanding of memory relocations and symbol resolution, real-world linkers handle a multitude of complexities beyond this scope:
- Handling Different Relocation Types: Various architectures and relocation types require specific handling logic.
- Managing Global Offset Tables (GOT) and Procedure Linkage Tables (PLT): These are essential for dynamic linking in larger systems.
- Symbol Versioning and Visibility: Advanced features that ensure symbols are correctly resolved across different versions and scopes.
- Memory Protection and Permissions: Ensuring that code and data segments have appropriate access rights (e.g., executable, writable).
To expand this toy linker, consider implementing additional features such as:
- Comprehensive Relocation Handling: Support more relocation types and architectures.
- Dynamic Linking Support: Allow linking with multiple shared libraries at runtime.
- Error Handling Enhancements: Provide more informative error messages and handle edge cases gracefully.
- Memory Management Improvements: Optimize how sections are placed in memory to better simulate real linker behavior.
By manually parsing ELF object files and applying relocations in Rust, we’ve built a simplified linker that demonstrates the core principles of symbol resolution and memory patching.
You can find the complete program on my Github repo here.
🚀 Discover More Free Software Engineering Content! 🌟
If you enjoyed this post, be sure to explore my new software engineering blog, packed with 200+ in-depth articles, 🎥 explainer videos, 🎙️ a weekly software engineering podcast, 📚 books, 💻 hands-on tutorials with GitHub code, including:
🌟 Developing a Fully Functional API Gateway in Rust — Discover how to set up a robust and scalable gateway that stands as the frontline for your microservices.
🌟 Implementing a Network Traffic Analyzer — Ever wondered about the data packets zooming through your network? Unravel their mysteries with this deep dive into network analysis.
🌟Implementing a Blockchain in Rust — a step-by-step breakdown of implementing a basic blockchain in Rust, from the initial setup of the block structure, including unique identifiers and cryptographic hashes, to block creation, mining, and validation, laying the groundwork.
And much more!
✅ 200+ In-depth software engineering articles
🎥 Explainer Videos — Explore Videos
🎙️ A brand-new weekly Podcast on all things software engineering — Listen to the Podcast
📚 Access to my books — Check out the Books
💻 Hands-on Tutorials with GitHub code
🚀 Mentoship Program
👉 Visit, explore, and subscribe for free to stay updated on all the latest: Home Page
LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here
🔗 Connect with Me:
- LinkedIn: Join my professional network for more insightful discussions and updates. Connect on LinkedIn
- X: Follow me on Twitter for quick updates and thoughts on Rust programming. Follow on Twitter
Wanna talk? Leave a comment or drop me a message!
All the best,
Luis Soares
luis@luissoares.dev
Lead Software Engineer | Blockchain & ZKP Protocol Engineer | 🦀 Rust | Web3 | Solidity | Golang | Cryptography | Author