Rust Parsing with Pest and Pest Derive Crates

Pest is a parsing library in Rust that emphasizes simplicity and performance. It uses Parsing Expression Grammar (PEG) as its foundation…

Rust Parsing with Pest and Pest Derive Crates

Pest is a parsing library in Rust that emphasizes simplicity and performance. It uses Parsing Expression Grammar (PEG) as its foundation. PEG is a way of describing a language in a set of rules. Pest makes it easier to define these rules and parse text according to them.

Key Features

  • Performance: Pest is designed to be fast and efficient.
  • Simplicity: Its syntax is easy to understand and use.
  • Grammar as Code: Pest integrates directly with Rust code, offering a seamless development experience.

Understanding Pest Derive

pest_derive is a procedural macro crate used alongside pest. It allows you to define your grammar in a declarative way, right in your Rust code. The macro processes this grammar and generates the necessary parsing code.

Why Use Pest Derive?

  • Ease of Use: Writing a parser by hand can be complex and error-prone. pest_derive automates this process.
  • Integration: It integrates tightly with the Rust language, offering a native development experience.

Core Components of Pest

  1. Rules: The basic building blocks of Pest grammar. Each rule corresponds to a pattern that the parser will try to match.
  2. Pairs: When parsing succeeds, Pest returns a Pair, which represents a portion of the parsed text along with its associated rule.
  3. Parser: The engine that takes the rules and the input text and produces the parse output.

Getting Started

To use Pest and Pest Derive, add them to your Cargo.toml:

[dependencies] 
pest = "2.1" 
pest_derive = "2.1"

Writing a Simple Grammar

Pest grammars are defined in a separate file with a .pest extension. Here's a basic example:

alpha = { 'a'..'z' | 'A'..'Z' } 
digit = { '0'..'9' }

This grammar defines two rules: alpha for alphabetic characters and digit for numeric characters.

Using Pest in Rust Code

Here’s how you can use Pest in a Rust program:

#[derive(Parser)] 
#[grammar = "your_grammar.pest"] // path to your Pest grammar file 
struct YourParser; 
 
fn main() { 
    let successful_parse = YourParser::parse(Rule::alpha, "example"); 
    // handle the parsing result 
}

Advanced Grammar Concepts

Sequences and Choices

You can define sequences and choices in Pest:

sequence_rule = { part1 ~ part2 } 
choice_rule = { option1 | option2 }

Optional and Repeating Patterns

Pest also supports optional and repeating patterns:

optional_rule = { "prefix"? ~ "main" } 
repeating_rule = { "part"* }

Capturing Groups and Predicates

Capturing groups and predicates add more power:

capturing_group = { ("a" ~ "b")+ } 
positive_predicate = &{ "start" }

Code Examples

Example 1: Basic Parsing

#[derive(Parser)] 
#[grammar = "simple.pest"] 
struct SimpleParser; 
 
fn parse_input(input: &str) { 
    match SimpleParser::parse(Rule::alpha, input) { 
        Ok(pairs) => { 
            for pair in pairs { 
                println!("Matched: {}", pair.as_str()); 
            } 
        }, 
        Err(e) => println!("Error: {}", e), 
    } 
}

Example 2: Handling Nested Structures

Consider a grammar with nested rules:

outer = { "start" ~ inner ~ "end" } 
inner = _{ "inner" }

Parsing and iterating through nested structures:

fn process_nested(input: &str) { 
    let pairs = SimpleParser::parse(Rule::outer, input).unwrap_or_else(|e| panic!("{}", e)); 
    for pair in pairs { 
        match pair.as_rule() { 
            Rule::inner => println!("Inner: {}", pair.into_inner().as_str()), 
            _ => println!("Outer: {}", pair.as_str()), 
        } 
    } 
}

Example 3: A bit more complex working example

Let’s implement a simple application that parses a custom configuration file format. Our configuration files will have key-value pairs and sections.

  1. Define the Configuration File Format

Our configuration file will have the following format:

  • Sections are denoted by square brackets [section].
  • Key-value pairs within sections are in the format key = value.

2. Set Up the Rust Project

First, you need to create a new Rust project and add dependencies.

cargo new pest_example 
cd pest_example

Add Pest and Pest Derive to your Cargo.toml:

[dependencies] 
pest = "2.1" 
pest_derive = "2.1"

3. Define the Grammar
Create a file named config.pest in the root of your project with the following grammar:

file = _{ SOI ~ (section | comment)* ~ EOI } 
section = { "[" ~ section_name ~ "]" ~ newline* ~ pair* } 
section_name = @{ ASCII_ALPHANUMERIC+ } 
pair = { key ~ "=" ~ value ~ newline* } 
key = @{ ASCII_ALPHANUMERIC+ } 
value = @{ (!newline ~ ANY)* } 
comment = _{ "#" ~ (!newline ~ ANY)* ~ newline* } 
newline = _{ "\r\n" | "\n" } 
WHITESPACE = _{ " " | "\t" }

4. Implement the Parser in Rust
Now, let’s write the Rust code to parse the configuration file.

Create a file src/main.rs:

#[macro_use] 
extern crate pest_derive; 
 
use pest::Parser; 
use std::collections::HashMap; 
use std::fs; 
#[derive(Parser)] 
#[grammar = "config.pest"] 
pub struct ConfigParser; 
fn main() { 
    let unparsed_file = fs::read_to_string("example.config") 
        .expect("cannot read file"); 
    let file = ConfigParser::parse(Rule::file, &unparsed_file) 
        .expect("unsuccessful parse") // unwrap the parse result 
        .next().unwrap(); // get and unwrap the `file` rule; never fails 
    let mut sections = HashMap::new(); 
    for record in file.into_inner() { 
        match record.as_rule() { 
            Rule::section => { 
                let mut inner_rules = record.into_inner();  
                let section_name = inner_rules.next().unwrap().as_str(); 
                let mut pairs = HashMap::new(); 
                for pair in inner_rules { 
                    let mut inner_pair = pair.into_inner();  
                    let key = inner_pair.next().unwrap().as_str(); 
                    let value = inner_pair.next().unwrap().as_str(); 
                    pairs.insert(key, value); 
                } 
                sections.insert(section_name, pairs); 
            } 
            Rule::EOI => (), 
            _ => unreachable!(), 
        } 
    } 
    println!("{:?}", sections); 
}

5. Create an Example Configuration File
Create an example.config file in the root of your project with the following content:

[general] 
name = ExampleApp 
version = 1.0 
 
[database] 
host = localhost 
port = 5432

6. Build and Run the Application
Now, you can build and run your application:

cargo run

When you run the Rust application with the provided example.config file, you can expect the following output:

{ 
    "general": { 
        "name": "ExampleApp", 
        "version": "1.0" 
    }, 
    "database": { 
        "host": "localhost", 
        "port": "5432" 
    } 
}

This output is a representation of the parsed configuration file, formatted as a Rust HashMap. Here's a breakdown of what this output signifies:

  1. Top-Level HashMap: The outermost {} encloses a HashMap. Each entry in this map represents a section in your configuration file.
  2. Sections as Keys: "general" and "database" are keys in this HashMap. These correspond to the section names defined in your example.config file. In Pest, these were captured by the section_name rule in your grammar.
  3. Nested HashMaps for Sections: Each section key ("general" and "database") maps to another HashMap. This nested HashMap represents the key-value pairs within that section.
  4. Key-Value Pairs within Sections: Inside each section’s HashMap, the keys and values are the parsed contents of your configuration file. For example, under the "general" section, there are two entries: "name": "ExampleApp" and "version": "1.0". These are the key-value pairs defined in the configuration file under the [general] section.
  5. String Representation: Notice that both keys and values are strings ("name", "ExampleApp", etc.). This is because the parser treats all parsed content as strings. If you need different data types (like integers for the port), you would need to convert them after parsing.

This output demonstrates how the Pest parser successfully translated the structured text of a configuration file into a structured Rust data type (HashMap<String, HashMap<String, String>>). This is useful for applications where configuration files need to be read and their contents programmatically accessed.

🚀 Explore a Wealth of Resources in Software Development and More by Luis Soares

📚 Learning Hub: Expand your knowledge in various tech domains, including Rust, Software Development, Cloud Computing, Cyber Security, Blockchain, and Linux, through my extensive resource collection:

  • Hands-On Tutorials with GitHub Repos: Gain practical skills across different technologies with step-by-step tutorials, complemented by dedicated GitHub repositories. Access Tutorials
  • In-Depth Guides & Articles: Deep dive into core concepts of Rust, Software Development, Cloud Computing, and more, with detailed guides and articles filled with practical examples. Read More
  • E-Books Collection: Enhance your understanding of various tech fields with a series of free e-Books, including titles like “Mastering Rust Ownership” and “Application Security Guide” Download eBook
  • Project Showcases: Discover a range of fully functional projects across different domains, such as an API Gateway, Blockchain Network, Cyber Security Tools, Cloud Services, and more. View Projects
  • LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here

🔗 Connect with Me:

  • Medium: Read my articles on Medium and give claps if you find them helpful. It motivates me to keep writing and sharing Rust content. Follow on Medium
  • Personal Blog: Discover more on my personal blog, a hub for all my Rust-related content. Visit Blog
  • LinkedIn: Join my professional network for more insightful discussions and updates. Connect on LinkedIn
  • Twitter: Follow me on Twitter for quick updates and thoughts on Rust programming. Follow on Twitter

Wanna talk? Leave a comment or drop me a message!

All the best,

Luis Soares
luis.soares@linux.com

Senior Software Engineer | Cloud Engineer | SRE | Tech Lead | Rust | Golang | Java | ML AI & Statistics | Web3 & Blockchain

Read more