Rust Parsing with Pest and Pest Derive Crates
Pest is a parsing library in Rust that emphasizes simplicity and performance. It uses Parsing Expression Grammar (PEG) as its foundation…
Pest is a parsing library in Rust that emphasizes simplicity and performance. It uses Parsing Expression Grammar (PEG) as its foundation. PEG is a way of describing a language in a set of rules. Pest makes it easier to define these rules and parse text according to them.
Key Features
- Performance: Pest is designed to be fast and efficient.
- Simplicity: Its syntax is easy to understand and use.
- Grammar as Code: Pest integrates directly with Rust code, offering a seamless development experience.
Understanding Pest Derive
pest_derive
is a procedural macro crate used alongside pest
. It allows you to define your grammar in a declarative way, right in your Rust code. The macro processes this grammar and generates the necessary parsing code.
Why Use Pest Derive?
- Ease of Use: Writing a parser by hand can be complex and error-prone.
pest_derive
automates this process. - Integration: It integrates tightly with the Rust language, offering a native development experience.
Core Components of Pest
- Rules: The basic building blocks of Pest grammar. Each rule corresponds to a pattern that the parser will try to match.
- Pairs: When parsing succeeds, Pest returns a
Pair
, which represents a portion of the parsed text along with its associated rule. - Parser: The engine that takes the rules and the input text and produces the parse output.
Getting Started
To use Pest and Pest Derive, add them to your Cargo.toml
:
[dependencies]
pest = "2.1"
pest_derive = "2.1"
Writing a Simple Grammar
Pest grammars are defined in a separate file with a .pest
extension. Here's a basic example:
alpha = { 'a'..'z' | 'A'..'Z' }
digit = { '0'..'9' }
This grammar defines two rules: alpha
for alphabetic characters and digit
for numeric characters.
Using Pest in Rust Code
Here’s how you can use Pest in a Rust program:
#[derive(Parser)]
#[grammar = "your_grammar.pest"] // path to your Pest grammar file
struct YourParser;
fn main() {
let successful_parse = YourParser::parse(Rule::alpha, "example");
// handle the parsing result
}
Advanced Grammar Concepts
Sequences and Choices
You can define sequences and choices in Pest:
sequence_rule = { part1 ~ part2 }
choice_rule = { option1 | option2 }
Optional and Repeating Patterns
Pest also supports optional and repeating patterns:
optional_rule = { "prefix"? ~ "main" }
repeating_rule = { "part"* }
Capturing Groups and Predicates
Capturing groups and predicates add more power:
capturing_group = { ("a" ~ "b")+ }
positive_predicate = &{ "start" }
Code Examples
Example 1: Basic Parsing
#[derive(Parser)]
#[grammar = "simple.pest"]
struct SimpleParser;
fn parse_input(input: &str) {
match SimpleParser::parse(Rule::alpha, input) {
Ok(pairs) => {
for pair in pairs {
println!("Matched: {}", pair.as_str());
}
},
Err(e) => println!("Error: {}", e),
}
}
Example 2: Handling Nested Structures
Consider a grammar with nested rules:
outer = { "start" ~ inner ~ "end" }
inner = _{ "inner" }
Parsing and iterating through nested structures:
fn process_nested(input: &str) {
let pairs = SimpleParser::parse(Rule::outer, input).unwrap_or_else(|e| panic!("{}", e));
for pair in pairs {
match pair.as_rule() {
Rule::inner => println!("Inner: {}", pair.into_inner().as_str()),
_ => println!("Outer: {}", pair.as_str()),
}
}
}
Example 3: A bit more complex working example
Let’s implement a simple application that parses a custom configuration file format. Our configuration files will have key-value pairs and sections.
- Define the Configuration File Format
Our configuration file will have the following format:
- Sections are denoted by square brackets
[section]
. - Key-value pairs within sections are in the format
key = value
.
2. Set Up the Rust Project
First, you need to create a new Rust project and add dependencies.
cargo new pest_example
cd pest_example
Add Pest and Pest Derive to your Cargo.toml
:
[dependencies]
pest = "2.1"
pest_derive = "2.1"
3. Define the Grammar
Create a file named config.pest
in the root of your project with the following grammar:
file = _{ SOI ~ (section | comment)* ~ EOI }
section = { "[" ~ section_name ~ "]" ~ newline* ~ pair* }
section_name = @{ ASCII_ALPHANUMERIC+ }
pair = { key ~ "=" ~ value ~ newline* }
key = @{ ASCII_ALPHANUMERIC+ }
value = @{ (!newline ~ ANY)* }
comment = _{ "#" ~ (!newline ~ ANY)* ~ newline* }
newline = _{ "\r\n" | "\n" }
WHITESPACE = _{ " " | "\t" }
4. Implement the Parser in Rust
Now, let’s write the Rust code to parse the configuration file.
Create a file src/main.rs
:
#[macro_use]
extern crate pest_derive;
use pest::Parser;
use std::collections::HashMap;
use std::fs;
#[derive(Parser)]
#[grammar = "config.pest"]
pub struct ConfigParser;
fn main() {
let unparsed_file = fs::read_to_string("example.config")
.expect("cannot read file");
let file = ConfigParser::parse(Rule::file, &unparsed_file)
.expect("unsuccessful parse") // unwrap the parse result
.next().unwrap(); // get and unwrap the `file` rule; never fails
let mut sections = HashMap::new();
for record in file.into_inner() {
match record.as_rule() {
Rule::section => {
let mut inner_rules = record.into_inner();
let section_name = inner_rules.next().unwrap().as_str();
let mut pairs = HashMap::new();
for pair in inner_rules {
let mut inner_pair = pair.into_inner();
let key = inner_pair.next().unwrap().as_str();
let value = inner_pair.next().unwrap().as_str();
pairs.insert(key, value);
}
sections.insert(section_name, pairs);
}
Rule::EOI => (),
_ => unreachable!(),
}
}
println!("{:?}", sections);
}
5. Create an Example Configuration File
Create an example.config
file in the root of your project with the following content:
[general]
name = ExampleApp
version = 1.0
[database]
host = localhost
port = 5432
6. Build and Run the Application
Now, you can build and run your application:
cargo run
When you run the Rust application with the provided example.config
file, you can expect the following output:
{
"general": {
"name": "ExampleApp",
"version": "1.0"
},
"database": {
"host": "localhost",
"port": "5432"
}
}
This output is a representation of the parsed configuration file, formatted as a Rust HashMap
. Here's a breakdown of what this output signifies:
- Top-Level HashMap: The outermost
{}
encloses aHashMap
. Each entry in this map represents a section in your configuration file. - Sections as Keys:
"general"
and"database"
are keys in thisHashMap
. These correspond to the section names defined in yourexample.config
file. In Pest, these were captured by thesection_name
rule in your grammar. - Nested HashMaps for Sections: Each section key (
"general"
and"database"
) maps to anotherHashMap
. This nestedHashMap
represents the key-value pairs within that section. - Key-Value Pairs within Sections: Inside each section’s
HashMap
, the keys and values are the parsed contents of your configuration file. For example, under the"general"
section, there are two entries:"name": "ExampleApp"
and"version": "1.0"
. These are the key-value pairs defined in the configuration file under the[general]
section. - String Representation: Notice that both keys and values are strings (
"name"
,"ExampleApp"
, etc.). This is because the parser treats all parsed content as strings. If you need different data types (like integers for theport
), you would need to convert them after parsing.
This output demonstrates how the Pest parser successfully translated the structured text of a configuration file into a structured Rust data type (HashMap<String, HashMap<String, String>>
). This is useful for applications where configuration files need to be read and their contents programmatically accessed.
🚀 Explore a Wealth of Resources in Software Development and More by Luis Soares
📚 Learning Hub: Expand your knowledge in various tech domains, including Rust, Software Development, Cloud Computing, Cyber Security, Blockchain, and Linux, through my extensive resource collection:
- Hands-On Tutorials with GitHub Repos: Gain practical skills across different technologies with step-by-step tutorials, complemented by dedicated GitHub repositories. Access Tutorials
- In-Depth Guides & Articles: Deep dive into core concepts of Rust, Software Development, Cloud Computing, and more, with detailed guides and articles filled with practical examples. Read More
- E-Books Collection: Enhance your understanding of various tech fields with a series of free e-Books, including titles like “Mastering Rust Ownership” and “Application Security Guide” Download eBook
- Project Showcases: Discover a range of fully functional projects across different domains, such as an API Gateway, Blockchain Network, Cyber Security Tools, Cloud Services, and more. View Projects
- LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here
🔗 Connect with Me:
- Medium: Read my articles on Medium and give claps if you find them helpful. It motivates me to keep writing and sharing Rust content. Follow on Medium
- Personal Blog: Discover more on my personal blog, a hub for all my Rust-related content. Visit Blog
- LinkedIn: Join my professional network for more insightful discussions and updates. Connect on LinkedIn
- Twitter: Follow me on Twitter for quick updates and thoughts on Rust programming. Follow on Twitter
Wanna talk? Leave a comment or drop me a message!
All the best,
Luis Soares
luis.soares@linux.com
Senior Software Engineer | Cloud Engineer | SRE | Tech Lead | Rust | Golang | Java | ML AI & Statistics | Web3 & Blockchain