I've been making a little more progress with text parsing in Rust
using the winnow
library. Today I will parse the input format (and
thus provide a very limited spoiler) for Advent of Code 2015 day 16.
I'm still not claiming this is the best way to do things, but
it's a way that works.
An example line from my puzzle input is:
Sue 1: cars: 9, akitas: 3, goldfish: 0
After the ID number, each comma-separated item is a key-value pair,
which I want to put into a HashMap. Here's the target structure:
struct Sue {
id: u32,
attr: HashMap<String, u32>,
}
Here I'll parse a set of attributes, separated by commas. Lifetimes in
Rust are still a very new thing to me, but as I understand it the
basic idea here is to say "the output is a reference to parts of the
input, so the input must be kept allocated until I've finished with
the output".
separated()
gets me a sequence of things with a common separator: in
this case I want one or more key-value pairs, separated by a comma and
at least one space.
separated_pair()
looks at just one of those pairs, a word and a
number separated by a colon and at least one space.
Specifying the output type in the function template lets Rust assemble
everything behind the scenes.
fn parse_attributes<'a>(
input: &mut &'a str,
) -> ModalResult<HashMap<&'a str, u32>> {
separated(
1..,
separated_pair(alpha1, (":", space1), dec_uint),
(",", space1),
)
.parse_next(input)
}
The line parser looks for the "Sue (number):" part of the line, and
extracts the number, then throws parse_attribute at the rest.
seq!
is a macro that lets me specify several fields to parse in a
row, but discard some of them (in this case the fixed text "Sue " and
": ").
fn parse_line(input: &mut &str) -> ModalResult<Sue> {
let c = seq!(
_: "Sue ",
dec_uint,
_: ": ",
parse_attributes,
)
.parse_next(input)?;
Then because I don't want to have to preserve the input when I've
finished parsing I copy all the &str references to distinct Strings to
go in the output HashMap. (This is a thing that serious Rust people
seem to regard as Bad, and I can see the inefficiency, but this is a
pretty tiny problem.)
let mut p: HashMap<String, u32> = HashMap::new();
for (k, v) in c.1.iter() {
p.insert(k.to_string(), *v);
}
Ok(Sue { id: c.0, attr: p })
}
The usual includes are needed at the top, in this case:
use winnow::ascii::{alpha1. dec_uint, space1};
use winnow::combinator::{separated, separated_pair, seq};
use winnow::ModalResult;
use winnow::Parser;