RogerBW's Blog: The Weekly Challenge 259: Banking Parser

I’ve been doing the Weekly Challenges. The latest involved date offsets and parser construction. (Note that this ends today.)

Task 1: Banking Day Offset

You are given a start date and offset counter. Optionally you also get bank holiday date list.

Given a number (of days) and a start date, return the number (of days) adjusted to take into account non-banking days. In other words: convert a banking day offset to a calendar day offset.

Non-banking days are: a) Weekends b) Bank holidays

This clearly has functionality in common with 178 part 2, though it doesn't care about time of day. Almost every language I'm using has a date class that can handle day of the week calculations (there's an external library for Lua, though I didn't bother with it here, and I've written my own for PostScript).

Date representations turned out to be hashable in everything except JavaScript.

In Raku: a date parser utility function.

sub parsedate($s) {
    $s ~~ /(<[0..9]>+)\D(<[0..9]>+)\D(<[0..9]>+)/;
    return Date.new($0, $1, $2);
}

sub bankingdayoffset($start, $offset, @bankholidays) {

First, build a set of bank holidays and initialise the working date.

    my $bh = Set(@bankholidays.map({parsedate($_)}));
    my $current = parsedate($start);

Step forward offset days, one at a time.

    for (1 .. $offset) {
        $current = $current.later(days => 1);

If the current date is a bank holiday or a weekend day, step forward until it isn't.

        while ($bh{$current}:exists || $current.day-of-week > 5) {
            $current = $current.later(days => 1);
        }
    }

Format and return the result.

    return $current.yyyy-mm-dd;
}

Task 2: Line Parser

You are given a line like below:

{% id field1="value1" field2="value2" field3=42 %}

Where a) "id" can be \w+. b) There can be 0 or more field-value pairs. c) The name of the fields are \w+. b) The values are either number in which case we don't need parentheses or string in which case we need parentheses around them.

The line parser should return structure like below:

{ name => id, fields => { field1 => value1, field2 => value2, field3 => value3, } }

I lost all enthusiasm for doing this in anything except Rust, where all the structs and enums I want come readily to my hand. Yeah, I'm sure I could do it in other languages, but it just didn't feel enjoyable. Even in PostScript.

(Also if I wanted to do this in real life I'd use a parser library such as winnow for Rust. Which would also be hard work but at least wouldn't break randomly later.)

First I need a data structure for the output.

#[derive(PartialEq, Debug)]
pub struct Lump {
    id: String,
    fields: HashMap<String, String>,
}

The basic approach is a state machine, so we'll need some states.

#[derive(PartialEq, Debug)]
enum State {
    Outside,
    PreID,
    InID,
    InterField,
    FieldName,
    FieldValue,
    FieldValueQuoted,
}

fn lineparser(line: &str) -> Lump {

Split the line into chars and initialise the state machine.

    let mut l = line.chars().collect::<VecDeque<_>>();
    let mut state = State::Outside;

Some convenience variables to track items in progress.

    let mut trail: Vec<char> = Vec::new();
    let mut fieldname = "".to_string();

The output structure.

    let mut out = Lump { id: "".to_string(), fields: HashMap::new() };

Loop over the characters.

    while l.len() > 0 {
        let mut c = l.pop_front().unwrap();

We're outside and saw a start-entry character

        if state == State::Outside && c == '{' {
            c = l.pop_front().unwrap();

And it was followed by the other half of the start-entry sequence, so look for ID.

            if c == '%' {
                state = State::PreID;
            }

We're looking for an ID (or already in one) and saw a non-space. Store it and move to ID-appending state.

        } else if (state == State::PreID || state == State::InID) && c != ' ' {
            trail.push(c);
            state = State::InID;

We're appending ID and found a space. Stow that value and start looking for fields.

        } else if state == State::InID && c == ' ' {
            out.id = trail.into_iter().collect();
            trail = Vec::new();
            state = State::InterField;

Looking for field names, or already within one, and got a useful character: append it.

        } else if (state == State::InterField || state == State::FieldName)
            && c != ' '
            && c != '='
            && c != '%'
        {
            trail.push(c);
            state = State::FieldName;

Found the end of a field name.

        } else if state == State::FieldName && c == '=' {
            fieldname = trail.into_iter().collect();
            trail = Vec::new();
            state = State::FieldValue;

We don't have a field value, but we find a quote: note it as a quoted value.

        } else if state == State::FieldValue && trail.len() == 0 && c == '"' {
            state = State::FieldValueQuoted;

In a field value.

        } else if state == State::FieldValue || state == State::FieldValueQuoted
        {

Handle escaped characters.

            let mut literal = false;
            if c == '\\' {
                c = l.pop_front().unwrap();
                literal = true;
            }

If we're in an unquoted field value and we get a space, or we have a non-literal quotation mark and we're in a quoted field value, store and look for the next field.

            if (c == ' ' && state == State::FieldValue)
                || (c == '"' && state == State::FieldValueQuoted && !literal)
            {
                out.fields
                    .insert(fieldname.clone(), trail.into_iter().collect());
                trail = Vec::new();
                state = State::InterField;
            } else {

Otherwise just append to the current value.

                trail.push(c);
            }
        }
    }

Return the structure.

    out
}

This is not a full validator; it'll allow all sorts of illiegal combinations (such as a field value that's neither quoted nor numeric). It doesn't even look for the end tag. But, apologies to the problem setter, it just didn't feel like fun to tweak it further.

Full code on github.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.