RogerBW's Blog

The Weekly Challenge 259: Banking Parser 10 March 2024

I’ve been doing the Weekly Challenges. The latest involved date offsets and parser construction. (Note that this ends today.)

Task 1: Banking Day Offset

You are given a start date and offset counter. Optionally you also get bank holiday date list.

Given a number (of days) and a start date, return the number (of days) adjusted to take into account non-banking days. In other words: convert a banking day offset to a calendar day offset.

Non-banking days are: a) Weekends b) Bank holidays

This clearly has functionality in common with 178 part 2, though it doesn't care about time of day. Almost every language I'm using has a date class that can handle day of the week calculations (there's an external library for Lua, though I didn't bother with it here, and I've written my own for PostScript).

Date representations turned out to be hashable in everything except JavaScript.

In Raku: a date parser utility function.

sub parsedate($s) {
    $s ~~ /(<[0..9]>+)\D(<[0..9]>+)\D(<[0..9]>+)/;
    return Date.new($0, $1, $2);
}

sub bankingdayoffset($start, $offset, @bankholidays) {

First, build a set of bank holidays and initialise the working date.

    my $bh = Set(@bankholidays.map({parsedate($_)}));
    my $current = parsedate($start);

Step forward offset days, one at a time.

    for (1 .. $offset) {
        $current = $current.later(days => 1);

If the current date is a bank holiday or a weekend day, step forward until it isn't.

        while ($bh{$current}:exists || $current.day-of-week > 5) {
            $current = $current.later(days => 1);
        }
    }

Format and return the result.

    return $current.yyyy-mm-dd;
}

Task 2: Line Parser

You are given a line like below:

{% id field1="value1" field2="value2" field3=42 %}

Where a) "id" can be \w+. b) There can be 0 or more field-value pairs. c) The name of the fields are \w+. b) The values are either number in which case we don't need parentheses or string in which case we need parentheses around them.

The line parser should return structure like below:

{ name => id, fields => { field1 => value1, field2 => value2, field3 => value3, } }

I lost all enthusiasm for doing this in anything except Rust, where all the structs and enums I want come readily to my hand. Yeah, I'm sure I could do it in other languages, but it just didn't feel enjoyable. Even in PostScript.

(Also if I wanted to do this in real life I'd use a parser library such as winnow for Rust. Which would also be hard work but at least wouldn't break randomly later.)

First I need a data structure for the output.

#[derive(PartialEq, Debug)]
pub struct Lump {
    id: String,
    fields: HashMap<String, String>,
}

The basic approach is a state machine, so we'll need some states.

#[derive(PartialEq, Debug)]
enum State {
    Outside,
    PreID,
    InID,
    InterField,
    FieldName,
    FieldValue,
    FieldValueQuoted,
}

fn lineparser(line: &str) -> Lump {

Split the line into chars and initialise the state machine.

    let mut l = line.chars().collect::<VecDeque<_>>();
    let mut state = State::Outside;

Some convenience variables to track items in progress.

    let mut trail: Vec<char> = Vec::new();
    let mut fieldname = "".to_string();

The output structure.

    let mut out = Lump { id: "".to_string(), fields: HashMap::new() };

Loop over the characters.

    while l.len() > 0 {
        let mut c = l.pop_front().unwrap();

We're outside and saw a start-entry character

        if state == State::Outside && c == '{' {
            c = l.pop_front().unwrap();

And it was followed by the other half of the start-entry sequence, so look for ID.

            if c == '%' {
                state = State::PreID;
            }

We're looking for an ID (or already in one) and saw a non-space. Store it and move to ID-appending state.

        } else if (state == State::PreID || state == State::InID) && c != ' ' {
            trail.push(c);
            state = State::InID;

We're appending ID and found a space. Stow that value and start looking for fields.

        } else if state == State::InID && c == ' ' {
            out.id = trail.into_iter().collect();
            trail = Vec::new();
            state = State::InterField;

Looking for field names, or already within one, and got a useful character: append it.

        } else if (state == State::InterField || state == State::FieldName)
            && c != ' '
            && c != '='
            && c != '%'
        {
            trail.push(c);
            state = State::FieldName;

Found the end of a field name.

        } else if state == State::FieldName && c == '=' {
            fieldname = trail.into_iter().collect();
            trail = Vec::new();
            state = State::FieldValue;

We don't have a field value, but we find a quote: note it as a quoted value.

        } else if state == State::FieldValue && trail.len() == 0 && c == '"' {
            state = State::FieldValueQuoted;

In a field value.

        } else if state == State::FieldValue || state == State::FieldValueQuoted
        {

Handle escaped characters.

            let mut literal = false;
            if c == '\\' {
                c = l.pop_front().unwrap();
                literal = true;
            }

If we're in an unquoted field value and we get a space, or we have a non-literal quotation mark and we're in a quoted field value, store and look for the next field.

            if (c == ' ' && state == State::FieldValue)
                || (c == '"' && state == State::FieldValueQuoted && !literal)
            {
                out.fields
                    .insert(fieldname.clone(), trail.into_iter().collect());
                trail = Vec::new();
                state = State::InterField;
            } else {

Otherwise just append to the current value.

                trail.push(c);
            }
        }
    }

Return the structure.

    out
}

This is not a full validator; it'll allow all sorts of illiegal combinations (such as a field value that's neither quoted nor numeric). It doesn't even look for the end tag. But, apologies to the problem setter, it just didn't feel like fun to tweak it further.

Full code on github.

See also:
The Weekly Challenge 178: Imaginary Date

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 3d printing action advent of code aeronautics aikakirja anecdote animation anime army astronomy audio audio tech base commerce battletech beer boardgaming book of the week bookmonth chain of command children chris chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 crime crystal cthulhu eternal cycling dead of winter doctor who documentary drama driving drone ecchi economics en garde espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 essen 2022 essen 2023 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo genesys geocaching geodata gin gkp gurps gurps 101 gus harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo 2021 hugo 2022 hugo 2023 hugo 2024 hugo-nebula reread in brief avoid instrumented life javascript julian simpson julie enfield kickstarter kotlin learn to play leaving earth linux liquor lovecraftiana lua mecha men with beards mpd museum music mystery naval noir non-fiction one for the brow opera parody paul temple perl perl weekly challenge photography podcast politics postscript powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs ruby rust scala science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance the weekly challenge thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1