I’ve been doing the Weekly
Challenges. The
latest
involved date offsets and parser construction. (Note that this ends today.)
Task 1: Banking Day Offset
You are given a start date and offset counter. Optionally you also
get bank holiday date list.
Given a number (of days) and a start date, return the number (of
days) adjusted to take into account non-banking days. In other
words: convert a banking day offset to a calendar day offset.
Non-banking days are: a) Weekends b) Bank holidays
This clearly has functionality in common with 178 part 2, though it
doesn't care about time of day. Almost every language I'm using has a
date class that can handle day of the week calculations (there's an
external library for Lua, though I didn't bother with it here, and
I've written my own for PostScript).
Date representations turned out to be hashable in everything except
JavaScript.
In Raku: a date parser utility function.
sub parsedate($s) {
$s ~~ /(<[0..9]>+)\D(<[0..9]>+)\D(<[0..9]>+)/;
return Date.new($0, $1, $2);
}
sub bankingdayoffset($start, $offset, @bankholidays) {
First, build a set of bank holidays and initialise the working date.
my $bh = Set(@bankholidays.map({parsedate($_)}));
my $current = parsedate($start);
Step forward offset
days, one at a time.
for (1 .. $offset) {
$current = $current.later(days => 1);
If the current date is a bank holiday or a weekend day, step forward
until it isn't.
while ($bh{$current}:exists || $current.day-of-week > 5) {
$current = $current.later(days => 1);
}
}
Format and return the result.
return $current.yyyy-mm-dd;
}
Task 2: Line Parser
You are given a line like below:
{% id field1="value1" field2="value2" field3=42 %}
Where a) "id" can be \w+. b) There can be 0 or more field-value
pairs. c) The name of the fields are \w+. b) The values are either
number in which case we don't need parentheses or string in which
case we need parentheses around them.
The line parser should return structure like below:
{ name => id, fields => { field1 => value1, field2 => value2, field3 => value3, } }
I lost all enthusiasm for doing this in anything except Rust, where
all the structs and enums I want come readily to my hand. Yeah, I'm
sure I could do it in other languages, but it just didn't feel
enjoyable. Even in PostScript.
(Also if I wanted to do this in real life I'd use a parser library
such as winnow
for Rust. Which would also be hard work but at least
wouldn't break randomly later.)
First I need a data structure for the output.
#[derive(PartialEq, Debug)]
pub struct Lump {
id: String,
fields: HashMap<String, String>,
}
The basic approach is a state machine, so we'll need some states.
#[derive(PartialEq, Debug)]
enum State {
Outside,
PreID,
InID,
InterField,
FieldName,
FieldValue,
FieldValueQuoted,
}
fn lineparser(line: &str) -> Lump {
Split the line into chars and initialise the state machine.
let mut l = line.chars().collect::<VecDeque<_>>();
let mut state = State::Outside;
Some convenience variables to track items in progress.
let mut trail: Vec<char> = Vec::new();
let mut fieldname = "".to_string();
The output structure.
let mut out = Lump { id: "".to_string(), fields: HashMap::new() };
Loop over the characters.
while l.len() > 0 {
let mut c = l.pop_front().unwrap();
We're outside and saw a start-entry character
if state == State::Outside && c == '{' {
c = l.pop_front().unwrap();
And it was followed by the other half of the start-entry sequence, so
look for ID.
if c == '%' {
state = State::PreID;
}
We're looking for an ID (or already in one) and saw a non-space. Store
it and move to ID-appending state.
} else if (state == State::PreID || state == State::InID) && c != ' ' {
trail.push(c);
state = State::InID;
We're appending ID and found a space. Stow that value and start
looking for fields.
} else if state == State::InID && c == ' ' {
out.id = trail.into_iter().collect();
trail = Vec::new();
state = State::InterField;
Looking for field names, or already within one, and got a useful
character: append it.
} else if (state == State::InterField || state == State::FieldName)
&& c != ' '
&& c != '='
&& c != '%'
{
trail.push(c);
state = State::FieldName;
Found the end of a field name.
} else if state == State::FieldName && c == '=' {
fieldname = trail.into_iter().collect();
trail = Vec::new();
state = State::FieldValue;
We don't have a field value, but we find a quote: note it as a quoted value.
} else if state == State::FieldValue && trail.len() == 0 && c == '"' {
state = State::FieldValueQuoted;
In a field value.
} else if state == State::FieldValue || state == State::FieldValueQuoted
{
Handle escaped characters.
let mut literal = false;
if c == '\\' {
c = l.pop_front().unwrap();
literal = true;
}
If we're in an unquoted field value and we get a space, or we have a
non-literal quotation mark and we're in a quoted field value, store
and look for the next field.
if (c == ' ' && state == State::FieldValue)
|| (c == '"' && state == State::FieldValueQuoted && !literal)
{
out.fields
.insert(fieldname.clone(), trail.into_iter().collect());
trail = Vec::new();
state = State::InterField;
} else {
Otherwise just append to the current value.
trail.push(c);
}
}
}
Return the structure.
out
}
This is not a full validator; it'll allow all sorts of illiegal
combinations (such as a field value that's neither quoted nor
numeric). It doesn't even look for the end tag. But, apologies to the
problem setter, it just didn't feel like fun to tweak it further.
Full code on
github.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.