RogerBW's Blog

Perl Weekly Challenge 38: date parsing and word games 18 December 2019

I’ve been doing the Perl Weekly Challenges. Last week's was about parsing an odd date format and generating words in a Scrabble-ish manner.

The date parser is mostly about validating the parameters.

foreach my $dc (@ARGV) {
  unless (length($dc)==7) {
    warn "$dc is wrong length\n";
    next;
  }
  unless ($dc =~ /^[0-9]+$/) {
    warn "$dc has non-digit characters\n";
    next;
  }

That's as much as we can check without breaking it down into individual fields…

  $dc =~ /^(.)(..)(..)(..)$/;
  my ($cen,$year,$month,$day)=($1,$2,$3,$4);
  if ($cen==2) {
    $year+=1900;
  } elsif ($cen==1) {
    $year+=2000;
  } else {
    warn "$dc has invalid century digit $cen\n";
    next;
  }

Yes, that really is how the century parameter is meant to be interpreted.

Any pair of digits is valid for a year, so validate the month…

  if ($month<1 || $month>12) {
    warn "$dc has invalid month $month\n";
    next;
  }

and now try to parse what we have as a date, which will give an undefined value if the day number is wrong; so we test that rather than worrying about calculating month lengths ourselves.

  my $d;
  eval {
    $d=timelocal(0,0,12,$day,$month-1,$year);
  };
  unless (defined $d) {
    warn "$dc has invalid day $day\n";
    next;
  }

Then dump the output in the required format.

  print strftime('%Y-%m-%d',localtime($d)),"\n";
}

In Perl6 this is mostly similar until the end, where the exception handling is different. I don't want the program to die with its own error; I want it to show mine, and continue in case there are more dates to check.

  my $d;
  CATCH {
    $d=Date.new($year,$month,$day);
  }
  unless (defined $d) {
    warn "$dc has invalid day $day\n";
    next;
  }

Finally, Perl6 date formatting. (Yeah, but where's my strftime?)

  say $d.yyyy-mm-dd;

The word game problem gives a series of tile counts and values (e.g. 8 A tiles worth 1 point each); the problem is to draw 7 at random, then make the highest-scoring word out of them (not necessarily using all of them).

my $l=7;

my %tilecount=(
  A => 8,

[...] );

my %tilevalue=(
  A => 1,

[...] );

For the draw, we turn the tilecount hash into a single long list, shuffle it, and throw away all but the first 7 entries. (shuffle is from List::Util.)

my @bag=shuffle map {($_) x $tilecount{$_}} keys %tilecount;
splice @bag,$l;

Load a candidate word list. (I used a copy of SOWPODS I have hanging around, which is my standard reference for word validation problems.) We ignore anything longer than 7 characters.

my %w;
open I,'<','wordlist';
while (<I>) {
  chomp;
  if (length($_)<=$l) {
    $w{uc($_)}=1;
  }
}
close I;

Set up the accumulators for tracking high-scoring words.

my $maxscore=0;
my @maxcandidate;
my %tried;

Set up the variables for my permutation generator (I'll be using the Steinhaus-Johnson-Trotter algorithm with Even's optimisation). @permute contains a set of indices into @bag.

(There's an obvious alternative approach to the problem, which I'll come back to at the end, but let's stick with the permuter for now.)

my @permute=(0..$#bag);
my @dir=(-1) x @permute;
$dir[0]=0;

while (1) {

For each permutation, we build a list @candidate consisting of the letters in order. Then we try to validate that as a word. (We also skip over any words we've seen before, since they may appear multiple times as different candidates get shortened; for example, the permutations ABCDEFG and ABCGEFD would both produce ABC if no longer candidate were valid.)

  my @candidate=map {$bag[$_]} @permute;
  while (@candidate) {
    my $candidate=join('',@candidate);
    if (exists $tried{$candidate}) {
      last;
    }
    $tried{$candidate}=1;
    if (exists $w{$candidate}) {

If so, we calculate the score, and in a two-stage trick of my own ensure that @maxcandidate contains all of the highest-scoring words.

      my $score=sum(map {$tilevalue{$_}} @candidate);
      if ($score > $maxscore) {
        @maxcandidate=();
        $maxscore=$score;
      }
      if ($score == $maxscore) {
        push @maxcandidate,$candidate;
      }

If that was a valid word, we don't need to look at any shorter words in this permutation; if ABJURER is valid, then whether or not ABJURE is valid it won't score as many points. (This optimisation wouldn't be true if some letters had negative values. Which might be an interesting variant to Scrabble…)

      last;
    }

Otherwise, if we didn't find a valid word, cut off the last character and try again.

    pop @candidate;
  }

This next bit is the permuter. There are various CPAN modules that will do this (most obviously Algorithm::Permute), but I felt like writing my own.

  my %find=map {$permute[$_] => $_} (0..$#permute);
  my $m=$#permute;
  while ($m>=0) {
    my $pos=$find{$m};
    unless ($dir[$pos]==0) {
      if ($m > $permute[$pos+$dir[$pos]]) {
        my $n=$pos+$dir[$pos];
        my $nn=$n+$dir[$pos];
        ($permute[$n],$permute[$pos])=($permute[$pos],$permute[$n]);
        ($dir[$n],$dir[$pos])=($dir[$pos],$dir[$n]);
        if ($n==0 || $n==$#permute || $permute[$nn] > $m) {
          $dir[$n]=0;
        }
        foreach my $i (0..$#permute) {
          if ($i==$n) {
            next;
          }
          if ($permute[$i]>$m) {
            $dir[$i]=($i<$n)?1:-1;
          }
        }
        last;
      }
    }
    $m--;
  }
  if ($m<0) {
    last;
  }
}

Finally, output the list of tiles and the highest scoring words.

print join('',sort @bag),"\n";
print "$maxscore: ",join(' ',sort @maxcandidate),"\n";

Perl6 is a bit different, because there are useful built-in language features:

my $tilecount=(
  'A' => 8,

[...] ).BagHash;

The grab method on a BagHash gives me a series of weighted random picks, without replacement. This is a really handy thing to have; lots of code I write does things a bit like this.

my @bag=$tilecount.grab($l);

Similarly, all that permuting algorithm can be replaced with another core feature:

for @bag.permutations -> $n {
  my @candidate=$n.list;

Which is less fun to write than the SJT permuter above but also less effort. (Now, if only the startup, loading up all these features that are nice to have but most of which a given program won't use, weren't so slow…)

ADDENDUM

The alternative approach is to read each word from the dictionary, see if it can be constructed with the available tiles, and if so score it. I wrote this in perl5 for the blog post, to see how the performance differed. After constructing the bag:

open I,'<','wordlist';
while (<I>) {
  chomp;
  if (length($_)<=$l) {
    my $candidate=uc($_);
    my @candidate=split '',$candidate;

%l holds letter counts. We add one to a value each time the letter occurs in the word, then subtract one each time it occurs in the bag. If any value is positive at the end, it occurred more in the word than it did in the bag, so it's not a word we can make. (This could possibly be optimised a bit by precalculating the hash with the bag values subtracted.)

    my %l;
    map {$l{$_}++} @candidate;
    map {$l{$_}--} @bag;
    if (max(values %l) > 0) {
      next;
    }

At this point we have a word that can be constructed from the available tiles, so just as before we score it.

    my $score=sum(map {$tilevalue{$_}} @candidate);
    if ($score > $maxscore) {
      @maxcandidate=();
      $maxscore=$score;
    }
    if ($score == $maxscore) {
      push @maxcandidate,$candidate;
    }
  }
}
close I;
print join('',sort @bag),"\n";
print "$maxscore: ",join(' ',sort @maxcandidate),"\n";

To test it, I used parallel(1) to run 800 instances (thus 8 at a time on my 8-core desktop, a new one being invoked whenever the previous one had finished). This variant took about 34 seconds to complete all 800; my original permuter code took about 14. I suspect that this is because my original code can skip score calculations on many words (see ABJURE/ABJURER above); to achieve this with the variant code would require the word list to be ordered appropriately with ABJURER before ABJURE, which would take time to set up.

The downside, of course, is that I have to hold the valid word list in memory. But SOWPODS is 2.6 megabytes, which isn't a problem for any vaguely non-antique machine.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 3d printing action aeronautics aikakirja anecdote animation anime army astronomy audio audio tech base commerce battletech beer boardgaming book of the week bookmonth chain of command children chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 cycling dead of winter doctor who documentary drama driving drone ecchi economics espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo geodata gin gurps gurps 101 harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo-nebula reread humour in brief avoid instrumented life kickstarter learn to play leaving earth linux lovecraftiana mecha men with beards museum mystery naval non-fiction one for the brow opera perl perl weekly challenge photography podcast politics powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1