RogerBW's Blog: Perl Weekly Challenge 38: date parsing and word games

I’ve been doing the Perl Weekly Challenges. Last week's was about parsing an odd date format and generating words in a Scrabble-ish manner.

The date parser is mostly about validating the parameters.

foreach my $dc (@ARGV) {
  unless (length($dc)==7) {
    warn "$dc is wrong length\n";
    next;
  }
  unless ($dc =~ /^[0-9]+$/) {
    warn "$dc has non-digit characters\n";
    next;
  }

That's as much as we can check without breaking it down into individual fields…

  $dc =~ /^(.)(..)(..)(..)$/;
  my ($cen,$year,$month,$day)=($1,$2,$3,$4);
  if ($cen==2) {
    $year+=1900;
  } elsif ($cen==1) {
    $year+=2000;
  } else {
    warn "$dc has invalid century digit $cen\n";
    next;
  }

Yes, that really is how the century parameter is meant to be interpreted.

Any pair of digits is valid for a year, so validate the month…

  if ($month<1 || $month>12) {
    warn "$dc has invalid month $month\n";
    next;
  }

and now try to parse what we have as a date, which will give an undefined value if the day number is wrong; so we test that rather than worrying about calculating month lengths ourselves.

  my $d;
  eval {
    $d=timelocal(0,0,12,$day,$month-1,$year);
  };
  unless (defined $d) {
    warn "$dc has invalid day $day\n";
    next;
  }

Then dump the output in the required format.

  print strftime('%Y-%m-%d',localtime($d)),"\n";
}

In Perl6 this is mostly similar until the end, where the exception handling is different. I don't want the program to die with its own error; I want it to show mine, and continue in case there are more dates to check.

  my $d;
  CATCH {
    $d=Date.new($year,$month,$day);
  }
  unless (defined $d) {
    warn "$dc has invalid day $day\n";
    next;
  }

Finally, Perl6 date formatting. (Yeah, but where's my strftime?)

  say $d.yyyy-mm-dd;

The word game problem gives a series of tile counts and values (e.g. 8 A tiles worth 1 point each); the problem is to draw 7 at random, then make the highest-scoring word out of them (not necessarily using all of them).

my $l=7;

my %tilecount=(
  A => 8,

[...] );

my %tilevalue=(
  A => 1,

[...] );

For the draw, we turn the tilecount hash into a single long list, shuffle it, and throw away all but the first 7 entries. (shuffle is from List::Util.)

my @bag=shuffle map {($_) x $tilecount{$_}} keys %tilecount;
splice @bag,$l;

Load a candidate word list. (I used a copy of SOWPODS I have hanging around, which is my standard reference for word validation problems.) We ignore anything longer than 7 characters.

my %w;
open I,'<','wordlist';
while (<I>) {
  chomp;
  if (length($_)<=$l) {
    $w{uc($_)}=1;
  }
}
close I;

Set up the accumulators for tracking high-scoring words.

my $maxscore=0;
my @maxcandidate;
my %tried;

Set up the variables for my permutation generator (I'll be using the Steinhaus-Johnson-Trotter algorithm with Even's optimisation). @permute contains a set of indices into @bag.

(There's an obvious alternative approach to the problem, which I'll come back to at the end, but let's stick with the permuter for now.)

my @permute=(0..$#bag);
my @dir=(-1) x @permute;
$dir[0]=0;

while (1) {

For each permutation, we build a list @candidate consisting of the letters in order. Then we try to validate that as a word. (We also skip over any words we've seen before, since they may appear multiple times as different candidates get shortened; for example, the permutations ABCDEFG and ABCGEFD would both produce ABC if no longer candidate were valid.)

  my @candidate=map {$bag[$_]} @permute;
  while (@candidate) {
    my $candidate=join('',@candidate);
    if (exists $tried{$candidate}) {
      last;
    }
    $tried{$candidate}=1;
    if (exists $w{$candidate}) {

If so, we calculate the score, and in a two-stage trick of my own ensure that @maxcandidate contains all of the highest-scoring words.

      my $score=sum(map {$tilevalue{$_}} @candidate);
      if ($score > $maxscore) {
        @maxcandidate=();
        $maxscore=$score;
      }
      if ($score == $maxscore) {
        push @maxcandidate,$candidate;
      }

If that was a valid word, we don't need to look at any shorter words in this permutation; if ABJURER is valid, then whether or not ABJURE is valid it won't score as many points. (This optimisation wouldn't be true if some letters had negative values. Which might be an interesting variant to Scrabble…)

      last;
    }

Otherwise, if we didn't find a valid word, cut off the last character and try again.

    pop @candidate;
  }

This next bit is the permuter. There are various CPAN modules that will do this (most obviously Algorithm::Permute), but I felt like writing my own.

  my %find=map {$permute[$_] => $_} (0..$#permute);
  my $m=$#permute;
  while ($m>=0) {
    my $pos=$find{$m};
    unless ($dir[$pos]==0) {
      if ($m > $permute[$pos+$dir[$pos]]) {
        my $n=$pos+$dir[$pos];
        my $nn=$n+$dir[$pos];
        ($permute[$n],$permute[$pos])=($permute[$pos],$permute[$n]);
        ($dir[$n],$dir[$pos])=($dir[$pos],$dir[$n]);
        if ($n==0 || $n==$#permute || $permute[$nn] > $m) {
          $dir[$n]=0;
        }
        foreach my $i (0..$#permute) {
          if ($i==$n) {
            next;
          }
          if ($permute[$i]>$m) {
            $dir[$i]=($i<$n)?1:-1;
          }
        }
        last;
      }
    }
    $m--;
  }
  if ($m<0) {
    last;
  }
}

Finally, output the list of tiles and the highest scoring words.

print join('',sort @bag),"\n";
print "$maxscore: ",join(' ',sort @maxcandidate),"\n";

Perl6 is a bit different, because there are useful built-in language features:

my $tilecount=(
  'A' => 8,

[...] ).BagHash;

The grab method on a BagHash gives me a series of weighted random picks, without replacement. This is a really handy thing to have; lots of code I write does things a bit like this.

my @bag=$tilecount.grab($l);

Similarly, all that permuting algorithm can be replaced with another core feature:

for @bag.permutations -> $n {
  my @candidate=$n.list;

Which is less fun to write than the SJT permuter above but also less effort. (Now, if only the startup, loading up all these features that are nice to have but most of which a given program won't use, weren't so slow…)

ADDENDUM

The alternative approach is to read each word from the dictionary, see if it can be constructed with the available tiles, and if so score it. I wrote this in perl5 for the blog post, to see how the performance differed. After constructing the bag:

open I,'<','wordlist';
while (<I>) {
  chomp;
  if (length($_)<=$l) {
    my $candidate=uc($_);
    my @candidate=split '',$candidate;

%l holds letter counts. We add one to a value each time the letter occurs in the word, then subtract one each time it occurs in the bag. If any value is positive at the end, it occurred more in the word than it did in the bag, so it's not a word we can make. (This could possibly be optimised a bit by precalculating the hash with the bag values subtracted.)

    my %l;
    map {$l{$_}++} @candidate;
    map {$l{$_}--} @bag;
    if (max(values %l) > 0) {
      next;
    }

At this point we have a word that can be constructed from the available tiles, so just as before we score it.

    my $score=sum(map {$tilevalue{$_}} @candidate);
    if ($score > $maxscore) {
      @maxcandidate=();
      $maxscore=$score;
    }
    if ($score == $maxscore) {
      push @maxcandidate,$candidate;
    }
  }
}
close I;
print join('',sort @bag),"\n";
print "$maxscore: ",join(' ',sort @maxcandidate),"\n";

To test it, I used parallel(1) to run 800 instances (thus 8 at a time on my 8-core desktop, a new one being invoked whenever the previous one had finished). This variant took about 34 seconds to complete all 800; my original permuter code took about 14. I suspect that this is because my original code can skip score calculations on many words (see ABJURE/ABJURER above); to achieve this with the variant code would require the word list to be ordered appropriately with ABJURER before ABJURE, which would take time to set up.

The downside, of course, is that I have to hold the valid word list in memory. But SOWPODS is 2.6 megabytes, which isn't a problem for any vaguely non-antique machine.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.