I’ve been doing the Perl Weekly
Challenges. Last
week's
was about parsing an odd date format and generating words in a
Scrabble-ish manner.
The date parser is mostly about validating the parameters.
foreach my $dc (@ARGV) {
unless (length($dc)==7) {
warn "$dc is wrong length\n";
next;
}
unless ($dc =~ /^[0-9]+$/) {
warn "$dc has non-digit characters\n";
next;
}
That's as much as we can check without breaking it down into
individual fields…
$dc =~ /^(.)(..)(..)(..)$/;
my ($cen,$year,$month,$day)=($1,$2,$3,$4);
if ($cen==2) {
$year+=1900;
} elsif ($cen==1) {
$year+=2000;
} else {
warn "$dc has invalid century digit $cen\n";
next;
}
Yes, that really is how the century parameter is meant to be interpreted.
Any pair of digits is valid for a year, so validate the month…
if ($month<1 || $month>12) {
warn "$dc has invalid month $month\n";
next;
}
and now try to parse what we have as a date, which will give an
undefined value if the day number is wrong; so we test that rather
than worrying about calculating month lengths ourselves.
my $d;
eval {
$d=timelocal(0,0,12,$day,$month-1,$year);
};
unless (defined $d) {
warn "$dc has invalid day $day\n";
next;
}
Then dump the output in the required format.
print strftime('%Y-%m-%d',localtime($d)),"\n";
}
In Perl6 this is mostly similar until the end, where the exception
handling is different. I don't want the program to die with its own
error; I want it to show mine, and continue in case there are more
dates to check.
my $d;
CATCH {
$d=Date.new($year,$month,$day);
}
unless (defined $d) {
warn "$dc has invalid day $day\n";
next;
}
Finally, Perl6 date formatting. (Yeah, but where's my strftime?)
say $d.yyyy-mm-dd;
The word game problem gives a series of tile counts and values (e.g. 8
A tiles worth 1 point each); the problem is to draw 7 at random, then
make the highest-scoring word out of them (not necessarily using all
of them).
my $l=7;
my %tilecount=(
A => 8,
[...]
);
my %tilevalue=(
A => 1,
[...]
);
For the draw, we turn the tilecount hash into a single long list,
shuffle it, and throw away all but the first 7 entries. (shuffle
is
from List::Util.)
my @bag=shuffle map {($_) x $tilecount{$_}} keys %tilecount;
splice @bag,$l;
Load a candidate word list. (I used a copy of
SOWPODS I have
hanging around, which is my standard reference for word validation
problems.) We ignore anything longer than 7 characters.
my %w;
open I,'<','wordlist';
while (<I>) {
chomp;
if (length($_)<=$l) {
$w{uc($_)}=1;
}
}
close I;
Set up the accumulators for tracking high-scoring words.
my $maxscore=0;
my @maxcandidate;
my %tried;
Set up the variables for my permutation generator (I'll be using the
Steinhaus-Johnson-Trotter
algorithm with Even's optimisation). @permute
contains a set of
indices into @bag
.
(There's an obvious alternative approach to the problem, which I'll
come back to at the end, but let's stick with the permuter for now.)
my @permute=(0..$#bag);
my @dir=(-1) x @permute;
$dir[0]=0;
while (1) {
For each permutation, we build a list @candidate
consisting of the
letters in order. Then we try to validate that as a word. (We also
skip over any words we've seen before, since they may appear multiple
times as different candidates get shortened; for example, the
permutations ABCDEFG and ABCGEFD would both produce ABC if no longer
candidate were valid.)
my @candidate=map {$bag[$_]} @permute;
while (@candidate) {
my $candidate=join('',@candidate);
if (exists $tried{$candidate}) {
last;
}
$tried{$candidate}=1;
if (exists $w{$candidate}) {
If so, we calculate the score, and in a two-stage trick of my own
ensure that @maxcandidate
contains all of the highest-scoring words.
my $score=sum(map {$tilevalue{$_}} @candidate);
if ($score > $maxscore) {
@maxcandidate=();
$maxscore=$score;
}
if ($score == $maxscore) {
push @maxcandidate,$candidate;
}
If that was a valid word, we don't need to look at any shorter words
in this permutation; if ABJURER is valid, then whether or not ABJURE
is valid it won't score as many points. (This optimisation wouldn't be
true if some letters had negative values. Which might be an
interesting variant to Scrabble…)
last;
}
Otherwise, if we didn't find a valid word, cut off the last character
and try again.
pop @candidate;
}
This next bit is the permuter. There are various CPAN modules that
will do this (most obviously Algorithm::Permute), but I felt like
writing my own.
my %find=map {$permute[$_] => $_} (0..$#permute);
my $m=$#permute;
while ($m>=0) {
my $pos=$find{$m};
unless ($dir[$pos]==0) {
if ($m > $permute[$pos+$dir[$pos]]) {
my $n=$pos+$dir[$pos];
my $nn=$n+$dir[$pos];
($permute[$n],$permute[$pos])=($permute[$pos],$permute[$n]);
($dir[$n],$dir[$pos])=($dir[$pos],$dir[$n]);
if ($n==0 || $n==$#permute || $permute[$nn] > $m) {
$dir[$n]=0;
}
foreach my $i (0..$#permute) {
if ($i==$n) {
next;
}
if ($permute[$i]>$m) {
$dir[$i]=($i<$n)?1:-1;
}
}
last;
}
}
$m--;
}
if ($m<0) {
last;
}
}
Finally, output the list of tiles and the highest scoring words.
print join('',sort @bag),"\n";
print "$maxscore: ",join(' ',sort @maxcandidate),"\n";
Perl6 is a bit different, because there are useful built-in language
features:
my $tilecount=(
'A' => 8,
[...]
).BagHash;
The grab
method on a BagHash
gives me a series of weighted random
picks, without replacement. This is a really handy thing to have; lots
of code I write does things a bit like this.
my @bag=$tilecount.grab($l);
Similarly, all that permuting algorithm can be replaced with another
core feature:
for @bag.permutations -> $n {
my @candidate=$n.list;
Which is less fun to write than the SJT permuter above but also less
effort. (Now, if only the startup, loading up all these features that
are nice to have but most of which a given program won't use, weren't
so slow…)
ADDENDUM
The alternative approach is to read each word from the dictionary, see
if it can be constructed with the available tiles, and if so score it.
I wrote this in perl5 for the blog post, to see how the performance
differed. After constructing the bag:
open I,'<','wordlist';
while (<I>) {
chomp;
if (length($_)<=$l) {
my $candidate=uc($_);
my @candidate=split '',$candidate;
%l holds letter counts. We add one to a value each time the letter
occurs in the word, then subtract one each time it occurs in the bag.
If any value is positive at the end, it occurred more in the word than
it did in the bag, so it's not a word we can make. (This could
possibly be optimised a bit by precalculating the hash with the bag
values subtracted.)
my %l;
map {$l{$_}++} @candidate;
map {$l{$_}--} @bag;
if (max(values %l) > 0) {
next;
}
At this point we have a word that can be constructed from the
available tiles, so just as before we score it.
my $score=sum(map {$tilevalue{$_}} @candidate);
if ($score > $maxscore) {
@maxcandidate=();
$maxscore=$score;
}
if ($score == $maxscore) {
push @maxcandidate,$candidate;
}
}
}
close I;
print join('',sort @bag),"\n";
print "$maxscore: ",join(' ',sort @maxcandidate),"\n";
To test it, I used parallel(1) to run 800 instances (thus 8 at a time
on my 8-core desktop, a new one being invoked whenever the previous
one had finished). This variant took about 34 seconds to complete all
800; my original permuter code took about 14. I suspect that this is
because my original code can skip score calculations on many words
(see ABJURE/ABJURER above); to achieve this with the variant code
would require the word list to be ordered appropriately with ABJURER
before ABJURE, which would take time to set up.
The downside, of course, is that I have to hold the valid word list in
memory. But SOWPODS is 2.6 megabytes, which isn't a problem for any
vaguely non-antique machine.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.