I've been doing the
Perl Weekly Challenges. This one
was a couple of standard computer-science problems.
The first was to find the longest common substring of multiple
strings. The standard version of this is with two strings, which has
already been implemented for example at
String::LCSS_XS, though
if I just used that it wouldn't be much of a challenge. So: to the
Wikipedia pseudocode.
Of course I had to make it a bit more Perlish.
sub lcsubstr2 {
my @s=split '',shift;
my @t=split '',shift;
We're working on individual characters, so rather than have a bunch of
substr calls we split the strings in advance.
my %l;
my $z=0;
my @ret;
foreach my $si (0..$#s) {
foreach my $ti (0..$#t) {
if ($s[$si] eq $t[$ti]) {
if ($si==0 || $ti==0) {
$l{$si}{$ti}=1;
} else {
$l{$si}{$ti}=($l{$si-1}{$ti-1} || 0)+1;
}
Here's a bit that's rather ugly in the pseudocode: that version is "if
count is greater than the previous max, set the max to count, and set
the result; otherwise, if count is equal to max, append the result".
Which means there are two different code paths for working out the
result, and two different code paths means code gets out of sync. This
pattern is my usual one for this situation: "if count is greater than
max, set max to count and clear the result set; then, separately, if
count equals max (which it will if we've just set it), append the
result (possibly to the empty result set just created)".
if ($l{$si}{$ti} > $z) {
$z=$l{$si}{$ti};
@ret=();
}
if ($l{$si}{$ti} == $z) {
push @ret,join('',@s[$si-$z+1..$si]);
}
Note also the array slice. Not often useful, but sometimes just the
right tool for the job.
}
}
}
return @ret;
}
But that's not the whole of the problem; we need the longest substring
of an arbitrary number of strings.
sub lcsubstr {
my @str=@_;
First we check for the degenerate cases and do the basic two-parameter
substring.
if (scalar @str < 2) {
return @str;
}
my @a=lcsubstr2(shift @str,shift @str);
Then, for each substring we got out of that, check it against the next
string in the list. Keep the longest. (Consider the string set ABCXDEF
ABCYDEF ABZDEF; comparing the first two will give ABC and DEF, and
comparing each of those with the last will give AB and DEF. Yes, this
could be slightly faster if I cached the string lengths.)
while (@str) {
my %b;
my $c=shift @str;
foreach my $a (@a) {
map {$b{$_}=1} lcsubstr2($a,$c);
}
my $m=max(map {length($_)} keys %b); # use List::Util qw(max);
@a=sort grep {length($_)==$m} keys %b;
}
return @a;
}
The other challenge is the implementation of a priority queue: put
elements in with a given priority, and pull them out in priority
order. Again there are implementations of this on CPAN, though they
don't appear to allow for entirely arbitrary priorities as was wanted
here.
The structure I chose was a hash of lists. Hashes are generally fast,
and lists preserve order (another requirement was to pull items of
equal priority in the order they were inserted).
package Local::PriorityQueue;
use List::Util qw(max);
sub new {
my $class = shift;
my $self={};
bless $self,$class;
return $self;
}
If there are no priority slots, the queue is empty.
sub is_empty {
my $self=shift;
if (scalar keys %{$self}) {
return 0;
}
return 1;
}
To insert: push the new elements onto the list with the appropriate
priority. (Which perl will autovivify if it doesn't already exist.)
sub insert_with_priority {
my $self=shift;
my $element=shift;
my $priority=shift;
push @{$self->{$priority}},$element;
}
To pull an element: get the highest extant priority (this is probably
the slowest bit, though I suspect that keeping an array of priority
levels would be even slower) and pull the first element out of the
list. If the list at that priority level is now empty, delete it.
sub pull_highest_priority_element {
my $self=shift;
if ($self->is_empty) {
return undef;
}
my $prio=max(keys %{$self});
my $element=shift @{$self->{$prio}};
if (scalar @{$self->{$prio}}==0) {
delete $self->{$prio};
}
return $element;
}
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.