RogerBW's Blog: Perl Weekly Challenge 18

I've been doing the Perl Weekly Challenges. This one was a couple of standard computer-science problems.

The first was to find the longest common substring of multiple strings. The standard version of this is with two strings, which has already been implemented for example at String::LCSS_XS, though if I just used that it wouldn't be much of a challenge. So: to the Wikipedia pseudocode. Of course I had to make it a bit more Perlish.

sub lcsubstr2 {
  my @s=split '',shift;
  my @t=split '',shift;

We're working on individual characters, so rather than have a bunch of substr calls we split the strings in advance.

  my %l;
  my $z=0;
  my @ret;
  foreach my $si (0..$#s) {
    foreach my $ti (0..$#t) {
      if ($s[$si] eq $t[$ti]) {
        if ($si==0 || $ti==0) {
          $l{$si}{$ti}=1;
        } else {
          $l{$si}{$ti}=($l{$si-1}{$ti-1} || 0)+1;
        }

Here's a bit that's rather ugly in the pseudocode: that version is "if count is greater than the previous max, set the max to count, and set the result; otherwise, if count is equal to max, append the result". Which means there are two different code paths for working out the result, and two different code paths means code gets out of sync. This pattern is my usual one for this situation: "if count is greater than max, set max to count and clear the result set; then, separately, if count equals max (which it will if we've just set it), append the result (possibly to the empty result set just created)".

        if ($l{$si}{$ti} > $z) {
          $z=$l{$si}{$ti};
          @ret=();
        }
        if ($l{$si}{$ti} == $z) {
          push @ret,join('',@s[$si-$z+1..$si]);
        }

Note also the array slice. Not often useful, but sometimes just the right tool for the job.

      }
    }
  }
  return @ret;
}

But that's not the whole of the problem; we need the longest substring of an arbitrary number of strings.

sub lcsubstr {
  my @str=@_;

First we check for the degenerate cases and do the basic two-parameter substring.

  if (scalar @str < 2) {
    return @str;
  }
  my @a=lcsubstr2(shift @str,shift @str);

Then, for each substring we got out of that, check it against the next string in the list. Keep the longest. (Consider the string set ABCXDEF ABCYDEF ABZDEF; comparing the first two will give ABC and DEF, and comparing each of those with the last will give AB and DEF. Yes, this could be slightly faster if I cached the string lengths.)

  while (@str) {
    my %b;
    my $c=shift @str;
    foreach my $a (@a) {
      map {$b{$_}=1} lcsubstr2($a,$c);
    }
    my $m=max(map {length($_)} keys %b); # use List::Util qw(max);
    @a=sort grep {length($_)==$m} keys %b;
  }
  return @a;
}

The other challenge is the implementation of a priority queue: put elements in with a given priority, and pull them out in priority order. Again there are implementations of this on CPAN, though they don't appear to allow for entirely arbitrary priorities as was wanted here.

The structure I chose was a hash of lists. Hashes are generally fast, and lists preserve order (another requirement was to pull items of equal priority in the order they were inserted).

package Local::PriorityQueue;
use List::Util qw(max);

sub new {
  my $class = shift;
  my $self={};
  bless $self,$class;
  return $self;
}

If there are no priority slots, the queue is empty.

sub is_empty {
  my $self=shift;
  if (scalar keys %{$self}) {
    return 0;
  }
  return 1;
}

To insert: push the new elements onto the list with the appropriate priority. (Which perl will autovivify if it doesn't already exist.)

sub insert_with_priority {
  my $self=shift;
  my $element=shift;
  my $priority=shift;
  push @{$self->{$priority}},$element;
}

To pull an element: get the highest extant priority (this is probably the slowest bit, though I suspect that keeping an array of priority levels would be even slower) and pull the first element out of the list. If the list at that priority level is now empty, delete it.

sub pull_highest_priority_element {
  my $self=shift;
  if ($self->is_empty) {
    return undef;
  }
  my $prio=max(keys %{$self});
  my $element=shift @{$self->{$prio}};
  if (scalar @{$self->{$prio}}==0) {
    delete $self->{$prio};
  }
  return $element;
}

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.