RogerBW's Blog

Perl Weekly Challenge 18 27 July 2019

I've been doing the Perl Weekly Challenges. This one was a couple of standard computer-science problems.

The first was to find the longest common substring of multiple strings. The standard version of this is with two strings, which has already been implemented for example at String::LCSS_XS, though if I just used that it wouldn't be much of a challenge. So: to the Wikipedia pseudocode. Of course I had to make it a bit more Perlish.

sub lcsubstr2 {
  my @s=split '',shift;
  my @t=split '',shift;

We're working on individual characters, so rather than have a bunch of substr calls we split the strings in advance.

  my %l;
  my $z=0;
  my @ret;
  foreach my $si (0..$#s) {
    foreach my $ti (0..$#t) {
      if ($s[$si] eq $t[$ti]) {
        if ($si==0 || $ti==0) {
          $l{$si}{$ti}=1;
        } else {
          $l{$si}{$ti}=($l{$si-1}{$ti-1} || 0)+1;
        }

Here's a bit that's rather ugly in the pseudocode: that version is "if count is greater than the previous max, set the max to count, and set the result; otherwise, if count is equal to max, append the result". Which means there are two different code paths for working out the result, and two different code paths means code gets out of sync. This pattern is my usual one for this situation: "if count is greater than max, set max to count and clear the result set; then, separately, if count equals max (which it will if we've just set it), append the result (possibly to the empty result set just created)".

        if ($l{$si}{$ti} > $z) {
          $z=$l{$si}{$ti};
          @ret=();
        }
        if ($l{$si}{$ti} == $z) {
          push @ret,join('',@s[$si-$z+1..$si]);
        }

Note also the array slice. Not often useful, but sometimes just the right tool for the job.

      }
    }
  }
  return @ret;
}

But that's not the whole of the problem; we need the longest substring of an arbitrary number of strings.

sub lcsubstr {
  my @str=@_;

First we check for the degenerate cases and do the basic two-parameter substring.

  if (scalar @str < 2) {
    return @str;
  }
  my @a=lcsubstr2(shift @str,shift @str);

Then, for each substring we got out of that, check it against the next string in the list. Keep the longest. (Consider the string set ABCXDEF ABCYDEF ABZDEF; comparing the first two will give ABC and DEF, and comparing each of those with the last will give AB and DEF. Yes, this could be slightly faster if I cached the string lengths.)

  while (@str) {
    my %b;
    my $c=shift @str;
    foreach my $a (@a) {
      map {$b{$_}=1} lcsubstr2($a,$c);
    }
    my $m=max(map {length($_)} keys %b); # use List::Util qw(max);
    @a=sort grep {length($_)==$m} keys %b;
  }
  return @a;
}

The other challenge is the implementation of a priority queue: put elements in with a given priority, and pull them out in priority order. Again there are implementations of this on CPAN, though they don't appear to allow for entirely arbitrary priorities as was wanted here.

The structure I chose was a hash of lists. Hashes are generally fast, and lists preserve order (another requirement was to pull items of equal priority in the order they were inserted).

package Local::PriorityQueue;
use List::Util qw(max);

sub new {
  my $class = shift;
  my $self={};
  bless $self,$class;
  return $self;
}

If there are no priority slots, the queue is empty.

sub is_empty {
  my $self=shift;
  if (scalar keys %{$self}) {
    return 0;
  }
  return 1;
}

To insert: push the new elements onto the list with the appropriate priority. (Which perl will autovivify if it doesn't already exist.)

sub insert_with_priority {
  my $self=shift;
  my $element=shift;
  my $priority=shift;
  push @{$self->{$priority}},$element;
}

To pull an element: get the highest extant priority (this is probably the slowest bit, though I suspect that keeping an array of priority levels would be even slower) and pull the first element out of the list. If the list at that priority level is now empty, delete it.

sub pull_highest_priority_element {
  my $self=shift;
  if ($self->is_empty) {
    return undef;
  }
  my $prio=max(keys %{$self});
  my $element=shift @{$self->{$prio}};
  if (scalar @{$self->{$prio}}==0) {
    delete $self->{$prio};
  }
  return $element;
}

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 3d printing action aeronautics aikakirja anecdote animation anime army astronomy audio audio tech base commerce battletech beer boardgaming book of the week bookmonth chain of command children chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup cycling dead of winter doctor who documentary drama driving drone ecchi economics espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 existential risk falklands war fandom fantasy film firefly first world war flash point food garmin drive gazebo geodata gin gurps gurps 101 harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo-nebula reread humour in brief avoid instrumented life kickstarter learn to play leaving earth linux mecha men with beards museum mystery naval non-fiction one for the brow opera perl perl weekly challenge photography podcast politics powers prediction privacy project woolsack pyracantha quantum rail ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1