RogerBW's Blog

Perl Weekly Challenge 99: Unique Match 10 February 2021

I’ve been doing the Perl Weekly Challenges. The latest involved globbing and building sub-sequences. (Note that this is open until 14 February 2021.)

TASK #1 › Pattern Match

You are given a string $S and a pattern $P.

Write a script to check if given pattern validate the entire string. Print 1 if pass otherwise 0.

The patterns can also have the following characters: ? - Match any single character. * - Match any sequence of characters.

So basically it's glob, which in all my chosen languages is easiest to implement as a regular expression. I arbitrarily decided not to sanitise the input; if I were actually doing this I'd add that too (probably by putting a backslash in front of each character that's not a letter, a number or a space).

No, that's not true. If I were actually doing this I'd say "pass a regular expression to the function" rather than "pass a glob pattern".

Anyway, trivial enough in Perl: turn each ? to ., and turn each * to .*. (I assume as in classic globbing that a * can match zero characters; otherwise just use .+ instead.)

sub pm {
  my $text=shift;
  my $match=shift;
  (my $re=$match) =~ s/\?/./g;
  $re =~ s/\*/.*/g;
  $re='^'.$re.'$';
  if ($text =~ /$re/) {
    return 1;
  }
  return 0;
}

Raku goes basically the same way except for the "minor" syntax changes…

sub pm($text,$match) {
  (my $re=$match) ~~ s:g/\?/./;
  $re ~~ s:g/\*/.*/;
  $re='^' ~ $re ~ '$';
  if ($text ~~ /<$re>/) {
    return 1;
  }
  return 0;
}

And the same in Ruby.

def pm(text,match)
  re='^' + match + '$'
  re=re.gsub(/\?/,'.')
  re=re.gsub(/\*/,'.*')
  if text =~ Regexp.new(re) then
    return 1
  end
  return 0
end

In Python I build up the regexp in a list, join it to make a string, then compile and match.

def pm(text,match):
    rl=list()
    rl.append("^")
    for c in match:
        if c == '?':
            rl.append(".")
        elif c == '*':
            rl.append(".*")
        else:
            rl.append(c)
    rl.append("$")
    rls=''.join(rl)
    if re.match(rls,text):
        return 1
    return 0

And similarly in Rust – because of the external dependency on the regex engine this needs to be an actual project unlike the usual one-file solutions.

fn pm(text: &str,mat: &str) -> u8 {
    let mut rl: Vec<char>=vec![];
    rl.push('^');
    for c in mat.chars() {
        if c == '?' {
            rl.push('.');
        } else if c == '*' {
            rl.push('.');
            rl.push('*');
        } else {
            rl.push(c);
        }
    }
    rl.push('$');
    let rs=rl.iter().collect::<String>();
    let rx: Regex=Regex::new(&rs).unwrap();
    if rx.is_match(&text) {
        return 1;
    }
    return 0;
}

TASK #2 › Unique Subsequence

You are given two strings $S and $T.

Write a script to find out count of different unique subsequences matching $T without changing the position of characters.

In other words, "london" and "lon" would produce:

  • london
  • london
  • london

for a total of 3.

The enjoyable part here was working out the algorithm, which happens in three stages:

(1) Map each letter in the text to a list of the positions it occurs in. E.g. l => [0], o => [1, 4], n => [2, 5], d => [3].

(2) Build a list of those lists, in order by their letters' occurrence in the matcher. E.g. [[0], [1, 4], [2, 5]].

(3) Traverse that list counting the number of ways it can be done with ascending values. E.g. there's only one way to count the initial 0; from there one can go on to 1 or 4 (2 paths); from 1 one can go to 2 or 5, but from 4 one can only go to 5 (total of 3 paths, which is the solution).

I admit, this is the sort of deep structure where perl can get a bit ugly.

sub us {
  my $text=shift;
  my $match=shift;

Step 1, build the hash:

  my %s;
  {
    my $i=0;
    foreach my $c (split '',$text) {
      push @{$s{$c}},$i;
      $i++;
    }
  }

Step 2, build the list. (This is also where I short-circuit: if there's a character in the match that isn't in the text, it can't match so we exit here.)

  my @j;
  foreach my $c (split '',$match) {
    if (exists $s{$c}) {
      push @j,$s{$c};
    } else {
      return 0;
    }
  }

Iterate through the list counting valid paths.

  my @o=(1) x scalar @{$j[0]};
  foreach my $m (1..$#j) {
    my @n;
    foreach my $bi (0..$#{$j[$m]}) {
      my $t=0;
      foreach my $ai (0..$#{$j[$m-1]}) {
        if ($j[$m-1][$ai] < $j[$m][$bi]) {
          $t+=$o[$ai];
        }
      }
      push @n,$t;
    }
    @o=@n;
  }
  return sum(@o);
}

Raku works the same way except without autovivification of the lists that go inside the hash.

sub us($text,$match) {
  my %s;
  {
    my $i=0;
    for $text.comb -> $c {
      unless %s{$c} {
        %s{$c}=Array.new;
      }
      push %s{$c},$i;
      $i++;
    }
  }
  my @j;
  for $match.comb -> $c {
    if (%s{$c}:exists) {
      push @j,%s{$c};
    } else {
      return 0;
    }
  }
  my @o=(1) xx @j[0].elems;
  for 1..@j.elems-1 -> $m {
    my @n;
    for 0..@j[$m].elems-1 -> $bi {
      my $t=0;
      for 0..@j[$m-1].elems-1 -> $ai {
        if (@j[$m-1][$ai] < @j[$m][$bi]) {
          $t+=@o[$ai];
        }
      }
      push @n,$t;
    }
    @o=@n;
  }
  return sum(@o);
}

All the languages that aren't perl or raku have some way of saying "given this list, I want you to generate indices for it as I iterate through it" (the thing for which I'm using the $i temporary variable above). Much cleaner. defaultdict handles the autovivification.

def us(text,match):
    s=defaultdict(list)
    for i,c in enumerate(text):
        s[c].append(i)
    j=list()
    for c in match:
        if c in s:
            j.append(s[c])
        else:
            return 0
    o=list();
    for i in range(len(j[0])):
        o.append(1)
    for m in range(1,len(j)):
        n=list()
        for bi in range(len(j[m])):
            t=0
            for ai in range(len(j[m-1])):
                if j[m-1][ai] < j[m][bi]:
                    t += o[ai]
            n.append(t)
        o=n
    return sum(o)

Ruby does its enumerate-equivalent in reverse order (value then index). I'm sure it has its reasons, perhaps for ease of stuffing into a hash.

#! /usr/bin/ruby

def us(text,match)
  s=Hash.new
  text.split(//).each_with_index do |c,i|
    unless s[c] then
      s[c]=Array.new
    end
    s[c].push(i)
  end
  j=Array.new;
  match.split(//).each do |c|
    if s.has_key?(c) then
      j.push(s[c])
    else
      return 0
    end
  end
  o=Array.new([1] * j[0].length)
  1.upto(j.length-1) do |m|
    n=Array.new
    0.upto(j[m].length-1) do |bi|
      t=0
      0.upto(j[m-1].length-1) do |ai|
        if j[m-1][ai] < j[m][bi] then
          t += o[ai]
        end
      end
      n.push(t)
    end
    o=n
  end
  return o.sum
end

Rust suffers from its hashes not being proper native constructs, so I have to use get and insert methods on them rather than just treating hash entries as variables. As a result there's a vector initialisation that's unneeded much of the time, but changing that would make the code even more verbose.

This is the only language I'm using in which I have to think about types; this probably isn't a bad thing, but I'm out of the habit.

fn us(text: &str,mat: &str) -> usize {
    let mut s: HashMap<char,Vec<usize>>=HashMap::new();
    for (i,c) in text.chars().enumerate() {
        let mut sc: Vec<usize>=vec![];
        if s.contains_key(&c) {
            sc=s.get(&c).unwrap().to_vec();
        }
        sc.push(i);
        s.insert(c,sc);
    }
    let mut j: Vec<Vec<usize>>=vec![];
    for c in mat.chars() {
        if s.contains_key(&c) {
            j.push(s.get(&c).unwrap().to_vec());
        } else {
            return 0;
        }
    }
    let mut o: Vec<usize>=vec![1; j[0].len()];
    for m in 1..j.len() {
        let mut n: Vec<usize>=vec![];
        for bi in 0..j[m].len() {
            let mut t: usize=0;
            for ai in 0..j[m-1].len() {
                if j[m-1][ai] < j[m][bi] {
                    t += o[ai];
                }
            }
            n.push(t);
        }
        o=n;
    }
    return o.iter().sum();
}

Full code on github.


  1. Posted by RogerBW at 02:46pm on 15 February 2021

    Most other players didn't bother to check for other regexp-valid punctuation in #1 either. (quotemeta seems like a non-terrible solution.)

    An alternative approach to #2: turn the matcher into a regexp (e.g. l.*o.*n) and then do an exhaustive match on the text. Another: a brute-force search with a binary mask.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 3d printing action aeronautics aikakirja anecdote animation anime army astronomy audio audio tech aviation base commerce battletech beer boardgaming book of the week bookmonth chain of command children chris chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 cycling dead of winter doctor who documentary drama driving drone ecchi economics espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo genesys geodata gin gkp gurps gurps 101 gus harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo-nebula reread in brief avoid instrumented life julie enfield kickstarter learn to play leaving earth linux liquor lovecraftiana mecha men with beards museum music mystery naval non-fiction one for the brow opera parody perl perl weekly challenge photography podcast politics powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs ruby rust science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1