RogerBW's Blog: Perl Weekly Challenge 99: Unique Match

I’ve been doing the Perl Weekly Challenges. The latest involved globbing and building sub-sequences. (Note that this is open until 14 February 2021.)

TASK #1 › Pattern Match

You are given a string $S and a pattern $P.

Write a script to check if given pattern validate the entire string. Print 1 if pass otherwise 0.

The patterns can also have the following characters: ? - Match any single character.

Match any sequence of characters.

So basically it's glob, which in all my chosen languages is easiest to implement as a regular expression. I arbitrarily decided not to sanitise the input; if I were actually doing this I'd add that too (probably by putting a backslash in front of each character that's not a letter, a number or a space).

No, that's not true. If I were actually doing this I'd say "pass a regular expression to the function" rather than "pass a glob pattern".

Anyway, trivial enough in Perl: turn each ? to ., and turn each * to .*. (I assume as in classic globbing that a * can match zero characters; otherwise just use .+ instead.)

sub pm {
  my $text=shift;
  my $match=shift;
  (my $re=$match) =~ s/\?/./g;
  $re =~ s/\*/.*/g;
  $re='^'.$re.'$';
  if ($text =~ /$re/) {
    return 1;
  }
  return 0;
}

Raku goes basically the same way except for the "minor" syntax changes…

sub pm($text,$match) {
  (my $re=$match) ~~ s:g/\?/./;
  $re ~~ s:g/\*/.*/;
  $re='^' ~ $re ~ '$';
  if ($text ~~ /<$re>/) {
    return 1;
  }
  return 0;
}

And the same in Ruby.

def pm(text,match)
  re='^' + match + '$'
  re=re.gsub(/\?/,'.')
  re=re.gsub(/\*/,'.*')
  if text =~ Regexp.new(re) then
    return 1
  end
  return 0
end

In Python I build up the regexp in a list, join it to make a string, then compile and match.

def pm(text,match):
    rl=list()
    rl.append("^")
    for c in match:
        if c == '?':
            rl.append(".")
        elif c == '*':
            rl.append(".*")
        else:
            rl.append(c)
    rl.append("$")
    rls=''.join(rl)
    if re.match(rls,text):
        return 1
    return 0

And similarly in Rust – because of the external dependency on the regex engine this needs to be an actual project unlike the usual one-file solutions.

fn pm(text: &str,mat: &str) -> u8 {
    let mut rl: Vec<char>=vec![];
    rl.push('^');
    for c in mat.chars() {
        if c == '?' {
            rl.push('.');
        } else if c == '*' {
            rl.push('.');
            rl.push('*');
        } else {
            rl.push(c);
        }
    }
    rl.push('$');
    let rs=rl.iter().collect::<String>();
    let rx: Regex=Regex::new(&rs).unwrap();
    if rx.is_match(&text) {
        return 1;
    }
    return 0;
}

TASK #2 › Unique Subsequence

You are given two strings $S and $T.

Write a script to find out count of different unique subsequences matching $T without changing the position of characters.

In other words, "london" and "lon" would produce:

london
london
london

for a total of 3.

The enjoyable part here was working out the algorithm, which happens in three stages:

(1) Map each letter in the text to a list of the positions it occurs in. E.g. l => [0], o => [1, 4], n => [2, 5], d => [3].

(2) Build a list of those lists, in order by their letters' occurrence in the matcher. E.g. [[0], [1, 4], [2, 5]].

(3) Traverse that list counting the number of ways it can be done with ascending values. E.g. there's only one way to count the initial 0; from there one can go on to 1 or 4 (2 paths); from 1 one can go to 2 or 5, but from 4 one can only go to 5 (total of 3 paths, which is the solution).

I admit, this is the sort of deep structure where perl can get a bit ugly.

sub us {
  my $text=shift;
  my $match=shift;

Step 1, build the hash:

  my %s;
  {
    my $i=0;
    foreach my $c (split '',$text) {
      push @{$s{$c}},$i;
      $i++;
    }
  }

Step 2, build the list. (This is also where I short-circuit: if there's a character in the match that isn't in the text, it can't match so we exit here.)

  my @j;
  foreach my $c (split '',$match) {
    if (exists $s{$c}) {
      push @j,$s{$c};
    } else {
      return 0;
    }
  }

Iterate through the list counting valid paths.

  my @o=(1) x scalar @{$j[0]};
  foreach my $m (1..$#j) {
    my @n;
    foreach my $bi (0..$#{$j[$m]}) {
      my $t=0;
      foreach my $ai (0..$#{$j[$m-1]}) {
        if ($j[$m-1][$ai] < $j[$m][$bi]) {
          $t+=$o[$ai];
        }
      }
      push @n,$t;
    }
    @o=@n;
  }
  return sum(@o);
}

Raku works the same way except without autovivification of the lists that go inside the hash.

sub us($text,$match) {
  my %s;
  {
    my $i=0;
    for $text.comb -> $c {
      unless %s{$c} {
        %s{$c}=Array.new;
      }
      push %s{$c},$i;
      $i++;
    }
  }
  my @j;
  for $match.comb -> $c {
    if (%s{$c}:exists) {
      push @j,%s{$c};
    } else {
      return 0;
    }
  }
  my @o=(1) xx @j[0].elems;
  for 1..@j.elems-1 -> $m {
    my @n;
    for 0..@j[$m].elems-1 -> $bi {
      my $t=0;
      for 0..@j[$m-1].elems-1 -> $ai {
        if (@j[$m-1][$ai] < @j[$m][$bi]) {
          $t+=@o[$ai];
        }
      }
      push @n,$t;
    }
    @o=@n;
  }
  return sum(@o);
}

All the languages that aren't perl or raku have some way of saying "given this list, I want you to generate indices for it as I iterate through it" (the thing for which I'm using the $i temporary variable above). Much cleaner. defaultdict handles the autovivification.

def us(text,match):
    s=defaultdict(list)
    for i,c in enumerate(text):
        s[c].append(i)
    j=list()
    for c in match:
        if c in s:
            j.append(s[c])
        else:
            return 0
    o=list();
    for i in range(len(j[0])):
        o.append(1)
    for m in range(1,len(j)):
        n=list()
        for bi in range(len(j[m])):
            t=0
            for ai in range(len(j[m-1])):
                if j[m-1][ai] < j[m][bi]:
                    t += o[ai]
            n.append(t)
        o=n
    return sum(o)

Ruby does its enumerate-equivalent in reverse order (value then index). I'm sure it has its reasons, perhaps for ease of stuffing into a hash.

#! /usr/bin/ruby

def us(text,match)
  s=Hash.new
  text.split(//).each_with_index do |c,i|
    unless s[c] then
      s[c]=Array.new
    end
    s[c].push(i)
  end
  j=Array.new;
  match.split(//).each do |c|
    if s.has_key?(c) then
      j.push(s[c])
    else
      return 0
    end
  end
  o=Array.new([1] * j[0].length)
  1.upto(j.length-1) do |m|
    n=Array.new
    0.upto(j[m].length-1) do |bi|
      t=0
      0.upto(j[m-1].length-1) do |ai|
        if j[m-1][ai] < j[m][bi] then
          t += o[ai]
        end
      end
      n.push(t)
    end
    o=n
  end
  return o.sum
end

Rust suffers from its hashes not being proper native constructs, so I have to use get and insert methods on them rather than just treating hash entries as variables. As a result there's a vector initialisation that's unneeded much of the time, but changing that would make the code even more verbose.

This is the only language I'm using in which I have to think about types; this probably isn't a bad thing, but I'm out of the habit.

fn us(text: &str,mat: &str) -> usize {
    let mut s: HashMap<char,Vec<usize>>=HashMap::new();
    for (i,c) in text.chars().enumerate() {
        let mut sc: Vec<usize>=vec![];
        if s.contains_key(&c) {
            sc=s.get(&c).unwrap().to_vec();
        }
        sc.push(i);
        s.insert(c,sc);
    }
    let mut j: Vec<Vec<usize>>=vec![];
    for c in mat.chars() {
        if s.contains_key(&c) {
            j.push(s.get(&c).unwrap().to_vec());
        } else {
            return 0;
        }
    }
    let mut o: Vec<usize>=vec![1; j[0].len()];
    for m in 1..j.len() {
        let mut n: Vec<usize>=vec![];
        for bi in 0..j[m].len() {
            let mut t: usize=0;
            for ai in 0..j[m-1].len() {
                if j[m-1][ai] < j[m][bi] {
                    t += o[ai];
                }
            }
            n.push(t);
        }
        o=n;
    }
    return o.iter().sum();
}

Full code on github.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.