I’ve been doing the Perl Weekly
Challenges. The
latest
involved globbing and building sub-sequences.
(Note that this is open until 14 February 2021.)
TASK #1 › Pattern Match
You are given a string $S
and a pattern $P
.
Write a script to check if given pattern validate the entire string.
Print 1 if pass otherwise 0.
The patterns can also have the following characters:
? - Match any single character.
-
- Match any sequence of characters.
So basically it's glob, which in all my chosen languages is easiest to
implement as a regular expression. I arbitrarily decided not to
sanitise the input; if I were actually doing this I'd add that too
(probably by putting a backslash in front of each character that's not
a letter, a number or a space).
No, that's not true. If I were actually doing this I'd say "pass a
regular expression to the function" rather than "pass a glob pattern".
Anyway, trivial enough in Perl: turn each ?
to .
, and turn each
*
to .*
. (I assume as in classic globbing that a *
can match
zero characters; otherwise just use .+
instead.)
sub pm {
my $text=shift;
my $match=shift;
(my $re=$match) =~ s/\?/./g;
$re =~ s/\*/.*/g;
$re='^'.$re.'$';
if ($text =~ /$re/) {
return 1;
}
return 0;
}
Raku goes basically the same way except for the "minor" syntax changes…
sub pm($text,$match) {
(my $re=$match) ~~ s:g/\?/./;
$re ~~ s:g/\*/.*/;
$re='^' ~ $re ~ '$';
if ($text ~~ /<$re>/) {
return 1;
}
return 0;
}
And the same in Ruby.
def pm(text,match)
re='^' + match + '$'
re=re.gsub(/\?/,'.')
re=re.gsub(/\*/,'.*')
if text =~ Regexp.new(re) then
return 1
end
return 0
end
In Python I build up the regexp in a list, join it to make a string,
then compile and match.
def pm(text,match):
rl=list()
rl.append("^")
for c in match:
if c == '?':
rl.append(".")
elif c == '*':
rl.append(".*")
else:
rl.append(c)
rl.append("$")
rls=''.join(rl)
if re.match(rls,text):
return 1
return 0
And similarly in Rust – because of the external dependency on the
regex engine this needs to be an actual project unlike the usual
one-file solutions.
fn pm(text: &str,mat: &str) -> u8 {
let mut rl: Vec<char>=vec![];
rl.push('^');
for c in mat.chars() {
if c == '?' {
rl.push('.');
} else if c == '*' {
rl.push('.');
rl.push('*');
} else {
rl.push(c);
}
}
rl.push('$');
let rs=rl.iter().collect::<String>();
let rx: Regex=Regex::new(&rs).unwrap();
if rx.is_match(&text) {
return 1;
}
return 0;
}
TASK #2 › Unique Subsequence
You are given two strings $S
and $T
.
Write a script to find out count of different unique subsequences
matching $T without changing the position of characters.
In other words, "london" and "lon" would produce:
for a total of 3.
The enjoyable part here was working out the algorithm, which happens
in three stages:
(1) Map each letter in the text to a list of the positions it occurs
in. E.g. l => [0], o => [1, 4], n => [2, 5], d => [3]
.
(2) Build a list of those lists, in order by their letters' occurrence
in the matcher. E.g. [[0], [1, 4], [2, 5]]
.
(3) Traverse that list counting the number of ways it can be done with
ascending values. E.g. there's only one way to count the initial 0;
from there one can go on to 1 or 4 (2 paths); from 1 one can go to 2
or 5, but from 4 one can only go to 5 (total of 3 paths, which is the
solution).
I admit, this is the sort of deep structure where perl can get a bit
ugly.
sub us {
my $text=shift;
my $match=shift;
Step 1, build the hash:
my %s;
{
my $i=0;
foreach my $c (split '',$text) {
push @{$s{$c}},$i;
$i++;
}
}
Step 2, build the list. (This is also where I short-circuit: if
there's a character in the match that isn't in the text, it can't
match so we exit here.)
my @j;
foreach my $c (split '',$match) {
if (exists $s{$c}) {
push @j,$s{$c};
} else {
return 0;
}
}
Iterate through the list counting valid paths.
my @o=(1) x scalar @{$j[0]};
foreach my $m (1..$#j) {
my @n;
foreach my $bi (0..$#{$j[$m]}) {
my $t=0;
foreach my $ai (0..$#{$j[$m-1]}) {
if ($j[$m-1][$ai] < $j[$m][$bi]) {
$t+=$o[$ai];
}
}
push @n,$t;
}
@o=@n;
}
return sum(@o);
}
Raku works the same way except without autovivification of the lists
that go inside the hash.
sub us($text,$match) {
my %s;
{
my $i=0;
for $text.comb -> $c {
unless %s{$c} {
%s{$c}=Array.new;
}
push %s{$c},$i;
$i++;
}
}
my @j;
for $match.comb -> $c {
if (%s{$c}:exists) {
push @j,%s{$c};
} else {
return 0;
}
}
my @o=(1) xx @j[0].elems;
for 1..@j.elems-1 -> $m {
my @n;
for 0..@j[$m].elems-1 -> $bi {
my $t=0;
for 0..@j[$m-1].elems-1 -> $ai {
if (@j[$m-1][$ai] < @j[$m][$bi]) {
$t+=@o[$ai];
}
}
push @n,$t;
}
@o=@n;
}
return sum(@o);
}
All the languages that aren't perl or raku have some way of saying
"given this list, I want you to generate indices for it as I iterate
through it" (the thing for which I'm using the $i
temporary variable
above). Much cleaner. defaultdict
handles the autovivification.
def us(text,match):
s=defaultdict(list)
for i,c in enumerate(text):
s[c].append(i)
j=list()
for c in match:
if c in s:
j.append(s[c])
else:
return 0
o=list();
for i in range(len(j[0])):
o.append(1)
for m in range(1,len(j)):
n=list()
for bi in range(len(j[m])):
t=0
for ai in range(len(j[m-1])):
if j[m-1][ai] < j[m][bi]:
t += o[ai]
n.append(t)
o=n
return sum(o)
Ruby does its enumerate
-equivalent in reverse order (value then
index). I'm sure it has its reasons, perhaps for ease of stuffing into
a hash.
#! /usr/bin/ruby
def us(text,match)
s=Hash.new
text.split(//).each_with_index do |c,i|
unless s[c] then
s[c]=Array.new
end
s[c].push(i)
end
j=Array.new;
match.split(//).each do |c|
if s.has_key?(c) then
j.push(s[c])
else
return 0
end
end
o=Array.new([1] * j[0].length)
1.upto(j.length-1) do |m|
n=Array.new
0.upto(j[m].length-1) do |bi|
t=0
0.upto(j[m-1].length-1) do |ai|
if j[m-1][ai] < j[m][bi] then
t += o[ai]
end
end
n.push(t)
end
o=n
end
return o.sum
end
Rust suffers from its hashes not being proper native constructs, so I
have to use get
and insert
methods on them rather than just
treating hash entries as variables. As a result there's a vector
initialisation that's unneeded much of the time, but changing that
would make the code even more verbose.
This is the only language I'm using in which I have to think about
types; this probably isn't a bad thing, but I'm out of the habit.
fn us(text: &str,mat: &str) -> usize {
let mut s: HashMap<char,Vec<usize>>=HashMap::new();
for (i,c) in text.chars().enumerate() {
let mut sc: Vec<usize>=vec![];
if s.contains_key(&c) {
sc=s.get(&c).unwrap().to_vec();
}
sc.push(i);
s.insert(c,sc);
}
let mut j: Vec<Vec<usize>>=vec![];
for c in mat.chars() {
if s.contains_key(&c) {
j.push(s.get(&c).unwrap().to_vec());
} else {
return 0;
}
}
let mut o: Vec<usize>=vec![1; j[0].len()];
for m in 1..j.len() {
let mut n: Vec<usize>=vec![];
for bi in 0..j[m].len() {
let mut t: usize=0;
for ai in 0..j[m-1].len() {
if j[m-1][ai] < j[m][bi] {
t += o[ai];
}
}
n.push(t);
}
o=n;
}
return o.iter().sum();
}
Full code on
github.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.