I’ve been doing the Weekly
Challenges. The
latest
involved word searching and directory mangling. (Note that this is
open until 29 May 2022.)
Task 1: Hexadecimal Words
Write a program that will read from a dictionary and find 2- to
8-letter words that can be "spelled" in hexadecimal, with the addition
of letter substitutions (O = 0, I or L = 1, S = 5, T = 7)
Optional extras:
Limit the number of "special" letter substitutions in any one result
to keep that result at least somewhat comprehensible. (0x51105010 is
an actual example from my sample solution you may wish to avoid!)
Find phrases of words that total 8 characters in length (e.g.,
0xFee1Face), rather than just individual words.
I decided to roll the maximum-specials into the main function. To
check my test cases, I also did a shell version:
"words of 2-8 letters, up to 8 specials":
$ egrep -i "^[abcdefoilst]{2,8}$" dictionary.txt|wc -l
1463
"words of 8 letters, up to 8 specials":
$ egrep -i "^[abcdefoilst]{8}$" dictionary.txt|wc -l
164
"words of 2-8 letters, no specials":
$ egrep -i "^[abcdef]{2,8}$" dictionary.txt|wc -l
45
"words of 2-8 letters, at most 1 special":
$ egrep -i "^[abcdefoilst]{2,8}$" dictionary.txt|grep -v "[oilst].*[oilst]"|wc -l
244
So that can be done with a three-parameter function: minimum length,
maximum length, maximum specials. In Rust:
fn hexwords(lo: usize, hi: usize, sb: usize) -> Vec<String> {
let mut out: Vec<String> = Vec::new();
let file = File::open("dictionary.txt").unwrap();
let reader = BufReader::new(file);
for lx in reader.lines() {
let line = lx.unwrap();
Filter lines to an appropriate length.
if line.len() >= lo && line.len() <= hi {
let mut valid = true;
let mut sbc = 0;
Check each character: specials increment the count and may cause an
early exist, disallowed characters cause an early exit, but if nothing
caused an exit add the word to the output list.
for c in line.chars() {
if c == 'o' || c == 'i' || c == 'l' || c == 's' || c == 't' {
sbc += 1;
if sbc > sb {
valid = false;
}
} else if c < 'a' || c > 'f' {
valid = false;
}
if !valid {
break;
}
}
if valid {
out.push(line);
}
}
}
out
}
For "phrases of words", I ended up using a cartesian product (cross
product). This takes the output from hexwords
and sorts into lists
by length:
fn combiwords(wl: Vec<String>, l: usize) -> Vec<String> {
let mut wh: HashMap<usize, Vec<String>> = HashMap::new();
for w in wl {
let en = wh.entry(w.len()).or_insert(Vec::new());
(*en).push(w);
}
Then we build a list of possible length decompositions: for example,
if we have words of length 3, 4 and 5, we can build an 8-letter phrase
out of (3,5), (4,4) or (5,3).
let mut tmap: Vec<Vec<usize>> = vec![Vec::new()];
let mut omap: Vec<Vec<usize>> = Vec::new();
while tmap.len() > 0 {
let mut c = tmap.pop().unwrap();
let s = &c.iter().sum::<usize>();
let ls = l - s;
for j in 1..ls {
if wh.contains_key(&j) {
let mut cc = c.clone();
cc.push(j);
tmap.push(cc);
}
}
if wh.contains_key(&ls) {
c.push(ls);
omap.push(c);
}
}
Then, for each length combination, do a cartesian product of each of
the lists that make it up, to produce each possible combination. In
Rust that's in the Itertools
crate; in Raku I can use the X
cross-product operator (repeatedly, because it only takes two
parameters); in Python and Ruby it's product
; and in the other five
languages I wrote my own (the PostScript version of which is now
available in my PostScript
libraries).
let mut out: Vec<String> = Vec::new();
for pat in omap {
for ss in pat.iter().map(|i| &wh[i]).multi_cartesian_product() {
out.push(ss.iter().join(""));
}
}
out
}
Task 2: K-Directory Diff
Given a few (three or more) directories (non-recursively), display a
side-by-side difference of files that are missing from at least one
of the directories. Do not display files that exist in every
directory.
Since the task is non-recursive, if you encounter a subdirectory,
append a /, but otherwise treat it the same as a regular file.
The actual processing is the relatively easy bit; the hard part for me
was reading directories across the various languages I'm using. Lua
needs an external library to do this, so I left it out this time.
In Perl:
Signatures (i.e. named function parameters). They didn't have those
when I were a lad.
sub kdd(@dirlist0) {
my @dirlist = sort @dirlist0;
my %fx;
foreach my $d (@dirlist) {
Modern Perl puts dirhandles in proper variables.
opendir (my $dh,$d);
We don't want dotfiles (I arbitrarily assume), but we do want to
detect subdirectories and note them.
foreach my $entry (grep !/^\./,readdir $dh) {
my $nn = $entry;
if (-d "$d/$entry") {
$nn .= '/';
}
$fx{$nn}{$d} = 1;
}
closedir $dh;
}
%fx
is an inside-out version of the data: a hash of filenames, each
of which contains a hash (set, in languages that support it)
indicating which directories it turns up in.
my $mm=scalar @dirlist;
my @out=(\@dirlist);
For each file, skip it if it's in all the directories.
foreach my $f (sort keys %fx) {
unless (scalar keys %{$fx{$f}} == $mm) {
Otherwise build up an output line: the filename if it's present, a
blank if it's not.
my @l;
foreach my $d (@dirlist) {
if (exists $fx{$f}{$d}) {
push @l,$f;
} else {
push @l,'';
}
}
push @out,\@l;
}
}
return \@out;
}
That gives a data structure, which then gets printed in a fixed-width
format. I already had code to do this in Perl:
sub tabular($d) {
my @columnlength;
foreach my $row (@{$d}) {
foreach my $colno (0..$#{$row}) {
if (!defined($columnlength[$colno]) ||
$columnlength[$colno] < length($row->[$colno])) {
$columnlength[$colno]=length($row->[$colno]);
}
}
}
my $format=join(' | ',map {"%-${_}s"} @columnlength);
my $result='';
foreach my $row (@{$d}) {
$result .= sprintf($format,@{$row})."\n";
}
return $result;
}
I didn't think PostScript could do this at all, but it seems that it
can, in a rather baroque way. (If it weren't baroque, I wouldn't love
it so.) Look up filenameforall
in the Red Book…
Full code on
github.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.