I’ve been doing the Weekly
Challenges. The
latest
involved more binary manipulation and a traditional problem.
(Note that this is open until 18 July 2021.)
TASK #1 › Invert Bit
You are given integers 0 <= $m
<= 255 and 1 <= $n
<= 8.
Write a script to invert $n
bit from the end of the binary representation of $m
and print the decimal representation of the new binary number.
As with the last few part 1s, this is basically a one-liner with bit
operations, shifting a single bit to the right place and then XOR-ing
it:
return $m ^ (1 << ($n-1));
or in PostScript
/ib {
exch
1 exch 1 sub bitshift
xor
} def
TASK #2 › The Travelling Salesman
You are given a NxN matrix containing the distances between N cities.
Write a script to find a round trip of minimum length visiting all N
cities exactly once and returning to the start.
BONUS 1: For a given number N, create a random NxN distance matrix
and find a solution for this matrix.
BONUS 2: Find a solution for a random matrix of size 15x15 or 20x20
I suppose that, in the way that every action-adventure television
series eventually does The Most Dangerous Game, every programming
challenge must come round to the TSP.
Before I jump in, though, I'll point out that when I actually have a
problem of this sort to solve, I use someone else's hyper-optimised
code running on someone else's machine: last year, Concorde at
NEOS gave
me an answer for a 333-node problem in a very few seconds.
(I'll refer here to the "cost" of a path rather than its "length",
because not every question that comes down to a TSP is necessarily
defined in consistently linear space: for example a road route-finding
problem, such as planning a parcel delivery route, might care more
about the time taken to travel between each node pair, which would
rely on road lengths and road types, not to mention one-way streets.)
But anyway. At this scale we can certainly obtain a definitive answer
rather than an approximation, so I implemented
Held-Karp, which
is O(n²×2^n) requiring O(2^n×n) space, and can cope with the
asymmetrical version as given in the example (cost from A to B not
necessarily equalling cost from B to A). In fact, because I was lazy,
I borrowed an existing implementation in
Python and translated it
into Perl.
sub tsp {
my ($d)=shift;
my $n=scalar @{$d};
my $n1=$n-1;
The nested hash %c
is the main data structure: the keys are the
nodes visited on a particular path and the terminating node of that
path, and the values are a tuple of the cost of that path and the
previous node on it. (I could have used an array, and this might have
been faster, but Perl hashes are fairly ferociously optimised.)
my %c;
foreach my $k (1..$n1) {
$c{1<<$k}{$k}=[$d->[0][$k],0];
}
Then for each possible path length…
foreach my $ss (2..$n1) {
And each possible permutation of (non-origin) nodes of that length…
my $p=Algorithm::Permute->new([1..$n1],$ss);
while (my @s = $p->next) {
Make a bitmask of the nodes on this path. (The bit length of this
variable is the maximum number of nodes we can handle. In practice I'm
not going to be using this for more than about 10-11 nodes anyway; see
timings below.)
my $bits=0;
foreach my $bit (@s) {
$bits |= 1 << $bit;
}
Then look at each possible pair of ending nodes ($k
is the one at
the end, $m
the one before that), and work out the total cost of a
path ending there: the total cost of a path (using only the relevant
nodes) that leads to $m
, plus the cost of the last hop.
foreach my $k (@s) {
my $prev=$bits & ~(1<<$k);
my @res;
foreach my $m (@s) {
if ($m==0 || $m==$k) {
next;
}
push @res,[$c{$prev}{$m}[0]+$d->[$m][$k],$m];
}
Store the cheapest path in %c
. (A slightly twiddly approach, because
the min()
in List::Util
expects scalars. But if I have equal
lowest cost paths I don't care which I get.)
my @r=map {$_->[0]} @res;
my %r=map {$r[$_] => $_} (0..$#r);
$c{$bits}{$k}=$res[$r{min(@r)}];
}
}
}
Then we bitmask to include every node, and for each possible ending
node, look at the cheapest path that starts at node 0 and ends here –
then add the cost of getting back to node 0. Then pick the cheapest of
all of those.
my $bits=(1<<$n)-1 & ~1;
my @res;
foreach my $k (1..$n1) {
push @res,[$c{$bits}{$k}[0]+$d->[$k][0],$k];
}
my @r=map {$_->[0]} @res;
my %r=map {$r[$_] => $_} (0..$#r);
my ($opt,$parent)=@{$res[$r{min(@r)}]};
Now we have $opt
, the total cost, and $parent
, the node before the
last one. Walk that path, removing nodes from the bitmask to avoid repetition.
my @path;
foreach my $i (0..$n1-1) {
push @path,$parent;
my $nb=$bits & ~(1 << $parent);
$parent=$c{$bits}{$parent}[1];
$bits=$nb;
}
Then top and tail with node 0 as required in the problem statement,
and reverse it since we built it from end to start.
push @path,0;
@path=reverse @path;
push @path,0;
return [$opt,\@path];
}
I didn't reimplement it in most of the other languages, but I did want
to see how much faster Rust would be, so I wrote that too. Oh, here's
a solution to Bonus 1:
fn genmatrix(n: usize) -> Vec<Vec<usize>> {
let mut m=vec![vec![0;n];n];
for x in 0..n {
for y in 0..n {
if x != y {
m[x][y]=thread_rng().gen::<u16>() as usize;
}
}
}
return m;
}
In Rust I can use a tuple (of path-bitmask and terminal node) as the
key in a hash, so I don't have to nest them. (Though I suspect an
array would have been faster, again.)
fn tsp (d: Vec<Vec<usize>>) -> (usize, Vec<usize>) {
let n=d.len();
let mut c: HashMap<(usize,usize),(usize,usize)>=HashMap::new();
for k in 1..n {
c.insert((1<<k,k),(d[0][k],0));
}
The permutation
crate wants me to extract n-combinations out of m,
then permute them as a separate step, and my debugging procedure for
Rust starts to look a bit like sprinkling *
and &
as the compiler
errors suggest. Really should get my head round these properly some
time.
for ss in 2..n {
let sb=(1..n).collect::<Vec<usize>>();
sb.combination(ss).for_each(|mut sbb| {
sbb.permutation().for_each(|s| {
I'm using usize
, which on a 64-bit Linux system is 64 bits, for
orthogonality and convenience. But making this u128 would probably
make sense. (Not sure why I used +
rather than |
for that matter.)
let mut bits: usize=0;
s.iter().for_each(|bit| {bits += 1<<**bit});
for k in &s {
let prev=bits & !(1<<**k);
let mut res: Vec<(usize,usize)>=vec![];
for m in &s {
if **m==0 || **m==**k {
continue;
}
res.push((c.get(&(prev,**m)).unwrap().0+d[**m][**k],**m));
}
min()
on a tuple returns the entry with the lowest first element,
which is what I want.
c.insert((bits,**k),*res.iter().min().unwrap());
}
});
});
}
let mut bitmask=(1<<n)-1 & !1;
All right, this line is a bit self-indulgent but I do like the
functional approach.
let opp=(1..n).collect::<Vec<usize>>().iter().map(|k| (c.get(&(bitmask,*k)).unwrap().0+d[*k][0],*k)).min().unwrap();
Breaking that down:
(1..n)
→ the numbers from 1 to n-1 (a range)
.collect::<Vec<usize>>()
→ collected into a vector
.iter()
→ and then made into an iterator
.map(|k| (c.get(&(bitmask,*k)).unwrap().0+d[*k][0],*k))
→ OK, this
one's a bit more fiddly. For each of those values (in k), retrieve
c(bitmask,k), the first element of the tuple (the cost of the path),
and add to it the cost of getting back to the start. Coming out of
the map we have an iterator of tuples of that with the index values.
.min().unwrap()
→ as above, take just the value that has the
smallest first element, dying if there wasn't one
From there it's basically the algorithm as before.
I was interested in relative performance, so I ran both versions on
the same machine at various sizes with random input matrices. O
is
the relative time one might expect if the algorithm were entirely
CPU-bound; clearly the actual times are not going up in conformity
with that, and given how much the system is slowing down with larger
sizes (it should only be a factor of a thousand or so from 4 to 11, as
against five million for Perl and two million for Rust) I suspect
memory management overhead.
Size |
O |
Perl seconds |
Rust seconds |
4 |
256 |
0.00018 |
0.000021397 |
5 |
800 |
0.000942 |
0.000050742 |
6 |
2304 |
0.007222 |
0.00034124 |
7 |
6272 |
0.059324 |
0.003553189 |
8 |
16384 |
0.575887 |
0.035039013 |
9 |
41472 |
6.14614 |
0.367724567 |
10 |
102400 |
71.689169 |
3.764356765 |
11 |
247808 |
918.541681 |
47.221561934 |
There is also of course the possibility of parallelising the inner
loops and collecting their outputs into res
.
Full code on
github.