RogerBW's Blog: Perl Weekly Challenge 121: Travelling Bit

I’ve been doing the Weekly Challenges. The latest involved more binary manipulation and a traditional problem. (Note that this is open until 18 July 2021.)

TASK #1 › Invert Bit

You are given integers 0 <= $m <= 255 and 1 <= $n <= 8.

Write a script to invert $n bit from the end of the binary representation of $m and print the decimal representation of the new binary number.

As with the last few part 1s, this is basically a one-liner with bit operations, shifting a single bit to the right place and then XOR-ing it:

  return $m ^ (1 << ($n-1));

or in PostScript

/ib {
    exch
    1 exch 1 sub bitshift
    xor
} def

TASK #2 › The Travelling Salesman

You are given a NxN matrix containing the distances between N cities.

Write a script to find a round trip of minimum length visiting all N cities exactly once and returning to the start.

BONUS 1: For a given number N, create a random NxN distance matrix and find a solution for this matrix.

BONUS 2: Find a solution for a random matrix of size 15x15 or 20x20

I suppose that, in the way that every action-adventure television series eventually does The Most Dangerous Game, every programming challenge must come round to the TSP.

Before I jump in, though, I'll point out that when I actually have a problem of this sort to solve, I use someone else's hyper-optimised code running on someone else's machine: last year, Concorde at NEOS gave me an answer for a 333-node problem in a very few seconds.

(I'll refer here to the "cost" of a path rather than its "length", because not every question that comes down to a TSP is necessarily defined in consistently linear space: for example a road route-finding problem, such as planning a parcel delivery route, might care more about the time taken to travel between each node pair, which would rely on road lengths and road types, not to mention one-way streets.)

But anyway. At this scale we can certainly obtain a definitive answer rather than an approximation, so I implemented Held-Karp, which is O(n²×2^n) requiring O(2^n×n) space, and can cope with the asymmetrical version as given in the example (cost from A to B not necessarily equalling cost from B to A). In fact, because I was lazy, I borrowed an existing implementation in Python and translated it into Perl.

sub tsp {
  my ($d)=shift;
  my $n=scalar @{$d};
  my $n1=$n-1;

The nested hash %c is the main data structure: the keys are the nodes visited on a particular path and the terminating node of that path, and the values are a tuple of the cost of that path and the previous node on it. (I could have used an array, and this might have been faster, but Perl hashes are fairly ferociously optimised.)

  my %c;
  foreach my $k (1..$n1) {
    $c{1<<$k}{$k}=[$d->[0][$k],0];
  }

Then for each possible path length…

  foreach my $ss (2..$n1) {

And each possible permutation of (non-origin) nodes of that length…

    my $p=Algorithm::Permute->new([1..$n1],$ss);
    while (my @s = $p->next) {

Make a bitmask of the nodes on this path. (The bit length of this variable is the maximum number of nodes we can handle. In practice I'm not going to be using this for more than about 10-11 nodes anyway; see timings below.)

      my $bits=0;
      foreach my $bit (@s) {
        $bits |= 1 << $bit;
      }

Then look at each possible pair of ending nodes ($k is the one at the end, $m the one before that), and work out the total cost of a path ending there: the total cost of a path (using only the relevant nodes) that leads to $m, plus the cost of the last hop.

      foreach my $k (@s) {
        my $prev=$bits & ~(1<<$k);
        my @res;
        foreach my $m (@s) {
          if ($m==0 || $m==$k) {
            next;
          }
          push @res,[$c{$prev}{$m}[0]+$d->[$m][$k],$m];
        }

Store the cheapest path in %c. (A slightly twiddly approach, because the min() in List::Util expects scalars. But if I have equal lowest cost paths I don't care which I get.)

        my @r=map {$_->[0]} @res;
        my %r=map {$r[$_] => $_} (0..$#r);
        $c{$bits}{$k}=$res[$r{min(@r)}];
      }
    }
  }

Then we bitmask to include every node, and for each possible ending node, look at the cheapest path that starts at node 0 and ends here – then add the cost of getting back to node 0. Then pick the cheapest of all of those.

  my $bits=(1<<$n)-1 & ~1;
  my @res;
  foreach my $k (1..$n1) {
    push @res,[$c{$bits}{$k}[0]+$d->[$k][0],$k];
  }
  my @r=map {$_->[0]} @res;
  my %r=map {$r[$_] => $_} (0..$#r);
  my ($opt,$parent)=@{$res[$r{min(@r)}]};

Now we have $opt, the total cost, and $parent, the node before the last one. Walk that path, removing nodes from the bitmask to avoid repetition.

  my @path;
  foreach my $i (0..$n1-1) {
    push @path,$parent;
    my $nb=$bits & ~(1 << $parent);
    $parent=$c{$bits}{$parent}[1];
    $bits=$nb;
  }

Then top and tail with node 0 as required in the problem statement, and reverse it since we built it from end to start.

  push @path,0;
  @path=reverse @path;
  push @path,0;
  return [$opt,\@path];
}

I didn't reimplement it in most of the other languages, but I did want to see how much faster Rust would be, so I wrote that too. Oh, here's a solution to Bonus 1:

fn genmatrix(n: usize) -> Vec<Vec<usize>> {
    let mut m=vec![vec![0;n];n];
    for x in 0..n {
        for y in 0..n {
            if x != y {
                m[x][y]=thread_rng().gen::<u16>() as usize;
            }
        }
    }
    return m;
}

In Rust I can use a tuple (of path-bitmask and terminal node) as the key in a hash, so I don't have to nest them. (Though I suspect an array would have been faster, again.)

fn tsp (d: Vec<Vec<usize>>) -> (usize, Vec<usize>) {
    let n=d.len();
    let mut c: HashMap<(usize,usize),(usize,usize)>=HashMap::new();
    for k in 1..n {
        c.insert((1<<k,k),(d[0][k],0));
    }

The permutation crate wants me to extract n-combinations out of m, then permute them as a separate step, and my debugging procedure for Rust starts to look a bit like sprinkling * and & as the compiler errors suggest. Really should get my head round these properly some time.

    for ss in 2..n {
        let sb=(1..n).collect::<Vec<usize>>();
        sb.combination(ss).for_each(|mut sbb| {
            sbb.permutation().for_each(|s| {

I'm using usize, which on a 64-bit Linux system is 64 bits, for orthogonality and convenience. But making this u128 would probably make sense. (Not sure why I used + rather than | for that matter.)

                let mut bits: usize=0;
                s.iter().for_each(|bit| {bits += 1<<**bit});
                for k in &s {
                    let prev=bits & !(1<<**k);
                    let mut res: Vec<(usize,usize)>=vec![];
                    for m in &s {
                        if **m==0 || **m==**k {
                            continue;
                        }
                        res.push((c.get(&(prev,**m)).unwrap().0+d[**m][**k],**m));
                    }

min() on a tuple returns the entry with the lowest first element, which is what I want.

                    c.insert((bits,**k),*res.iter().min().unwrap());
                }
            });
        });
    }
    let mut bitmask=(1<<n)-1 & !1;

All right, this line is a bit self-indulgent but I do like the functional approach.

    let opp=(1..n).collect::<Vec<usize>>().iter().map(|k| (c.get(&(bitmask,*k)).unwrap().0+d[*k][0],*k)).min().unwrap();

Breaking that down:

(1..n) → the numbers from 1 to n-1 (a range)
.collect::<Vec<usize>>() → collected into a vector
.iter() → and then made into an iterator
.map(|k| (c.get(&(bitmask,*k)).unwrap().0+d[*k][0],*k)) → OK, this one's a bit more fiddly. For each of those values (in k), retrieve c(bitmask,k), the first element of the tuple (the cost of the path), and add to it the cost of getting back to the start. Coming out of the map we have an iterator of tuples of that with the index values.
.min().unwrap() → as above, take just the value that has the smallest first element, dying if there wasn't one

From there it's basically the algorithm as before.

I was interested in relative performance, so I ran both versions on the same machine at various sizes with random input matrices. O is the relative time one might expect if the algorithm were entirely CPU-bound; clearly the actual times are not going up in conformity with that, and given how much the system is slowing down with larger sizes (it should only be a factor of a thousand or so from 4 to 11, as against five million for Perl and two million for Rust) I suspect memory management overhead.

Size	O	Perl seconds	Rust seconds
4	256	0.00018	0.000021397
5	800	0.000942	0.000050742
6	2304	0.007222	0.00034124
7	6272	0.059324	0.003553189
8	16384	0.575887	0.035039013
9	41472	6.14614	0.367724567
10	102400	71.689169	3.764356765
11	247808	918.541681	47.221561934

There is also of course the possibility of parallelising the inner loops and collecting their outputs into res.

Full code on github.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.