I did Advent of Code again, and had a
really good time again. Spoilers.
Carrying om from where I left off last post, I'll talk here about
each day and my solutions. Should you want to follow along, my code is
at Codeberg.
Part 1 screams "HashSet" to me. (In fact there might have been an even
easier way to do it, just count all the splitters and assume each one
gets used, but I'm glad to say this isn't the case even in the test
data.)
I keep a set of active beam X-coordinates (initialised when I find the
S). For each row, I make a new copy of that set; then each time I find
a splitter that overlaps with an active beam, I increment the splits
count, delete that beam and add two more at (x - 1) and (x + 1). (The
set takes care of deduplication.) There are some potential undefined
behaviours: what happens if a splitter is at the edge of the field,
and what happens if it's next to another splitter? But neither of
these happens in either the test or the liva data. Then at the end of
the row calculations I move the new set back to the old to be ready
for the next row. It's a sort of one-dimensional cellular automaton.
For part 2 I upgrade the set to a map. Start with a value of 1 for the
initial beam; then each split increments the (x - 1) and (x + 1)
values by 1, with a default initial value of zero.
Someone talked about memoisation and if I were trying to trace every
path (e.g. with BFS/DFS) I'd probably have to do something like that;
but there is no meaningful distinction between paths once they arrive
at a given point, so a count is all I need.
I thought this was going to be a spanning tree, but it turned out to
be connected components. Quite a lot of data wrangling to get there,
though.
Step one, get the coordinates. Step 2, generate a list of distances
between each pair.
It's good to see an actual straight line distance in one of these;
usually they use Manhattan distance, which I find rather dull. Of
course, because I'm comparing the distances, I don't need to take the
square root.
Step 3, sort that list and take the top N (10 for the test case, 1000
for the live input). Build a map of edges (ignoring distances now, but
a bidirectional one for greater efficiency at the cost of memory since
I need to look up neighbours from either end). Also build a list of
live nodes; I probably didn't need to do this.
Then throw the lists at connected_components from the pathfinding
crate, segment what I have, and pull off the three highest values.
For part 2, the start is similar, though I explicitly don't want to
keep the list of live nodes because I care about what happens to all
of them. connected_components now moves inside the loop, and I have
to consider the possibility of a node with no neighbours. (I suspect
I'm losing a lot of speed here, but it's fast enough for today's
problem.) Once all nodes are connected, I look at the nodes making up
the latest connection to get my answer.
(Might spanning tree be the answer after all? Perhaps, but I don't
think it would readily give me the latest link.)
First one that's needed serious effort this year. (No complaints.)
Part 1 is more or less just the coordinate parsing, fairly trivial.
Part 2 is quite crunchy. The number of cells to check is large. And
winding numbers don't work reliably because the border is of non-zero
width.
Something similar came up in an Everybody Codes problem this year and
I ended up compressing the coordinates, i.e. paying attention only to
the rows and columns that contain at least one corner, so that for
example the test case
..............
.......#XXX#..
.......X...X..
..#XXXX#...X..
..X........X..
..#XXXXXX#.X..
.........X.X..
.........#X#..
..............
ends up looking like
.#X#
##.X
#X#X
..##
I still need to mark the "inside" spaces, so I ended up first marking
the edges ("X") above), then looking for an easy-to-spot inside space
(first occurrence on a line of a filled space followed by an empty
space followed by a filled space, which might not always occur but if
it does it's definitely inside the shape). Then I flood-fill from
there.
Then it's back to looking at each pair of corners, and evaluating all
spaces contained within that pair to ensure that they are in the
greenred tile set. (Only doing the evaluation if the area will be
larger than one I've already validated, which probably saves a bit.)
Those of you who've followed my Everybody Codes entries this year will
notice that I got the coordinate compression a bit neater, by defining
a separate struct for CoordCompressed so that the type system would
warn me if I tried to use one as the other.
This still isn't as fast as I'd like but it's at least credible.
I had a lot of fun with part 1, rendering everything as BigInt
bitfields and then doing a breadth-first search with XORs. (Clearly,
it's only meaningful to press each button once at most.)
Then it became clear that part 2 was (a) a linear algebra system and
(b) not susceptible to short cuts (e.g. eliminating one variable at a
time), so I had to find a suitable solver (i.e. one documented for
normal people rather than mathematicians who already fully understand
the problem domain and what needs to be tweaked). ndarray-linalg
made no sense to me at all and the example code doesn't even work in
the current version, but microlp got the job done even if it does
have to work in floating point.
I can constrain the variables slightly, though I probably don't need
to; for example, a button that increments field 0 can't be pressed
more than (final value of field 0) times.
Then I run through the goal fields and for each one add an equation
based on the variables that affect it.
Finally fire up the library to do the hard part.
I got a step up on this by knowing about the pathfinding crate, so
that I had an efficient path counter available. From there it was a
matter of massaging the data into an appropriate shape.
And one trick I've used in AoC before is to take short strings and
convert them to base 36 numbers. (Base 26 from A to Z would work if,
as in this case, there are no digits, but 36 is built in to the
language.) An unsigned 16-bit integer will fit three letters, as we
have here, and a 32-bit will fit six. (If I needed to get the values
out again, e.g. for debugging, I'd need to use the radix_fmt crate
because that isn't a built-in feature,) Granted, most languages aren't
as fiddly as Rust when it comes to passing strings about the place,
but I suspect all of them are faster with simple numbers.
Anyway, I build that into my standard directed graph structure, a
HashMap of HashSets, and then pour it into the path counter.
For part 2 I was concerned about possible loops, which will obviously
break a path counter, but I hoped for the best and it worked. Each
overall path can be broken down into three steps: start to first
internode, first internode to second internode, second internode to
end; first internode will be either dac or fft, and second internode
will be the other one.
So for each ordering of the internodes I calculate the number of paths
in each leg, then multiply them together. For example, in this
network:
B J
/ \ / \
A-C-I-K-Z
\ / \ /
D L
there are three paths from A to I, and three paths from I to Z, so
there are nine distinct paths overall. This has the additional
advantage that if there are no paths between a particular pair of
nodes - as in the test data from dac back up to fft - I can skip
any further calculations for that ordering of internodes.
Indeed, inspecting the network diagram (generated with graphviz and
neato), it's clear how the solution falls out: in the test data,
there's only one path from svr to fft, only one from fft to
dac, and then two from dac to out. (The Codeberg repository
includes the quick Perl d2dot to generate the .dot file, and the
testB.svg result from feeding the test data to dot.)
The same pattern shows up in my full data: there are six distinct
clumps of nodes with smaller connecting groups between them. fft is
in the second and dac in the fifth, and there's no way of getting
back from dac to fft.

(This was probably inevitable. A path counter will break if there
are any possible loops, and if each of dac and fft can reach the
other there must be at least one.)
This is clearly a Really Hard problem. I don't just mean for me, I
mean (having read a bit about packing problems) probably NP-hard. I
could obviously parse the shapes and write a search, but I think the
search space would be a Bit Huge. (8 potential configurations of each
shape under flipping and rotation, though symmetry might reduce that;
300-ish total shapes to fit under each tree, 100+ top-left positions
in each tree grid… 8 ^ 300 is rather a lot, even if each shape can be
expressed as a 9-bit number.)
So maybe Eric knows this is a hard problem. I mean, Eric's a smart
guy, right? Let's try winnowing it down a bit. Maybe some
precalculated rectangles?
There's an obvious upper bound on what I can fit into one tree: if
it's X×Y, I can't put more than X×Y # marks into it even I chop them
up and pour them in individually (i.e. they all fit together
perfectly). Let's try that to cut down on the calculation time by
eliminating the clearly impossible. That's basically a parsing
problem, and I reach for regexps here, dispatching based on what
matches: the start of a shape (get the index number, make sure there
are enough values in my area-counting Vec), a line in the body of a
shape (count the #s), a tree (work out maximum area, work out area
used, add to the count if it might fit).
Well the total is a bit less than half the number of trees in my
input. Let's submit it before I start writing the actual packing code.
Turns out it's the answer.
If the intended lesson here is that part of being a good programmer is
recognising when a problem is intractable, I'm not going to argue.
Conclusions
Nothing stretched my Rust, though several days stretched my algorithm
design skills. That seems like a good thing.
Crates used:
- itertools (day 9 for an efficient
MinMax)
- microlp (day 10)
- num (day 10 for BigInt)
- pathfinding (days 8 and 11)
- permutator (day 9 for "every pair")
- range-ext (days 2 and 5)
- regex (days 1 and 12)
Interesting to see basically no actual pathfinding this year, though
there were several grid-related problems. (Also no cellular automata
or Chinese Remainder Theorem.) I think I will have to say, with some
reluctance, that solving linear equation systems may well now be
considered a standard possibility.
In particular there wasn't anything that needed fiddly parsing this
time,
Puzzles sorted by Part 2 difficulty: this percentage is the proportion
of people who have successfully solved part 1, but not part 2, which
is the only public information about completion rates. (Based on a
snapshot at the time of writing, so values may have changed.) As
usual, I omit the last day, because the qualification for part 2 is
"have done all the other days".
4: 4%
8: 5%
5: 11%
2: 13%
7: 14%
3: 15%
6: 15%
11: 22%
1: 23%
9: 46%
10: 49%
I'd agree that 9 and 10 were the hardest (though if I'd had a linear
equations library in mind from the start 10 would have been trivial).
I suspect 11 suffers from path counting being quite hard and not
everyone having code available to do it. 1 was indeed unusually
fiddly for a first day problem.
6 and 7 were the weekend problems, and clearly I'm not the only person
who found them relatively easy.
Thanks as always to Eric Wastl and the AoC team.