2022-12-12

Advent of Code 2022

For a fun challenge, and hopefully to prompt a slightly better cadence of writing, I am doing this year's Advent of Code in Power BI. If you are not familiar with Advent of Code, it is an annual series of programming challenges, one per day of December leading up to Christmas. It is a great opportunity to practice a programming language with small, self-contained problems of increasing difficulty. Power BI is probably not a great fit for some of the algorithmic challenges that are likely to appear in later days, but I want to take it as far as I can. I will be updating this post in place as I go, with brief notes on each day's challenge. All code can be found in the companion pbix (week 2), with comments to explain my approach. I will not give full write-ups of each solution in this post, but encourage you to follow along in that PBIX as I update it, or better yet, by trying out the challenges on your own!

A note on PBIXes: I had originally intended to put all solutions into a single PBIX, but due to a memory leak in Power Query's editor things started getting very crashy after day 7. Now I will have multiple PBIXes, but hopefully only one per week.

Day 11 - Monkey in the middle

Today's challenge and the companion pbix. Today's challenge is more parsing of state and instructions from text. It is more involved, because each chunk of lines defines a monkey with several behaviors, each behavior captured in a closure (yay functional languages!) and all metadata in a monkey record. The closures each represent a partial state change in a game of monkey in the middle -- each monkey takes actions based on its behavior functions, which alter the state. The state accumulation is clunkier than prior days, but the crux of today's challenge is in the second half, wherein we have to derive an insight not provided in the problem description. One of the monkey functions is an operation which will increase a number representing our worry about an item; this can grow very rapidly thanks to one squaring operation. The numbers overflow the range of 64-bit integers and double-precision floats. Thus, we must identify that we can find a divisor that we can use for modulo arithmetic at each step to bound the numbers; this divisor is the common multiple of all divisors used in predicate tests as part of the step function.

There was also an issue specific to PQ/M that became a challenge today. The pattern of using List.Accumulate hit its performance barrier when we had to iterate 10,000 steps of the game. After a few attempted optimizations (these failures are in the code of the linked PBIX), I had to refactor completely. The refactoring was welcome, as I was not happy with the ergonomics of my first state representation. For the first half, I stuck close to the representation provided by the input data, which was good for parsing, but poor for representing state transitions in the accumulation pattern I have been using thus far in the challenges. I refactored to a more efficient representation that made the state transitions much cleaner and more straightforward and used a List.Generate pattern to compute the states necessary to solve the challenge.

Today's code is good to examine to show two quite different approaches and data structures for the same data and computation.

Day 10 - Cathode ray tube

Today's challenge and the companion pbix. Once again, we accumulate a list of states and use these to solve a challenge! Today, the instructions are incrementing and decrementing the value in a register of the CPU of our communication device from a few days ago. We need to deal with instructions taking variable amounts of time and we need to handle state transitions slightly differently. Each step is a cycle, and the value of our state does not change until the end of the cycle, so we need to keep track of start and end state and mostly use the start values.

The first challenge is a trivial measure summing values from specified cycles.

The second challenge tries to throw a wrench into things by talking about two different index bases (0 and 1). We can dispatch with the confusion by throwing away everything but the X values and reindexing. Now, each X value represents the position of a sprite to be drawn by the CRT display. If the X value in the register is within 1 of the X position of the ray on the current line it is drawing, then we draw a "#" for our pixel, otherwise a ".". This ends up being a bit of index arithmetic and we get pixels at each location, which we can then read out of a matrix visual.

Day 9 - Rope bridge

Today's challenge and the companion pbix. Today's challenge is another two-dimensional challenge, which I realized only after the fact ends up being a game of snake. It also differs from prior challenges in that it does not give us a representation of state to start from, leaving us to create our own. As with most two-dimensional challenges, we can imagine state as coordinates on an (X, Y) plane. Our input consists of a list of movement instructions around the plane, with directions up (U), down (D), left (L) and right (R) and varying distances represented by a number, indicating the number of units to move in the given direction. It is helpful to expand and transform this list to be a series of unit movements along either the X or Y axis. With this transformation, each step is represented by an axis and a 1 or -1. This simplification is necessary because the movement instructions only apply directly to one point (knot), and the other knot(s) follow the first according to a king's move rule. The king's move represents the set of valid moves a king could make on an imaginary chess board -- one step in any of eight directions (vertical, horizontal, and diagonal), along with no move. The king's move constraint means that the trailing knot must remain within a king's move of the head knot.

For the first challenge, we have a head knot and a tail knot moving according to the instructions and following rule. It is incredibly helpful to break apart the movements into a series of helper functions to move the head and then have the tail follow. With these helpers, our states can be computed with a List.Accumulate pattern similar to prior days'. One function that gets a few uses today is List.TransformMany, which is M's version of flatmap or mapcat which you would likely find in other functional languages. The syntax of M's is slightly strange, taking a mapping function and a subsequent combining function. This leads to a lovely representation of the idea of a cartesian product: List.TransformMany(l1, each l2, (x, y) => [X=x, Y=y]). We use exactly this construction to generate the list of valid touching coordinates for the king's move rule. With the states computed, we can load the data to the model for our distinct count. In this case, we need distinct coordinates, which gives a good example of a DAX distinct count over the combination of multiple columns.

The second challenge expands our number of knots from two to ten. Now, each knot acts as the tail to the knot before it. We can use our helper functions unchanged, but need to modify our representation of state. We could simply track Knot0, Knot1, ... Knot9 explicitly by doing a lot of copy-paste from the first challenge. Rather than do this, I chose to represent the knots as a list of coordinates. Our step function for each state changes to be its own List.Accumulate across this list of coordinates, a quite small change. Our model representation also changes from having a field for each knot's coordinates to have a knot index and a single pair of coordinate columns. The measure's definition is the same, and we can now see the number of positions for each knot.

Day 8 - Tree house

Today's challenge and the companion pbix. Today, we have a two-dimensional array problem. M has no concept of a multi-dimensional array, though we could easily represent it as a list of lists. DAX has no concept of this either. Rather than stick with M, I decided simply to flatten everything to a table and do all my work in DAX. Flatttening an N-dimensional array is as simple as creating an index column per dimension and storing a row with each of these indices and the value of the cell at the indices' intersection.

The first challenge is to calculate whether a cell (tree) is the maximum in one of the four cardinal directions. This maps to the idea of whether the tree is visible from the edge of the forest. We can test for visibility by checking whether the a tree has the greatest height starting from each edge. If a tree is visible from any direction, it is counted as visible. This requires four tests, each covered by a single (very similar) measure. Then we can simply iterate the entire table with FILTER and count the rows where any of the visibility test measures returned true. By filtering the table, we minimize iteration and avoid double-counting.

The second question asks for a score based on how far one can see from the top of a given tree. This turns into a question of which index in each direction represents a tree that is as tall or taller than the basis tree. Again, we create a measure per direction. In this case, the measures differ a little bit more, with MAX required when looking up or left, to find the closest index in that direction, and MIN for the complementary directions. It is fiddly work and easy to swap things the wrong way, so I went through a few more edit and test cycles than I am happy to admit before getting these to work correctly. Then, we must take the product of the visibility in each direction and find the tree with the maximum product. MAXX dispatches this aggregation and handles both the cell-level and totals.

Both of these offer good examples of defining components in terms of a single row or single dimensional value and aggregating in a way that works for totals. It would be easy to write a query that only computes the aggregated total, or to write a measure that only works for the detail level. Instead, with a bit of care, we can dispatch both with one measure.

Note: I have started grouping weeks of queries in Power Query, so that all prior days' challenges are in the Week 1 group, and today's is in the Week 2 Group. Additionally, I have stopped using 'Enter Data' for the inputs, because this seems to exacerbate the memory leak I found yesterday. Instead, I have used a text editor to enter each line of input as an item in a list literal. I will do similar for future challenges, and if the OOM crashes get to be too annoying, I may make a new PBIX with changes to prior solutions' raw inputs as well. My goal in using 'Enter Data' was to match as closely as possible (and with as little effort on my part) what the data would look like if loaded from a file or similar source. I will not do any massaging or data prep when adding raw input queries, so that all processing is in M and DAX code.

Day 7 - Filesystems

Today's challenge and the companion pbix. Today's input represents a session at the command line, navigating and listing directories on the filesystem. Each line of the input represents a command (one of cd or ls) or the output of an ls command. The output of ls is either a directory or a file (with its size). We must parse this input to create a representation of the filesystem to analyze file and directory sizes. One thing that keeps things easier for us is that we do not need to actually traverse a filesystem, but just deal with a list of files and directories and keep some state about where we are.

List.Accumulate allows us to track the current partial state of the filesystem based on lines we have seen. The accumulation yields a table of files and their directory paths. We can then create an unwound parent-child hierarchy (I will eventually write more about this model pattern, but you can see a demo of the idea in this video (at time 2461 seconds, or timestamp 41:01)) of the directory hierarchy to allow easy interaction with the data in DAX. We join the unwound hierarchy with the file data so that we can have a single table for our analysis. In a larger model, it may make sense to dimensionalize this more.

The first challenge requires us to find all directories whose recursive size (the size of all files in the directory and its subdirectories, recursively (note we do not need to do any literal recursion in our code)) is below a threshold, which is easily accomplished with a SUM measure and a visual-level filter. The second challenge requires us to find the smallest directory which will free enough space for us. The latter takes a few steps of mostly arithmetic and filtering, which again yields the answer easily. The reason our analysis is so straightforward here is entirely due to the data model, in this case the unwound parent-child hierarchy. It is always worth spending time to model your data well.

Side note: This challenge required a lot more infrastructure of functions to parse and structure the input data than prior days' challenges. When I am working on code like this, I am typically bouncing into and out of the advanced editor and writing most of the code in there. In so doing today, I identified a massive memory leak leading to several out-of-memory crashes of my PBI process. This bug is reported on the Power BI Community Issues forum.

Day 6 - Tuning trouble

Today's challenge and the companion pbix. Today's input is a single string. The goal is to find a substring with no repeated characters -- 4-long for the first half challenge and 14-long for the second half.

The approach is pretty naive and depends on one or both of the following (though the Power Query runtime does not make it easy to validate laziness):

lazy list semantics to avoid processing all the data
the string is small, so I do not care about efficiency in the processing anyway.

First, we split the string into a list of one-character strings to process, because we need to do a distinct operation, which only exists in the List functions. We look at each index in the list and take a sub-list of appropriate length, testing it for distinctness. That is all.

Day 5 - Supply stacks

Today's challenge and the companion pbix. Today's input is in two parts, the first an ASCII art diagram of some stacked crates, and the second a list of movements of some number of crates from one stack to another. The challenges revolve around representing the state of the stack after all movements have been made. Each movement can comprise many crates being moved. The first challenge assumes all crates are moved individually, such that their order in the stack is reversed after all are moved. The second challenge assumes all crates are moved together, ensuring that their order remains the same after movement.

The representation of the stacks is easiest in M as a list of crates. I chose to represent the top of the stack as the first element in the list, because our outputs and our movements all operate from the top of the stack. We stay in the domain of lists and records, because M has a dearth of functions which can iterate a table, but plenty of options for lists. Aside: a list of homogenous records is nearly isomorphic to a table in M, and round-tripping between these two representations can be achieved with Table.ToRecords and Table.FromRecords; one should prefer these two functions and the list-of-records representation for much structured data.

To parse the stack diagram, I started at the stack numbers (the last line of the diagram) and used these to find the indices that would hold crates for that stack, such indices turn visual alignment into a code-friendly representation. With the stack diagram parsed, I then need a function that can take a state of the stacks (the very thing I parsed) and a single movement and give me the new state of the stacks. With the initial state and a step function defined, List.Accumulate does the rest of the work of stepping through every movement and threading the state of the stack through each of these. Some table munging gets me to the DAX-friendly representation that includes indices for each state and the location of each crate for each stack at that state. With a table of every state-stack-crate-position tuple, we can find the top item of each stack by filtering to CrateIndex=0 and selecting the appropriate StateIndex.

A side note: M's laziness bit me on the second half of today's challenge. In the first half, we need to reverse the order of the crates moved -- trivial with List.Reverse. To achieve the correct ordering for the second half, I merely removed that function. The performance of loading the second half table was abysmal, dropping to less than one row per second after about 7,000 rows loaded. It turns out that List.Reverse fully realizes its result, which meant the intermediate stacks were eagerly evaluated. To regain performance for the second half, I replaced List.Reverse with List.Buffer, whose only purpose is to fully realize its argument. With this modification, the two halves' solutions perform equivalently.

Day 4 - Camp cleanup

Today's challenge and the companion pbix. Today, we receive lines of delimited text representing ranges, e.g., 1-5,4-7. The comma separates two elves in a pair. The dashes separate a lower and upper bound of section IDs that the respective elves are assigned to clean. The challenges both revolve around identifying overlaps between the two elves in a pair. The first challenge looks for total overlap -- one range is completely contained within the other. The second looks for partial overlap.

We could use set logic again, similarly to yesterday's challenge. Such an approach would be entirely valid, encoding the set of IDs as a list of IDs per elf, and in DAX one row per elf-ID pair. This would be a dense representation. By working only with the boundaries of the range, we have a sparse representation, which is more efficient in storage; depending on the size of the sets, it is also much more efficient in comparisons. I stuck with the sparse representation. Thus, the challenges are just a matter of arranging some inequality tests. Again, I persisted much of the work into fields in the table. The results were simple row counts.

Probably the most interesting part of today's is the risk of double-counting depending on how you track the overlaps. E.g., there are some pairs that have identical ranges of IDs; if you sum the count of pairs where elf one's range contains elf two's and the count of pairs where elf two's range contain's elf one's, then you would double count each of these pairs.

Today's challenge is also one where I feel the constraints of the capabilities of M and especially DAX. One could imagine creating a set type that encapsulates all of the various inequalities necessary to compute the set functions for the sparse representation. Such a thing could be done in M, though it would be clunky without a module or class system. Such a thing is impossible in DAX -- one must always perform the inequality tests directly on the boundaries raw, as it were.

Day 3 - Rucksacks

Today's challenge and the companion pbix. Today's data represent packed rucksacks, which made me worry we were going to jump straight into some dynamic programming and the knapsack problem, which in turn got me worried for PQ/M's suitability (and would rule DAX right out). Luckily, we are working on a much simpler rucksack problem, no knapsacks in sight! Each line of data is a string (case sensitive), with each letter representing an item in the rucksack. The first half of the characters are in the first compartment and the latter half in the second compartment. Each letter is assigned a priority value, and the challenges mostly focused on set operations.

The first half needed us to find intersections between the two compartments in a single rucksack. Some brief ETL to split the string out to one row per item was easy. Out of caution, I included index columns to represent the original position of each rucksack (0-indexed, which was later validated) and also of each item within the rucksack. In DAX, I could then define a measure that works for one rucksack to find the intersection value. With such a per-key measure, we can get a total by wrapping it in an iterator, in this case SUMX, to add up each per-key value.

The second half required that we group across multiple rucksacks and find a three-way intersection. Each rucksack is assigned to a group based on its position in the raw data, with the first three lines in the first group, the next three in the second group, and so on. With a 0-based index, truncating division gives a number identifying the group any given index belongs to. Luckily, I used 0-indexing for my rucksack numbers. With the group assignment out of the way, I then needed to compute a three-way intersection of the values in each member's rucksack. I could have done something similar in DAX after deriving the group number, but opted to calculate this dimension attribute in PQ/M. M includes set operation functions in its List. family, and we want List.Intersect. To use List.Intersect, we need a list of lists. This lets us use a cute little double grouping operation. The first Table.Group groups by {"ElfGroup", "RuckNumber"}, yielding one row per rucksack and a column which contains a list of item values that exist in that rucksack. When we Table.Group a second time, we group only by {"ElfGroup"} and our grouping function is List.Intersect, yielding a table with one row per group and the value that existed in all rucksacks in that group.

Note that the only place we care about the size of the group in the second half is when we compute the group numbers. Everything downstream of that takes advantage of variadic M functions and the structure of the data. This is something that is useful to consider when building something -- what assumptions of N have you made? Usually the only numbers worth considering (when speaking of cardinality) are 0, 1, and many.

Day 2 - Rock paper scissors

Today's challenge is here. And the companion pbix. As per usual, the input would normally be a text file input, but I paste to an 'Enter data' query. Each line contains a 3-long string, with the first character drawn from 'A', 'B', and 'C', followed by a space, and the third character drawn from 'X', 'Y', and 'Z'. The two halves of today's challenge offer two different interpretations of how these map to RPS plays. In each half, the first letter maps to my opponent's play, with 'A', 'B', and 'C' representing 'Rock', 'Paper', and 'Scissors', respectively. In the first half 'X', 'Y', and 'Z' map to the same for me. In the second half 'X', 'Y', and 'Z' map to my loss, draw, or win. Using these rules and a scoring rubric based on my play and the result, we have to calculate the total score I would achieve.

My first instinct was to create a lookup table for myself of what the input strings mean and the score for each. This was easy enough to write by hand with only nine possible inputs. This ended up serving me pretty well, because the second half scores could be derived by a self-join against this same lookup table. I already had ([OpponentPlay], [MyPlay], [Result], [Score]). The reinterpretation for the second half meant I could calculate a new [Result] and then join on ([OpponentPlay], [Result]) to get my new play and score for the second half. A good lookup or reference table can take you very far. The results were both simple sums in cards.

Day 1 - Elf calories

Today's challenge can be found here. The crux of this challenge is parsing an input file of data where each row represents the amount of calories in a given food item. Food items belong to unnamed elves, with each elf's items located contiguously in the data. Elves' items are separated by an empty line. To keep the companion pbix portable, I have pasted the data into an 'Enter data' query and the empty lines come through as nulls.

The interesting bit is to group data in a table not by some data element, but by position. Table.PositionOf is the critical function here, which tells us exactly what row index each null value is located. The rest is pretty trivial list manipulation after that. The answers of max calories for an elf and the sum of the calories for the top three are dispatched with some very simple visuals. This provides a good example of a short, throwaway analysis.

Conclusion

Not much to say! As always, feel free to get in touch