Source: Day Day 5: Print Queue
Full solution for today (spoilers!).
Part 1
The input is a list of pairs of the form
a|b
which defines thatb
must not come beforea
, an empty line, and then a list of valuesa,b,c,d
.For each line that is valid for all given
a|b
rules, sum the middle number of each list.
We’ll go ahead and start with a hash of a -> Set(b)
:
#[derive(Debug, Default, Clone)]
pub struct Ordering {
data: hashbrown::HashMap<u32, hashbrown::HashSet<u32>>,
}
impl Ordering {
pub fn new() -> Self {
Self {
data: hashbrown::HashMap::new(),
}
}
pub fn insert(&mut self, a: u32, b: u32) {
self.data.entry(a).or_default().insert(b);
}
}
That lets us define can_precede
:
impl Ordering {
pub fn can_precede(&self, a: u32, b: u32) -> bool {
!self.data.contains_key(&b) || !self.data[&b].contains(&a)
}
}
If b
isn’t in any pair, than a
is always fine. Otherwise, a
can never appear in any list b
.
This then tells us if a given Vec
is sorted by this Ordering
:
impl Ordering {
pub fn validates(&self, list: &[u32]) -> bool {
list.iter().is_sorted_by(|&a, &b| self.can_precede(*a, *b))
}
}
Which in turn basically solves the problem for us:
#[aoc(day5, part1, v1)]
fn part1_v1((ordering, data): &(Ordering, Vec<Vec<u32>>)) -> u32 {
data.iter()
.filter(|list| ordering.validates(list))
.map(|list| list[list.len() / 2])
.sum()
}
It’s fun when most of the work is setting up abstractions.
$ cargo aoc --day 5 --part 1
AOC 2024
Day 5 - Part 1 - v1 : 4924
generator: 119.083µs,
runner: 19.25µs
Quick enough!
Parsing
Not super relevant to the problem (especially since it’s a fairly tame format so far as puzzle input goes), but here’s my nom
parsing function:
#[aoc_generator(day5)]
pub fn parse(input: &str) -> (Ordering, Vec<Vec<u32>>) {
use nom::{
character::complete::{self, newline},
multi::{many1, separated_list1},
sequence::separated_pair,
};
fn parse_ordering(input: &str) -> nom::IResult<&str, Ordering> {
let (rest, ls) = separated_list1(
newline,
separated_pair(complete::u32, complete::char('|'), complete::u32),
)(input)?;
let mut ordering = Ordering::new();
for (a, b) in ls {
ordering.insert(a, b);
}
Ok((rest, ordering))
}
fn parse_list(input: &str) -> nom::IResult<&str, Vec<u32>> {
separated_list1(complete::char(','), complete::u32)(input)
}
fn parse_input(input: &str) -> nom::IResult<&str, (Ordering, Vec<Vec<u32>>)> {
let (input, ordering) = parse_ordering(input)?;
let (input, _) = many1(newline)(input)?;
let (input, data) = separated_list1(newline, parse_list)(input)?;
Ok((input, (ordering, data)))
}
parse_input(input).unwrap().1
}
It actually runs about 20% faster than a .lines()
+ .split_once('|')
, which I found interesting. And at least for me, just as easy to read if not a bit more clear.
Is it transitive?
Second aside: one thing that really tripped me up when running this problem was the assumption that the rules in the first part would be defined such that they were transitive. So if you have:
a|b
b|c
You can infer a|c
. Which works perfectly well for their given example… but doesn’t for our actually given test cases. It ends up you can have these rules:
a|b
b|c
c|a
Thus forming a loop. It doesn’t make any sense if you have all three a,b,c
in an input (I guess it could just be always invalid), but it actually works out perfectly if you only ever include no more than two of them. a,b
b,c
and c,a
are all perfectly valid to have within a given list.
Spent more time than I wanted to track that one down:
impl Ordering {
// This was my original (more complicated!) version, but it's not actually correct
/*
Imagine this input:
98|51
51|22
22|98
This would imply both that 98 is before 51 but that 51 is before 22 which is before 98.
But... that doesn't make any sense... *unless* you can never a valid list that has all three.
If you have 98 and 51, 98 goes first. But if you have 51,22 or 22,98 those are correct.
I expect this would do funny things to sort_by if you end up with all three 😄
*/
// To proceed, either a is directly before b or recursively before it
pub fn can_precede_transitive(&self, a: u32, b: u32) -> bool {
self.data.contains_key(&a)
&& (self.data[&a].contains(&b) || self.data[&a].iter().any(|&c| self.can_precede(c, b)))
}
pub fn can_precede_transitive_path(&self, a: u32, b: u32) -> Option<Vec<u32>> {
if !self.data.contains_key(&a) {
return None;
}
if self.data[&a].contains(&b) {
return Some(vec![a, b]);
}
for &c in &self.data[&a] {
if let Some(mut path) = self.can_precede_transitive_path(c, b) {
path.insert(0, a);
return Some(path);
}
}
None
}
}
And bin/day5-check.rs
:
use aoc2024::day5;
// Used to determine/verify that the ordering is *not* transitive
fn main() {
let filename = std::env::args().nth(1).expect("missing filename argument");
let content = std::fs::read_to_string(filename).expect("cannot read file");
let (ordering, _) = day5::parse(&content);
for a in ordering.values() {
for b in ordering.values() {
let proceeds = ordering.can_precede(a, b);
let proceeds_transitive = ordering.can_precede_transitive(a, b);
if proceeds_transitive && !proceeds {
println!("{a} {b} {:?}", ordering.can_precede_transitive_path(a, b));
}
}
}
}
That was how I actually verified that you can have rules that don’t work transitively, but are perfectly valid for the given input.
I’ll admit, that’s one of the things that both annoys me and is such a perfect encapsulation of software development: Occasionally ambiguous specifications. And/or specifications that are specifically written in English, but not given as concrete and comprehensive examples.
The second and third updates are also in the correct order according to the rules. Like the first update, they also do not include every page number, and so only some of the ordering rules apply - within each update, the ordering rules that involve missing page numbers are not used.
Which does technically cover that edge case, but so it goes.
Whee. 😄
Part 2
For each list that is not currently valid, sort it by the given ordering and then sum the middle values.
#[aoc(day5, part2, v1)]
fn part2_v1((ordering, data): &(Ordering, Vec<Vec<u32>>)) -> u32 {
data.iter()
.filter(|list| !ordering.validates(list))
.map(|list| {
// TODO: I don't want to have to clone this here, but AOC requires it
let mut list = list.clone();
list.sort_by(|&a, &b| {
if ordering.can_precede(a, b) {
std::cmp::Ordering::Less
} else {
std::cmp::Ordering::Greater
}
});
list
})
.map(|list| list[list.len() / 2])
.sum()
}
Nothing much to see here; we filter
out the ones that are already sorted, sort_by
the ones left, and map
to the middle values.
As mentioned, I do have one TODO:
because of how the cargo-aoc
crate passes input, you cannot directly mutate the input (it’s always a &T
for whatever T
the generator
function returns). This makes sense when it comes to benchmarking–you can’t mutate the input if you’re going to run the algorithm on that input 10k times. 😄
Anyways. Still quick!
cargo aoc --day 5 --part 2
AOC 2024
Day 5 - Part 2 - v1 : 6085
generator: 124.917µs,
runner: 207.5µs
Onward.
Benchmarks
$ cargo aoc bench --day 5
Day5 - Part1/v1 time: [8.1412 µs 8.2175 µs 8.3300 µs]
Day5 - Part2/v1 time: [59.691 µs 60.810 µs 62.017 µs]
It’s interesting how much faster that is than one run. Some impressive caching going on there I expect.
Optimization 1: Drop the hashmap
Okay, hashbrown
is fast enough, but we don’t really need that if we really want this solution to be fast. Instead, let’s just throw some memory at it. We know (from looking at our input) that all values will be two digits, so let’s just initialize a 100*100 array of booleans:
#[derive(Debug, Clone)]
pub struct Ordering {
data: [bool; 100*100]
}
impl Ordering {
pub fn new() -> Self {
Self {
data: [false; 100*100],
}
}
pub fn insert(&mut self, a: u32, b: u32) {
self.data[(a as usize)*100+(b as usize)] = true;
}
pub fn can_precede(&self, a: u32, b: u32) -> bool {
!self.data[(a as usize)*100+(b as usize)]
}
pub fn validates(&self, list: &[u32]) -> bool {
list.iter().is_sorted_by(|&a, &b| self.can_precede(*a, *b))
}
}
Nothing else actually needs to change (although this can certainly panic!
if any input is > 100).
$ cargo aoc bench --day 5
Day5 - Part1/v1 time: [229.10 ns 230.04 ns 231.17 ns]
Day5 - Part2/v1 time: [14.302 µs 14.369 µs 14.437 µs]
That’s … nanoseconds. Whee! And a 4x speedup on part 2.
Can we go further?
Optimization 2: bitvec
I’m not 100% sure how optimized Rust’s [bool]
is under the hood, but what if we directly use a crate that’s designed to work with a vec
of bits
? bitvec
!
#[derive(Debug, Clone)]
pub struct Ordering {
data: BitVec,
}
impl Ordering {
pub fn new() -> Self {
Self {
data: bitvec![0; 100*100],
}
}
pub fn insert(&mut self, a: u32, b: u32) {
self.data.set((a as usize)*100+(b as usize), true);
}
pub fn can_precede(&self, a: u32, b: u32) -> bool {
!self.data[(a as usize)*100+(b as usize)]
}
pub fn validates(&self, list: &[u32]) -> bool {
list.iter().is_sorted_by(|&a, &b| self.can_precede(*a, *b))
}
}
Basically the same; only a few names have changed and IndexMut
isn’t defined.
$ cargo aoc bench --day 5
Day5 - Part1/v1 time: [304.75 ns 306.14 ns 307.66 ns]
Day5 - Part2/v1 time: [20.223 µs 20.430 µs 20.766 µs]
Huh, we’ll that’s certainly interesting. The runtimes are a bit slower. It seems that the Rust folks did a pretty good job optimization [bool]
already!
One interesting thing to note though:
# With [bool]
cargo aoc --day 5
AOC 2024
Day 5 - Part 1 - v1 : 4924
generator: 101.291µs,
runner: 3.292µs
Day 5 - Part 2 - v1 : 6085
generator: 69.458µs,
runner: 22.583µs
# With bitvec
cargo aoc --day 5
AOC 2024
Day 5 - Part 1 - v1 : 0
generator: 83.584µs,
runner: 1.583µs
Day 5 - Part 2 - v1 : 11009
generator: 62.041µs,
runner: 34.834µs
The generator
part (that does the parsing and building of the initial bitvec
) is ~20% faster.
It would be interesting to dig into why exactly that is!
Optimization (attempt) 3: A vec
of pairs
Okay, let’s keep going. Next attempt, let’s see if a vec
of (a, b)
does any better. In theory, we can store the whole vec in memory, but we do have to scan through O(n)
(~1k) entries on each search.
#[derive(Debug, Clone)]
pub struct Ordering {
data: Vec<(u32, u32)>,
}
impl Ordering {
pub fn new() -> Self {
Self {
data: Vec::with_capacity(2000),
}
}
pub fn insert(&mut self, a: u32, b: u32) {
let point = (a, b);
let index = self.data.binary_search(&(a, b)).unwrap_or_else(|e| e);
self.data.insert(index, point);
}
pub fn can_precede(&self, a: u32, b: u32) -> bool {
self.data.binary_search(&(a, b)).is_ok()
}
pub fn validates(&self, list: &[u32]) -> bool {
list.iter().is_sorted_by(|&a, &b| self.can_precede(*a, *b))
}
}
Performance:
$ cargo aoc bench --day 5
Day5 - Part1/v1 time: [380.36 µs 381.16 µs 382.14 µs]
Day5 - Part2/v1 time: [2.2590 ms 2.2695 ms 2.2801 ms]
Part 1 is slightly slower, but part 2… ouch! That’s many times slower. It turns out that sorting has to do a bunch of comparisons, which actually makes that O(n)
meaningful…
Optimization (attempt) 4: Sorted vec
of pairs
I don’t think this will actually help, but one last try down this method: let’s store the (a, b)
list as sorted and use a binary search to check each comparison. It should be O(log n)
instead (roughly 3 comparisons, since N ≅ 1k)!
#[derive(Debug, Clone)]
pub struct Ordering {
data: Vec<(u32, u32)>,
}
impl Ordering {
pub fn new() -> Self {
Self {
data: Vec::with_capacity(2000),
}
}
pub fn insert(&mut self, a: u32, b: u32) {
let point = (a, b);
let index = self.data.binary_search(&(a, b)).unwrap_or_else(|e| e);
self.data.insert(index, point);
}
pub fn can_precede(&self, a: u32, b: u32) -> bool {
self.data.binary_search(&(a, b)).is_ok()
}
pub fn validates(&self, list: &[u32]) -> bool {
list.iter().is_sorted_by(|&a, &b| self.can_precede(*a, *b))
}
}
That binary_search
is an interesting API. It returns Ok
if the value was found and Err
if not; thus the unwrap_or_else
.
$ cargo aoc bench --day 5
Day5 - Part1/v1 time: [42.928 µs 43.629 µs 44.586 µs]
Day5 - Part2/v1 time: [245.26 µs 248.05 µs 251.44 µs]
Well, that’s ~10x as fast as the unsorted vec
! But still ~10x slower than the bitvec
. So I guess no luck either way!
Overall timing graph
Since I’ve done a chunk of optimizations for this one, here are the benchmarks in a nice table.
Version | Part 1 | Part 2 |
---|---|---|
hashbrown | 8.2175 µs | 60.810 µs |
[bool] | 230.04 ns | 14.369 µs |
bitvec 1 | 306.14 ns | 20.420 µs |
Vec<(a, b)> | 381.16 µs | 2.2695 ms |
binary_search | 42.629 µs | 248.05 µs |
But it does load faster; so it’s better if you include parsing time. ↩︎