AoC 2023 Day 14: Spininator

Source: Day 14: Parabolic Reflector Dish

Full solution for today (spoilers!)

Part 1

Given a grid of # and O (among empty . points) where O can move, slide each O as far north as it can. Score each based on how far north it is.

Cellular automata!

And an easier one too, it doesn’t actually slide side to side, just in one direction.

Types and parsing

Okay, let’s do it. Start with Point and Bounds from yesterday and add today’s simulation:

#[derive(Debug, Clone)]
pub struct Platform {
    pub bounds: Bounds,
    pub round_rocks: Vec<Point>,
    pub cube_rocks: FxHashSet<Point>,
}

impl From<&str> for Platform {
    fn from(input: &str) -> Self {
        let mut bounds = Bounds::default();
        let mut round_rocks = Vec::default();
        let mut cube_rocks = FxHashSet::default();

        for (y, line) in input.lines().enumerate() {
            for (x, c) in line.chars().enumerate() {
                bounds.include(Point {
                    x: x as isize,
                    y: y as isize,
                });
                match c {
                    'O' => {
                        round_rocks.push(Point {
                            x: x as isize,
                            y: y as isize,
                        });
                    }
                    '#' => {
                        cube_rocks.insert(Point {
                            x: x as isize,
                            y: y as isize,
                        });
                    }
                    _ => {}
                }
            }
        }

        Self {
            bounds,
            round_rocks,
            cube_rocks,
        }
    }
}

Simulation

So how does that translate to our problem?

fn main() -> Result<()> {
    let stdin = io::stdin();
    let input = io::read_to_string(stdin.lock())?;
    let mut platform = Platform::from(input.as_str());

    // Let the rocks slide until they stop moving
    loop {
        let mut changed = false;

        for i in 0..platform.round_rocks.len() {
            // Get current point; if we're at the top already, skip
            let r = platform.round_rocks[i];
            let next = Point { x: r.x, y: r.y - 1 };

            // Check that the next point is available
            if !platform.bounds.contains(&next)
                || platform.round_rocks.contains(&next)
                || platform.cube_rocks.contains(&next)
            {
                continue;
            }

            // If we get here, we can move; do it
            platform.round_rocks[i].y = next.y;
            changed = true;
        }

        if !changed {
            break;
        }
    }

    // Calculate final score
    let result = platform
        .round_rocks
        .iter()
        .map(|r| platform.bounds.max_y - r.y + 1)
        .sum::<isize>();

    println!("{result}");
    Ok(())
}

Not bad.

It’s not functional style, since if you iter over platform.round_rocks, things get a bit messy.

One gotcha that I’m sure we’re going to run into is the platform.(round|cube)_rocks.contains. Since we’re storing it as a Vec, this is O(n) . But we’ll get back to that.

Part 2

Instead of just sliding north, slide first north, then east, then south, then west. Repeat this 1 billion times.

Whelp. There it is. Performance is going to matter.

Solution: Spinning each way

To make this work, we’ll define a few constants and Add and Sub on Point:

impl Point {
    pub const NORTH: Point = Point { x: 0, y: -1 };
    pub const SOUTH: Point = Point { x: 0, y: 1 };
    pub const EAST: Point = Point { x: 1, y: 0 };
    pub const WEST: Point = Point { x: -1, y: 0 };
}

impl std::ops::Add<Point> for Point {
    type Output = Point;

    fn add(self, rhs: Point) -> Self::Output {
        Point {
            x: self.x + rhs.x,
            y: self.y + rhs.y,
        }
    }
}

impl std::ops::Sub<Point> for Point {
    type Output = Point;

    fn sub(self, rhs: Point) -> Self::Output {
        Point {
            x: self.x - rhs.x,
            y: self.y - rhs.y,
        }
    }
}

From there, we’re going to fairly directly translate. For each cycle + each direction, slide in the right direction until we are stable:

for cycle in 0..=TARGET {
    // The rocks will slide N, W, S, E
    for direction in [Point::NORTH, Point::WEST, Point::SOUTH, Point::EAST] {
        // Let the rocks slide until they stop moving
        loop {
            let mut changed = false;

            for i in 0..platform.round_rocks.len() {
                let r = platform.round_rocks[i];
                let next = r + direction;

                // Check that the next point is available
                if !platform.bounds.contains(&next)
                    || platform.round_rocks.contains(&next)
                    || platform.cube_rocks.contains(&next)
                {
                    continue;
                }

                // If we get here, we can move; do it
                platform.round_rocks[i].x = next.x;
                platform.round_rocks[i].y = next.y;

                changed = true;
                break;
            }

            if !changed {
                break;
            }
        }
    }
}

I have no idea how slow this is… but let’s go with very. I’m never going to wait a billion cycles for this.

Optimization 1: Cycle Detection

Luckily, as was the case in [[AoC 2023 Day 8: Mazinator|day 8]], we need to use some cycle detection.

let mut seen = FxHashMap::default();

for cycle in 0..=TARGET {
    // Check if we've seen this platform state before (it's deterministic, thus cycling)
    // Keep going until the cycle is in the same phase as the TARGET
    let key = platform.to_string();
    if let Some(cycle_start) = seen.get(&key) {
        let cycle_length = cycle - cycle_start;

        if (TARGET - cycle_start) % cycle_length == 0 {
            break;
        }
    }
    seen.insert(key, cycle);

    // ...

As mentioned, we’re going to keep a map of the current platform to when we saw it before. If we see the same platform twice, we have a cycle. We don’t need to immediately stop then, but rather stop once we’re at the same point in that cycle that TARGET is.

Hashing based on platform.to_string is a bit ugly, but it works well enough. More of the time is spent on updating than to_string, so no worries.

Still pretty slow.

Optimization 2: Data Structures

Okay, the next problem we’re having is that contains on a Vec is a bit slow. So let’s update our Platform struct a bit:

#[derive(Debug, Clone)]
pub struct PlatformV2 {
    pub bounds: Bounds,
    pub round_rocks: Vec<Point>,
    pub occupied: FxHashSet<Point>,
}

impl From<Platform> for PlatformV2 {
    fn from(value: Platform) -> Self {
        let mut occupied = FxHashSet::default();
        for r in value.round_rocks.iter() {
            occupied.insert(*r);
        }
        for c in value.cube_rocks.iter() {
            occupied.insert(*c);
        }

        Self {
            bounds: value.bounds,
            round_rocks: value.round_rocks,
            occupied,
        }
    }
}

Essentially, we’re going to store some of the data (the round_rocks) twice in order to get some speedup. occupied will be used specifically for contains, since HashSets are fast for that, but we’ll get the Vec<Point> for round_rocks so we can loop over it and modify it more easily:

// The rocks will slide N, W, S, E
for direction in [Point::NORTH, Point::WEST, Point::SOUTH, Point::EAST] {
    // Let the rocks slide until they stop moving
    loop {
        let mut changed = false;

        for i in 0..platform.round_rocks.len() {
            let r = platform.round_rocks[i];
            let next = r + direction;

            // Check that the next point is available
            if !platform.bounds.contains(&next) || platform.occupied.contains(&next) {
                continue;
            }

            // If we get here, we can move; do it
            platform.round_rocks[i].x = next.x;
            platform.round_rocks[i].y = next.y;

            platform.occupied.remove(&r);
            platform.occupied.insert(next);

            changed = true;
            break;
        }

        if !changed {
            break;
        }
    }
}

We do have to update both round_rocks and occupied (which could perhaps be hidden in an impl rather than trusting that I’ll get it right), but it works. And for the first time, we actually have a solution!

$ just time 14 2-v2

hyperfine --warmup 3 'just run 14 2-v2'
Benchmark 1: just run 14 2-v2
  Time (mean ± σ):     18.358 s ±  0.172 s    [User: 17.818 s, System: 0.025 s]
  Range (min … max):   18.125 s … 18.627 s    10 runs

We can do better.

Optimization 3: Multislide

Okay, one thing we’ve been doing so far is sliding each rock one tile at a time–and sometimes not even that. If two rocks are touching and the ‘second’ tries to slide first, it will have to wait for the first to move. Something like this:

# sliding right
.OO...
.O.O..
..O.O.
...O.O
....OO

# versus
.OO...
.O...O
....OO

But there’s no particular reason we have to do it that way!

for i in 0..platform.round_rocks.len() {
    let r = platform.round_rocks[i];

    // Move in that direction until we hit something (or a wall)
    let mut next = r;
    loop {
        next = next + direction;

        if !platform.bounds.contains(&next) || platform.occupied.contains(&next) {
            // Have to step back to the last valid point
            next = next - direction;
            break;
        }
    }

    // If we didn't actually move, do nothing
    if next == r {
        continue;
    }

    // If we get here, we can move; do it
    platform.round_rocks[i].x = next.x;
    platform.round_rocks[i].y = next.y;

    platform.occupied.remove(&r);
    platform.occupied.insert(next);

    changed = true;
    break;
}

Does it help?

$ just time 14 2

hyperfine --warmup 3 'just run 14 2'
Benchmark 1: just run 14 2
  Time (mean ± σ):      6.110 s ±  0.215 s    [User: 5.781 s, System: 0.021 s]
  Range (min … max):    5.862 s …  6.340 s    10 runs

Absolutely.

Performance

Part 1 is fast…

$ just time 14 1

hyperfine --warmup 3 'just run 14 1'
Benchmark 1: just run 14 1
  Time (mean ± σ):     308.4 ms ± 162.7 ms    [User: 92.8 ms, System: 19.5 ms]
  Range (min … max):   165.0 ms … 525.7 ms    10 runs

$ just time 14 2

hyperfine --warmup 3 'just run 14 2'
Benchmark 1: just run 14 2
  Time (mean ± σ):      6.110 s ±  0.215 s    [User: 5.781 s, System: 0.021 s]
Range (min … max):    5.862 s …  6.340 s    10 runs

…but part 2 is unfortunately well over the 1 second mark, even with improvements. I really do think that I should be able to do better. But for now, this will have to do. We’ll see!

Edit 1, Optimization 4: … removing a debugging line

Okay, I kept working on it.

Fix #1 (and I feel so bad about this)… I left a debugging line in.

// Let the rocks slide until they stop moving
loop {
    let mut changed = false;

    for i in 0..platform.round_rocks.len() {
        let r = platform.round_rocks[i];

        // Move in that direction until we hit something (or a wall)
        let mut next = r;
        loop {
            next = next + direction;

            if !platform.bounds.contains(&next) || platform.occupied.contains(&next) {
                // Have to step back to the last valid point
                next = next - direction;
                break;
            }
        }

        // If we didn't actually move, do nothing
        if next == r {
            continue;
        }

        // If we get here, we can move; do it
        platform.round_rocks[i].x = next.x;
        platform.round_rocks[i].y = next.y;

        platform.occupied.remove(&r);
        platform.occupied.insert(next);

        changed = true;
        break;
    }

    if !changed {
        break;
    }
}

That second last break (after changed = true) means that each update will only update a single stone. To update any future stone, it has to at least check each previous one.

That’s … not helpful.

Removing that and:

$ just time 14 2

hyperfine --warmup 3 'just run 14 2'
Benchmark 1: just run 14 2
  Time (mean ± σ):      1.776 s ±  0.039 s    [User: 1.657 s, System: 0.019 s]
  Range (min … max):    1.715 s …  1.830 s    10 runs

This does also improve pretty much all of the previous part 2 runtimes, but I’m not going to go back and fix them retroactively. Such is life.

Edit 2, Optimization 5: Pre-sorting the rocks

For the first ‘real’ optimization, I noted previously that we’re sliding each stone as far as we can. Well, it turns out that’s only partially true. In order to move each stone as far as possible, we have to be moving in the right direction. If we’re moving from ’top to bottom’, then moving North is easier than moving South.

To fix this, let’s add a quick sort within the direction loop:

// The rocks will slide N, W, S, E
for (direction_i, direction) in [Point::NORTH, Point::WEST, Point::SOUTH, Point::EAST]
    .into_iter()
    .enumerate()
{
    // Resort the rocks in the direction we're moving
    platform.round_rocks.sort_by(|a, b| {
        if direction_i == 0 {
            // direction == Point::NORTH
            a.y.cmp(&b.y)
        } else if direction_i == 1 {
            // direction == Point::WEST
            a.x.cmp(&b.x)
        } else if direction_i == 2 {
            // direction == Point::SOUTH
            b.y.cmp(&a.y)
        } else if direction_i == 3 {
            // direction == Point::EAST
            b.x.cmp(&a.x)
        } else {
            panic!("Invalid direction_i: {}", direction_i)
        }
    });

    // Slide each rock once
    // Note: No more `loop` here waiting for them to stop sliding!
    for i in 0..platform.round_rocks.len() {
        // ...
    }
}

It should sort in place, so no more allocation and I tried to (perhaps prematurely) optimize the comparison about which way we’re sorting.

Downside: we have to spend the extra time each part of each cycle to sort. Upside, we can remove the actual loop and just go directly to updating each rock!

So, does it help?

$ hyperfine --warmup 3 'just run 14 2-order' 'just run 14 2'

Benchmark 1: just run 14 2-order
  Time (mean ± σ):      1.769 s ±  0.183 s    [User: 1.550 s, System: 0.020 s]
  Range (min … max):    1.621 s …  2.151 s    10 runs

Benchmark 2: just run 14 2
  Time (mean ± σ):      1.909 s ±  0.152 s    [User: 1.651 s, System: 0.019 s]
  Range (min … max):    1.716 s …  2.103 s    10 runs

Summary
  just run 14 2-order ran
    1.08 ± 0.14 times faster than just run 14 2

~8%. So it’s progress, but not much.

Edit 3, Rendering

Much as I’ve now added to , I have rendering!

Here is the solution (for part 2) rendering one frame per cycle:

And here is rendering one frame per subcycle/direction:

Not exactly as I expected, but still pretty fun!