Source: No Space Left On Device
Part 1
Give the output of a very simple shell with only the commands
cdandls(the output of which is either"dir" namefor directories orsize namefor files), find the largest directory (disk usage calculated recursively) with a size no greater than 100,000.
Oh boy this one was crazy. Rust does some… interesting things when it comes to recursive/self-referential data structures.
Specifically, here’s where I ended up:
// Represents a 'thing' on a file system, either:
// - A named directory which can contain directories or files
// - A named file which has a size
#[derive(Clone)]
enum FileSystemThing {
Directory {
name: String,
parent: Option<Arc<FileSystemThing>>,
children: RefCell<Vec<Arc<FileSystemThing>>>,
},
File {
name: String,
size: usize,
},
}
The two interesting bits to deal with are:
The
Arc(atomic reference counter) wrappings on allFileSystemThings. That’s because I’m going to be usingclonea bunch to attach those same references to a single file system thing to different parts in the tree. Specifically, the parents of each child and the child nodes themselves.The
RefCellin a directory’s children. This came about because I need to be able tomutthe children topushto it. I can’t otherwise figure out how that’s going to work. I kept getting this error:error[E0596]: cannot borrow `*children` as mutable, as it is behind a `&` reference --> src/bin/07-recursive-fileinator.rs:95:25 | 95 | children.push(child); | ^^^^^^^^^^^^^^^^^^^^ `children` is a `&` reference, so the data it refers to cannot be borrowed as mutableWith the
RefCellthough, I canborroworborrow_mutthe value as I need.
Okay, so we have the structure… how do we build it? Well, here’s what I did:
impl FileSystemThing {
// Build a file system thing from an iter of commands run (cd and ls)
fn from<I>(mut iter: I) -> Arc<Self>
where
I: Iterator<Item = String>,
{
use FileSystemThing::*;
// Build a special :ROOT: node to start with
let root = Arc::new(Directory {
name: String::from(":ROOT:"),
parent: None,
children: RefCell::new(Vec::new()),
});
// Keep track of where we currently are after cds
let current = RefCell::new(root.clone());
while let Some(next) = iter.next() {
let mut parts = next.split_ascii_whitespace();
if next.starts_with("$ cd") {
// cd changes directory and has three cases:
// "/" goes to root
// ".." goes up one level
// anything else goes into the named child directory
let path = String::from(parts.last().expect("cd must have a directory"));
// Build the next node depending on what we're cding too ("/", "..", or a name)
let next = match current.borrow().as_ref() {
Directory { parent, .. } => {
if path == ".." {
match parent {
Some(parent) => parent.clone(),
_ => panic!("must have parent set to .."),
}
} else if path == "/" {
root.clone()
} else if let Some(child) = current.borrow().get(path) {
child
} else {
current.borrow().clone()
}
}
_ => current.borrow().clone(),
};
current.replace(next);
} else if next.starts_with("$ ls") {
// Starting an LS, nothing to do on this line
} else if next.starts_with("dir") {
// If we see a line starting with dir we're in an ls, create the directory
let name = parts.last().expect("directory in ls must have a name");
// If the file/directory already exists, we ran ls twice, ignore this
if let Some(_child) = current.borrow().get(String::from(name)) {
continue;
}
// Build the new child directory referencing current as the parent
let child = Arc::new(Directory {
name: String::from(name),
parent: Some(current.borrow().clone()),
children: RefCell::new(Vec::new()),
});
// Add a reference to the new directory to current's children
match current.borrow().as_ref() {
Directory { children, .. } => {
children.borrow_mut().push(child);
}
_ => panic!("somehow tried to ls a file"),
}
} else {
// Otherwise, it's a line containing the size and name of a file
let size = parts
.next()
.expect("must have size")
.parse::<usize>()
.expect("size must be a usize");
let name = parts.next().expect("must have a name");
// If the file/directory already exists, we ran ls twice, ignore this
if let Some(_child) = current.borrow().get(String::from(name)) {
continue;
}
// Build the new file, doesn't need a parent at least
let child = Arc::new(File {
name: String::from(name),
size,
});
// Add a reference to the new file to current's children
match current.borrow().as_ref() {
Directory { children, .. } => {
children.borrow_mut().push(child);
}
_ => panic!("somehow tried to put a file into another file"),
}
}
}
root.clone()
}
}
Like I said, there are an awful lot of clone around, but that’s fine. All that’s essentially doing is telling Arc that we want more reference for it to count. Cheap.
To break down the code a bit:
- Create a special
:ROOT:node with no parent, this is the only reason we haveOptionon that… - Set the
currentnode to aRefCellof that, so we can update the reference - For each line in the input:
- If it’s a
cd:- If it’s
cd /, change thecurrentto the:ROOT:node; I don’t think this actually happens - If it’s
cd .., change to the parent directory ofcurrent;panicif we’re at the root - If it’s
cd $name, change to that directory; assume that it’s been ‘created’ / revealed by a previousls(see below)
- If it’s
- If it’s an
ls: we don’t actually care, assume that all input not starting with$is part of anls - If it’s
dir $name, create a new directory with$nameand no children and attach it tocurrentby:- Adding the new directory to
current.children(with all the requisite indirection) - Set the
parentof the new node tocurrent(likewise)
- Adding the new directory to
- If it’s
$size $filename, create a file, add it tocurrent.children, and store the size with it
- If it’s a
Ooo-kay. That took a while to get working. I think I have a better handle for what Rust needs to make these complicated structs work, but … oh my is it a lot.
It took … a while to get to this. If you happen to be reading this and know of a more idiomatic / cleaner way to write the from code, I’d love to hear it.
Okay, next we need the ability to recursively calculate the size at any given node:
impl FileSystemThing {
// Get the size for a file or directory
// Files directly have sizes, directories recursively sum their children's sizes
pub fn size(&self) -> usize {
use FileSystemThing::*;
match self {
Directory { children, .. } => {
return children
.borrow()
.iter()
.map(|child| child.as_ref().size())
.sum()
}
File { size, .. } => *size,
}
}
}
That’s not so bad. If it’s a file, just return the size. If it’s a directory, it doesn’t have a size of it’s own, so instead, borrow the children RefCell, iter through it, map each child to get the size (by calling this function, don’t assume it’s a file), and suming those. Clean, at least in my opinion.
Okay, so we can build the tree and calculate the size at any point… but how do we go through and ‘iterate’ (as it were) over all of the nodes to find the ones that match our criteria?
Okay, this one was also kind of a pain. This time, so that I can store the state of the iterator, I made a second struct FileSystemIterator and had FileSystemThing build one of those:
impl FileSystemThing {
// Return an iterator over all nodes in the tree
pub fn walk(&self) -> FileSystemIterator {
FileSystemIterator::new(Arc::new(self.clone()))
}
}
How I want to deal with this iterator is:
- Initialize a stack of
FileSystemThingwith the root node we’re walking from - Whenever
nextis called:- If the stack is empty: we’re done, return
None(stop iteration, I like doing it this way) - Otherwise, pop that thing off the stack and save it to return, but first:
- For each child node of that thing, add it to the stack in our
Iteratorstate - Now, return the popped node
- For each child node of that thing, add it to the stack in our
- If the stack is empty: we’re done, return
// Iterate recursively over file systems
// Keep a stack of the children we've seen but not returned so far
struct FileSystemIterator {
stack: Vec<Arc<FileSystemThing>>,
}
impl FileSystemIterator {
fn new(root: Arc<FileSystemThing>) -> Self {
FileSystemIterator { stack: vec![root] }
}
}
impl Iterator for FileSystemIterator {
type Item = Arc<FileSystemThing>;
fn next(&mut self) -> Option<Self::Item> {
use FileSystemThing::*;
if self.stack.is_empty() {
return None;
}
let next = self.stack.pop().unwrap();
if let Directory { children, .. } = next.borrow() {
for child in children.borrow().iter() {
self.stack.push(child.clone());
}
}
Some(next)
}
}
Getting the right combination of references in this took a bit too… eventually I fell back again on Arc. It’s … actually the right tool for the job, no? Because I’ll have at least two references kicking around at times, in children (and sometimes parents) and in the stack.
Neat.
Also note: because we’re not reversing here, nodes at the same level will be returned in the opposite order they were seen on the input, you can add a .rev() at the end of the children.borrow().iter() line. But for these problems, it doesn’t matter.
Okay. So, what was the problem even?
Right, iterate through the tree and sum up the recursive side (including overlaps) of all directories that have a size of no more than 100,000.
fn part1(filename: &Path) -> String {
let root = FileSystemThing::from(iter_lines(filename));
let mut total_sizes = 0;
for node in root.walk() {
match node.borrow() {
FileSystemThing::Directory { .. } => {
let size = node.size();
if size <= 100000 {
total_sizes += size;
}
}
_ => {}
}
}
total_sizes.to_string()
}
Not so bad.
One thing probably worth mentioning is that I implemented (and used here) an iter_lines method. It’s the same as from_lines, it just doesn’t collect, making it lazy for free. It doesn’t matter at all in this case, but would 1) save some memory by not loading an entire file into memory and 2) allow you to partially consume the input easily.
I may go back and change previous days over to this model.
Part 2
Assume this virtual disk has total available space of
70,000,000. You need30,000,000free. Find the smallest directory (recursively) you can delete that would leave you with30,000,000free and return it’s size.
I definitely made this more verbose than I needed to calculate this target. So it goes.
Essentially, I want to:
- Calculate what the
target_to_freesize is - Initialize how much is
freeabletousize::MAX(since we want a minimum) - Walk the file system
- Skip files
- For directories:
- If it’s larger than
target_to_freeand smaller thanfreeable, updatefreeable - Otherwise, skip it
- If it’s larger than
- Return the final
freeablevalue
fn part2(filename: &Path) -> String {
let root = FileSystemThing::from(iter_lines(filename));
let total_disk = 70000000;
let needed = 30000000;
let used = root.size();
let available = total_disk - used;
let target_to_free = needed - available;
// We need the smallest directory at least larger than target_to_free
let mut freeable = usize::MAX;
for node in root.walk() {
match node.borrow() {
FileSystemThing::Directory { .. } => {
let size = node.size();
if size > target_to_free && size < freeable {
freeable = size;
}
}
_ => {}
}
}
freeable.to_string()
}
I do enjoy the programs where the final wrapper for part 2 is mostly done by properly engineering part 1. 😄
Performance
Fast:
./target/release/07-recursive-fileinator 1 data/07.txt
1453349
took 403.291µs
./target/release/07-recursive-fileinator 2 data/07.txt
2948823
took 421.666µs
Yeah… I don’t really need to optimize that from a speed perspective.
Like I said though, If you happen to be reading this and know of a more idiomatic / cleaner way to write the from code, I’d love to hear it.