TestIT - Integration Testing for My Rust Solvers

One of the problems (of a sorts) I’ve been having with my series on Rust Solvers is that, for each input puzzle, I need a way to save one or more ‘known good’ solutions so that when I change and add new functionality, I can verify that I’ve either not changed the solution or found another valid one.

Integration tests as it were.

So far, I’d been building this into each solution. While this worked perfectly fine, it’s a bit annoying to copy and paste to each binary, and then have to edit each test case with the answers.

An example run

Enter: testit:

# First run, without --db/--save for previous runs
$ testit \
  --command "./target/release/golf-peaks" \
  --files "data/golf-peaks/**/*.txt" \
  --timeout 60

data/golf-peaks/1-1.txt: New success:
1-↗

===

data/golf-peaks/1-10.txt: New success:
1-↘ 3-↙ 2-↘

===

...

data/golf-peaks/9-8.txt: New success:
1/3-↘ 1/2-↖ 1/↗ 2/1-↖ 1/1-↗

===

data/golf-peaks/9-9.txt: New success:
1-↗ 1/↘ 1-↘ 4-↗ 3-↘ 1/1-↗

===

data/golf-peaks/Credits.txt: New success:
4-↖ 5-↗ 3-↗ 6-↘

===


Summary:
	Successes: 121 (121 new)
	Failures: 0
	Timeouts: 0

# Later runs
$ testit \
  --command "./target/release/golf-peaks" \
  --files "data/golf-peaks/**/*.txt" \
  --timeout 60 \
  --db testit/golf-peaks.json \
  --save


Summary:
	Successes: 121 (0 new)
	Failures: 0
	Timeouts: 0

Pretty cool, I do think. 😄

So how does it work?

A lot like the aforementioned test cases above.

Command line options

First, Clap for parameter parsing:

/// Test a series of input files to check that output hasn't changed
#[derive(Parser, Debug)]
#[command(version, about, long_about = None)]
struct Args {
    /// The command to run
    #[arg(short, long)]
    command: String,

    /// The directory to run the command in and store the output file in (default: current directory)
    #[arg(short, long)]
    directory: Option<String>,

    /// A glob style pattern defining the files to test
    #[arg(short, long)]
    files: String,

    /// Specify environment variables as key=value pairs; multiple can be specified
    #[arg(short, long)]
    env: Vec<String>,

    /// Preserve the environment of the parent process (default: false)
    #[arg(short = 'E', long)]
    preserve_env: bool,

    /// The database file that will store successful results (as a JSON file)
    #[arg(short = 'o', long)]
    db: Option<String>,

    /// If this flag is set, save new successes to the output file (default: false)
    #[arg(short, long, action)]
    save: bool,

    /// The time to allow for each test (default: 1 sec)
    #[arg(short, long)]
    timeout: Option<u64>,
}

I do enjoy Clap.

The full (current as of v0.1.5) set of options is as follows:

$ testit --help

Test a series of input files to check that output hasn't changed

Usage: testit [OPTIONS] --command <COMMAND> --files <FILES>

Options:
  -c, --command <COMMAND>      The command to run
  -d, --directory <DIRECTORY>  The directory to run the command in and store the output file in (default: current directory)
  -f, --files <FILES>          A glob style pattern defining the files to test
  -e, --env <ENV>              Specify environment variables as key=value pairs; multiple can be specified
  -E, --preserve-env           Preserve the environment of the parent process (default: false)
  -o, --db <DB>                The database file that will store successful results (as a JSON file)
  -s, --save                   If this flag is set, save new successes to the output file (default: false)
  -t, --timeout <TIMEOUT>      The time to allow for each test (default: 1 sec)
  -h, --help                   Print help
  -V, --version                Print version

Test file collection

Next up, collect the list of files to use using the glob crate:

// Build the absolute glob pattern
// This is based on the working directory (or cwd) from the args + the files pattern
let pattern = format!(
    "{}/{}",
    args.directory.clone().unwrap_or(".".to_string()),
    args.files
);

// Glob the list of all files that we want to test
let files = glob::glob(&pattern)
    .unwrap()
    .map(|x| x.unwrap())
    .collect::<Vec<path::PathBuf>>();

I’ve actually designed this so that you don’t have to always specify the full set of test cases, even if using the same database file. If you only specify a few files to run, it will only compare those to previous results, ignoring the rest. Handy when you have some really slow tests to run (that’s also why I have a timeout).

Parallel Execution + Building the Command

Next up, we’ll use Rayon to automagically run several of these at a time in threads, collecting the results in order (without having to muss about with channels).

The command builder here, I also set up specifically to pass different environment variable along (or use -E/--preserve-env to keep the environment of the calling shell). That’s a pretty cool interface.

Currently, I capture stdout and ignore stderr, but at some point I’ll want to grab both of those I think.

// For each file, run the command and compare the output
let results = files.par_iter().map(|file| {
    log::info!("Testing {}", file.display());

    let command = args.command.clone();
    let cwd = args.directory.clone().unwrap_or(".".to_string());
    let stdin = std::fs::File::open(&file).unwrap();
    let timeout = Duration::from_secs(args.timeout.unwrap_or(10));

    // Create the child process
    let mut command_builder = Command::new("bash");
    command_builder
        .arg("-c")
        .arg(command)
        .current_dir(&cwd)
        .stdin(stdin)
        .stderr(std::process::Stdio::piped())  // TODO: Do we want to capture this?
        .stdout(std::process::Stdio::piped());

    // Add environment variables
    if !args.preserve_env {
        command_builder.env_clear();
    }
    for (key, value) in env.iter() {
        command_builder.env(key, value);
    }

    // Start the child
    let mut child = command_builder
        .spawn()
        .expect("Failed to execute command");

    //...
}

Running each command with a timeout

Next, we’ve spawned each command (with Command::spawn), now we just want to let them finish… but some of these tasks will happily run for hours if not longer. So I added a --timeout parameter:

// For each file, run the command and compare the output
let results = files.par_iter().map(|file| {
    // ...

    // Wait for the child to finish up to timeout
    // If timeout is reached, kill the thread (or it may outlast us...)
    match child.wait_timeout(timeout) {
        Ok(Some(status)) => {
            let mut output = String::new();
            child.stdout.as_mut().unwrap().read_to_string(&mut output).unwrap();

            if status.success() {
                log::info!("Success: {}", file.display());
                TestResult::Success(output)
            } else {
                log::info!("Failure {}", file.display());
                TestResult::Failure(output)
            }
        }
        Ok(None) => {
            // Timeout passed without exit
            log::info!("Timeout {}", file.display());
            child.kill().unwrap();
            TestResult::Timeout
        }
        Err(_) => {
            // Process errored out
            child.kill().unwrap();
            unimplemented!("Process errored out")
        },
    }
}).collect::<Vec<_>>();

In this case, we’ll tell the child process to wait_timeout. If you get an Ok(Some(status)) back from that, the command finished (which could be a success or failure itself). But if you get Ok(None), it’s a timeout. Kill off the process (and hope it handles sigterm).

I don’t actually have any cases (yet) that have hit the Err state, so I haven’t actually dealt with it yet. Panic!

Collecting the results

Finally, we’ll collect the results. If --db is set, we’ll load previous results from a JSON ‘database’ and compare, otherwise, just display them:

// If we have a previous output file, compare results
let mut previous_results: BTreeMap<String, Vec<String>> = if let Some(output_file_path) = db_path.clone() {        
    if let Ok(f) = std::fs::read_to_string(output_file_path) {
        serde_json::from_str(&f).unwrap()
    } else {
        BTreeMap::new()
    }
} else {
    BTreeMap::new()
};

let mut success_count = 0;
let mut new_success_count = 0;
let mut faulure_count = 0;
let mut timeout_count = 0;

// Write results
// This will only print failures, timeouts, and new successes
// If the output file is set and we see the same success again, it will be ignored
for (file, result) in files.iter().zip(results.iter()) {
    // Remove the directory prefix if it exists
    // This will apply to the printed output + the output file
    let file = if let Some(prefix) = args.directory.clone() {
        file.strip_prefix(prefix).unwrap()
    } else {
        file
    };

    match result {
        TestResult::Success(output) => {
            success_count += 1;

            if let Some(previous) = previous_results.get(file.to_str().unwrap()) {
                if previous.contains(output) {
                    // We have a previously logged success, do nothing
                    continue;
                }
            }

            new_success_count += 1;

            // We have successful output we haven't seen before, log it and potentially save it
            println!("{}: New success:\n{}\n===\n", file.display(), output);
            if args.save {
                previous_results
                    .entry(file.to_str().unwrap().to_string())
                    .or_insert(Vec::new())
                    .push(output.clone());
            }
        }
        TestResult::Failure(output) => {
            faulure_count += 1;
            println!("{}: Failure\n{}\n===\n", file.display(), output);
        }
        TestResult::Timeout => {
            timeout_count += 1;
            println!("{}: Timeout", file.display());
        }
    }
}

If you are using the database, I do intentionally not show results that previously succeeded and succeeded again (without new output). That means that the tests are working as intended, so I don’t need the output cluttering things up.

And that’s it… if --save is specified, we’ll write the JSON back out:

// Save the new results (if requested)
if args.save {
    let f = std::fs::File::create(db_path.expect("Tried to save with no output file")).unwrap();
    serde_json::to_writer_pretty(f, &previous_results).unwrap();
}

But that’s the entire script. It’s worked great for my Rust solvers, so far I’ve moved all of the previous solutions over to testit with test cases in their own directory. I suppose if someone really wanted to cheat at one of the games I’ve solved, it’s a quick and easy way to see possible solutions… but you can already find those all on YouTube, so this doesn’t bother me over much.

Summary

Overall, it’s a quick and easy way to do integration tests as I’m working on these solvers. With --timeout, I can purposely set a longer timeout for some runs, to at least get an answer for those ’tricky’ solutions, but normally run it much more quickly to make sure I didn’t break something obvious.

If you want to use it, go for it! You can install it with cargo: cargo install --path .. That’s pretty cool actually. Like a lot of Rust, it just works.

Let me know (on Github) if you like it or have any requests!