The earliest memory I have of ‘programming’ is in the early/mid 90s when my father brought home a computer from work. We could play games on it … so of course I took the spreadsheet program he used (LOTUS 123, did I date myself with that?) and tried to modify it to print out a helpful message for him. It … halfway worked? At least I could undo it so he could get back to work…
After that, I picked up programming for real in QBASIC (I still have a few of those programs lying around), got my own (junky) Linux desktop from my cousin, tried to learn VBasic (without a Windows machine), and eventually made it to high school… In college, I studied computer science and mathematics, mostly programming in Java/.NET, although with a bit of everything in the mix. A few of my oldest programming posts on this blog are from that time.
After that, on to grad school! Originally, I was going to study computational linguistics, but that fell through. Then programming languages (the school’s specialty). And finally I ended up studying censorship and computer security… before taking a hard turn into the private sector to follow my PhD advisor.
Since then, I’ve worked in the computer security space at a couple of different companies. Some don’t exist any more, some you’ve probably heard of. I still program for fun too, and not just in security.
But really, I still have a habit of doing a little bit of everything. Whatever seems interesting at the time!
Classes: \d/\D for digits, \w/\W for ‘words’, and \s/\S for whitespace
Escape characters: \t\r\n\v\f
Control characters: \cX (I’ve never used these)
Hex and unicode literals: \hXX and \uXXXX
Disjunction: | (both in capture groups and not)
Groups and back references
Capture groups: (abc)
Named capture groups: (?<name>abc)
Non-capturing groups: (?:abc)
Flags: (?ims-ims:abc)
Both enabling and disabling
i and s but not m
Backreferences: \n
Named backreferences: \k<name>
Quantifiers
* for zero or more
+ for one or more
? for zero or one
*?, +?, and ?? for lazy / non-greedy matches
abc{n} exactly n matches
abc{n,} at least n matches
abc{,m} up to m matches
abc{n,m} at least n and up to m matches (inclusive)
Lazy matches for all of those
Most of those were fairly straight forward extensions of previous code. In think the most interesting ones were handling the parsing of all the different things that can go in groups (including flags).
For each of them, you can check my git commit history to see how I implemented specific things. It’s mostly one commit per feature, but not always.
Unsupported Regex Features (so far!)
Assertions:
Word boundaries (\b and \B)
Look ahead/behind (parsed but not matched)
Character classes
[\b] for backspace characters
Long unicode format: \u{XXXXX}
Unicode properties: \p{...}/\P{...}
Groups and back references:
m flag / mode: multiline matches
The look ahead/behind is the one I’m most interested in supporting. I don’t even think it will be that hard, I just honestly missed it.
The more interesting one will be the m flag. Currently, I only match lines, so that will be a decently large restructuring. We’ll see.
Supported CLI flags
I’ve made an awful lot of progress on this one too!
$ jp-grep --help
A custom grep implementation; always behaves as egrep
Usage: jp-grep [OPTIONS][PATTERN][PATHS]...
Arguments:
[PATTERN] The regular expression to evaluate; may also be specified with -e
[PATHS]... Paths to search for matches; if none are provided read from stdin
Options:
-A, --after-context <AFTER_CONTEXT>
Lines of context to print after each match
-B, --before-context <BEFORE_CONTEXT>
Lines of context to print before each match
-C, --context <CONTEXT>
Lines to print both before and after
-c, --count
Only print the matching count
-E, --extended-regexp
Extended regex mode (egrep); this option is ignored (always true) -e, --regexp <ADDITIONAL_PATTERNS>
Additional patterns, will return a line if any match
-h, --no-filename
Never print filenames
--help
Display this help message
-i, --ignore-case
Default to case insensitive match
-n, --line-number
Print line numbers before matches and context
-r, --recursive
Recursively add any directories (-R also works) -v, --invert-match
Invert the match; only print lines that don't match any pattern
-V, --version
Print version
Of those, the context flags (-A, -B, and -C) were probably the most tricky, since I basically had to implement a circular buffer for them. I could have just read the entire file into memory, but from the beginning, I didn’t want to do that.
-E is a little silly, since that’s the only grep pattern I support (and the only one I actually use in grep, so that’s fair).
So far as supporting multiple files, recursive search, and stdin, read the section on collecting files later.
So far as printing (handling line numbers and file names), read the section on printing lines.
Overall, pretty fun code.
Unsupported CLI flags
So far, there are a bunch of flags that I don’t support for grep. Of those, there are a bunch that I don’t intend to support (like built in compression support and properly dealing with symlinks).
The things that I would still like to support though are:
Input options:
-f file/--file=file - Read patterns from file
Output options:
-a/--text - Currently I always have this set; I don’t treat binary files differently
-L/--files-without-match - only print files that don’t match
-o/--only-matching - only print the matching groups; I have the groups for backreferences, use them!
File filtering - files to include/exclude (useful with recursive matches):
--exclude pattern
--exclude-dir pattern
--include pattern
--include-dir pattern
That’s not too bad, all things consider.
Error handling
One thing that I actually played a bit with this time around was custom error handling in the parser. Rather than just returning &str all over the place for Err types, I made my own:
The WithPosition type also lets me pinpoint exactly where in a pattern I failed:
jp-grep 'this is some long complicated pattern, \hXX see?'Error parsing regex: Invalid character 'X', expected hex digit
| this is some long complicated pattern, \hXX see?|^
That’s pretty neat and I hope helpful! 😄
Expanding past CodeCrafters
Overall, I’m pretty happy with this project. It’s got a pretty decent chunk of code, including…
$ jp-grep -c -v -e '//' -e '^\s*$' **/*.rs
1241
…over 1000 lines of Rust code, including tests but not blank lines or comments. 😄
I’ll probably pick this up at least once more.
Now… will I actually use this? Probably not. But it was certainly interesting to write.
Other than that, was CodeCrafters actually helpful for this? Middling. It was the kick I needed to actually do it (I’ve been meaning to write this for years at this point) and once I was started, I could finish it. On the other hand, the output format they require was a bit annoying at times, I’ve mostly moved away from that.
Still, worth I think. I’ll probably continue to do their free programs. Kafka is up next. Whee servers!
Didn’t I just do one of these? Well, yes. Yes I did. But I love building compilers and interpreters, so when I saw this one was in beta (and thus free 😉), I had to try it!
It’s directly an implemention of the Lox languages from the Crafting Interpreters website / book (my review), if incomplete. By the end of the lesson, we’ll have:
A tokenizer that handles parentheses, braces, operators (single and multiple character), whitespace, identifiers, string literals, numeric literals, and keywords A parser that can take those tokens and build an abstract syntax tree using recursive descent parsing A simple tree walking interpreter for some subset of the language It doesn’t handle all of the syntax (yet).
Another Rust Solvers puzzle: Cosmic Express. Basically, it’s a routefinding puzzle. You have a train that needs a track from entrance to exit, picking up and dropping off cargo on the way.
It’s actual a relatively simple puzzle, so far as things go, but one thing that’s interesting from a solving perspective is that branching paths really don’t work great with my solver code. Paths just have a crazy branching factor when compared to (for example) playing one of a handful of cards.
I recently stumbled across CodeCrafters again1. In a nutshell, they give a number of ‘Build Your Own…’ courses, each of which will automatically create a repo for you, guide you through solving the program step by step, and provide some feedback on the way.
On one hand, it’s a freemium (one problem a month is free) / paid service. I wish they had tiers. I really think their monthly fee is a bit steep for what they offer (we’ll come back to that). But on the other hand, it’s a neat tool and I’ve been wanting some more larger programming projects to learn more Rust on, so away we go!
One of the problems (of a sorts) I’ve been having with my series on Rust Solvers is that, for each input puzzle, I need a way to save one or more ‘known good’ solutions so that when I change and add new functionality, I can verify that I’ve either not changed the solution or found another valid one.
So far, I’d been building this into each solution. While this worked perfectly fine, it’s a bit annoying to copy and paste to each binary, and then have to edit each test case with the answers.
This time around, we’re going to solve Golf Peaks. I picked this up a while ago on iOS, but only recently on Steam. It’s a cute little puzzle game themed around minigolf.
Basically, you’re on a grid and you have to get the ball (in the bottom in that screenshot above) to the flag (currently at the top). You have a set list of moves you can take, styled as cards–all of which either move a certain number of tiles in a specific direction or possibly jump into the air (and fly over obstacles).
It gets more complicated from there, but hopefully you have the basic idea. 😄
The basic idea is you have a field of elements with (chemical accurate) free electrons):
Here we have 4 hydrogens (1 bond each) and a carbon (4 bonds). It should seem pretty obvious that the carbon should end up with a hydrogen on each end. The one last bit of interest: the element with the dashed border is the one we actually control, that will never change.
This eventually gets more complicated, adding:
Modifiers that are placed on the map between squares:
One that strengthens bonds, turning a single bond into double into triple
One that weakens bonds, turning triple to double to single or breaking single bonds
One that rotates bonds as you move by it
More elements, eventually hydrogen (1), oxygen (2), nitrogen (3), carbon (4), and helium (0)
Solutions that require forming multiple elements at the same time
It’s a pretty neat puzzle game with 144 levels of increasing difficulty. Perfect to solve.
I enjoy puzzle games. I especially enjoy letting computers solve them for me 😄. Once upon a time, I set up a framework for solving random things. Let’s solve some more.
It’s a Sokoban about making snowmen! You can push snowballs of three sizes around, collecting snow if you roll over it. You can push smaller snowballs onto bigger ones, stacking them. Or back off, in order to get around one another.
And that’s really it.
There are some interesting twists (multiple snowmen, the ability to leave and re-enter levels, and even a whole second ‘hard mode’), but at a basic level, it’s just pushing.
You’ve probably seen Neil.fun’s Infinite Craft game somewhere on the internet. If not, in a nutshell:
You start with 4 blocks: Earth, Fire, Water, and Wind.
You can combine any two blocks, for example:
Earth + Water = Plant
Plant + Fire = Smoke
Smoke + Smoke = Cloud
That’s… pretty much it, from a gameplay perspective. There’s not really any goal, other than what you set yourself (try to make Cthulhu!). Although if you manage to find something no one has ever made before, you get a neat little note for it!
So wait, what do I mean by ‘something no one has ever seen before’?
Well, if two elements have ever been combined by anyone before, you get a cached response. Barring resets of the game (no idea if / how often this has happened, but I assume it has), if A + B = C for you, A + B = C for everyone.
And here’s the fun part: if you find a combination no one has ever found before: Neil.fun will send the combination out to an LLM to generate the new answer. The specific prompt isn’t public (so far as I know), but essentially what that means is that you have a basically infinite crafting tree1!
So of course seeing something like this I want to automate it. 😄