It’s been a bit, but I hope it’s worth it. StackLang, part 5: compiling to C!
Posts in this series:
Posts in StackLang:
This post:
Full source code for StackLang: github:jpverkamp/stacklang
The goal
So previously, I wrote an interpreter that can run StackLang programs. It works well enough and is great for hacking on features, but on the other hand:
- It’s pretty slow. Generating a 1024x768x8 Mandelbrot image takes ~2m15s even in release mode with the VM vs 2s in the initial compiler. No, that’s not a typo.
- It redoes the lex + parse + run on each run, versus building a binary that just runs
- The compiled executable is a bit larger (2.5MB vs 50kB; yes I know that’s not much either way)
- I just want to write a compiler!
The target
So the next question: what do we want to compile to?
- Assembly (x86 / ARM); directly compile without other tools (although an assembler is probably still necessary), best low level control
- A compiler backend (LLVM); something I’ve been meaning to learn
- Another language (C / Rust); leverage a lower level compiler for the final compilation steps
- WASM; compiles to the web, it’s stack based, so theoretically fits the language well
All together… I think that any of those would be interesting, but semi-randomly I think that I’m going to try compiling to C and using Clang behind that (which in turn ends up using LLVM, although not directly).
Code generation
Okay, first things first. I want to write a function compile(ast: Expression) -> String
. It should generate C code which then can be written to file (or stdout, whichever).
To start with, something like this:
/// Compile the AST into C code
pub fn compile(ast: Expression) -> String {
let mut lines = vec![];
lines.push(include_str!("../compile_c_includes/header.c").to_string());
lines.push(include_str!("../compile_c_includes/types.c").to_string());
lines.push(include_str!("../compile_c_includes/globals.c").to_string());
lines.push(include_str!("../compile_c_includes/coerce.c").to_string());
/// Helper function to compile a specific block to be output later
fn compile_block(
arity: (usize, usize),
body: &Vec<Expression>,
blocks: &mut Vec<Vec<String>>,
) -> usize {
...
}
/// Compile a single expression into strings
fn compile_expr(
expr: Expression,
blocks: &mut Vec<Vec<String>>
) -> Vec<String> {
...
}
// Compile the top level expression
let mut blocks = vec![];
match ast {
Expression::Group(body) => {
compile_block((0, 0), &body, &mut blocks);
}
_ => panic!("Unexpected top level expression: {:?}", ast),
}
// Forward declare all blocks
...
// Generate block functions
...
// Add the main function that setups up the initial stack
// and calls the top level block (block_0)
lines.push(include_str!("../compile_c_includes/main.c").to_string());
// Put it all together
lines.join("\n")
}
So there are a few things going on here:
- We’re generating a
vec
oflines
that will be combined at the end - The main compile function has two helper functions:
compile_block
will take a block of code ({ ... }
) and compile it into a C function, callingcompile_expr
once for each subexpressioncompile_expr
will compile a single expression, be it a literal, variable, name, lookup, or whatever (if it’s ablock
, callcompile_block
)
- It’s possible to add static C code to
lines
withinclude_str!
:header.c
has the includes and a number of#define
constantstypes.c
defines theValue
type (a tagged union struct)globals.c
stores global variables (thestack
andframe
)coerce.c
contains a function to automatically cast numbers when necessary (adding an integer + float should result in a float for example)main.c
defines theint main(int, char*)
function; sets up memory
Static C code
First, let’s talk about all of the static C code that I’m including in every program, no matter the content.
header.c
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TAG_NUMBER 0
#define TAG_NUMBER_INTEGER 1
#define TAG_NUMBER_RATIONAL 2
#define TAG_NUMBER_FLOAT 3
#define TAG_NUMBER_COMPLEX 4
#define TAG_STRING 16
#define TAG_BOOLEAN 17
#define TAG_BLOCK 18
Essentially, this is where I pull in headers (for boolean types, converting to/from ints, and doing some string I/O) + define the tags for types (see types.c
).
types.c
// Values on the stack
typedef struct
{
uint8_t type;
union
{
int64_t as_integer;
double as_float;
char *as_string;
bool as_boolean;
void *as_block;
};
} Value;
As mentioned, this is a tagged union for all values
. It stores the type
(one of the constants from header.c
) and the data. But because I’m using a union, all of the values are stored in the same memory–you have to make sure to read it as the right kind of data. It will always take as much memory as the largest of the values (int64_t
and double
are each 8 bytes), which can be wasteful, but such is life.
One interesting one in particular is the void *as_block
. This is a function pointer to another block_#
(we haven’t defined them yet) stored as a value.
globals.c
// The stack holding all values
Value *stack;
Value *stack_ptr;
// Frames holding the stack pointer for each block
Value **frames;
Value **frame_ptr;
This stores two arrays which I’ll allocate in main.c
. They’re not dynamic at this point.
Specifically, stack
points to base of the entire stack of values while stack_ptr
points to the current top of the stack
and will be incremented when pushing and decremented when popping.
frames
stores an array of pointers to the stack
pointer (yay **
) which is updated whenever a new scope is introduced (when invoking a block
), so that I know how many values to pop from stack_ptr
on return. Likewise, frame_ptr
is the current top of that stack.
It’s… certainly an interesting way to do it, but I think it works well enough.
coerce.c
// Convert two values to have the same type by upgrading if necessary
void coerce(Value *a, Value *b)
{
if (a->type == b->type)
{
return;
}
if (a->type == TAG_NUMBER_INTEGER && b->type == TAG_NUMBER_FLOAT)
{
a->type = TAG_NUMBER_FLOAT;
a->as_float = (double)a->as_integer;
}
if (a->type == TAG_NUMBER_FLOAT && b->type == TAG_NUMBER_INTEGER)
{
b->type = TAG_NUMBER_FLOAT;
b->as_float = (double)b->as_integer;
}
}
As mentioned, this is the function to automatically convert ints/floats when doing math on mixed types. So far, I don’t have the rest of the numeric tower I’m planning and I may eventually make manual casting required instead of this, but for the moment, this works pretty well.
I did, for a while, have a rather subtle bug here that did all sorts of interesting things. The last (contentful) line was b->as_float = (double)a->as_integer;
. So I ended up copying a
over b
, but only if a
was a float and b
an int. That took a while to find… I need better tests. 😄
main.c
And finally, the main function!
int main(int argc, char *argv[])
{
// The stack holding all values
stack = malloc(10240 * sizeof(Value));
stack_ptr = stack;
// Frames holding the stack pointer for each block
frames = malloc(10240 * sizeof(Value **));
frame_ptr = frames;
block_0(NULL);
return 0;
}
So right now, I’m just allocating two flat arrays of ~90kB each (10240 * (1 byte for the tag and 8 bytes for the value union)). I’ll probably make that dynamic at some point, but so far as making this work, it works well.
Then we call block_0
(I’ll get to the NULL
when talking names) and finally (if we make it this far) return
success. Voila!
compile_block
Okay, let’s actually talk about compilation. I’m going to start with compile_block
. This function call compile_expr
to actually compile each expression, but it also has to do a few more things:
- Calculate (or be given) the
arity
of the expression: how many values it will consume from the stack and how many it will push back on - Store the frame pointer–this will be offset by
arity_in
(which effectively consumes those values) - Compile the expressions
- Handle popping ’extra’ values off the stack so only
arity_out
values are left after the storedframe_ptr
when we’re done
Like this:
/// Helper function to compile a specific block to be output later
fn compile_block(
arity: (usize, usize),
body: &Vec<Expression>,
blocks: &mut Vec<Vec<String>>,
) -> usize {
log::debug!("compile_block({arity:?}, {body:?})");
let index = blocks.len();
blocks.push(vec![]); // Throwaway vec to hold the index
let mut lines = vec![];
let (arity_in, arity_out) = arity;
lines.push(format!(
" // block: {}",
body.iter()
.map(|ex| ex.to_string())
.collect::<Vec<String>>()
.join(" ")
));
lines.push(format!("\n // Store the current stack pointer with arity_in={arity_in}"));
lines.push(format!(" *(++frame_ptr) = (stack_ptr - {arity_in});\n"));
// Compile the block itself
for expr in body {
for line in compile_expr(expr.clone(), blocks) {
lines.push(line);
}
}
// Pop the block off the stack
lines.push(format!(" // Pop the block off the stack, preserving arity_out={arity_out} values"));
lines.push(format!(" Value* return_ptr = (stack_ptr - {arity_out});"));
lines.push(format!(" stack_ptr = *(frame_ptr--);"));
for _ in 0..arity_out {
lines.push(format!(" *(++stack_ptr) = *(++return_ptr);"));
}
blocks[index] = lines;
index
}
One thing that I’ve been making sure I do is generating comments in the C code. In a perfect world, these are never going to get looked at, since we’re going to immediately compile this C code to assembly/machine code. But in a practical world, knowing what code I’m generating is kind of important…
One thing to note is that all blocks have a single variable: names
. This one is a linked list of name
-> stack_ptr
associations that we’ve stored. Because it’s passed as a parameter to the block_#(...)
call, when we return, it ‘forgets’ all names
bound in this block or child blocks. I’ll get more into that in the names section.
This is also the first time seeing code like this:
*(++frame_ptr) = (stack_ptr - {arity_in});
Essentially, frame_ptr
is a pointer to a value in the frames
array. ++frame_ptr
is a pre-increment, so it’s saying advance the frame_ptr
to the next available value in the array before evaluating the rest of the code. *(++frame_ptr)
then unwraps one level (so rather than a frame, it’s a stack
value), which we then store the current stack_ptr
offset by arity_in
(essentially saying we’re going to pop that many values throughout this function).
Likewise, at the end:
Value* return_ptr = (stack_ptr - {arity_out});
stack_ptr = *(frame_ptr--);
*(++stack_ptr) = *(++return_ptr); // ... repeat arity_out times
This will use arity_out
to determine how many values from the top of the stack (stack_ptr
) we want to return. If it’s 0, we just won’t copy any values. But if it’s 3 (for example), we’ll start returning from the 3rd from the top of the stack.
Next, move the stack_ptr
to the frame_ptr
. Because it’s frame_ptr--
, we will take the top one and then only after we’re done, decrement it to pop this frame
. This frame_ptr
starts without arity_in
Then *(++stack_ptr) = *(++return_ptr);
will be called once for each of arity_in. That says: ‘set the next value on the stack to the next value to return.
It… makes sense to me? Perhaps a diagram will help. Let’s show a few examples.
In the first example, let’s show what would happen when calling dup
function that will pop 1 value and return 2:
# Initial state
stack: A B C
└ `stack_ptr`
# Call dup, generate a new frame pointer
# The `stack_ptr` contains the value passed to the function
# The `frame_ptr` does not, since it's treating it as those values will be popped
stack: A B C
│ └ `stack_ptr`
└ `frame_ptr`
# Dup runs, duplicating the C
stack: A B C C
│ └ `stack_ptr`
└ `frame_ptr`
# Preparing for return, 1) `return_ptr` needs to have two values to return
stack: A B C C
│ │ └ `stack_ptr`
│ └ `return_ptr`
└ `frame_ptr`
# 2) The `stack_ptr` is moved to `frame_ptr - 1` (since we increment before copying)
stack: A B C C
│ │ └ `return_ptr`
│ └ `frame_ptr`
└ `stack_ptr`
# Now (twice because of `arity_out`), we copy the value at `return_ptr` to `stack_ptr` and increment
# After the first copy:
stack: A C C C
│ └ `return_ptr`
└ `frame_ptr` + `stack_ptr`
# After the second copy:
stack: A C C C
│ │ └ `return_ptr`
│ └ `stack_ptr`
└ `frame_ptr`
# Finally, we return; because `stack_ptr` is where the stack
# will be on returning from the function; the last `C` (`return_ptr`)
# is effectively ignored / not able to be referenced any more
Hopefully that helps?
And that’s all we have for compile_block
. Now the real fun(tm), compile_expr
:
compile_expr
In a nutshell, we have this:
/// Compile a single expression into strings
fn compile_expr(expr: Expression, blocks: &mut Vec<Vec<String>>) -> Vec<String> {
log::debug!("compile_expr({expr})");
let mut lines = vec![];
lines.push(format!(" // {expr}")); // TODO: Flag for verbose mode
match expr {
Expression::Identifier(id) => {
...
},
...
}
lines
}
Where match expr
will generate code for each possible expression (see the sections below). Two interesting bits:
blocks: &mut Vec<Vec<String>>
- this stores the blocks as we compile; since we can haveblocks
inexprs
, we need to thread this through for when we see more blocks-> Vec<String>
- we return aVec
so that we can return 0, 1, or more lines; if an expression doesn’t generate code, it should return 0. Most will return more than 1: a comment + the actual generated code
So let’s go through the generation!
Expression::Identifier
Expression::Identifier(id) => {
match id.as_str() {
// Built in numeric functions
"+" => numeric_binop!(lines, "+"),
"-" => numeric_binop!(lines, "-"),
"*" => numeric_binop!(lines, "*"),
"/" => numeric_binop!(lines, "/"),
"%" => numeric_binop!(lines, "%"),
// Built in numeric comparisons
"<" => numeric_compare!(lines, "<"),
"<=" => numeric_compare!(lines, "<="),
"=" => numeric_compare!(lines, "=="),
"!=" => numeric_compare!(lines, "!="),
">=" => numeric_compare!(lines, ">="),
">" => numeric_compare!(lines, ">"),
// Built ins
"read" => lines.push(include_str!("../compile_c_includes/builtins/read.c").to_string()),
"write" => lines.push(include_str!("../compile_c_includes/builtins/write.c").to_string()),
"writeln" => {
lines.push(include_str!("../compile_c_includes/builtins/write.c").to_string());
lines.push("printf(\"\\n\");".to_string());
},
"newline" => lines.push("printf(\"\\n\");".to_string()),
"loop" => lines.push(include_str!("../compile_c_includes/builtins/loop.c").to_string()),
"if" => lines.push(include_str!("../compile_c_includes/builtins/if.c").to_string()),
"to_float" => lines.push(include_str!("../compile_c_includes/builtins/to_float.c").to_string()),
"to_int" => lines.push(include_str!("../compile_c_includes/builtins/to_int.c").to_string()),
// Attempt to lookup in names table
id => {
let id = sanitize_name(id);
lines.push(format!(
"
{{
Value* v = lookup(names, NAME_{id});
if (v->type == TAG_BLOCK) {{
void *f = v->as_block;
((void (*)(Name*))f)(names);
}} else {{
*(++stack_ptr) = *v;
}}
}}
"
));
}
}
}
So there are threeish categories here:
- builtins numeric functions generated by macro
- builtin functions included as C files
- variable lookups
For numeric functions, we do much like we did in the VM version:
/// A helper macro to generate functions that operate on two integers and floats
macro_rules! numeric_binop {
($lines:expr, $op:literal) => {{
let op = stringify!($op).to_string().trim_matches('"').to_string();
$lines.push(format!("
{{
Value *b = stack_ptr--;
Value *a = stack_ptr--;
coerce(a, b);
if (a->type == TAG_NUMBER_INTEGER) {{
Value result = {{.type=TAG_NUMBER_INTEGER, .as_integer=a->as_integer {op} b->as_integer}};
*(++stack_ptr) = result;
}} else if (a->type == TAG_NUMBER_FLOAT) {{
Value result = {{.type=TAG_NUMBER_FLOAT, .as_float=a->as_float {op} b->as_float}};
*(++stack_ptr) = result;
}}
}}
"));
}};
}
It’s got fun {{ double brackets }}
because otherwise those are template variables (which I do use for op
), but other than that, all we do is:
- pop two values with
Value *b = stack_ptr--
(gets the value and then decrements the pointer) coerce
- perform the function, creating a new value
- push the new value onto the stack with
*(++stack_ptr) = result;
numeric_compare
is much the same, except instead we create a new TAG_BOOLEAN
value.
The next case is the builtins. Those are just including C code, which you can see in the repo github:jpverkamp/stacklang:compile_c_includes/builtins.
For example, if
includes builtins/if.c
:
{
Value cond = *(stack_ptr--);
Value if_false = *(stack_ptr--);
Value if_true = *(stack_ptr--);
if (cond.type != TAG_BOOLEAN)
{
printf("Error: if condition must be a boolean\n");
exit(1);
}
Value v = (cond.as_boolean ? if_true : if_false);
if (v.type == TAG_BLOCK)
{
void *f = v.as_block;
((void (*)(Name *))f)(names);
}
else
{
*(++stack_ptr) = v;
}
}
Oh that ((void (*)(Name *))f)(names);
line… Essentially, that’s taking the as_block
, which has to be stored as an arbitrary (void*
) pointer and then casts it to a function pointer of the proper type (take a Name*
and return nothing), which is then called with the names
linked list.
Finally, if the Identifier
isn’t a builtin function, assume it’s a variable and look it up:
lines.push(format!(
"
{{
Value* v = lookup(names, NAME_{id});
if (v->type == TAG_BLOCK) {{
void *f = v->as_block;
((void (*)(Name*))f)(names);
}} else {{
*(++stack_ptr) = *v;
}}
}}
"
));
Then, if it’s not a block
, push that variable’s Value
onto the stack as always. But if it is a block
, then we need to call it, as we did in if
above.
So that’s a lot. 😄
Expression::Literal
Literals are much easier.
Expression::Literal(value) => {
let (tag, field, value) = match value {
// TODO: additional numeric tyhpes
Value::Number(Number::Integer(v)) => {
("TAG_NUMBER_INTEGER", "integer", v.to_string())
}
Value::Number(Number::Float(v)) => ("TAG_NUMBER_FLOAT", "float", v.to_string()),
Value::String(v) => ("TAG_STRING", "string", format!("{v:?}")),
Value::Boolean(v) => ("TAG_BOOLEAN", "boolean", format!("{v:?}")),
Value::Block { .. } => panic!("Blocks should be compiled separately"),
};
lines.push(format!(
"
{{
Value v = {{.type={tag}, .as_{field}={value}}};
*(++stack_ptr) = v;
}}
"
));
}
For each type of literal, determine what the tag should be and generate the necessary code for it. Then push it onto the stack.
There shouldn’t be direct block
literals at the moment. panic!
Expression::Block
Expression::Block(ref body) => {
let arity = calculate_arity(&expr)
.expect(format!("Unable to calculate arity for block: {:?}", expr).as_str());
let index = compile_block(arity, body, blocks);
lines.push(format!(
"
{{
Value v = {{.type=TAG_BLOCK, .as_block=(void*)block_{index}}};
*(++stack_ptr) = v;
}}
"
));
}
This one is straight forward (mostly because the code is in compile_block
). We do calculate the arity ahead of time (I’m not sure why anymore–I think because we have the expr
here and only the body in calculate_block
?), but then we just compile the block, get the new index for this block, and push a Value::Block
onto the stack containing a pointer to the new function.
Pretty cool how that just works.
Expression::List
todo!()
I haven’t actually implemented list expressions yet, but I also haven’t written any code that needs them yet!
Expression::Group
Groups are defined with ( ... )
(or there is an implicit one at the top level). All they are is a sequence of expressions, but this is exactly why we have the function return an arbitrary number of lines:
Expression::Group(exprs) => {
for expr in exprs {
for line in compile_expr(expr, blocks) {
lines.push(line);
}
}
}
Expression:At
Here’s a more interesting case. At expressions are implicitly tied into naming, so we’ll get into it more there. But for now, assume we have a function bind
in our C code that takes names
, a NAME_{id}
constant, and a pointer to the stack and binds the name to that stack location.
Then, there are two three cases:
@a
for singleIdentifier
binding@[a b c]
forList
binding of multiple variables at once@5
for arity clauses, these just shouldn’t generate code
Expression::At(expr) => {
match expr.as_ref() {
Expression::Identifier(id) => {
let id = sanitize_name(id);
lines.push(format!(
"
{{
Value *p = stack_ptr;
names = bind(names, NAME_{id}, p);
}}
"
));
}
Expression::List(id_exprs) => {
let id_count = id_exprs.len();
for (i, id_expr) in id_exprs.iter().enumerate() {
match id_expr {
Expression::Identifier(id) => {
let id = sanitize_name(id);
lines.push(format!(
"
{{
Value *p = (stack_ptr - {id_count} + {i} + 1);
names = bind(names, NAME_{id}, p);
}}
"
));
}
_ => panic!("Unexpected @ expression when compiling: {}", expr),
}
}
}
Expression::Literal(Value::Number(Number::Integer(_))) => {} // ignore numeric @ expressions
_ => panic!("Unexpected @ expression when compiling: {}", expr),
}
}
Expression::Bang
These are eventually going to be for writing to named variables, but I’m not currently using them much. What we do have though is arity_out
expressions, which have a numeric field, a la !1
. We don’t want to generate code for them, but do need to handle them:
Expression::Bang(v) => {
match v.as_ref() {
Expression::Literal(Value::Number(Number::Integer(_))) => {}, // Used only for arity out expressions
_ => todo!(),
}
}
Expression::Dollar
$
expressions are essentially a simpler form of variable lookups. They always have an Identifier
and they always push the value on the stack, even if it’s a block. Essentially, this is to allow passing blocks to other blocks. It does make the code simpler though:
Expression::Dollar(expr) => match expr.as_ref() {
Expression::Identifier(id) => {
lines.push(format!(
"
{{
Value* v = lookup(names, NAME_{id});
*(++stack_ptr) = *v;
}}
"
));
}
_ => panic!("Unexpected $ expression when compiling: {}", expr),
}
Generating block code
Okay, that’s all well and good, but remember back in the original code when I said I still have to forward declare and generate all blocks? Well, now we have the blocks
, so let’s do that!
First:
// Forward declare all blocks
lines.push("\n// Forward declare all blocks".to_string());
for (i, _) in blocks.iter().enumerate() {
lines.push(format!("void block_{i}(Name *block_names);", i = i).to_string());
}
This comes up because each block
can theoretically call any other block in any order. Because of how C compilers work, you can’t call a function defined after you (by default), so what we’re doing here is just naming all of the block_#
functions we’re going to generate to make the compiler happy.
And then the code gen:
// Generate block functions
lines.push("\n// Actual block definitions".to_string());
for (i, block) in blocks.iter().enumerate() {
lines.push(format!("void block_{i}(Name *block_names) {{").to_string());
lines.push(format!(" if (block_names != NULL) block_names->boundary = true;").to_string());
lines.push(format!(" Name* names = block_names;").to_string());
lines.push(block.join("\n"));
lines.push(" // Free names bound in this block".to_string());
lines.push("
while (names != NULL && block_names != names) {
Name *next = names->prev;
free(names);
names = next;
}
".to_string());
lines.push("}".to_string());
}
A lot of that has to do with how we actually deal with named variables, which we’re almost to, I promise. In a nutshell though, we take in block_names
as a pointer to where in the names
linked list the block started at. We then copy names
which will be incremented as we evaluate this block (in At
expressions). At the end of the block, if there are any values between (if we bound any names), we free the memory those names used, since the calling function will no longer be able to see them.
It’s not how I originally did it, see the first version of named variables, but it does work!
Named variables
Okay, enough is enough. Let’s actually talk about names
.
As mentioned in the previous section, we have one more data structure: names
. This one, rather than a static array, is actually a linked list. Each name will be added to the end of this list, with the name
and stack_ptr
it refers to. To lookup, we’ll start at the end of this list and iterate backwards until we find name
. If we bound a variable multiple times, the last bound (so the closest in scope) will be seen first.
It’s pretty elegant, IMO. Here’s the code:
// Names linked list
typedef struct Name Name;
struct Name
{
bool boundary;
uint8_t name;
Value *value;
Name *prev;
};
Name *bind(Name *names, uint8_t name, Value *value)
{
Name *new_name = malloc(sizeof(Name));
if (new_name == NULL)
{
printf("Out of memory");
exit(1);
}
new_name->boundary = false;
new_name->name = name;
new_name->value = value;
new_name->prev = names;
return new_name;
}
// Lookup a value on the stack by name
Value *lookup(Name *names, uint8_t name)
{
while (names != NULL)
{
if (names->name == name)
{
return names->value;
}
names = names->prev;
}
printf("Name not found: %d", name);
exit(1);
}
Yes, it’s a bit weird that we don’t store next
, but we don’t need it. Just prev
to look back from the end of the list. And if we ever go all the way back, prev
will be NULL
and we know the variable isn’t defined (so error out).
One last interesting field is new_name->boundary
. This is set when a block
is called and essentially all it does is allow us to print a distinction between names
in different block_#
for debug mode.
With that, bind
will malloc
the memory for the new node, fill it out, and return the new end of list. lookup
(as mentioned) will start at the current end of the list and go back until we either find the name
or run out of names
.
I do store all name
as uint8_t
, which does require collecting all of the names at the start of the compile
function:
Collecting names
/// Collect the names used so we can assign each an integer value
fn collect_names(ast: &Expression) -> HashMap<String, usize> {
let mut names = HashMap::new();
fn collect_names_expr(expr: &Expression, names: &mut HashMap<String, usize>) {
match expr {
Expression::Identifier(_)
| Expression::Literal(_)
| Expression::Bang(_)
| Expression::Dollar(_) => {
// Do nothing, no names possible
}
Expression::List(_) => todo!(),
Expression::Block(exprs) => {
for expr in exprs {
collect_names_expr(expr, names);
}
}
Expression::Group(exprs) => {
for expr in exprs {
collect_names_expr(expr, names);
}
}
Expression::At(expr) => {
match expr.as_ref() {
Expression::Identifier(id) => {
let id = sanitize_name(id);
if !names.contains_key(&id) {
log::debug!("Adding name: {} @ {}", id, names.len());
names.insert(id.clone(), names.len());
}
}
Expression::List(id_exprs) => {
for id_expr in id_exprs {
match id_expr {
Expression::Identifier(id) => {
let id = sanitize_name(id);
if !names.contains_key(&id) {
log::debug!("Adding name: {} @ {}", id, names.len());
names.insert(id.clone(), names.len());
}
}
_ => panic!(
"Unexpected @ expression when collecting names: {}",
expr
),
}
}
}
Expression::Literal(Value::Number(Number::Integer(_))) => {} // ignore numeric @ expressions
_ => panic!("Unexpected @ expression when collecting names: {}", expr),
}
}
}
}
collect_names_expr(ast, &mut names);
names
}
This just iterates through the function, looking for anything that is used in an At
expression recursively and makes a unique set of them. It then can generate a define for each:
let names = collect_names(&ast);
log::debug!("collected names: {:?}", names);
for (name, index) in names.iter() {
lines.push(format!("#define NAME_{name} {index}"));
}
Not so bad.
Original idea
As a quick aside, this isn’t the first iteration of variable naming I wrote. Originally (and it works in most cases), the types.c
definition looked like this:
// Values on the stack
typedef struct
{
uint8_t type;
union
{
int64_t as_integer;
double as_float;
char *as_string;
bool as_boolean;
void *as_block;
};
uint8_t name_count;
uint8_t names[4];
// TODO: more than 4 names
} Value;
In essence, this would store up to 4 names
with each value on the stack. This would end up using up that extra memory even if a value was never named, but that’s not actually where the bug crept in: because names were always bound on the stack and never removed, when you called a child block
that used the same names as a parent block
, weird things(tm) could happen.
This… took a while to trace down before finally settling on the linked list version above.
Adding it to the main
function
Okay, we have all of the code now! So let’s add it to my main.rs
. Because we have two possible branches now, along with setting up logging and debug mode, it’s worth pushing out into a new function:
use clap::*;
use log;
use std::{io::BufReader, env};
mod debug;
mod numbers;
mod stack;
mod types;
mod arity;
mod compile_c;
mod lexer;
mod parser;
mod vm;
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Args {
#[arg(short, long)]
file: String,
#[arg(short, long)]
compile: bool,
#[arg(short, long)]
debug: bool,
}
fn main() {
pretty_env_logger::init();
let args = Args::parse();
let file = std::fs::File::open(args.file).unwrap();
let tokens = lexer::tokenize(BufReader::new(file));
log::info!(
"Tokens: {}",
tokens
.iter()
.map(|token| token.token.clone())
.collect::<Vec<String>>()
.join(" ")
);
let ast = parser::parse(tokens);
log::info!("AST:\n{:#?}", ast);
if args.compile {
let c_code = compile_c::compile(ast);
println!("{}", c_code);
} else {
vm::evaluate(ast);
}
}
clap is pretty cool!
Essentially, we read args, load the specified file, always tokenize, always generate an AST, and then based on the compile
flag, either run it with the VM or compile it to C.
That’s pretty shiny.
Debug mode
One bit missing though, debug mode!
There were essentially three options I had here:
- Thread a
debug_mode
flag through all of the other functions - Set an environment variable based on the command line flag (if specified)
- Create a crate with
debug::ENABLED
The third appealed to me… with ENABLED
being a static mut bool
. The only problem with that… is that it’s unsafe
:
// main.rs
fn main() {
pretty_env_logger::init();
let args = Args::parse();
// Debug flag override envs variable
if args.debug {
unsafe {
debug::ENABLED = true;
}
} else {
match env::var("STACKLANG_DEBUG") {
Ok(s) if s.to_lowercase() == "true" => unsafe {
debug::ENABLED = true;
},
_ => {},
}
}
unsafe {
env::set_var("RUST_LOG", "trace");
if debug::ENABLED {
log::debug!("Debug mode enabled");
}
}
...
}
// compile_c.rs
/// Compile the AST into C code
pub fn compile(ast: Expression) -> String {
...
unsafe {
if debug::ENABLED {
lines.push("char* get_name(int index) {".to_string());
for (name, index) in names.iter() {
lines.push(format!(" if (index == {index}) {{ return \"{name}\"; }}"));
}
lines.push(" return \"<unknown>\";".to_string());
lines.push("}".to_string());
}
}
...
}
I’m … not thrilled with that. There has to be a better way to do it, but I’ve not found it yet. And it works. For now!
Justfile
So now I can run with the --file
flag to specify file, --compile
to enable compilation mode (which actually only generates C code), and --debug
to enable debug mode. But that’s… a lot. So I made a Justfile:
both name:
just example {{ name }}
just example {{ name }} compile=true
example name compile="false" debug="false":
just example{{ if compile != "false" { "-compile" } else { "-run" } }}{{ if debug != "false" { "-debug" } else { "" } }} {{name}}
example-run name:
time cargo run --release -- --file examples/{{name}}.stack
example-compile name:
cargo run --release -- --file examples/{{name}}.stack --compile > output/{{name}}.c
clang -Ofast output/{{name}}.c -o output/{{name}}
time output/{{name}}
example-run-debug name:
cargo run -- --debug --file examples/{{name}}.stack
example-compile-debug name:
cargo run -- --debug --file examples/{{name}}.stack --compile > output/{{name}}.c
clang output/{{name}}.c -o output/{{name}}
output/{{name}}
Now I can run my code examples (with timing! and automatic clang!) with just example
:
$ just example fibonacci-acc
just example-run fibonacci-acc
time cargo run --release -- --file examples/fibonacci-acc.stack
Compiling stacklang v0.1.0 (/Users/jp/Projects/stacklang)
Finished release [optimized] target(s) in 3.76s
Running `target/release/stacklang --file examples/fibonacci-acc.stack`
12586269025
real 0m4.037s
user 0m0.048s
sys 0m0.047s
$ just example fibonacci-acc compile=true
just example-compile fibonacci-acc
cargo run --release -- --file examples/fibonacci-acc.stack --compile > output/fibonacci-acc.c
Finished release [optimized] target(s) in 0.05s
Running `target/release/stacklang --file examples/fibonacci-acc.stack --compile`
clang -Ofast output/fibonacci-acc.c -o output/fibonacci-acc
time output/fibonacci-acc
12586269025
real 0m0.097s
user 0m0.001s
sys 0m0.001s
Pretty cool.
And if I just want to compare VM vs compiler (as I did quite a lot while tracking down bugs):
$ just both fibonacci-acc
just example fibonacci-acc
just example-run fibonacci-acc
time cargo run --release -- --file examples/fibonacci-acc.stack
Finished release [optimized] target(s) in 0.08s
Running `target/release/stacklang --file examples/fibonacci-acc.stack`
12586269025
real 0m0.327s
user 0m0.044s
sys 0m0.036s
just example fibonacci-acc compile=true
just example-compile fibonacci-acc
cargo run --release -- --file examples/fibonacci-acc.stack --compile > output/fibonacci-acc.c
Finished release [optimized] target(s) in 0.02s
Running `target/release/stacklang --file examples/fibonacci-acc.stack --compile`
clang -Ofast output/fibonacci-acc.c -o output/fibonacci-acc
time output/fibonacci-acc
12586269025
real 0m0.035s
user 0m0.000s
sys 0m0.001s
Shiny.
Mandelbrot timing
As mentioned (and shown above), the C compiler is fast compared to the VM. Here’s a real example. Generating a 1024x768x8 Mandelbrot image.
$ echo -e "1024\n768\n8\n" | just example mandelbrot-read | convert - output/mandelbrot-vm.png
just example-run mandelbrot-read
time cargo run --release -- --file examples/mandelbrot-read.stack
Finished release [optimized] target(s) in 0.15s
Running `target/release/stacklang --file examples/mandelbrot-read.stack`
real 2m16.413s
user 1m59.204s
sys 0m2.758s
$ echo -e "1024\n768\n8\n" | just example mandelbrot-read compile=true | convert - output/mandelbrot-compile.png
just example-compile mandelbrot-read
cargo run --release -- --file examples/mandelbrot-read.stack --compile > output/mandelbrot-read.c
Finished release [optimized] target(s) in 0.02s
Running `target/release/stacklang --file examples/mandelbrot-read.stack --compile`
clang -Ofast output/mandelbrot-read.c -o output/mandelbrot-read
time output/mandelbrot-read
real 0m2.145s
user 0m2.062s
Yes. You read that write. From 2m16s down to 2s. Not too bad! 😄
Next steps
I know it’s been a while, but I hope it was worth it. Like last time, it’s so cool just seeing something like this working…
So what’s next?
- Type checking:
- Automatically determine specific types of expressions (including blocks)
- Automatically determine the
arity
of blocks when possible
- Numeric tower:
- Implement rationals/complex numbers at the parser level + in any interpreter / compiler I have at that point
- Interpreters:
- A bytecode interpreter/compiler, evaluating at a lower level (I’m not sure how much this would gain, the AST is already fairly low level)
- Compilers:
- Compile to WASM; since it’s also stack based, this should be interesting
- Compile to x86/ARM assembly
I think I want to work more on types next, possibly doing away with manual coercion. We’ll see.
Onwards, once again!