The earliest memory I have of ‘programming’ is in the early/mid 90s when my father brought home a computer from work. We could play games on it … so of course I took the spreadsheet program he used (LOTUS 123, did I date myself with that?) and tried to modify it to print out a helpful message for him. It … halfway worked? At least I could undo it so he could get back to work…

After that, I picked up programming for real in QBASIC (I still have a few of those programs lying around), got my own (junky) Linux desktop from my cousin, tried to learn VBasic (without a Windows machine), and eventually made it to high school… In college, I studied computer science and mathematics, mostly programming in Java/.NET, although with a bit of everything in the mix. A few of my oldest programming posts on this blog are from that time.

After that, on to grad school! Originally, I was going to study computational linguistics, but that fell through. Then programming languages (the school’s specialty). And finally I ended up studying censorship and computer security. That’s about where I am today!

But really, I still have a habit of doing a little bit of everything. Whatever seems interesting at the time!

Posts

Runelang: The Parser (Part 2: Expressions)

Earlier this week, we started parsing, getting through groups, nodes, params, and lists. A pretty good start, but it also leaves out two very powerful things (expressions and defines), one of which we absolutely do need to start actually evaluating things: expressions. Since we use them in every param, we pretty much need to know how to parse them, so let’s do it!

read more...


Runelang: The Parser (Part 1)

I’m still here! And less sick now.

Last time(s), we described and lexed Runelang! This time around, let’s take the lexed tokens and go one step further and parse them!

So, how do we go about this? With a recursive descent parser!

  • Start with a list/stream of tokens
  • Using the first k (in a LL(k) parser) elements of the list, identify which sort of object we are parsing (a group / identifier / literal / expression / etc)
  • Call a parsing function for that object type (parseGroup etc) that will:
    • Recursively parse the given object type (this may in turn call more parse functions)
    • Advance the token stream ‘consuming’ any tokens used in this group so the new ‘first’ element is the next object

read more...


Runelang: The Lexer

Let’s LEX!

So this is actually one of the easier parts of a programming language. In this case, we need to turn the raw text of a program into a sequence of tokens / lexemes that will be easier to parse. In this case, we want to:

  • Remove all whitespace and comments
  • Store the row and column with the token to make debugging easier

So let’s do it!

read more...


Runelang: Language Specification

Previously, I wrote a post about making a DSL in Ruby that could render magic circles/runes. It worked pretty well. I could turn things like this:

rune do
    scale 0.9 do 
        circle
        polygon 7
        star 14, 3
        star 7, 2
        children 7, scale: 1/8r, offset: 1 do |i|
            circle
            invert do
                text (0x2641 + i).chr Encoding::UTF_8
            end
        end
    end
    scale 0.15 do
        translate x: -2 do circle; moon 0.45 end
        circle
        translate x: 2 do circle; moon 0.55 end
    end
end

Into this:

But… I decided to completely rewrite it. Now it’s an entirely separate language:

Output

Source

Log (most recent messages first):

    read more...


    Go is faster than Python? (an example parsing huge JSON logs)

    Recently at work I came across a problem where I had to go through a year’s worth of logs and corelate two different fields across all of our requests. On the good side, we have the logs stored as JSON objects (archived from Datadog which collects them). On the down side… it’s kind of a huge amount of data. Not as much as I’ve dealt with at previous jobs/in some academic problems, but we’re still talking on the order of terabytes.

    On one hand, write up a quick Python script, fire and forget. It takes maybe ten minutes to write the code and (for this specific example) half an hour to run it on the specific cloud instance the logs lived on. So we’ll start with that. But then I got thinking… Python is supposed to be super slow right? Can I do better?

    (Note: This problem is mostly disk bound. So Python actually for the most part does just fine.)

    read more...


    A CLI Tool for Bulk Processing Github Dependabot Alerts (with GraphQL!)

    Dependabot is … somewhat useful. When it comes to letting you know that there are critical issues in your dependencies that can be fixed simply by upgrading the package (they did all the work for you*). The biggest problem is that it can just be insanely noisy. In a busy repo with multiple Node.JS codebases (especially), you can get dozens to even hundreds of reports a week. And for each one, you optimally would update the code… but sometimes it’s just not practical. So you have to decide which updates you actually apply.

    So. How do we do it?

    Well the traditional rest based Github APIs don’t expose the dependabot data, but the newer GraphQL one does! I’ll admit, I haven’t used as much GraphQL as I probably should, it’s… a bit more complicated than REST. But it does expose what I need.

    read more...


    A simple Flask Logging/Echo Server

    A very simple server that can be used to catch all incoming HTTP requests and just echo them back + log their contents. I needed it to test what a webhook actually returned to me, but I’m sure that there are a number of other things it could be dropped in for.

    It will take in any GET/POST/PATCH/DELETE HTTP request with any path/params/data (optionally JSON), pack that data into a JSON object, and both log that to a file (with a UUID1 based name) plus return this object to the request.

    Warning: Off hand, there is already a potential security problem in this regarding DoS. It will happily try to log anything you throw at it, no matter how big and will store those in memory first. So long running requests / large requests / many requests will quickly eat up your RAM/disk. So… don’t leave this running unattended? At least not without additional configuration.

    That’s it! Hope it’s helpful.

    read more...


    Pulling more than 5000 logs from datadog

    Datadog is pretty awesome. I wish I had it at my previous job, but better late than never. In particular, I’ve used it a lot for digging through recent logs to try to construct various events for various (security related) reasons.

    One of the problems I’ve come into though is that eventually you’re going to hit the limits of what datadog can do. In particular, I was trying to reconstruct user’s sessions and then check if they made one specific sequence of calls or another one. So far as I know, that isn’t directly possible, so instead, I wanted to download a subset of the datadog logs and process them locally.

    Easy enough, yes? Well: https://stackoverflow.com/questions/67281698/datadog-export-logs-more-than-5-000

    Turns out, you just can’t export more than 5000 logs directly. But… they have an API with pagination!

    read more...


    AoC 2021 Day 25: Cucumbinator

    Source: Sea Cucumber

    Part 1: Load a grid of empty cells (.), east movers (>), and south movers (v). Each step, move all east movers than all south movers (only if they can this iteration). Wrap east/west and north/south. How many steps does it take the movers to get stuck?

    read more...