The earliest memory I have of ‘programming’ is in the early/mid 90s when my father brought home a computer from work. We could play games on it … so of course I took the spreadsheet program he used (LOTUS 123, did I date myself with that?) and tried to modify it to print out a helpful message for him. It … halfway worked? At least I could undo it so he could get back to work…

After that, I picked up programming for real in QBASIC (I still have a few of those programs lying around), got my own (junky) Linux desktop from my cousin, tried to learn VBasic (without a Windows machine), and eventually made it to high school… In college, I studied computer science and mathematics, mostly programming in Java/.NET, although with a bit of everything in the mix. A few of my oldest programming posts on this blog are from that time.

After that, on to grad school! Originally, I was going to study computational linguistics, but that fell through. Then programming languages (the school’s specialty). And finally I ended up studying censorship and computer security. That’s about where I am today!

But really, I still have a habit of doing a little bit of everything. Whatever seems interesting at the time!

Novel compression

Last week on /r/dailyprogrammer, there was a neat trio of posts all about a new compression algorithm:

More specifically, we’re going to represent compressed text with the following rules:

  • If the chunk is just a number (eg. 37), word number 37 from the dictionary (zero-indexed, so 0 is the 1st word) is printed lower-case.
  • If the chunk is a number followed by a caret (eg. 37^), then word 37 from the dictionary will be printed lower-case, with the first letter capitalised.
  • If the chunk is a number followed by an exclamation point (eg. 37!), then word 37 from the dictionary will be printed upper-case.
  • If it’s a hyphen (-), then instead of putting a space in-between the previous and next words, put a hyphen instead.
  • If it’s any of the following symbols: . , ? ! ; : (edit: also ’ and “), then put that symbol at the end of the previous outputted word.
  • If it’s a letter R (upper or lower), print a new line.
  • If it’s a letter E (upper or lower), the end of input has been reached.
  • edit: any other block of text, represent as a literal ‘word’ in the dictionary

Got it? Let’s go!

(If you’d like to follow along: full source)

read more...


Trigonometric Triangle Trouble

Yesterday’s post at /r/dailyprogrammer managed to pique my interest1:

A triangle on a flat plane is described by its angles and side lengths, and you don’t need all of the angles and side lengths to work out everything about the triangle. (This is the same as last time.) However, this time, the triangle will not necessarily have a right angle. This is where more trigonometry comes in. Break out your trig again, people.

read more...


Gorellian sorting

It’s been a while, so I figured I should get in a quick coding post. From /r/dailyprogrammer, we have this challenge:

The Gorellians, at the far end of our galaxy, have discovered various samples of English text from our electronic transmissions, but they did not find the order of our alphabet. Being a very organized and orderly species, they want to have a way of ordering words, even in the strange symbols of English. Hence they must determine their own order.

For instance, if they agree on the alphabetical order: UVWXYZNOPQRSTHIJKLMABCDEFG

Then the following words would be in sorted order based on the above alphabet order: WHATEVER ZONE HOW HOWEVER HILL ANY ANTLER COW

read more...


Caesar cipher

Here’s a 5 minute1 coding challenge from Programming Praxis:

A caeser cipher, named after Julius Caesar, who either invented the cipher or was an early user of it, is a simple substitution cipher in which letters are substituted at a fixed distance along the alphabet, which cycles; children’s magic decoder rings implement a caesar cipher. Non-alphabetic characters are passed unchanged. For instance, the plaintext PROGRAMMINGPRAXIS is rendered as the ciphertext SURJUDPPLQJSUDALV with a shift of 3 positions.

– Source: Wikipedia, public domain

read more...


Command line user agent parsing

Quite often when working with internet data, you will find yourself wanting to figure out what sort of device users are using to access your content. Luckily, if you’re using HTTP, there is a standard for that: The user-agent header.

Since I’m in exactly that position, I’ve added a new script to my Dotfiles that reads user agents on stdin, parses them, and writes them back out in a given format.

read more...


Combining sort and uniq

A fairly common set of command line tools (at least for me) is to combine sort and uniq to get a count of unique items in a list of unsorted data. Something like this:

$ find . -type 'f' | rev | cut -d "." -f "1" | rev | sort | uniq -c | sort -nr | head

2649 htm
1458 png
 993 cache
 612 jpg
 135 css
 102 zip
  99 svg
  60 gif
  45 js
  27 pdf

read more...