AnnGram - Initial GUI


Basically, I got tired of modifying the command line every time I wanted to test new values.  To that end, I spent a small bit of time coding up a GUI to make further experiments easier.

AnnGram - Cosine Distance


The first algorithm that I’ve chosen to implement is a simple cosine difference between the n-gram vectors.  This was the first method used in multiple of the papers that I’ve read and it seems like a good benchmark.

Essentially, this method gives the similarity of two n-gram documents (either Documents or Authors) as an angle ranging from 0 (identical documents) to \pi/2 (completely different documents).  Documents written by the same author should have the lowest values.

New Years at Hemlock Cliffs

A New Year’s day hike along the trains at Hemlock Cliffs in southern Indiana. Flickr Galleries require JavaScript to view View on Flickr

AnnGram - Framework

Document Framework

The first portion of the framework that it was necessary to code was the ability to load documents.  To reduce the load on the processor when first loading the document, only a minimal amount of computation is done.  Further computation is pushed off until necessary.

To avoid duplicating work, the n-grams are stored using memoization.  The basic idea is that when a function (in this case, a particular length of n-gram) is first requested, the calculation is done and the result is stored in memory.  During any future calls, the cached result is directly returned, greatly increasing speed at the cost of memory.  Luckily, modern computers have more than sufficient memory for the task at hand.

AnnGram - Overview

Basic Premise

For my senior thesis at Rose-Hulman Institute of Technology, I am attempting to combine the fields of Computational Linguistics and Artificial Intelligence in a new and useful manner.  Specifically, I am planning on making use of Artificial Neural Networks to enhance the performance of n-gram based document classification.  Over the next few months, I will be updating this category with background and information and further progress.

First, I’ll start with some basic background information.

Macros Without Bugs or Flowers III - Death

I submitted “Death” for the Macros Without Bugs or Flowers III challenge at DPChallenge. I came in 170 out of 196 with a score of 4.38 / 10. Granted, it’s not really a macro shot by the truest definition, but I did have a lot of fun reading the comments… WARNING IMAGE 4151221101 COULD NOT BE FOUND



Sandbox - Bugfix

Quick bug fix (plus one new simple feature) for Sandbox. Bug fix: Automatically default to first item on startup Allow particles to be placed while paused New features: Number keys select corresponding placeable particle type Downloads: If you may not have .NET 3.5: click here To install SDL.NET: **click here ** Controls: Esc/Q – Quit the program B – Toggle border behavior P – Pause / Unpause Space – Advance the simulation one step (when paused) Left-click – Add a blob of the current kind of particle Right-click – Remove a blob of any kind of particle 1-9 – Select the corresponding kind of particle particle