AnnGram - Cosine Distance

Overview

The first algorithm that I’ve chosen to implement is a simple cosine difference between the n-gram vectors.  This was the first method used in multiple of the papers that I’ve read and it seems like a good benchmark.

Essentially, this method gives the similarity of two n-gram documents (either Documents or Authors) as an angle ranging from 0 (identical documents) to \pi/2 (completely different documents).  Documents written by the same author should have the lowest values.

read more...


AnnGram - Framework

Document Framework

The first portion of the framework that it was necessary to code was the ability to load documents.  To reduce the load on the processor when first loading the document, only a minimal amount of computation is done.  Further computation is pushed off until necessary.

To avoid duplicating work, the n-grams are stored using memoization.  The basic idea is that when a function (in this case, a particular length of n-gram) is first requested, the calculation is done and the result is stored in memory.  During any future calls, the cached result is directly returned, greatly increasing speed at the cost of memory.  Luckily, modern computers have more than sufficient memory for the task at hand.

read more...


AnnGram - Overview

Basic Premise

For my senior thesis at Rose-Hulman Institute of Technology, I am attempting to combine the fields of Computational Linguistics and Artificial Intelligence in a new and useful manner.  Specifically, I am planning on making use of Artificial Neural Networks to enhance the performance of n-gram based document classification.  Over the next few months, I will be updating this category with background and information and further progress.

First, I’ll start with some basic background information.

read more...


Sandbox - Bugfix

Quick bug fix (plus one new simple feature) for Sandbox.

Bug fix:

  • Automatically default to first item on startup
  • Allow particles to be placed while paused

New features:

  • Number keys select corresponding placeable particle type

Downloads:

Controls:

  • Esc/Q – Quit the program
  • B – Toggle border behavior
  • P – Pause / Unpause
  • Space – Advance the simulation one step (when paused)
  • Left-click – Add a blob of the current kind of particle
  • Right-click – Remove a blob of any kind of particle
  • 1-9 – Select the corresponding kind of particle

particle

read more...


Sandbox - More user friendly

One more update on my quick schedule than it’s back to school so I’ll probably slow down for a while.  In any case, I’ve added the ability to change between different elements in the definitions files.

Two new changes to the definitions are the addition of placeability and colorful flags.  If placeability is set, the element will show up on the GUI to be placed.  If colorful is set, the colors will be varied slightly (see the screenshots).

Next up, seeing if I can come up with more elements to play with…

read more...


Sandbox - Interactivity

I know I’ve already updated this project twice within the past 24 hours, but third time’s a charm.  This time, it’s interactive!

I’m using the same rules as last time (with the tweaks I mentioned).  The main difference are that you can left-click anywhere on the screen to add a blob of fire or right-click to add a new blob of plant.  It’s still not really a game per-say, but it’s got the makings of one!

read more...


Sandbox - Reactions

So I stayed up entirely too late last night / this morning and decided to go ahead and add reacti0ns to Sandbox.  Turns out, it was far easier than anything that I’ve implemented thus far on this project.  I spent some of the day (when I wasn’t at the family Thanksgiving celebration) tweaking a few things to make it look a little better.

Basically, reactions have four parts: a core, reactants, a chance, and (possibly) a product.  The core is the particle that will be reacting.  The reactants (each given with a concentration) are the neighboring particles.  The chance adds a bit of randomness to reactions and allows particles to fade (see the fire below).  The product (if present) is the result of the reaction.

read more...


Sandbox - And so it begins

For the past few years, I’ve been fascinated by falling sand / particle simulation type games (like this one).  Enough so that I’ve set out to make one a fair number of times.  Each time, I’ve advanced my own techniques by a little bit, finding new and better ways to make digital sand.

This time around, I’m going to try to use C# with SDL.NET for all of my graphical work and a simple grid for all of the particle data.  Rather than looping over the grid, I will be using quadtrees to only update the regions that actually need to be updated.  So far the results are promising!

read more...


AudioVision Update

Since deciding that I cannot use MATLAB because of the additional addons necessary to use webcams, I have been deciding between C# and Python as the next language to try. I’ve settled on Python for the time being, using VideoCapture to connect to the webcams and Numpy to process the data. It turns out that Python + VideoCapture + Numpy is actually rather similar in functionality and syntax to MATLAB with its image processing library.

read more...