AnnGram - nGrams vs Words

Overview

For another comparison, I’ve been looking for a way to replace the nGrams with another way of turning a document into a vector.  Based on word frequency instead of nGrams, I’ve run a number of tests to see how the accuracy and speed of the algorithm compares for the two.

nGrams

I still intend to look into why the Tragedy of Macbeth does not stay with the rest of Shakespeare’s plays.  I still believe that it is because portions of it were possible written by another author.

read more...


AnnGram vs k-means

Overview As a set of benchmarks to test whether or not the new AnnGram algorithm is actually working correctly, I’ve been trying to come up with different yet similar methods to compare it too. Primarily, there are two possibilities: Replace the nGram vectors with another form Process the nGrams using something other than Self-Organizing Maps I’m still looking through the related literature to decide if there is some way to use something other than the nGrams to feed into the SOM; however, I haven’t been having any luck.

read more...


AnnGram - New GUI

The old GUI framework just wasn’t working out (so far as adding new features went).  So, long story short, I’ve switched GUI layout.

read more...


AnnGram - Ideas for improvement

After my meeting yesterday with my thesis advisers, I have a number of new ideas to try to improve the efficiency of the neural networks.  The most promising of those are described below.

Sliding window

The first idea was to replace the idea of applying the most common frequencies directly with a sliding window (almost a directly analogue to the nGrams themselves).  The best way that we could come up to implent this would be to give the neural networks some sort of memory which brought up recurring networks (see below).

read more...


AnnGram - NeuralNetwork Library

I’ve been looking for a good Neural Network library to use with the AnnGram project and so far I’ve come across a couple of possibilities:

C# Neural network library

The top link on Google was an aptly named C# Neural network library.  Overall, it looks clean and easy to use and is licensed under the GPL, so should work well for my needs.   The framework has two types of training methods: genetic algorithms and backward propagation.  In addition, there are at least three different activation functions included: linear, signmoid, and heaviside functions.  The main problem with this framework is the spare documentation.  The only that I’ve been able to find so far is a generated API reference and a few examples (using their included GUI framework).

read more...


AnnGram - Initial GUI

Overview

Basically, I got tired of modifying the command line every time I wanted to test new values.  To that end, I spent a small bit of time coding up a GUI to make further experiments easier.

read more...


AnnGram - Cosine Distance

Overview

The first algorithm that I’ve chosen to implement is a simple cosine difference between the n-gram vectors.  This was the first method used in multiple of the papers that I’ve read and it seems like a good benchmark.

Essentially, this method gives the similarity of two n-gram documents (either Documents or Authors) as an angle ranging from 0 (identical documents) to \pi/2 (completely different documents).  Documents written by the same author should have the lowest values.

read more...