Research

Once upon a time, I was on track to get a PhD in censorship/computer security. I was ABD (all but dissertation) when my advisor decided to leave and go into the private sector. When that happens… you either find a new advisor or you go with them. I decided to go with them, move to Silicon Valley, and join a startup. It was perhaps the best thing that could have happened to me. While I sometimes regret not having the extra letters after my name, I love the practicality of working in the ‘real world’. Not to mention the job prospects are better. :)

So for the most part, these posts are archival, but there are still a few gems in there.

All posts

2013-08-17: USENIX 2013 - Day 3
2013-08-16: USENIX 2013 - Day 2
2013-08-15: USENIX 2013 - Day 1
2013-08-14: FOCI 2013
2013-08-13: Usenix/FOCI 2013 - Five incidents, one theme: Twitter spam as a weapon to drown voices of protest
2013-02-09: ISMA 2013 AIMS-5 - DNS Based Censorship
2013-02-09: AIMS-5 - Day 3
2013-02-08: AIMS-5 - Day 2
2013-02-07: AIMS-5 - Workshop on Active Internet Measurements
2013-01-31: Scanning for DNS resolvers
2012-10-22: ISC/CAIDA Workshop
2012-08-06: Usenix/FOCI 2012 - Inferring Mechanics of Web Censorship Around the World
2011-08-17: Facebot: An Undiscoverable Botnet Based on Treasure Hunting Social Networks
2010-12-13: AudioVision: A Stereophonic Analogue to Visual Systems
2010-05-21: FLAIRS 2010 - Augmenting n-gram Based Authorship Attribution With Neural Networks
2010-03-17: AnnGram - nGrams vs Words
2010-02-10: AnnGram vs k-means
2010-02-03: AnnGram - Self-Organizing Map GUI
2010-01-28: AnnGram - New GUI
2010-01-21: AnnGram - Neural Network Progress
2010-01-15: AnnGram - Ideas for improvement
2010-01-13: AnnGram - Initial ANN Results
2010-01-12: AnnGram - NeuralNetwork Library
2010-01-05: AnnGram - Initial GUI
2010-01-01: AnnGram - Cosine Distance
2009-12-29: AnnGram - Framework
2009-12-21: AnnGram - Overview
2009-03-07: SIGCSE 2009 - RASQL Query Grammar Conversion Project
2009-02-26: AudioVision Update
2009-01-19: AudioVision Update
2009-01-05: AudioVision Update
2008-12-19: AudioVision Overview

AnnGram - Ideas for improvement

2010-01-15

After my meeting yesterday with my thesis advisers, I have a number of new ideas to try to improve the efficiency of the neural networks. The most promising of those are described below.

Sliding window

The first idea was to replace the idea of applying the most common frequencies directly with a sliding window (almost a directly analogue to the nGrams themselves). The best way that we could come up to implent this would be to give the neural networks some sort of memory which brought up recurring networks (see below).

AnnGram - Initial ANN Results

2010-01-13

Overview

For now, I’ve chosen to work with C# Neural network library. It was the easiest to get off the ground and running, so it seemed like a good place to start.

AnnGram - NeuralNetwork Library

2010-01-12

I’ve been looking for a good Neural Network library to use with the AnnGram project and so far I’ve come across a couple of possibilities:

C# Neural network library

The top link on Google was an aptly named C# Neural network library. Overall, it looks clean and easy to use and is licensed under the GPL, so should work well for my needs. The framework has two types of training methods: genetic algorithms and backward propagation. In addition, there are at least three different activation functions included: linear, signmoid, and heaviside functions. The main problem with this framework is the spare documentation. The only that I’ve been able to find so far is a generated API reference and a few examples (using their included GUI framework).

AnnGram - Initial GUI

2010-01-05

Overview

Basically, I got tired of modifying the command line every time I wanted to test new values. To that end, I spent a small bit of time coding up a GUI to make further experiments easier.

AnnGram - Cosine Distance

2010-01-01

Overview

The first algorithm that I’ve chosen to implement is a simple cosine difference between the n-gram vectors. This was the first method used in multiple of the papers that I’ve read and it seems like a good benchmark.

Essentially, this method gives the similarity of two n-gram documents (either Documents or Authors) as an angle ranging from 0 (identical documents) to \pi/2 (completely different documents). Documents written by the same author should have the lowest values.

AnnGram - Framework

2009-12-29

Document Framework

The first portion of the framework that it was necessary to code was the ability to load documents. To reduce the load on the processor when first loading the document, only a minimal amount of computation is done. Further computation is pushed off until necessary.

To avoid duplicating work, the n-grams are stored using memoization. The basic idea is that when a function (in this case, a particular length of n-gram) is first requested, the calculation is done and the result is stored in memory. During any future calls, the cached result is directly returned, greatly increasing speed at the cost of memory. Luckily, modern computers have more than sufficient memory for the task at hand.

AnnGram - Overview

2009-12-21

Basic Premise

For my senior thesis at Rose-Hulman Institute of Technology, I am attempting to combine the fields of Computational Linguistics and Artificial Intelligence in a new and useful manner. Specifically, I am planning on making use of Artificial Neural Networks to enhance the performance of n-gram based document classification. Over the next few months, I will be updating this category with background and information and further progress.

First, I’ll start with some basic background information.

SIGCSE 2009 - RASQL Query Grammar Conversion Project

2009-03-07

Abstract: While variety of language structures, there still exists room for an extensible grammatical structure based through the implementation of such a grammar, translations between languages can be made relatively easy using preexisting tools such as XSLT.

AudioVision Update

2009-02-26

The quarter is ending and so is my current work on AudioVision. I have successfully managed to convert a basic two camera view into stereophonic 3d audio, using OpenCV (C++). I hope to continue this work some time in the future, so keep an eye out here for any future developments.

AudioVision Update

2009-01-19

Since deciding that I cannot use MATLAB because of the additional addons necessary to use webcams, I have been deciding between C# and Python as the next language to try. I’ve settled on Python for the time being, using VideoCapture to connect to the webcams and Numpy to process the data. It turns out that Python + VideoCapture + Numpy is actually rather similar in functionality and syntax to MATLAB with its image processing library.

JP's Blog

Research

All posts

Recent posts

AnnGram - Ideas for improvement

AnnGram - Initial ANN Results

AnnGram - NeuralNetwork Library

AnnGram - Initial GUI

AnnGram - Cosine Distance

AnnGram - Framework

AnnGram - Overview

SIGCSE 2009 - RASQL Query Grammar Conversion Project

AudioVision Update

AudioVision Update