Research on jverkamp.com

USENIX 2013 - Day 3

Sat, 17 Aug 2013 00:00:14 +0000

Today’s the third and final day. Since I had to fly out in the afternoon, I didn’t get a chance to go to as many talks today, but so it goes. There was really interesting talk that I’m sad to have missed (Dismantling Megamos Crypto: Wirelessly Lockpicking a Vehicle Immobilizer) but it’ll be nice to be home. Here are the talks that I did make it to and found particularly interesting though:

USENIX 2013 - Day 2

Fri, 16 Aug 2013 00:00:33 +0000

Day 2/3. There’s really not much more to say, so how about getting right to the interesting talks:

USENIX 2013 - Day 1

Thu, 15 Aug 2013 00:00:30 +0000

Perhaps unsurprisingly, there were fewer papers today that I was particularly interested–given that FOCI is directly related to my area of research. Still, computer security is a very useful field and one that I’m keen to learn more about. I only went to two of the sessions today (it’s always unfortunate when they run two interesting sessions at the same time) and here are some of the talks I found particularly interesting:

FOCI 2013

Wed, 14 Aug 2013 00:00:30 +0000

Today was the Five incidents, one theme. Here are a few short summaries of the other papers that are particularly related to my own interests:

Usenix/FOCI 2013 - Five incidents, one theme: Twitter spam as a weapon to drown voices of protest

Tue, 13 Aug 2013 14:00:03 +0000

Another year, another Usenix Security Symposium. Like last year, I’ll be presenting a paper at FOCI1 (Free and Open Communications on the Internet) entitled: Five incidents, one theme: Twitter spam as a weapon to drown voices of protest: Social networking sites, such as Twitter and Facebook, have become an impressive force in the modern world with user bases larger than many individual countries. With such influence, they have become important in the process of worldwide politics.

ISMA 2013 AIMS-5 - DNS Based Censorship

Sat, 09 Feb 2013 15:00:06 +0000

I gave a presentation about research that I’m just starting out studying DNS-based censorship in specific around the world. In particularly, preliminary findings in China have confirmed that the Great Firewall is responding via packet injection to many queries for either Facebook or Twitter (among others). Interestingly, the pool of IPs that they return is consistent yet none of the IPs seem to resolve to anything interesting. In addition, there is fallout in South Korea where some percentage of packets go through China and thus have the same behaviors.

AIMS-5 - Day 3

Sat, 09 Feb 2013 14:00:02 +0000

Yesterday was the third and final day of AIMS-5. With the main topic being Detection of Censorship, Filtering, and Outages, many of these talks were much more in line with what I know and what I’m working on. I gave my presentation as well, you can see it (along with a link to my slides) down below.

AIMS-5 - Day 2

Fri, 08 Feb 2013 14:00:06 +0000

Today’s agenda had discussions on Mobile Measurements and IPv6 Annotations, none of which are areas that I find myself particularly interested in. Still, I did learn a few things.

AIMS-5 - Workshop on Active Internet Measurements

Thu, 07 Feb 2013 14:00:41 +0000

Yesterday was the first of three days for the fifth annual ISC/CAIDA Workshop I went to in Baltimore back in October at least, but even the ones that weren’t have still been interesting.

I’ll be presenting on Friday and I’ll share my slides when I get that far (they aren’t actually finished yet). I’ll be talking about new work that I’m just getting off the ground focusing specifically on DNS-based censorship. There is a lot of interesting ground to cover there and this should be only the first in a series of updates about that work (I hope).

Scanning for DNS resolvers

Thu, 31 Jan 2013 14:00:00 +0000

For a research project I’m working on, it has become necessary to scan potentially large IPv4 prefixes in order to find any DNS revolvers that I can and classify them as either open (accepting queries from anyone) or closed.

Disclaimer: This is a form of port scanning and thus has associated ethical and legal considerations. Use it at your own risk.

This project is available on GitHub: jpverkamp/dnsscan

ISC/CAIDA Workshop

Mon, 22 Oct 2012 23:00:58 +0000

I’ve spent the day in Baltimore at the ISC/CAIDA Data Collaboration Workshop learning about and presenting about all things DNS related. It’s not really the sort of thing that my PhD work is focusing on but it’s still interesting.

Usenix/FOCI 2012 - Inferring Mechanics of Web Censorship Around the World

Mon, 06 Aug 2012 13:00:28 +0000

For the next week or so, I’ll be in Seattle attending the Usenix Security Symposium and specifically the FOCI workshop. Why? Because I’m presenting a paper at FOCI. Entitled Inferring Mechanics of Web Censorship Around the World, here’s the abstract: While mechanics of Web censorship in China are well studied, those of other countries are less understood. Through a combination of personal contacts and Planet-Lab nodes, we conduct experiments to explore the mechanics of Web censorship in 11 countries around the world, including China.

Facebot: An Undiscoverable Botnet Based on Treasure Hunting Social Networks

Wed, 17 Aug 2011 14:00:14 +0000

Co-authors: Parag Malshe, Minaxi Gupta, and Chris Dunn Abstract: Popular botnets earn millions of dollars for their operators by enabling many types of cyberfraud activities, including spam and phishing. Current and past botnet architectures revolve around the idea of bots communicating with their masters to carry out their functionality. Given that many take-down eorts leverage this feature, future botnet architectures may evolve to overcome this limitation. In order to enable pro-active defenses against such botnets, in this paper we design a botnet whose bots never explicitly communicate with their master.

AudioVision: A Stereophonic Analogue to Visual Systems

Mon, 13 Dec 2010 14:00:56 +0000

Abstract: AudioVision is designed to take a visual representation of the world–inthe form form of one or more video feeds–and convert it into a related stereophonic audio representation. With such a representation, it should be possible for someone who has minimal or no use of their visual system to avoid obstacles using their sense of hearing rather than vision. To this end, several different vision algorithms, including single and multiple image disparity, disparity from motion, and optical flow were investigated.

FLAIRS 2010 - Augmenting n-gram Based Authorship Attribution With Neural Networks

Fri, 21 May 2010 14:00:23 +0000

Co-authors: Michael Wollowski, and Maki Hirotani Abstract: While using statistical methods to determine authorship attribution is not a new idea and neural networks have been applied to a number of statistical problems, the two have not often been used together. We show that the use of articial neural networks, specically self-organizing maps, combined with n-grams provides a success rate on the order of previous work with purely statistical methods. Using a collection of documents including the works of Shakespeare, William Blake, and the King James Version of the Bible, we were able to demonstrate classication of documents into individual groups.

AnnGram - nGrams vs Words

Wed, 17 Mar 2010 05:05:37 +0000

Overview

For another comparison, I’ve been looking for a way to replace the nGrams with another way of turning a document into a vector. Based on word frequency instead of nGrams, I’ve run a number of tests to see how the accuracy and speed of the algorithm compares for the two.

nGrams

I still intend to look into why the Tragedy of Macbeth does not stay with the rest of Shakespeare’s plays. I still believe that it is because portions of it were possible written by another author.

AnnGram vs k-means

Wed, 10 Feb 2010 04:05:32 +0000

Overview As a set of benchmarks to test whether or not the new AnnGram algorithm is actually working correctly, I’ve been trying to come up with different yet similar methods to compare it too. Primarily, there are two possibilities: Replace the nGram vectors with another form Process the nGrams using something other than Self-Organizing Maps I’m still looking through the related literature to decide if there is some way to use something other than the nGrams to feed into the SOM; however, I haven’t been having any luck.

AnnGram - Self-Organizing Map GUI

Wed, 03 Feb 2010 05:05:20 +0000

They say a picture is worth a thousand words:

One Thousand Words

AnnGram - New GUI

Thu, 28 Jan 2010 05:05:36 +0000

The old GUI framework just wasn’t working out (so far as adding new features went). So, long story short, I’ve switched GUI layout.

AnnGram - Neural Network Progress

Thu, 21 Jan 2010 05:05:19 +0000

As expected, I’ve decided to change libraries from the** **poor results with the original tests may have been a direct results of a misunderstanding with the code base. I think that the layers were not being hooked up correctly, resulting in low/random values.

AnnGram - Ideas for improvement

Fri, 15 Jan 2010 05:05:30 +0000

After my meeting yesterday with my thesis advisers, I have a number of new ideas to try to improve the efficiency of the neural networks. The most promising of those are described below.

Sliding window

The first idea was to replace the idea of applying the most common frequencies directly with a sliding window (almost a directly analogue to the nGrams themselves). The best way that we could come up to implent this would be to give the neural networks some sort of memory which brought up recurring networks (see below).

AnnGram - Initial ANN Results

Wed, 13 Jan 2010 05:05:14 +0000

Overview

For now, I’ve chosen to work with C# Neural network library. It was the easiest to get off the ground and running, so it seemed like a good place to start.

AnnGram - NeuralNetwork Library

Tue, 12 Jan 2010 05:05:54 +0000

I’ve been looking for a good Neural Network library to use with the AnnGram project and so far I’ve come across a couple of possibilities:

C# Neural network library

The top link on Google was an aptly named C# Neural network library. Overall, it looks clean and easy to use and is licensed under the GPL, so should work well for my needs. The framework has two types of training methods: genetic algorithms and backward propagation. In addition, there are at least three different activation functions included: linear, signmoid, and heaviside functions. The main problem with this framework is the spare documentation. The only that I’ve been able to find so far is a generated API reference and a few examples (using their included GUI framework).

AnnGram - Initial GUI

Tue, 05 Jan 2010 05:05:55 +0000

Overview

Basically, I got tired of modifying the command line every time I wanted to test new values. To that end, I spent a small bit of time coding up a GUI to make further experiments easier.

AnnGram - Cosine Distance

Fri, 01 Jan 2010 05:05:00 +0000

Overview

The first algorithm that I’ve chosen to implement is a simple cosine difference between the n-gram vectors. This was the first method used in multiple of the papers that I’ve read and it seems like a good benchmark.

Essentially, this method gives the similarity of two n-gram documents (either Documents or Authors) as an angle ranging from 0 (identical documents) to \pi/2 (completely different documents). Documents written by the same author should have the lowest values.

AnnGram - Framework

Tue, 29 Dec 2009 05:05:28 +0000

Document Framework

The first portion of the framework that it was necessary to code was the ability to load documents. To reduce the load on the processor when first loading the document, only a minimal amount of computation is done. Further computation is pushed off until necessary.

To avoid duplicating work, the n-grams are stored using memoization. The basic idea is that when a function (in this case, a particular length of n-gram) is first requested, the calculation is done and the result is stored in memory. During any future calls, the cached result is directly returned, greatly increasing speed at the cost of memory. Luckily, modern computers have more than sufficient memory for the task at hand.

AnnGram - Overview

Mon, 21 Dec 2009 05:05:34 +0000

Basic Premise

For my senior thesis at Rose-Hulman Institute of Technology, I am attempting to combine the fields of Computational Linguistics and Artificial Intelligence in a new and useful manner. Specifically, I am planning on making use of Artificial Neural Networks to enhance the performance of n-gram based document classification. Over the next few months, I will be updating this category with background and information and further progress.

First, I’ll start with some basic background information.

SIGCSE 2009 - RASQL Query Grammar Conversion Project

Sat, 07 Mar 2009 14:00:52 +0000

Abstract: While variety of language structures, there still exists room for an extensible grammatical structure based through the implementation of such a grammar, translations between languages can be made relatively easy using preexisting tools such as XSLT.

AudioVision Update

Thu, 26 Feb 2009 08:05:58 +0000

The quarter is ending and so is my current work on AudioVision. I have successfully managed to convert a basic two camera view into stereophonic 3d audio, using OpenCV (C++). I hope to continue this work some time in the future, so keep an eye out here for any future developments.

AudioVision Update

Mon, 19 Jan 2009 05:05:26 +0000

Since deciding that I cannot use MATLAB because of the additional addons necessary to use webcams, I have been deciding between C# and Python as the next language to try. I’ve settled on Python for the time being, using VideoCapture to connect to the webcams and Numpy to process the data. It turns out that Python + VideoCapture + Numpy is actually rather similar in functionality and syntax to MATLAB with its image processing library.

AudioVision Update

Mon, 05 Jan 2009 08:05:51 +0000

The original plan to use Make3D for the visual depth determination has mostly fallen through, partially because it has several dependencies that I cannot get to build correctly and partially because it is written in a combination of C and MATLAB. I have nothing against either of these languages; however, I do not have the addons necessary for MATLAB to connect to a webcam. As such, I’ve decided to switch from a monocular vision algorithm to a more traditional stereo vision algorithm.

AudioVision Overview

Fri, 19 Dec 2008 08:05:02 +0000

I am taking an independent study course this winter in Image Recognition / Computer Vision. The primary goal of my independent study is to look into determining depth information from video feed(s) in real time and then representing that depth information using a 3D audio map (headphones).