Research

Once upon a time, I was on track to get a PhD in censorship/computer security. I was ABD (all but dissertation) when my advisor decided to leave and go into the private sector. When that happens… you either find a new advisor or you go with them. I decided to go with them, move to Silicon Valley, and join a startup. It was perhaps the best thing that could have happened to me. While I sometimes regret not having the extra letters after my name, I love the practicality of working in the ‘real world’. Not to mention the job prospects are better. :)

So for the most part, these posts are archival, but there are still a few gems in there.

All posts

2013-08-17: USENIX 2013 - Day 3
2013-08-16: USENIX 2013 - Day 2
2013-08-15: USENIX 2013 - Day 1
2013-08-14: FOCI 2013
2013-08-13: Usenix/FOCI 2013 - Five incidents, one theme: Twitter spam as a weapon to drown voices of protest
2013-02-09: ISMA 2013 AIMS-5 - DNS Based Censorship
2013-02-09: AIMS-5 - Day 3
2013-02-08: AIMS-5 - Day 2
2013-02-07: AIMS-5 - Workshop on Active Internet Measurements
2013-01-31: Scanning for DNS resolvers
2012-10-22: ISC/CAIDA Workshop
2012-08-06: Usenix/FOCI 2012 - Inferring Mechanics of Web Censorship Around the World
2011-08-17: Facebot: An Undiscoverable Botnet Based on Treasure Hunting Social Networks
2010-12-13: AudioVision: A Stereophonic Analogue to Visual Systems
2010-05-21: FLAIRS 2010 - Augmenting n-gram Based Authorship Attribution With Neural Networks
2010-03-17: AnnGram - nGrams vs Words
2010-02-10: AnnGram vs k-means
2010-02-03: AnnGram - Self-Organizing Map GUI
2010-01-28: AnnGram - New GUI
2010-01-21: AnnGram - Neural Network Progress
2010-01-15: AnnGram - Ideas for improvement
2010-01-13: AnnGram - Initial ANN Results
2010-01-12: AnnGram - NeuralNetwork Library
2010-01-05: AnnGram - Initial GUI
2010-01-01: AnnGram - Cosine Distance
2009-12-29: AnnGram - Framework
2009-12-21: AnnGram - Overview
2009-03-07: SIGCSE 2009 - RASQL Query Grammar Conversion Project
2009-02-26: AudioVision Update
2009-01-19: AudioVision Update
2009-01-05: AudioVision Update
2008-12-19: AudioVision Overview

ISC/CAIDA Workshop

2012-10-22

I’ve spent the day in Baltimore at the ISC/CAIDA Data Collaboration Workshop learning about and presenting about all things DNS related. It’s not really the sort of thing that my PhD work is focusing on but it’s still interesting.

Usenix/FOCI 2012 - Inferring Mechanics of Web Censorship Around the World

2012-08-06

For the next week or so, I’ll be in Seattle attending the Usenix Security Symposium and specifically the FOCI workshop. Why? Because I’m presenting a paper at FOCI.

Entitled Inferring Mechanics of Web Censorship Around the World, here’s the abstract:

While mechanics of Web censorship in China are well studied, those of other countries are less understood. Through a combination of personal contacts and Planet-Lab nodes, we conduct experiments to explore the mechanics of Web censorship in 11 countries around the world, including China. Our work provides insights into the diversity of modus operandi of censors around the world and can guide future work on censorship evasion.
read more...

Facebot: An Undiscoverable Botnet Based on Treasure Hunting Social Networks

2011-08-17

Co-authors: Parag Malshe, Minaxi Gupta, and Chris Dunn

Abstract: Popular botnets earn millions of dollars for their operators by enabling many types of cyberfraud activities, including spam and phishing. Current and past botnet architectures revolve around the idea of bots communicating with their masters to carry out their functionality. Given that many take-down eorts leverage this feature, future botnet architectures may evolve to overcome this limitation. In order to enable pro-active defenses against such botnets, in this paper we design a botnet whose bots never explicitly communicate with their master. Our design leverages the popularity of social networks and the hidden nature of steganography. In our prototype implementation of an information stealing bot, the bot hides stolen information in the prole picture of Facebook user(s) on infected machines through the use of steganography. The stolen information is uploaded when a user visits Facebook thus hiding its tracks. Subsequently, it joins a carefully selected Facebook group to indicate the availability of information to the botmaster. The botmaster polls relevant groups like any other Facebook user to identify prole pictures of new group members that may contain stolen information. Neither Facebook nor the machine’s user(s) can easily identify bot tra c. Further, since bots never directly communicate with their master, capturing a bot will reveal nothing about the whereabouts of the master.
read more...

AudioVision: A Stereophonic Analogue to Visual Systems

2010-12-13

Abstract: AudioVision is designed to take a visual representation of the world–inthe form form of one or more video feeds–and convert it into a related stereophonic audio representation. With such a representation, it should be possible for someone who has minimal or no use of their visual system to avoid obstacles using their sense of hearing rather than vision. To this end, several different vision algorithms, including single and multiple image disparity, disparity from motion, and optical flow were investigated. In addition two different methods of mapping the resulting disparity map to stereophonic audio–maximal poiints and sonar scan–were implemented. The results are rather promising. Using Lucas-Kanade optical flow and sonar scan audio has fulfilled the aforementioned goals in simple tests.
read more...

FLAIRS 2010 - Augmenting n-gram Based Authorship Attribution With Neural Networks

2010-05-21

Co-authors: Michael Wollowski, and Maki Hirotani

Abstract: While using statistical methods to determine authorship attribution is not a new idea and neural networks have been applied to a number of statistical problems, the two have not often been used together. We show that the use of articial neural networks, specically self-organizing maps, combined with n-grams provides a success rate on the order of previous work with purely statistical methods. Using a collection of documents including the works of Shakespeare, William Blake, and the King James Version of the Bible, we were able to demonstrate classication of documents into individual groups. Further experiments with The Federalist Papers exposed potential problems with the algorithm. Finally, first exchanging n-gram frequencies with word frequencies and then exchanging self-organizing maps with k-means clustering shows that it is the combination of the two factors contributing to the algorithms success.
read more...

AnnGram - nGrams vs Words

2010-03-17

Overview

For another comparison, I’ve been looking for a way to replace the nGrams with another way of turning a document into a vector. Based on word frequency instead of nGrams, I’ve run a number of tests to see how the accuracy and speed of the algorithm compares for the two.

nGrams

I still intend to look into why the Tragedy of Macbeth does not stay with the rest of Shakespeare’s plays. I still believe that it is because portions of it were possible written by another author.

AnnGram vs k-means

2010-02-10

Overview

As a set of benchmarks to test whether or not the new AnnGram algorithm is actually working correctly, I’ve been trying to come up with different yet similar methods to compare it too. Primarily, there are two possibilities:

Replace the nGram vectors with another form
Process the nGrams using something other than Self-Organizing Maps

I’m still looking through the related literature to decide if there is some way to use something other than the nGrams to feed into the SOM; however, I haven’t been having any luck. So far, most of my work has been focused on comparing SOM to k-means clustering.

AnnGram - Self-Organizing Map GUI

2010-02-03

They say a picture is worth a thousand words:

One Thousand Words

AnnGram - New GUI

2010-01-28

The old GUI framework just wasn’t working out (so far as adding new features went). So, long story short, I’ve switched GUI layout.

AnnGram - Neural Network Progress

2010-01-21

As expected, I’ve decided to change libraries from the** **poor results with the original tests may have been a direct results of a misunderstanding with the code base. I think that the layers were not being hooked up correctly, resulting in low/random values.

JP's Blog

Research

All posts

Recent posts

ISC/CAIDA Workshop

Usenix/FOCI 2012 - Inferring Mechanics of Web Censorship Around the World

Facebot: An Undiscoverable Botnet Based on Treasure Hunting Social Networks

AudioVision: A Stereophonic Analogue to Visual Systems

FLAIRS 2010 - Augmenting n-gram Based Authorship Attribution With Neural Networks

AnnGram - nGrams vs Words

AnnGram vs k-means

AnnGram - Self-Organizing Map GUI

AnnGram - New GUI

AnnGram - Neural Network Progress