I’ve spent the day in Baltimore at the ISC/CAIDA Data Collaboration Workshop learning about and presenting about all things DNS related. It’s not really the sort of thing that my PhD work is focusing on but it’s still interesting.
For the next week or so, I’ll be in Seattle attending the Usenix Security Symposium and specifically the FOCI workshop. Why? Because I’m presenting a paper at FOCI.
Entitled Inferring Mechanics of Web Censorship Around the World, here’s the abstract:
While mechanics of Web censorship in China are well studied, those of other countries are less understood. Through a combination of personal contacts and Planet-Lab nodes, we conduct experiments to explore the mechanics of Web censorship in 11 countries around the world, including China.
Co-authors: Parag Malshe, Minaxi Gupta, and Chris Dunn
Abstract: Popular botnets earn millions of dollars for their operators by enabling many types of cyberfraud activities, including spam and phishing. Current and past botnet architectures revolve around the idea of bots communicating with their masters to carry out their functionality. Given that many take-down eorts leverage this feature, future botnet architectures may evolve to overcome this limitation. In order to enable pro-active defenses against such botnets, in this paper we design a botnet whose bots never explicitly communicate with their master.
Abstract: AudioVision is designed to take a visual representation of the world–inthe form form of one or more video feeds–and convert it into a related stereophonic audio representation. With such a representation, it should be possible for someone who has minimal or no use of their visual system to avoid obstacles using their sense of hearing rather than vision. To this end, several different vision algorithms, including single and multiple image disparity, disparity from motion, and optical flow were investigated.
Co-authors: Michael Wollowski, and Maki Hirotani
Abstract: While using statistical methods to determine authorship attribution is not a new idea and neural networks have been applied to a number of statistical problems, the two have not often been used together. We show that the use of articial neural networks, specically self-organizing maps, combined with n-grams provides a success rate on the order of previous work with purely statistical methods. Using a collection of documents including the works of Shakespeare, William Blake, and the King James Version of the Bible, we were able to demonstrate classication of documents into individual groups.
For another comparison, I’ve been looking for a way to replace the nGrams with another way of turning a document into a vector. Based on word frequency instead of nGrams, I’ve run a number of tests to see how the accuracy and speed of the algorithm compares for the two.
I still intend to look into why the Tragedy of Macbeth does not stay with the rest of Shakespeare’s plays. I still believe that it is because portions of it were possible written by another author.
As a set of benchmarks to test whether or not the new AnnGram algorithm is actually working correctly, I’ve been trying to come up with different yet similar methods to compare it too. Primarily, there are two possibilities:
Replace the nGram vectors with another form Process the nGrams using something other than Self-Organizing Maps I’m still looking through the related literature to decide if there is some way to use something other than the nGrams to feed into the SOM; however, I haven’t been having any luck.
As expected, I’ve decided to change libraries from the** **poor results with the original tests may have been a direct results of a misunderstanding with the code base. I think that the layers were not being hooked up correctly, resulting in low/random values.