Co-authors: Michael Wollowski, and Maki Hirotani
Abstract: While using statistical methods to determine authorship attribution is not a new idea and neural networks have been applied to a number of statistical problems, the two have not often been used together. We show that the use of articial neural networks, specically self-organizing maps, combined with n-grams provides a success rate on the order of previous work with purely statistical methods. Using a collection of documents including the works of Shakespeare, William Blake, and the King James Version of the Bible, we were able to demonstrate classication of documents into individual groups. Further experiments with The Federalist Papers exposed potential problems with the algorithm. Finally, first exchanging n-gram frequencies with word frequencies and then exchanging self-organizing maps with k-means clustering shows that it is the combination of the two factors contributing to the algorithms success.
You can download the full paper here: