Command line user agent parsing

Quite often when working with internet data, you will find yourself wanting to figure out what sort of device users are using to access your content. Luckily, if you’re using HTTP, there is a standard for that: The user-agent header.

Since I’m in exactly that position, I’ve added a new script to my Dotfiles that reads user agents on stdin, parses them, and writes them back out in a given format.


Combining sort and uniq

A fairly common set of command line tools (at least for me) is to combine sort and uniq to get a count of unique items in a list of unsorted data. Something like this:

$ find . -type 'f' | rev | cut -d "." -f "1" | rev | sort | uniq -c | sort -nr | head

2649 htm
1458 png
 993 cache
 612 jpg
 135 css
 102 zip
  99 svg
  60 gif
  45 js
  27 pdf