# Go is faster than Python? (an example parsing huge JSON logs)

Recently at work I came across a problem where I had to go through a year’s worth of logs and corelate two different fields across all of our requests. On the good side, we have the logs stored as JSON objects (archived from Datadog which collects them). On the down side… it’s kind of a huge amount of data. Not as much as I’ve dealt with at previous jobs/in some academic problems, but we’re still talking on the order of terabytes.

On one hand, write up a quick Python script, fire and forget. It takes maybe ten minutes to write the code and (for this specific example) half an hour to run it on the specific cloud instance the logs lived on. So we’ll start with that. But then I got thinking… Python is supposed to be super slow right? Can I do better?

(Note: This problem is mostly disk bound. So Python actually for the most part does just fine.)

# A simple Flask Logging/Echo Server

A very simple server that can be used to catch all incoming HTTP requests and just echo them back + log their contents. I needed it to test what a webhook actually returned to me, but I’m sure that there are a number of other things it could be dropped in for.

It will take in any GET/POST/PATCH/DELETE HTTP request with any path/params/data (optionally JSON), pack that data into a JSON object, and both log that to a file (with a UUID1 based name) plus return this object to the request.

Warning: Off hand, there is already a potential security problem in this regarding DoS. It will happily try to log anything you throw at it, no matter how big and will store those in memory first. So long running requests / large requests / many requests will quickly eat up your RAM/disk. So… don’t leave this running unattended? At least not without additional configuration.

# CSV to JSON

Today at work, I had to process a bunch of CSV data. Realizing that I don’t have any particularly nice tools to work with streaming CSV data (although I did write about querying CSV files with SQL), I decided to write one:

$cat users.csv "user_id","name","email","password" "1","Luke Skywalker","[email protected]","$2b$12$XQ1zDvl5PLS6g.K64H27xewPQMnkELa3LvzFSyay8p9kz0XXHVOFq"
"2","Han Solo","[email protected]","$2b$12$eKJGP.tt9u77PeXgMMFmlOyFWSuRZBUZLvmzuLlrum3vWPoRYgr92"$ cat users.csv | csv2json | jq '.'

{
"password": "$2b$12$XQ1zDvl5PLS6g.K64H27xewPQMnkELa3LvzFSyay8p9kz0XXHVOFq", "name": "Luke Skywalker", "user_id": "1", "email": "[email protected]" } { "password": "$2b$12$eKJGP.tt9u77PeXgMMFmlOyFWSuRZBUZLvmzuLlrum3vWPoRYgr92",
"name": "Han Solo",
"user_id": "2",
"email": "[email protected]"
}


"#\u043f\u043e\u0431\u0435\u0434\u0430\u0437\u0430\u043d\u0430\u043c\u0438"