Today at work, I had to process a bunch of CSV data. Realizing that I don't have any particularly nice tools to work with streaming CSV data (although I did write about querying CSV files with SQL), I decided to write one:
$ cat users.csv
$ cat users.csv | csv2json | jq '.'
"name": "Luke Skywalker",
"name": "Han Solo",
The newest chapter in my quest to collect entirely too much data / back up All The Things!: GitHub.
Basically, I want to back up all of my own personal GitHub repositories along with any from organizations that I am involved with. Strictly speaking, this is a little strange, since it's unlikely that GitHub is going anywhere soon and, if it does, we are likely to have fair warning. But still, it's nice to have a local copy just in case GitHub is down.
When you're starting out on a simple web application, eventually you will reach the point where you need to store some form of persistant data. Basically, you have three options:
- Store the information in flat files on the file system
- Store the information in a database (MySQL, SQLite etc)
- Store the information in a key/value store (mongoDB, reddis)
There are all manner of pros and cons to each, in particular how easy they are to get started in, how well they fit the data you are using, and how well they will scale horizontally (adding more machines rather than bigger ones).