Prevent JavaScript links by parsing URLs

If you have a website that allows users to submit URLs, one of the (many many) things people will try to do to break your site is to submit URLs that use the javascript: protocol (rather than the more expected http: or https:). This is almost never something that you want, since it allows users to submit essentially arbitrary code that other users will run on click in the context of your domain (same origin policy).

So how do you fix it?

First thought would be to try to check the protocol:

> safe_url = (url) => !url.match(/^javascript:/)
[Function: safe_url]

> safe_url('http://www.example.com')
true

false


Tiny Helper Scripts for Command Line MySQL

Quite often, I’ll find myself wanting to query and manipulate MySQL data entirely on the command line. I could be building up a pipeline or working on a task that I’m going to eventually automate but haven’t quite gotten to yet. Whenver I have to do something like that, I have a small pile of scripts I’ve written over time that help out:

• skiphead: Skip the first line of output, used to skip over headers in a query response
• skipuntil: Skip all lines until we see one matching a pattern, used to resume partial tasks
• commaify: Take a list of single values on the command line and turn them into a comma separated list (for use in IN clauses)
• csv2json: a previously posted script for converting csv/tab delimited output to json
• jq: not my script, but used to take the output of csv2json and query it further in ways that would be complicated to do with SQL

Admitedly, the first two of those are one liners and I could easily remember them, but the advantage of a single command that does it is tab completion. sk<tab>, arrow to select which one I want, and off we go. I could put them as an alias, but I don’t always use the same shell (mostly fish, but sometimes Bash or Zsh).

Today I found the need to look through all old versions of a file in S3 that had versioning turned on. You can do it through the AWS Console, but I prefer command line tools. You can do it with awscli, but the flags are long and I can never quite remember them. So let’s write up a quick script using boto3 (and as a bonus, try out click)!

AoC 2018 Day 3: Regionification

Source: No Matter How You Slice It

Part 1: Given a list of overlapping regions defined by (left, top, width, height) count how many integer points occur in more than one region.

AoC 2018 Day 2: Counting letters

Source: Inventory Management System

Part 1: Given a list of strings, count how many contain exactly two of a letter (a) and how many contain exactly three of a letter (b). Calculate a*b.

Source: Chronal Calibration

Part 1: Given a list of numbers (positive and negative) calculate the sum.

Let’s do it again! I’m starting a day late, but much better than last year 😄!

This time around, I’m hoping to solve each problem in both Python and Racket, both to show an example of how the languages differ and … well, because I can 😇.

EDIT 2018-12-05: Yeah… I’m not actually going to do these in both Racket and Python. The solutions are ending up being near direct translations. Since there are probably fewer people solving these in Racket, I’ll do that first and Python eventually™.

As always, these problems are wonderful to try to solve yourself. If you agree, stop reading now. This post isn’t going anywhere.

If you’d like to see the full form of any particular solution, you can do so on GitHub (including previous years and possibly some I haven’t written up yet): jpverkamp/advent-of-code

Counting and Sizing S3 Buckets

A long time ago in a galaxy far far away, I wrote up a script that I used to take an AWS S3 bucket and count how many objects there were in the bucket and calculate its total size. While you could get some of this information from billing reports, there just wasn’t a good way to get it other than that at the time. The only way you could do it was to… iterate through the entire bucket, summing as you go. If you have buckets with millions (or more) objects, this could take a while.

Basically:

conn = boto.connect_s3()
for bucket in sorted(conn.get_all_buckets()):
try:
total_count = 0
total_size = 0
start = datetime.datetime.now()

for key in bucket.list_versions():
# Skip deleted files
if isinstance(key, boto.s3.deletemarker.DeleteMarker):
continue

size = key.size
total_count += 1
total_size += size

print('-- {count} files, {size}, {time} to calculate'.format(
count = total_count,
size = humanize.naturalsize(total_size),
time = humanize.naturaltime(datetime.datetime.now() - start).replace(' ago', '')
))

Creating a temporary SMTP server to 'catch' domain validation emails

One problem that has come up a time or two is dealing with email-based domain validation (specifically in this case for the issuance of TLS certificates) on domains that aren’t actually configured to receive email. Yes, in a perfect world, it would be easier to switch to DNS-based validation (since we have to have control of the DNS for the domain, we need it later), but let’s just assume that’s not an option. So, how do we ‘catch’ the activation email so we can prove we can receive email on that domain?