Counting and Sizing S3 Buckets

A long time ago in a galaxy far far away, I wrote up a script that I used to take an AWS S3 bucket and count how many objects there were in the bucket and calculate its total size. While you could get some of this information from billing reports, there just wasn’t a good way to get it other than that at the time. The only way you could do it was to… iterate through the entire bucket, summing as you go. If you have buckets with millions (or more) objects, this could take a while.

Basically:

conn = boto.connect_s3()
for bucket in sorted(conn.get_all_buckets()):
    try:
        total_count = 0
        total_size = 0
        start = datetime.datetime.now()

        for key in bucket.list_versions():
            # Skip deleted files
            if isinstance(key, boto.s3.deletemarker.DeleteMarker):
                continue

            size = key.size
            total_count += 1
            total_size += size

        print('-- {count} files, {size}, {time} to calculate'.format(
            count = total_count,
            size = humanize.naturalsize(total_size),
            time = humanize.naturaltime(datetime.datetime.now() - start).replace(' ago', '')
        ))


Generating zone files from Route53

Recently I found myself wanting to do some analysis on all of our DNS entires stored in AWS’s Route53 for security reasons (specifically to prevent subdomain takeover attacks, I’ll probably write that up soon). In doing so, I realized that while Route53 has the ability to import a zone file, it’s not possible to export one.

To some extent, this makes sense. Since Route53 supports ALIAS records (which can automatically determine their values based on other AWS products, such as an ELB changing its public IP) and those aren’t actually ‘real’ DNS entries, things will get confused. But I don’t currently intend to re-import these zone files, just use them. So let’s see what we can do.


SSH Config ProxyCommand Tricks

Working in security/operations in the tech industry, I use SSH a lot. To various different machines (some with hostnames, some without), using various different users and keys, and often (as was the case in my previous post) via a bastion host. Over the years, I’ve collected a number of SSH tricks that make my life easier.


Clock drift in Docker containers

I was working on a docker container which uses the aws cli to mess around with some autoscaling groups when I got a somewhat strange error:

A client error (SignatureDoesNotMatch) occurred when calling the DescribeAutoScalingGroups operation: Signature not yet current: 20171115T012426Z is still later than 20171115T012420Z (20171115T011920Z + 5 min.)

Hmm.

Are the clocks off?


Configuring Websockets behind an AWS ELB

Recently at work, we were trying to get an application that uses websockets working on an AWS instance behind an ELB (load balancer) and nginx on the instance.

If you’re either not using a secure connection or handling the cryptography on the instance (either in nginx or Flask), it works right out of the box. But if you want the ELB to handle TLS termination it doesn’t work nearly as well… Luckily, after a bit of fiddling, I got it working.

Update 2018-05-31: A much easier solution, https://aws.amazon.com/blogs/aws/new-aws-application-load-balancer/:

WebSocket allows you to set up long-standing TCP connections between your client and your server. This is a more efficient alternative to the old-school method which involved HTTP connections that were held open with a “heartbeat” for very long periods of time. WebSocket is great for mobile devices and can be used to deliver stock quotes, sports scores, and other dynamic data while minimizing power consumption. ALB provides native support for WebSocket via the ws:// and wss:// protocols.


Performance problems with Flask and Docker

I had an interesting problem recently on a project I was working on. It’s a simple Flask-based webapp, designed to be deployed to AWS using Docker. The application worked just fine when I was running it locally, but as soon as I pushed the docker container…

Latency spikes. Bad enough that the application was failing AWS’s healthy host checks, cycling in and out of existence1:


Parsing AWS instance data with jq

Semi-random amusing code snippet of the day:

aws ec2 describe-instances | jq << EOF
    .[][].Instances[]
    | select(.Tags[]?.Value == "production")
    | .PrivateIpAddress
EOF