Counting and Sizing S3 Buckets

A long time ago in a galaxy far far away, I wrote up a script that I used to take an AWS S3 bucket and count how many objects there were in the bucket and calculate its total size. While you could get some of this information from billing reports, there just wasn’t a good way to get it other than that at the time. The only way you could do it was to… iterate through the entire bucket, summing as you go. If you have buckets with millions (or more) objects, this could take a while.

Basically:

conn = boto.connect_s3()
for bucket in sorted(conn.get_all_buckets()):
    try:
        total_count = 0
        total_size = 0
        start = datetime.datetime.now()

        for key in bucket.list_versions():
            # Skip deleted files
            if isinstance(key, boto.s3.deletemarker.DeleteMarker):
                continue

            size = key.size
            total_count += 1
            total_size += size

        print('-- {count} files, {size}, {time} to calculate'.format(
            count = total_count,
            size = humanize.naturalsize(total_size),
            time = humanize.naturaltime(datetime.datetime.now() - start).replace(' ago', '')
        ))

read more...