Today I found the need to look through all old versions of a file in S3 that had versioning turned on. You can do it through the AWS Console, but I prefer command line tools. You can do it with awscli, but the flags are long and I can never quite remember them. So let’s write up a quick script using boto3 (and as a bonus, try out click)!

# Counting and Sizing S3 Buckets

A long time ago in a galaxy far far away, I wrote up a script that I used to take an AWS S3 bucket and count how many objects there were in the bucket and calculate its total size. While you could get some of this information from billing reports, there just wasn’t a good way to get it other than that at the time. The only way you could do it was to… iterate through the entire bucket, summing as you go. If you have buckets with millions (or more) objects, this could take a while.

Basically:

conn = boto.connect_s3()
for bucket in sorted(conn.get_all_buckets()):
try:
total_count = 0
total_size = 0
start = datetime.datetime.now()

for key in bucket.list_versions():
# Skip deleted files
if isinstance(key, boto.s3.deletemarker.DeleteMarker):
continue

size = key.size
total_count += 1
total_size += size

print('-- {count} files, {size}, {time} to calculate'.format(
count = total_count,
size = humanize.naturalsize(total_size),
time = humanize.naturaltime(datetime.datetime.now() - start).replace(' ago', '')
))