A long time ago in a galaxy far far away, I wrote up a script that I used to take an AWS S3 bucket and count how many objects there were in the bucket and calculate its total size. While you could get some of this information from billing reports, there just wasn’t a good way to get it other than that at the time. The only way you could do it was to… iterate through the entire bucket, summing as you go. If you have buckets with millions (or more) objects, this could take a while.
Basically:
conn = boto.connect_s3()
for bucket in sorted(conn.get_all_buckets()):
try:
total_count = 0
total_size = 0
start = datetime.datetime.now()
for key in bucket.list_versions():
# Skip deleted files
if isinstance(key, boto.s3.deletemarker.DeleteMarker):
continue
size = key.size
total_count += 1
total_size += size
print('-- {count} files, {size}, {time} to calculate'.format(
count = total_count,
size = humanize.naturalsize(total_size),
time = humanize.naturaltime(datetime.datetime.now() - start).replace(' ago', '')
))
read more...