Get List of Files in Specific AWS Bucket

First, you have to be very careful when trying to get list of objects from AWS S3 bucket because potentially there might be millions or billions of files, so you might fall in situation where your pc will run out of memory or will stuck.

To control this in boto3 there is special paginator object that allows you to fetch data by so called pages.

This is the code in Python3 that retrieves all objects names (keys) and sizes from specific bucket.

import boto3

def get_list_of_objects(bucket):
    
    boto3.session.Session(profile_name='my-profile')
    conn = boto3.client('s3')
    
    paginator = conn.get_paginator('list_objects_v2')
    pages = paginator.paginate(Bucket=bucket)
    
    existing_objects = []
    
    for page in pages:
        for obj in page['Contents']:
            existing_objects.append((obj['Key'], obj['Size']))
    return existing_objects

print(get_list_of_objects(my-bucket))

Of course before running this code you need to set AWS credentials with AWS CLI.

Happy coding!

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.