First, you have to be very careful when trying to get list of objects from AWS S3 bucket because potentially there might be millions or billions of files, so you might fall in situation where your pc will run out of memory or will stuck.
To control this in boto3
there is special paginator
object that allows you to fetch data by so called pages.
This is the code in Python3 that retrieves all objects names (keys) and sizes from specific bucket.
import boto3
def get_list_of_objects(bucket):
boto3.session.Session(profile_name='my-profile')
conn = boto3.client('s3')
paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=bucket)
existing_objects = []
for page in pages:
for obj in page['Contents']:
existing_objects.append((obj['Key'], obj['Size']))
return existing_objects
print(get_list_of_objects(my-bucket))
Of course before running this code you need to set AWS credentials with AWS CLI.
Happy coding!
[…] If you code in Python then read this post: Get List of Files in Specific AWS Bucket […]