Posts

Getting Started With Boto3 - AWS SDK for Python

avatar of @geekgirl
25
@geekgirl
·
·
0 views
·
4 min read

Python is da best! Python makes automating daily repetitive tasks and creating free time for more interesting and meaningful activities possible. While python is awesome as a programming language, what makes super useful is large selection of libraries, tools and frameworks that provide solutions for any problems we may have regarding automating various tasks and projects. Boto3 is one of the super useful tools I have recently discovered.

You use the AWS SDK for Python (Boto3) to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). The SDK provides an object-oriented API as well as low-level access to AWS services.

My favorite python web framework is Streamlit, because it makes publishing and sharing web apps for simple tasks easy without a need for spending countless hours in web development. Since my introduction to Streamlit I managed to create more than a dozen apps specific to certain tasks and projects. A couple of them are Hive based apps. But most are for other projects that involve data manipulation, data analysis and automating. Some of these projects involve working with a lot of files in pdf and excel format, and creating reports. Python has solutions for these tasks as well with libraries like pdfplumber, openpyxl, etc.

Combining all of these tools and heroku, creating shareable apps becomes super easy. One problem I haven't been able to solve until now was storing files used for these projects in the cloud, so that they are accessible anywhere and can be shared with teams. There are many file sharing options in general. However, what I was looking for was the ability for the scripts to have access to these files, create new files, and also store them in a way that they are available whenever needed.

Unfortunately, turns out Heroku doesn't offer storing files options. So, the apps deployed on Heroku wouldn't be able to also store these files. What Heroku does offer it seems an easy integration with AWS's S3 cloud storage. I haven't used Amazon's AWS or S3 before and had no clue how to get started. After quick research I came across Boto3 and finally I think I will be able to add improvements to the projects I am working on.

Streamlit app running on Heroku actually does have uploading and downloading files solutions, which I already use. What I couldn't do was to store them for future use. Store could only be done on a local machine, which wouldn't help because it would only be accessible to the user of the machine. By integrating Amazon's S3 cloud storage to the app, I think this problem will be solved. For this I would need to create an AWS S3 account. They offer a free tier options. That's great to experiment while I am trying to learn how it works. Boto3 will allow accessing, uploading, downloading files stores in AWS S3 programmatically within the python script.

Before we do any programming, we would need to create AWS S3 account, which I just did a couple of hours ago. It was super fast. They offer free tier option. However, they also say free tier can be used for next 12 months. I am not sure if after 12 months it will need to be upgraded. Full year of use is actually not bad, and plenty of time to get familiar with how everything works. If everything works as expected, paying for the services wouldn't be a big deal.

After creating an AWS S3 account, we will need to create a "Bucket", which is a container where we can store all of our files or objects. That was simple to do as well. Next, to be able to connect to our S3 we will need public and private keys or as they word it access_key_id and secret_access_key. The root user doesn't seem to have these keys for security purposes, or I wasn't able to find it. But we can create new users, assign permissions to the new user and create the new user with these keys. Once new user for our S3 created, it displays the keys and we will need to store them somewhere for future use within our python script. After creating S3 account, new bucket and a new user with keys, I uploaded a file. Everything is ready in S3 side, it is time to write a script. Nothing fancy, nothing complicated. At this point I just want to make sure Boto3 works, can actually connect to S3 and do some actions.

Before starting with a python script, I had to pip3 install boto3, so that it can be used within the script. Everything went without any problems, at the first try. Below is the simple script I tested to see if I could download the file I uploaded earlier.

from secrets import access_key_id, secret_access_key, bucket_name 
import boto3 
 
session = boto3.Session(aws_access_key_id = access_key_id,  
			aws_secret_access_key = secret_access_key) 
 
s3 = session.resource('s3') 
my_bucket = s3.Bucket(bucket_name) 
for s3_object in my_bucket.objects.all(): 
    my_bucket.download_file(s3_object.key, 'Desktop/myfile.pdf') 

I ran the script, and the file appeared on my desktop. Success!

I know there isn't much going on in the code above. It is simply downloading the file from the bucket in my S3 account. But the ability to do this simple action programmatically opens the possibilities for more interesting, meaningful, and useful scripts. When learning how to use a new library or a tool, the most frustrating and annoying part is to get things working at the beginning. Because when they don't work at the start, it is more difficult to figure out the reasons. Setting up an S3 account and getting started with Boto3 went smooth and everything worked at the first time.

Next step is to learn more the functionalities of Boto3 and integrating these functions to my existing projects. Hopefully, things will continue to progress smoothly. Boto3 documentation is very useful and easily to follow. If you have used Boto3, make sure to share some tips and tricks in the comments.

Posted Using LeoFinance Beta