The easiest way to quickly analyze data stored in S3 is with Row Zero, a blazingly fast spreadsheet with native S3 connectivity.
Skip straight to instructions to connect a S3 bucket.
What is AWS S3?
Amazon S3 (Simple Storage Service) is a cloud-based object storage service provided by Amazon Web Services (AWS). It enables individuals and organizations to store and retrieve data, images, videos, and other files on the internet.
AWS S3 is designed to be highly scalable, durable, and secure. The service allows users to store unlimited amounts of data in virtually any format, with high availability and reliability.
Some common use cases for Amazon S3 include backup and disaster recovery, data archiving, content storage, distribution, and web hosting. Many businesses also use Amazon S3 as a data lake for big data analytics and machine learning.
Challenges with analyzing data in S3
While S3 makes it incredibly easy to store large amounts of data, it can be challenging to analyze data stored in S3 for 3 reasons:
Dataset Size
Processing large datasets is a challenge for spreadsheets, like Google Sheets and Microsoft Excel, and requires a more powerful spreadsheet. Additionally, exporting data from S3 to a local machine and then uploading to a hosted application is a slow and tedious process when datasets are large. It is much easier to connect directly to a S3 bucket and browse for files that open in your analytics tool of choice.
Access Control
S3 provides various access control mechanisms, such as IAM policies and bucket policies, which can restrict or limit access to data stored in S3. This can make it challenging to ensure that users have the necessary permissions to access and analyze the data they need.
Data Integration
Data stored in S3 may need to be integrated with other data sources before analysis. This can require additional processing and transformation steps to ensure data consistency and accuracy.
Analyze S3 Data In a Spreadsheet
There are 3 steps to easily analyze S3 data in a spreadsheet. Follow the instructions below to get started.
- Connect a S3 bucket to Row Zero.
- Browse the S3 bucket to select file.
- Open file in Row Zero.
Connect to a S3 bucket
The following instructions explain how to create an IAM role in AWS S3 and grant permissions for a Row Zero workbook to connect a specific S3 bucket and ingest data. To begin the process, in the Data menu, select Import from Amazon S3. You will see a link to the public datasets S3 bucket and a button in the upper right hand corner that says + connect bucket. Click the button to connect a new bucket. The public datasetst bucket is available to demo the integration and Row Zero features.
The first step is indentifying the bucket name you intend to connect to Row Zero and entering the name in the provided text box. Next, provide the Amazon Resource Name (ARN). To identify the name, follow the instructions below to create an IAM role for Row Zero and grab the ARN from the AWS console. At the top of the screen your Row Zero AWS Account ID and your External ID will be visible. Both will be needed to create the IAM role.
Create an IAM Role
Log in to the AWS Management Console
Select AWS Account
Select Another AWS Account and enter 732940336628 as the account ID
Select Require External Id and enter Your Account ID
Click Next
Select Create Policy (this will open a new tab)
Select the tab that says JSON and replace the text with this:
Click Next: Tags
Click Next: Review
Give the policy a name like 'RowZero-myS3bucket-ReadOnly'
Click Create policy and close this browser tab
On the first browser tab, refresh the list of policies
Search for the policy you just created (e.g. RowZero-myS3bucket-ReadOnly), click the checkbox to its left, and click Next
Give the role a name, like RowZero-myS3bucket-ReadOnly
Scroll to the bottom of the page, and click Create Role
Click View Role at the top of the screen and copy the role ARN and paste it in the text box at the top of the window. Click Next.
Import Data
After completing the above instructions your S3 bucket is connected to Row Zero. Now when selecting the Data tab from the top menu, you can easily browse your S3 bucket and select a file, which will import to Row Zero where you can begin working with it. Even if the file is large, Row Zero is a blazingly fast spreadsheet designed to support large file sizes down. To see the product in action, try it here.