The following post covers the pros and cons of connecting AWS Redshift to Google Sheets and provides instructions for the top 5 ways to connect. Connecting Google Sheets to Redshift can make it easy for anyone to pull data into a spreadsheet for analysis. An alternative to Google Sheets is Row Zero, a spreadsheet designed for big data that more easily connects to Redshift and other cloud data warehouses. Read on for more information or skip to specific sections using the table of contents below.
Table of Contents
The top 5 ways to connect Redshift to Google Sheets:
- App Script - A free method of building a Google Sheets add-on, using Apps Script, that pulls data from Redshift into Google Sheets.
- Coefficient - A Google Sheets add-on with an easy-to-use GUI that connects Redshift to the spreadsheet. Note, there is a 5000 row limit on the free tier.
- Download and Open a CSV - A simple, free process of exporting data from Redshift as a CSV and uploading it to Google Sheets.
- Row Zero - A spreadsheet designed for big data with a free tier and built-in connection to Redshift. Use it and share it just like Google Sheets but with larger data sets.
- Other Connectors - A number of pre-built connectors with varying degrees of complexity and cost.
- What is Redshift
- Pros and Cons of Connecting Redshift to Google Sheets
- Conclusion
1. App Script
To connect Amazon Redshift to Google Sheets and import data directly into a spreadsheet, you can use Google Sheets Apps Script to create a custom function that connects to AWS Redshift via JDBC. Here's how you can achieve this:
1. Open Google Sheets:
- Open the Google Sheets workbook where you want to import your Redshift data.
2. Access Apps Script:
- Go to
Extensions
>Apps Script
.
3. Write Your Script:
- Copy and paste the following code into the Apps Script editor:
function connectToRedshift() { // Replace with your Redshift details var url = 'jdbc:redshift://<your-cluster-url>:<port>/<database>'; var user = '<your-username>'; var password = '<your-password>'; var query = 'SELECT * FROM <your-table> LIMIT 100'; // Customize the query // Establish the connection var conn = Jdbc.getConnection(url, user, password); // Execute the query var stmt = conn.createStatement(); var results = stmt.executeQuery(query); var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet(); // Get column metadata and add to the sheet var numCols = results.getMetaData().getColumnCount(); var data = []; // Add column headers var headers = []; for (var i = 1; i <= numCols; i++) { headers.push(results.getMetaData().getColumnName(i)); } data.push(headers); // Fetch and add rows to the data array while (results.next()) { var row = []; for (var i = 1; i <= numCols; i++) { row.push(results.getString(i)); } data.push(row); } // Write data to the sheet sheet.getRange(1, 1, data.length, numCols).setValues(data); // Close the connection results.close(); stmt.close(); conn.close(); }
4. Replace Placeholders:
Update the following with your actual Redshift cluster details:
<your-cluster-url>
<port>
<database>
<your-username>
<your-password>
<your-table>
5. Authorize the Script:
- Save the script, run
connectToRedshift
, and grant any necessary permissions when prompted.
Important Considerations:
- Security: Avoid hard-coding sensitive credentials. Use environment variables or a secure method to handle credentials if you need a production-grade solution.
- Redshift Access: Ensure that your Redshift instance is configured to accept connections from the necessary IPs, which may require modifying security groups or VPC settings.
Note:
Google Apps Script includes Jdbc
as a native service, so there is no need to enable it separately.
2. Coefficient
Coefficient is a Google Sheets add-on that makes it easy to connect the spreadsheet to a number of different applications. Coefficient offers a free plan that limits import to 5000 rows. The starter plan is $50/mo with the same 5000 row limit and the Pro plan is $99/user/mo and has no row limit on import. To use Coefficient to connect Redshift to Google Sheets, follow the steps below.
- Install the Coefficient extension in Google Sheets. If starting from Coefficient's page, follow the instructions with the pop-up menu. If starting from your own Google Sheets workbook, use the top menu and click Extensions>Add-Ons>Get Add-ons. Search for 'Coefficient.' Select Coefficient from the search results and click 'Individual Install.'
- Follow the steps to authenticate your Google account with Coefficient and give the add-on access.
- Launch the application by going to Extensions>Coefficient>Launch.
- Once launched, follow the steps to answer questions and select AWS Redshift as your import source
- Enter your database host, port, database name, username, password, and SSL.
- Once connected, you can use the visual query editor or write your own sql query and click import.
- The data will load into a Google sheet. The data in Google Sheets can be automatically updated every time Redshift data updates if you use a paid tier of Coefficient.
- Be cognisant of the query result size as Google sheets has a cell limit (rows x columns) of 10M cells. Google Sheets also has a file size limit of 100MB. There is a blog specifically focused on the Google Sheets row limit here. To work with large data sets in a spreadsheet, use Row Zero, a spreadsheet designed for big data.
3. Download and Open a CSV
Opening a CSV extract from Redshift is the simplest way to get data from an Amazon Redshift database into Google Sheets. The downside is this method doesn't establish a live-connection and Google Sheets may have some issues if you try to open a large csv.
- Run Query: In your Redshift database tool (e.g., SQL Workbench), run a query to select the data you want to export.
SELECT * FROM your_table;
- Export to CSV: After running the query, export the result set to a CSV file. You can typically do this from the query tool's interface. Ensure that the CSV file contains headers.
- Open Google Sheets
- In Google Sheets, go to File > Import > Upload and select your CSV file.
- The CSV will be loaded into a new or existing sheet.
By exporting data from Redshift to a CSV file and then importing that file into Excel, you establish a simple yet effective connection. Keep in mind that this method is suitable for scenarios where you don't need real-time updates and can periodically refresh the data manually.
4. Row Zero
Row Zero is a spreadsheet that works just like Google Sheets but has native connectivity to Redshift and can open billion row datasets. Follow the instructions below to connect Row Zero to your Redshift account.
1. Connect Row Zero to Redshift
In Row Zero click on the 'data source' icon in the right-hand menu and then click the button for 'New data source.'
The connection requires 6 pieces of information defined below. Name - This field can be anything that helps identify the data source being connected. (e.g. "Production Data" or "Product Metrics") Host - This is the Redshift address that typically takes the form of a string like '12345xyz.redshift.amazonaws.com'. Port - This is always 5439. User - This is the username for your Redshift database. Password - The password used to log into your Redshift Database. Database - The name of the Redshift database being connected to.
Once all the information is entered, hit 'Test connection' to ensure the information is correct. If the test connection icon turns green and says 'Connected', proceed by clicking on '+ Add Source.'
2. Write a query to import data
Now that the Redshift database is connected to Row Zero, write a query to pull data into the spreadsheet. First select the 'Schema' from the drop down. Write a query in the query editor. The easiest query to write is the 'select *' statement, which pulls in the entire table. Example:
select * from TABLENAME
Row Zero is equipped to handle large data sets and will easily ingest the entirety of a table. Write more complicated queries to pull in various subsets of data stored in a Redshift Database. All results will be displayed in the connected data table in the spreadsheet.
3. Begin analysis in Row Zero
Once the data is loaded into the data table use all the features in Row Zero to analyze, filter/sort, pivot, and graph the data.
5. Other Connectors
Several other tools can connect Redshift to Google Sheets, though many involve third-party interfaces:
Each has varying degrees of complexity and cost.
What is AWS Redshift?
Redshift is a database offered by Amazon Web Services. Redshift is unique in that is it built on top of a massive parallel process technology that allows it to process billions of rows at once. Redshift is based on an older version of PostgreSQL and best suited for analytics workloads. It can connect to most applications via JDBC and ODBC connections and is reported as the most commonly deployed cloud data warehouse.
The Pros and Cons of Connecting Redshift to Google Sheets
Pros
Production Data
Connecting Google Sheets to Redshift enables financial and accounting models to be built on product or operational data stored in Redshift.
Live Models
Once Redshift is connected to a spreadsheet, like Google Sheets, it is easy to re-run or refresh the Redshift query to pull the latest data into the spreadsheet without having to recreate another static analysis.
Dashboards
With a connected spreadsheet it is easy to turn a simple analysis into a dashboard. Simply refresh the queried data from Redshift and all graphs, pivot tables, and spreadsheet models will update.
Cons
Security
Importing data from Redshift to Google Sheets poses an information security risk because the data is moving from a cloud data warehouse where activity is traceable and behind layers of security to a locally accessable Google Sheets file stored on a computer. Once the data is in Google Sheets, that data goes wherever the computer goes. The computer could be lost, stolen, or shared. The file could also be emailed to personnel who wouldn't otherwise have access to the data. A more secure option is a cloud based spreadsheet, like Row Zero
Data Set Size
While Google sheets is a great tool it cannot support large data sets. Google sheets will often slow down or crash as data sets sizes reach tens of thousands or a hundred thousand rows, depending on the number of columns. For more information on the Google Sheets row limit, read our previous blog. It is common for data sets stored in Redshift to be quite large. To import data from Redshift to Google sheets, the Redshift query will need to trim the data down to a size Google Sheets can manage or it will be better to use a more powerful spreadsheet, like Row Zero.
Cost
Some methods for connecting Redshift to Google Sheets charge a monthly fee. Depending on the company's budget, paying for another service may or may not be acceptable
Complexity
Certain options for connecting Redshift to Google Sheets are technically complicated and not easy to maintain. They may require writing code or regularly repeating a process every time the Redshift data is updated.
Conclusion
After reviewing the popular options for connecting Redshift to Google Sheets, it is clear there are a number of different options. Some options, like CSV import, are simple and free. Others, like App Script, are more technically complex. Products like Coefficient make it easy but have row limits on data import and charge for auto-updating the connected data. Above all, the biggest challenge is that most data sets in Redshift are rather large and Google Sheets struggles with large data sets. To get full utility out of connecting Redshift to a spreadsheet, the best option is Row Zero, a spreadsheet designed for big data. Row Zero provides a simple UI to connect to Redshift and can open a billion of rows of data. Try Row Zero for free and connect Redshift now.
If you use another cloud data warehouse or blob storage, Row Zero also connects to Snowflake, Databricks, BigQuery, Postgres, and S3.