The following post provoides a brief overview of the benefits and challenges of analyzing data in Redshift and provides simple instructions for connecting Redshft to Row Zero to import and analyze big data sets. For the fastest path, watch the video below and connect your Redshift instance now.
- What is Redshift?
- Why is it helpful to analyze Redshift data in a spreadsheet?
- Why is it hard to analyze data stored in Redshift with a spreadsheet?
- How to analyze Redshift data in a spreadsheet
What is Redshift?
Redshift is a database offered by Amazon Web Services. Redshift is unique in that is it built on top of a massive parallel process technology that allows it to process billions of rows at once. Redshift is based on an older version of PostgreSQL and best suited for analytics workloads. It can connect to most applications via JDBC and ODBC connections is reported as the most commonly deployed cloud data warehouse.
Why is it helpful to analyze Redshift data in a spreadsheet?
Data stored in Redshift is often only accessible to analytics and BI teams who know how to write SQL or work with BI tools. Business, operations, and marketing teams are most comfortable in spreadsheets but are often relegated to using complicated BI tools or making requests of their analytics teams to get the data they need. With Row Zero, business, operations, and marketing teams can connect directly to Redshift, import data into the spreadsheet and perform the analysis they need. Row Zero is enabling true self-serve analytics. Below are a few example use cases:
- Live models - Build a model around a live data set. Instead of recreating that model on a weekly or monthly cadence, simply refresh the data import from the database and the Row Zero workbook updates every value.
- No SQL needed - Enable business, operations, and marketing teams to access and analyze big data sets without having to learn new tools or make requests of their BI team.
- Explore and visualize - Analyze data by both viewing the individual rows, graphing, and pivoting the data. Drill down into the actual values to understand where anomalies exist.
- Ad-hoc analysis - Get answers quickly by importing disparate data sets, building an analysis with simple spreadsheet skills, and share with business partners.
- Collaborative decision making - Multi-user collaboration makes decision-making easier when every business partner has access to the golden data set and can share the analysis in a spreadsheet format others can easily audit and inspect.
- Join data sets - Often times an analysis requires joining two distinct data sets that may not reside in the same database. Row Zero lets users connect any number of data sources and upload CSVs. Then join the data sets with join statements, VLOOKUP/XLOOKUP, or Python.
Why is it hard to analyze data stored in Redshift with a spreadsheet?
While Redshift makes it incredibly easy to store and access large amounts of data, it can be challenging to analyze data stored in Redshift using a spreadsheet for 2 reasons:
1. Dataset Size
Analyzing and processing large amounts of data is not easy and popular spreadsheet applications, Microsoft Excel and Google Sheets, slow down or crash when big data sets are imported. See blogs Excel Row Limit and Google Sheets Row Limit for more information. Row Zero is a spreadsheet designed for big data and can process hundreds of millions of rows of data.
Due to the performance deficiencies of legacy spreadsheets, companies typically resort to paying for expensive BI and dashboard tools, like Looker, Tableau, and Power BI. These tools require a data engineer to set up and limit analysis to predefined aggregations and pivot table style analyses whereas a powerful spreadsheet, like Row Zero, gives every employee the ability to work with big data without the extra setup and education overhead.
2. Need to know SQL
Once a company stores data in Redshift, a user must write sql queries in order to access and analyze the data. Many analysts in finance, marketing, and operations do not know SQL, often creating a queue of requests sitting with BI teams. Businesses often try to mitigate this problem by providing BI and dashboard tools to business users. While these tools are nice for high level metrics and claim to provide analyitcs for everyone, they are not actually useful for everyone as business users still want to export the data to a spreadsheet to perform their own analysis. Row Zero operates on the same spreadsheet skills business partners already have and is instantly usable by everyone in a company, even on big data sets.
How to Analyze Redshift Data in a Spreadsheet
There are 3 steps to easily connect Redshift to Row Zero and analyze data in the spreadsheet. Follow the instructions below to get started.
- Connect Row Zero to your Redshift instance
- Write a query to import data
- Begin analysis in Row Zero
1. Connect Row Zero to Redshift
In Row Zero click on the 'data source' icon in the right-hand menu and then click the button for 'New data source.'
The connection requires 6 pieces of information defined below. Name - This field can be anything that helps identify the data source being connected. (e.g. "Production Data" or "Product Metrics") Host - This is the Redshift address that typically takes the form of a string like '12345xyz.redshift.amazonaws.com'. Port - This is always 5439. User - This is the username for your Redshift database. Password - The password used to log into your Redshift Database. Database - The name of the Redshift database being connected to.
Once all the information is entered, hit 'Test connection' to ensure the information is correct. If the test connection icon turns green and says 'Connected', proceed by clicking on '+ Add Source.'
2. Write a query to import data
Now that the Redshift databse is connected to Row Zero, write a query to pull data into the spreadsheet. First select the 'Schema' from the drop down. Write a query in the query editor. The easiest query to write is the 'select *' statement, which pulls in the entire table. Example:
select * from TABLENAME
Row Zero is equipped to handle large data sets and will easily ingest the entirety of a table. Write more complicated queries to pull in various subsets of data stored in a Redshift Database. All results will be displayed in the connected data table in the spreadsheet.
3. Begin analysis in Row Zero
Redshift is a popular and effective tool for storing and accessing large amounts of data. The challenge with analyzing data stored in Redshift is Excel and Sheets often cannot open big data sets and BI/dashboard tools are expensive and difficult to learn. Row Zero makes it easy to analyze data stored in Redshift by connecting to a company's Redshift instance, importing the data directly into and blazingly fast spreadsheet, and allowing anyone to sort, filter, pivot, and graph the data. You can open your own Row Zero workbook and connect to Redshift today.