Row Zero is the best spreadsheet for big data. Try for free →

How to connect Google Sheets to Databricks

2024-09-07 // Nick End, Founder

The following post covers the pros and cons of connecting Databricks to Google Sheets and provides instructions for the 5 top ways to get Databricks data into a spreadsheet. Connecting Google Sheets to Databricks can make it easy for anyone to pull data into a spreadsheet for analysis. An alternative to Google Sheets is Row Zero, a spreadsheet designed for big data that more easily connects to Databricks and other cloud data warehouses. Read on for more information or skip to specific sections using the table of contents below.

Table of Contents

The top 5 ways to connect Databricks to Google Sheets:

  1. Coefficient - A Google Sheets add-on with an easy-to-use GUI that connects to Databricks with a 5000 row limit on the free tier.
  2. CSV Upload - A simple, free process of exporting data from Databricks and uploading it to Google Sheets.
  3. Row Zero - A spreadsheet designed for big data with a free tier and built-in connection to Databricks. Use it and share it just like Google Sheets but with larger data sets.
  4. App Script - A free, somewhat complicated way of building a Google Sheets add-on that pulls data from Databricks into Google Sheets.
  5. Other Connectors - A number of other connector options with varying degrees of complexity and cost.

1. Coefficient

Coefficient is a Google Sheets add-on that provides a GUI for connecting your spreadsheet to Databricks. It simplifies the process, but the free version has a row limit. Coefficient Databricks Connector

  1. Install Coefficient via the Extensions > Add-ons > Get add-ons menu.
  2. Search for "Coefficient" and install it.
  3. After installation, launch Coefficient and follow the prompts to select Databricks as the data source.
  4. Enter your Databricks connection details, including the server URL and access token.
  5. Choose your data source and configure your query or use the visual query editor to import data.
  6. Data can be refreshed automatically with a paid Coefficient plan.

2. CSV Upload

A simple, manual way to transfer data is by exporting a CSV from Databricks and uploading it to Google Sheets. This method doesn't provide live syncing but is easy to implement:

  1. Log in to your Databricks workspace.
  2. Run your query by clicking on "SQL Editor" in the left-hand menu. Write your query (e.g.select * from TABLENAME) in the SQL editor and hit "Run."
  3. Export the result set as a CSV.
  4. In Google Sheets, go to File > Import > Upload and select your CSV file.
  5. The CSV will be loaded into a new or existing sheet. Databricks run query and download CSV

3. Row Zero

Row Zero is a spreadsheet that works just like Google Sheets but has native connectivity to Databricks and can open billion row datasets. Follow the instructions below to connect Row Zero to your Databricks account.

1. Connect Row Zero to Databricks

In Row Zero click on the 'data source' icon in the right-hand menu and then click the button for 'New data source.' New connection New data source Add new connection

The connection requires 4 pieces of information defined below.

  • Name - This field can be anything that helps identify the data source being connected. (e.g. "Production Data" or "Product Metrics")
  • Server Hostname - This is the Databricks warehouse name, which can be found in your Databricks portal under the SQL Warehouses menu. In SQL Warehouses, click on the warehouse you wish to connect to. Your server hostname is listed under Connection details and is the long string of letters and numbers under "server hostname" at the top of the screen (e.g. cbd-9hfliu9u-83jd.cloud.databricks.com). Databricks server hostname
  • Warehouse ID - Warehouse ID is under the SQL Warehouses > Overview > Name (e.g. 0923idj093jd9j11) Databricks Warehouse ID
  • Access Token - Your access token can be generated by going to your account icon in the upper right-hand corner of the Databricks portal, selecting "Settings" and then "Developer." Click the blue button to generate your access token. If you do not see a blue button, contact your Databricks account admin as they will need to update your Databricks personal access token permissions. Databricks access token

Once all the information is entered, hit 'Test connection' to ensure the information is correct. If the test returns green, you can select "+ Add source" to add the data source. Once added and connected, you will be asked to select the 'Catalog' and 'Schema.' Databricks select database

2. Write a query to import data

Now that Databricks is connected to Row Zero, begin writing a query to pull data into the spreadsheet. First select the 'Database' and 'Schema' from the two drop downs. Write a query in the query editor. The easiest query to write is the 'select *' statement, which pulls in the entire table. Example:

select * from TABLENAME

Row Zero is equipped to handle large data sets and will easily ingest the entirety of a database. Write more complicated queries to pull in various subsets of data stored in Databricks. All results will be displayed in the 'connected data table' in the spreadsheet. Databricks write query

3. Begin analysis in Row Zero

Once the data is loaded into the data table use all the features in Row Zero to analyze, filter/sort, pivot, and graph the data.

Databricks query table graph

4. App Script

Google Sheets provides an app development feature called Apps Script. You can use it to build custom functions and connect to external data sources, like Databricks. Databricks supports JDBC, which you can leverage through Apps Script to query and import data.

To connect Databricks to Google Sheets using Apps Script, follow these instructions:

  1. Download the Databricks JDBC Driver To start, download and install the Databricks JDBC driver.

  2. Create a new App Script in your Google Sheet by going to the top menu and selecting Extensions > Apps Script.

  3. When the app script window opens, delete the sample 'function myfunction()' code and start with the provided code below. Replace the example values with your own Databricks details.

// Replace the variables with actual values
var address = 'my.databricks.server.address:port'; // e.g., dbc-xxxxxxxx-xxxxxxxx.cloud.databricks.com:443
var token = 'YOUR_DATABRICKS_ACCESS_TOKEN';
var clusterId = 'YOUR_CLUSTER_ID';
var db = 'YourDatabaseName'; 

var dbUrl = 'jdbc:spark://' + address + '/sql/protocolv1/o/' + db;
  1. Write a function to add a custom menu in Google Sheets for running the script:
function onOpen() {
  var spreadsheet = SpreadsheetApp.getActive();
  var menuItems = [{name: 'Import Databricks data', functionName: 'connectToDatabricksData'}];
  spreadsheet.addMenu('Databricks Data', menuItems);
}
  1. Write a function to get the first empty row in a sheet:
function getFirstEmptyRowByColumnArray(spreadSheet, column) {
  var columnRange = spreadSheet.getRange(column + ":" + column);
  var values = columnRange.getValues();
  var ct = 0;
  while (values[ct] && values[ct][0] !== "") {
    ct++;
  }
  return (ct + 1);
}
  1. Write the main function to query Databricks and write the data to Google Sheets:
function connectToDatabricksData() {
  var thisWorkbook = SpreadsheetApp.getActive();

  // Select the target sheet
  var selectedSheet = Browser.inputBox('Which sheet would you like the data to post to?', Browser.Buttons.OK_CANCEL);
  if (selectedSheet == 'cancel') return;

  if (thisWorkbook.getSheetByName(selectedSheet) == null) {
    thisWorkbook.insertSheet(selectedSheet);
  }
  var resultSheet = thisWorkbook.getSheetByName(selectedSheet);

  // Query Databricks
  var query = Browser.inputBox('Enter the query to pull data from Databricks:', Browser.Buttons.OK_CANCEL);
  if (query == 'cancel') return;

  var conn = Jdbc.getConnection(dbUrl, 'token', token);
  var stmt = conn.createStatement();
  var results = stmt.executeQuery(query);

  // Get column headers and data
  var rsmd = results.getMetaData();
  var numCols = rsmd.getColumnCount();
  var rowNum = getFirstEmptyRowByColumnArray(resultSheet, "A");

  if (rowNum === 1) {
    var headers = [];
    for (var col = 1; col <= numCols; col++) {
      headers.push(rsmd.getColumnName(col));
    }
    resultSheet.getRange(1, 1, 1, headers.length).setValues([headers]);
  }

  while (results.next()) {
    var rowData = [];
    for (var col = 1; col <= numCols; col++) {
      rowData.push(results.getString(col));
    }
    resultSheet.getRange(rowNum, 1, 1, rowData.length).setValues([rowData]);
    rowNum++;
  }

  results.close();
  stmt.close();
}
  1. After writing the script, save and run it. You will be asked to verify and authorize the Google Sheet to connect to Databricks. Once authorized, you can run the script from the custom "Databricks Data" menu in the sheet.

Google Sheets Databricks Connect run

5. Other Connectors

Several other tools can connect Databricks to Google Sheets, though many involve third-party interfaces:

Each has varying degrees of complexity and cost.

What is Databricks?

Databricks Computing is a cloud-based data warehousing platform that simplifies data storage and analysis. Unlike traditional data warehousing solutions, Databricks operates on a cloud-native architecture, enabling organizations to seamlessly store, manage, and analyze vast amounts of data across multiple clouds without the constraints of hardware limitations. Databricks utilizes a structured data storage model known as a "data lakehouse," which combines the benefits of data warehousing and data lakes. This architecture allows for separation of storage and compute, providing elasticity and scalability.

Pros and Cons of Connecting Databricks to Google Sheets

The benefits of connecting Databricks to Google Sheets

Production Data

Connecting Google Sheets to Databricks enables financial and accounting models to be built on product or operational data stored in Databricks.

Live Models

Once Databricks is connected to a spreadsheet, like google sheets, it is easy to re-run or refresh the Databricks query to pull the latest dataset into the spreadsheet without having to recreate another static analysis.

Dashboards

With a connected spreadsheet it is easy to turn a simple analysis into a dashboard. Simply refresh the queried data from Databricks and all graphs, pivot tables, and spreadsheet models will update.

Considerations when connecting Databricks to Google Sheets

Security

Importing data from Databricks to Google Sheets poses an information security risk because the data is moving from a cloud data warehouse where activity is traceable and behind layers of security to a locally accessable Google Sheets file stored on a computer. Once the data is in Google Sheets, that data goes wherever the computer goes. The computer could be lost, stolen, or shared. The file could also be emailed to personnel who wouldn't otherwise have access to the data. A more secure option is a cloud based spreadsheet, like Row Zero

Data Set Size

While Google sheets is a great tool it cannot support large data sets. Google sheets will often slow down or crash as data sets sizes reach tens of thousands or a hundred thousand rows, depending on the number of columns. For more information on the Google Sheets row limit, read our previous blog. It is common for data sets stored in Databricks to be quite large. To import data from Databricks to Google sheets, the Databricks query will need to trim the data down to a size Google Sheets can manage or it will be better to use a more powerful spreadsheet, like Row Zero.

Cost

Some methods for connecting Databricks to Google Sheets charge a monthly fee. Depending on the company's budget, paying for another service may or may not be acceptable

Complexity

Certain options for connecting Databricks to Google Sheets are technically complicated and not easy to maintain. They may require writing code or regularly repeating a process every time the Databricks data is updated.

Conclusion

After reviewing the popular options for connecting Databricks to Google Sheets, it is clear there are a number of different options for connecting. Some options, like CSV import, are simple and free. Others, like App Script, are more technically complex. Products like Coefficient make it easy but have row limits on data import and charge for auto-updating the connected data. Above all, the biggest challenge is that most data sets in Databricks are rather large and Google Sheets struggles with large data sets. To get full utility out of connecting Databricks to a spreadsheet, the best option is Row Zero, a blazingly fast spreadsheet designed for big data. Row Zero provides a simple UI to connect to Databricks and can easily handle many millions of rows of data. Try Row Zero for free and connect Databricks now.

If you use another cloud data warehouse or blob storage, Row Zero also connects to Snowflake, Redshift, BigQuery, Postgres, and S3.

FAQs