← Back

Spreadsheet Data Governance

2024-01-01 // Tom Ward, Software and Analytics Consultant

During a recent kickoff of a new data governance program that I attended at a large healthcare organization, when a group of executives was asked what outcomes they wanted from the program, the CFO responded, “just get them off of their spreadsheets.”

Spreadsheets pose an interesting problem when it comes to modern data strategy and data governance. You’d be hard-pressed to find an organization with a data strategy that doesn’t include providing “self-service analytics" to users in some form or fashion and you could argue that spreadsheets are still the most widely used “self service” data analysis tool in existence. Spreadsheets are so ingrained in most organizations, it is difficult to develop a governance strategy that doesn’t, at the very least, account for them.

However, self-service analytics initiatives (and spreadsheets) are often at odds with the goals of data governance programs. There is a natural tension between giving users the ability to access, analyze, and share data in an application like a spreadsheet and administering and monitoring sound data governance policies. Spreadsheets, in their current form, pose a number of issues for data governance; However, with new innovation, like what I’ve seen with Row Zero, a modern, more powerful spreadsheet can help solve the tension between spreadsheets and data governance goals:

Data security and compliance issues

Data security and compliance is one of the main pillars of an effective data governance program. If your data is not secure, everything else in data governance becomes moot.

However, many spreadsheet applications lack basic data security and compliance features that are expected of modern data applications like strong password protection, fine grained role-based access controls, and robust audit features for tracking and managing changes. Spreadsheets are often so deeply embedded in operational processes and outside the visibility of IT and security teams that they get a free pass when it comes to security and compliance standards.

Row Zero gives users familiar spreadsheet functionality with foundational data security and compliance features. To start, Row Zero connects directly to data sources and data warehouses, like Snowflake, Redshift, Postgres, S3, ERPs, and others. This connection enables BI and data teams to give their business partners read-only and real-time access to the cleaned and modeled data sets they want them to use. The data warehouse connection is tied to the warehouse’s user credentials, which enforce account level permissions. Whether using personal credentials or a service account, data access is monitored at the source. Additionally, when a Row Zero user shares a workbook containing a data warehouse connection and a query, other users have no access to the credentials and can only re-run the query, with no ability to edit it. These conventions make it easy to give business teams self-serve access to clean curated data and maintain appropriate permissions and access controls.

Scalability issues

Excel’s limit of 1,048,576 rows and 16,384 columns per sheet becomes quickly apparent when working with large transactional data sets. Nearly every spreadsheet user has run into performance issues, lags, and workbook freezes when working on large (or even not so large data sets) with Excel and similar traditional spreadsheet applications.

From a data governance perspective, these issues are more than just user frustrations; they often cause second-order effects that impact data integrity. In practice, a frozen workbook can easily lead to unsaved changes getting missed and compromised data quality. Performance lags can quickly turn into data sets getting split across multiple workbooks with undocumented and untested ways of merging data back together.

Row Zero can support hundred-million row data sets and auto-saves workbooks as changes are made. Row Zero was designed to handle big data with all the same flexibility and speed users expect from a spreadsheet. The performance makes it easy for any analytical person to work with large data sets without having to learn SQL or new BI tools.

Erosion of trust in data

One of the most valuable outcomes of a successful data governance program is establishing organizational trust in data and the flows of data throughout the organization (often through processes like data lineage). Spreadsheets by nature invite manual changes to data from multiple users that are error-prone and hard to track. Most people can relate to receiving a spreadsheet with a file name ending in something like “FINAL (version 2) - TW edits.” It’s easy to see how this quickly erodes trust in data. Row Zero’s collaboration features enable any number of users to work in one workbook and with the use of data tables, at a minimum can ensure the source data being used is read-only and can’t be fat fingered or altered.

Conclusion

In the world today users import CSVs or copy/paste data into Excel. This process is error prone, wastes time, and results in emailing .xlsx files around, with no visibility into who has access to what. With Row Zero users can connect directly to data warehouses, saving time and reducing copy/paste mistakes. Data teams also have visibility into where that data is consumed.

Overall, spreadsheets are so ingrained in the data practices of most organizations that any effective data governance program will necessarily have to account for them. Modern spreadsheet applications like Row Zero make it easier to reconcile spreadsheets with sound data governance practices.

FAQs