← Back

10 ways spreadsheets violate data governance and pose a security risk

2024-10-07 // Mark Tressler

Spreadsheets are the most widely used data tool in most organizations, used by employees of all skill levels and seniority levels. Because spreadsheets are so ingrained in most organizations, they can pose a latent security risk without a proactive effort to monitor and improve spreadsheet data governance. Spreadsheets can be a double edged sword - they make it very easy for anyone to do a wide range of business critical work, but that very attribute and lack of security controls can make traditional spreadsheets like Excel a security risk. It can also erode data integrity and lead to data leakage. Row Zero is a next-gen spreadsheet built for big data and enterprise data governance that solves many of these issues.

In this guide, we explore 10 ways traditional spreadsheets like Excel and Google Sheets may violate data governance, security, and privacy compliance. We’ll also show how proactive data governance policies and implementing Row Zero can significantly improve data security and integrity and prevent data leakage in your org.

10 Ways Traditional Spreadsheet May Violate Data Governance and Security

  1. Getting data into the spreadsheet
  2. Inadequate access controls
  3. Workarounds for large data sets
  4. Ungoverned sharing
  5. Loss of data integrity
  6. Storing data in an unsecure environment
  7. Lack of data masking
  8. Non-compliance with data retention policies
  9. No back up or recovery plans
  10. Untraceable data leakage

1. Getting data into the spreadsheet

The biggest source of data governance violations is likely how data data is sourced into spreadsheets in the first place. Best practice would be a direct, secure connection from your "source of truth" data to your spreadsheet that enforces user and row level security access to data. No middleware connectors, no manual processes, no downloading/uploading. The reality at most orgs is a mess. Employees download CSVs of customer data out of a myriad of cloud software services like CRMs, marketing platforms, accounting/ERP software, and even BI tools. import file

Even when data does flow from your database or data warehouse to your spreadsheet, it may pass through a middleware connector or is exported from Postgres, Snowflake, etc., downloaded locally, and imported into Excel or Google Sheets. This all creates a number of security risks and opportunities for data leakage and loss of data integrity.

How to improve: Set export permissions and access controls across your entire tech stack and set up a direct secure connection from your data warehouse to your spreadsheet software. Make your "source of truth" the easiest (and only) way for anyone to access sensitive data in a spreadsheet. This is where Row Zero can really help. Row Zero is a cloud spreadsheet that securely connects directly to your data warehouse or data source. Row Zero inherits access controls and row level security permissions from your data warehouse. Data is automatically kept in sync and never leaves the cloud.

2. Inadequate access controls

Typical spreadsheet files for Excel and Google Sheets lack robust access control mechanisms. In many cases, once a file is shared, it can be copied, downloaded, or forwarded with little or no restriction. This makes it challenging to ensure that only authorized personnel have access to sensitive information and could lead to violations of security compliance and privacy regulations like HIPAA (Health Insurance Portability and Accountability Act), CCPA (California Consumer Privacy Act), and GDPR (General Data Protection Regulation). You can password protect an Excel file, but this is often done as a one-off as opposed to default, meaning it can be forgotten or overlooked.

As mentioned above, a lack of access controls across the tech stack (in CRMs, marketing software, ERPs, etc.) can lead to ungoverned data downloads and data leakage into spreadsheets and beyond.

How to improve: Similar to the above, set access controls and restrict export permissions across your tech stack to prevent unauthorized data leakage. If your org uses Google Sheets, make sharing Restricted by default. Regardless of your tech stack, find a way to make it easy and default for data to flow from your data warehouse to spreadsheets in a way that maintains the user permissions and row level security from your data warehouse but also makes it easy for the typical user to get the data they need to do their work. Row Zero makes this easy and supports SSO and robust access controls for your spreadsheets.

3. Workarounds for large datasets or file formats

Several security risks of Excel and Google Sheets are related to big data files and formats. Excel has a max row limit of 1,048,576 rows and Google Sheets supports a max of 10 million cells. In practice, Excel slows down and Google Sheets crashes well below these limits. google sheets crash

Because of these limits, users seek workarounds to get their work done. Users are forced to split big files into multiple smaller files, which invites errors and integrity issues. Or worse, they go to random third party websites that open or split large files and upload your sensitive data.

It's the same story for compressed files and large file formats. Google Sheets and Excel do not natively support large file formats like parquet files and compressed formats like .gz files. This again leads to use of unauthorized file reader websites or downloads of 3rd party software. This all contributes to data governance violations and potential security risks.

How to improve: Train employees to never upload sensitive data to unauthorized websites and provide solutions for working with large files and large file formats. Row Zero supports billion row spreadsheets (1000x Excel's limits) and natively opens parquet and compressed files.

4. Ungoverned sharing

Spreadsheets and CSVs are often downloaded, copied, or shared via email with little or no restriction. This can proliferate data leakage across and outside your organization. Ungoverned sharing makes sensitive data more vulnerable to interception or unauthorized access. In particular, sharing personally identifiable information (PII), protected health information (PHI), or financial data this way can violate regulations like HIPAA, GDPR, and CCPA.

How to improve: Train employees to limit sharing, downloading, and copying sensitive data. Where possible, restrict permissions to share and download data. In general, employees should not share files. They should share links to files that require a secure login to access the data. Row Zero lets organzations restrict data export and restrict external sharing org wide. Fully keep your data in the cloud to prevent emailing files or other untraceable data leakage, while still supporting real-time collaboration and governed sharing.

5. Loss of data integrity

Poorly governed spreadsheet usage presents a number of data integrity issues. Traditional spreadsheets are highly prone to human error due to manual data entry, which can lead to inaccurate or incomplete data. Poor version control can result in multiple versions of the same spreadsheet which can lead to confusion, errors, and no clear source of truth. You can protect sheets in Excel and lock cells, but source data isn't protected by default. Splitting CSV files to get under Excel data limits or working with static downloads instead of live connections all contributes to loss of data integrity. Most importantly, if parts of your org don't consistently source data from your central "source of truth", but rather export reports and CSVs out of their SaaS software (CRM, ERP, etc.), it's very hard to ensure everyone is working with the same data and same data definitions.

How to improve: Try to eliminate manual work to access or enter data into a spreadsheet. Ideally your spreadsheets should automatically connect to your data warehouse and automatically update as "source of truth" data updates. As much as possible, data should originate from a central data source (e.g. data warehouse) rather than originate in a spreadsheet or from from 3rd party SaaS vendors (e.g. CRM). Row Zero makes it easy to connect spreadsheets directly to your data warehouse. Data teams can also build shared queries and provide refresh permissions (without revealing credentials) so non-technical users can easily work with live, connected data. With Row Zero's connected tables, source data is protected and cannot be overwritten. Data stays in sync across all of your spreadsheet work. When the source data updates, it automatically updates everything built on top of it including auto-updating pivot tables, charts, and formulas. Row Zero empowers orgs to make accessing "source of truth" data easiest and default.

6. Storing Sensitive Data in an Unsecure Environment

In many orgs, Excel files are often stored on personal or unsecure devices (e.g. local hard drives, USBs) without adequate security controls. Spreadsheets are often saved in plain text or minimally secured formats, making them vulnerable to unauthorized access if stored on unsecured devices or shared improperly. Sensitive data like PII, financial data, or health records should be encrypted at rest and in transit, as required by regulations like GDPR, HIPAA, or CCPA. In addition to saving and storing data in an unsecure environment, emailing Excel files and CSV attachments also creates security risks. locally saved files not secure

How to improve: Create a data and technical infrastructure where employess cannot save sensitive data to a local machine or attach sensitive files to an email. Ideally data managers should be able to track everywhere data is accessed or modified. Employees should share links to data that require a login rather than saving, downloading, or sharing files. Row Zero makes this easy and keeps your data governed and in the cloud.

7. Lack of Data Masking

Traditional speadsheet usage lacks inherent data masking for sensitive data, meaning sensitive fields such as credit card numbers or social security numbers can be fully visible to any user. It's possible to use custom formatting or write a masking formula to replace sensitive data with "*" or random characters but that's after the sensitive data already shows in the spreadsheet. Lack of data masking can lead to unintentional exposure of sensitive data, violating privacy regulations and internal data protection policies. data masking in spreadsheet

How to improve: Best practice is to prevent the sensitive data from ever showing in the spreadsheet in the first place, especially to anyone that is not authorized to see it. Row Zero inherits the data permissions and row level security from your data warehouse which helps prevent unauthorized viewing of sensitive data.

8. Non-compliance with Data Retention Policies

Locally saved Excel files, locally downloaded CSVs, and even Google Sheets spreadsheets are often not managed under formal data retention policies, leading to the indefinite storage of sensitive data. Keeping sensitive data longer than necessary can violate data retention and deletion regulations under GDPR, HIPAA, CCPA, or other regional laws. In addition, when users request to delete their data, these regulations typically require data to be fully deleted from a company's records. If user data lives in static Excel spreadsheets or was was ever part of a downloaded CSV from any source (CRM, database, data warehouse, etc.) it's practically impossible to track down and delete the data.

How to improve: Ideally your system logs data queries, data changes, and tracks who accessed or modified the data. In terms of data retention, best practice is for data deletion to happen automatically by default. For example, files and data are automatically deleted when they haven't been opened for 365 days. For enterprise accounts, Row Zero can enforce automated data deletion rules. With Row Zero's connected tables, deleting data at the source will also delete it from connected tables in spreadsheets that reference it when the data is refreshed.

9. No back up or recovery plans

Due to the versatility and widespread use of spreadsheets, a significant amount of critical data may only live in spreadsheets in an org. Excel files may be stored locally and not always included in enterprise backup solutions, making them vulnerable to loss or corruption. Failure to back up critical spreadsheets can lead to data loss during system crashes or accidental deletions. This is especially true when working working with compicated models or large files that may cause Excel or Google Sheets to crash. google sheets unresponsive

How to improve: Ensure spreadsheets are included in enterprise backup solutions and reduce the amount of business critical data that only lives in one spreadsheet. Ideally spreadsheets should source critical data from a central data source like a data warehouse, instead of being the source of critical data. As a cloud-based spreadsheet, Row Zero reduces the likelihood of suffering data loss from local machine crashes, makes it easy to dynamically source data from data warehouses, and also generally handles large models and data sizes better than Excel and Google Sheets.

10. Untraceable Data leakage

Perhaps the biggest data governance issue with traditional spreadsheets is that they can lead to untraceable, widespread data leakage. Thanks to their widespread usage and ease of duplication and sharing, spreadsheets have a tendency to quickly proliferate data into many files, versions, and users. In isolation, each of these iterations may be harmless, but in aggregate across a whole organization over a number of years, this data leakage can lead to serious problems. When sensitive data can easily be put in an email or physically picked up (when it's saved locally), your enterprise may be put at serious risk of a data breach or compliance violation. untraceable email attachment

How to improve: Set up an infrastructure that lets data managers monitor all spreadsheet usage in an organization. Restrict data export as much as possible across your tech stack, restrict external sharing, and restrict access by default. In general your org should not share files, even internally. You should be sharing links to files that require a login to access. This is a much stronger default approach to data protection. Row Zero is built for data governance and offers these features to help you significantly reduce data leakage in your org.

Conclusion

Because spreadsheets are so versatile and widely used, they present a number of data governance issues. Evaluate your spreadsheet security risk with Excel and Google Sheets. If your org works with big data or sensitive data in a spreadsheet, Row Zero is a great alternative to Excel and Google Sheets. Row Zero makes it easy to directly connect spreadsheets to your source data and has a number of enterprise security features that help improve data governance and security across your spreadsheets. If you're serious about reducing data leakage in your org and complying with security and privacy regulations, then you need to take spreadsheet data governance seriously and incorporate spreadsheets into your data governance plan. Using Row Zero is a great first step. You can try Row Zero for free or compare plans here.

Explore Row Zero

FAQs