The remove duplicates feature instantly deletes duplicates from the sheet, but is a one-time action. You can automate removing duplicates as new data updates with several techniques.
In this guide, we’ll show how to automatically handle duplicates in spreadsheets using Row Zero, but many of these techniques can be used to remove duplicates in Excel and Google Sheets on smaller datasets.
Row Zero is an enterprise-grade spreadsheet that makes it easy to remove duplicates from massive files or auto-updating, connected spreadsheets. Row Zero works like Excel and Google Sheets but supports 1000x bigger datasets and connects directly to your data source.
How to remove all duplicates one time
Row Zero has a built-in remove duplicates feature that lets you instantly remove all duplicates in a spreadsheet range with a click. This works similar to remove duplicates in Excel and Google Sheets. Since Row Zero can handle 1000x bigger datasets, you can easily remove duplicates from massive datasets in a spreadsheet.
How to use the remove duplicates feature:
- Open up a Row Zero workbook and get your data ready. You can import a file or connect to your data source.
- Select the range of cells to remove duplicates from. If you don't select a range, it will select the entire sheet by default.
- In the header menu, go to Data, Remove duplicates.
- Select the column(s) to remove duplicates from. If you select multiple columns, it will only remove duplicate rows that have duplicate values in each of the columns selected. By default, all columns are selected, but in many instances you may want to select just one or two identifier columns like a GUID or email address.
- Click Apply and all matching duplicate rows are instantly deleted while the first instance of each duplicated value is retained.
Important tip: Be thoughtful about sorting on a particular attribute before removing duplicates. For example, you may want to sort on date to only keep the first or last instance of an entity.
Keep in mind, this is a one-time remove duplicates. It's great for removing duplicates from static files like big CSV files, parquet files, and other large file formats. However, you'll want to use one of the below solutions to automate deleting duplicates from dynamic spreadsheets that are auto-updating with new data.
How to automatically remove duplicates in updating spreadsheets:
Row Zero makes it easy to build connected spreadsheets that automatically update with the latest data, with built-in connectors to Snowflake, Databricks, Postgres, and more. You can also dynamically update spreadsheets with new files.
In these updating spreadsheets, you'll want to automate deleting duplicates and/or account for duplicate values. Row Zero offers several features to automatically remove duplicates from updating data with connected tables and any of the below features.
Dynamic pivot tables
Row Zero pivot tables dynamically update with new data and can be built on connected tables in sync with your data source. By default, pivot tables only include unique rows for each set of fields assigned to Rows, so they are an easy way to remove duplicates and/or show and summarize data by unique values. You can also use the built-in pivot table count unique and count calculations to count unique values and duplicate values.
You can also dynamically build calculated columns on top of Row Zero pivot tables, use pivot tables as source data, and create pivot charts, so they are a good tool for summarizing and re-orienting your data around unique values.
Dynamic HISTOGRAM function
The HISTOGRAM function creates a table of unique values from a specified range or connected table column and counts how many times each value appears. It's a fast way to count unique values and automatically sorts most to least.
UNIQUE function
The UNIQUE function creates a list of unique values from a specified range or connected table column. This creates a new table with only unique values and duplicates automatically removed. It sorts ascending by default, but can be filtered or sorted descending. This can be helpful when you want to build out a clean new table that auto-updates. You can easily add calculated columns using COUNTIFS, SUMIFS, etc. to summarize your data and/or use lookup functions like XLOOKUP or MULTILOOKUP to lookup corresponding values from other datasets.
Counter columns
You can use the COUNTIF and COUNTIFS functions to create a counter column that automatically counts the number of times a value appears in a range. You can then filter this column to "1" to only see unique, non-duplicated values or sort all values on count. Counter columns unlock a myriad of ways to automatically highlight, sort, clean, and transform data based on value count. Here is an example of a dynamic counter column added to our UNIQUE table above.
Conditional format duplicates
You can use the conditional formatting feature to find and highlight duplicate values automatically. Just set the Condition to Duplicate to highlight duplicates with your desired formatting. You can also highlight duplicates across multiple columns or conversely highlight unique, non-duplicated values.
Dynamic filtering with the tilde operator
You can use the tilde operator "~" in the formulas above to also have formulas filter in sync with connected table filtering (see screenshot below). The ~ is added right before the data table reference in the formula.
Note that the features above do not remove duplicates from your connected source data in your spreadsheet. Rather, they can be used to automatically transform your source data into a new data table that is automatically cleaned and preprocessed to power dynamic dashboards, charts, analysis, etc. At any time, you can also write-back clean data from your spreadsheet to your data source, using the 2-way data sync feature.
Conclusion
Spreadsheets are the most popular tool for data cleaning, and accounting for and/or deleting duplicates is one of the most common data cleaning actions. You can use the remove duplicates feature to automatically delete all duplicates in the spreadsheet, but this is a one-time action. If you have a connected, updating spreadsheet and want to automate removing duplicates as data updates, you can use several features including dynamic pivot tables, and several functions that remove duplicates automatically like UNIQUE and HISTOGRAM. You can also use COUNTIF to create a counter column to track duplicates or conditional formatting to automatically highlight duplicates.
If you're working with large datasets, it can be a challenge to automatically remove duplicates in Excel or Google Sheets due to their data limits. The Excel limit is 1,048,576 rows and the Google Sheets limit is 10 million cells. Row Zero is an enteprise-grade spreadsheet built for big data that makes it easy to remove duplicates from big data sets. You can also connect to a data source to create auto-updating spreadsheets and leverage the features above to automatically remove duplicates as data updates.