Excel has a row limit of 1,048,576 rows and a column limit of 16,384 columns. In the post below, we provide 5 solutions for working with big data sets that exceed the Excel row limit. If you are looking for a quick solution, try Row Zero, a blazingly fast spreadsheet designed to easily handle giant datasets and complex models.
Excel is a great tool for all the data sets smaller than the row limit but the reality is Excel starts to slow down well before limits are hit. We wrote about the symptoms of slow Excel spreadsheets in a previous post. For datasets larger than the maximum number of rows, the size is an obstacle and as the world accumulates more data, users will more frequently hit Excel's performance limits. Below is an overview of why the excel row limit exists and how to work around it. If you want to skip straight to the solutions, use the following table of contents:
- The effects of hitting the Excel row limit
- Why does Excel have a maximum number of rows?
- 4 methods for working with data sets bigger than the Excel row limit
What are the effects of hitting the Excel row limit?
When trying to import or open a data set in an Excel worksheet that exceeds the maximum number of rows, Excel will alert the user that the dataset they are trying to import exceeds the row limitation in the Excel data grid.
If 'ok' is selected, Excel will proceed with importing the data set but will truncate any rows beyond the maximum number.
Once the data is in, Excel may perform well if the computer being used has sufficient memory (RAM). If the computer has insufficient memory, a number of performance deficiencies will manifest. A more detailed overview of Excel's performance issues can be found in the blog Why is excel slow or slow to open?. Among other things, working with large data sets, even if they are under the max rows, can cause Excel to crash, run slowly, or cause an inability to save the file.
Why does Excel have a maximum number of rows?
Excel stores data in a grid format, with each cell having its own unique address based on its row and column position. Each cell can contain a certain amount of data, such as text, numbers, or formulas, and this data is stored in memory. As the number of cells in a worksheet increases, so does the amount of memory required to store and manage that data.
In addition, Excel needs to be able to perform calculations and manipulations on that data, such as sorting, filtering, and creating charts. These operations require processing power, and as the amount of data increases, so does the amount of processing power required.
Excel runs locally on a user’s computer and is therefore constrained by the amount of memory (RAM) and processing power (processor) on the machine. The performance of Excel and ability to handle large data sets and perform big computations is therefore limited by the hardware on the computer. Users with slower computers will see slower performance whereas users with more memory and faster processors will experience better performance. At the largest row count of 1,048,576, Excel ceases to be responsive due to the amount of memory required to support all the data and features to manipulate it.
4 Methods for working with datasets bigger than the Excel row limit
1. Use a more powerful spreadsheet
Row Zero - Row Zero is a blazingly fast spreadsheet that is designed to have the same functionality as Excel but can handle multi-million row data sets. Row Zero runs in the cloud and is not hamstrung by the hardware on a user’s computer, supports real-time collaboration, and can ingest data from a csv upload, S3 import, database query, API connection, or a connection to any other cloud data source.
2. Data Model/PowerQuery
Microsoft built a few additional features into Excel to let users work with large datasets. Those features are Data Model, Power Query, and PowerPivot. These features enable a user to open a 1M+ row dataset and view the first million rows, sort, filter, and pivot.
Data Model, Power Query, and PowerPivot are hard to find but they exist for the big dataset use case. The downside to these features is they don't facilitate the interactivity a spreadsheet provides. Interactions are limited to column level computations and it's not easy to build models as you normally would with data sets below the row limit in Excel.
Follow instructions below to use them.
Click the ‘Data’ menu across the top, then select ‘Get Data.’
Then select the data type you would like to import and click ‘import.’
Once the preview launches, select ‘Load to’ from the bottom drop down.
You will see several options
- Pivot table - Allows interactions with the data in a pivot table format but the underlying data is not visible
- Pivot chart - Allows pivot interactions with the data, which generates a chart. The underlying data is not visible.
- Table - Tries to load data to a worksheet but will only display the first 1,048,576 rows.
- Only Create Connection - Creates a connection to the data set. Data can be viewed by clicking on ‘Manage data Model’ or double clicking the data connection.
Make sure in each case you select ‘Add this data to the Data Model,’ otherwise the workbook won’t support more rows than the row limit.
3. Use a Notebook or a SQLite Database
- Jupyter Notebook - Jupyter is a tool best suited for software engineers, data scientists, and analysts that know how to code. The application enables manipulation of large data sets with commands written in code cells and can handle millions of rows, though it runs locally and is dependent on the hardware in a user’s computer.
- SQLite - SQLite is an easy to install lightweight database that can be installed and run on a user’s computer. The database can store millions of rows of data, which can be accessed by writing SQL queries to transform and analyze data. Once transformed to a smaller subset, the data can be opened in a spreadsheet.
4. Open in a notepad file and truncate
Open the large data set in a notepad file, like notepad++ and delete all the rows over 1,048,576 so it can be opened in MS Excel.
There are a number of different options for working around Excel’s row limit. Most of them require learning some new skills, downloading programs, or file manipulation. If you are comfortable writing code or SQL a jupyter notebook or SQLite database might be best for you. If you would prefer to stick with existing tools, some of Excel's work-arounds may be sufficient. In our opinion, the easiest option is to simply use a better spreadsheet. Row Zero, a blazingly fast spreadsheet designed to work with big data.