The EPA Chemical and Products Database (CPDat) contains information on the chemical ingredients of thousands of consumer and industrial products. We've imported the full CPDat database into Row Zero, an enterprise-grade spreadsheet for big data, to make it easy to analyze CPDat data. View the CPDat spreadsheet here or continue reading to learn more about the CPDat dataset.
Dataset Summary
The dataset includes chemical ingredients data on more than 300,000 commercial products, with data on more than 12,000 different chemical ingredients. Each product is classified to a category, family, and product type. Each chemical is classified with a CAS number, which is a unique identifier assigned by the Chemical Abstracts Service (CAS) and includes information about it's functional use (e.g. solvent), and composition within each product. The source for the dataset is the Chemical and Products Database (CPDat) from the Environmental Protection Agency (EPA). The CPDat database is a collection of information sourced from publicly available documents and EPA assessments.
The dataset includes the following sheets:
- Summary Data - Contains pivot tables that summarize the raw data from the product_composition_data sheet to highlight data by product, category, and company.
- product_composition_data - Contains the list of products and their chemical ingredients with one row for each chemical and product combination. Products have been curated to Product Use Categories (PUCs) and chemical composition information is curated to weight fractions, curated names, and curated ingredients' chemical identifier information.
- functional_use_data - Contains the list of all chemicals in CPDat that have a functional use reported in their associated data document and includes the raw and curated chemical identifiers and functional use.
- list_presence_data - Contains the list of all chemicals in CPDat database that were collected from a list presence document with their raw and curated chemical identifiers.
- puc_vocabulary - Contains information about the Product Use Category (PUC) vocabulary of CPDat.
- fc_vocabulary - Contains information about the Function Category (FC) vocabulary of CPDat.
- lpk_vocabulary - Contains information about the List Presence Keyword (LPK) vocabulary of CPDat.
- CPDat v4 Data Dictionary - Data dictionary for the CPDat database with descriptions for each field.
- CPDat v4 File Information - Lists the files included in the CPDat download along with a description of each file.
Highlights from the data
The Summary Data sheet includes pivot tables that highlight interesting views of the CPDat dataset.
Number of Chemicals in Each Product
Looking at a pivot table of the count of chemicals by product, reveals that beauty products tend to have the most chemicals per product, with hair coloring, hair styling, and face cream products dominating the list of products with the most chemicals. Several beauty products contain more than 100 different chemicals.
To view the full list of chemicals by product, go to the product_composition_data sheet and search for the specific product.
Number of Chemicals Used by Companies
This pivot table shows the number of distinct chemicals used by each company and the average number of chemicals per product by company.
To view the full list of chemicals by a company, go to the product_composition_data sheet and filter the Organization column.
Products with the Most Chemicals
Summarizing by product type, makes it clear that beauty products have the most chemicals on average.
Most Commonly Used Chemicals in Products
Water is the most commonly used chemical across all products in the dataset. After water, the 5 most common chemicals in products are:
- Titanium dioxide
- Glycerol
- 2-Phenoxyethanol
- Xylenes
- Toluene
You can also use the additional pivot tables and filters to lookup common chemicals by product type and/or company.
Use Cases for this Dataset
Row Zero is a powerful spreadsheet built for big data, so you can easily open the CPDat database in a spreadsheet in Row Zero to explore the dataset, lookup products, and analyze the data. Here are a few common use cases:
- Lookup chemical ingredients in products to better understand chemical exposure, identify potential risks, and better understand chemicals used in everyday products.
- Identify products that contain chemicals that may cause allergic reactions or pose risks.
- Lookup CAS numbers (Chemical Abstract Service numbers) and functional use of specific chemicals.
- Evaluate impact to supply chains if there is a shortage in a chemical or disruption to supply. You can see what companies and products will be affected.
- Support research involving chemical risk assessment, exposure modeling, and chemical informatics.
Data Sources
The source for the dataset is the Chemical and Products Database (CPDat) from the U.S. Environmental Protection Agency (EPA). The CPDat database consolidates information from publicly available documents and EPA assessments. Data referenced here has been updated as of May 2025 with the most recent data available. You can download the raw dataset at the link above.