The quality of your data analysis depends on the quality of the data behind it. This is why data cleaning in Excel is so important. Excel data riddled with errors and inconsistencies is a significant issue: 20% of IT and data leaders say poor data quality is one of the most critical challenges they face.
Clean data is the foundation of achieving high quality data and accurate insights from data analysis. No one wants to see errors in their outputs—or worse, have mistakes go undetected and incorrectly steer business decisions.
What is data cleaning in Excel?
Data cleaning in Excel is the process of preparing raw data for analysis by removing bad data, organizing the raw data, and filling in the null values. Working with clean data is a prerequisite for getting meaningful and accurate insights from your data.
Data cleansing is just one of the steps in the data preparation process that ensures data is ready for analysis and reporting. Data prep also includes:
- Data exploration: Discovering what surprises the dataset holds.
- Data blending: Joining multiple datasets and revealing new truths.
- Data profiling: Spotting poor-quality data before it poisons your results.
- ETL (Extract-Transform-Load): Aggregating data from diverse sources.
- Data wrangling: Making data digestible for your analytical models.
What are the components of clean data?
- Accuracy: Accuracy means your data correctly displays the real-world values it represents. It ensures that the information shown is free from errors and reflects the true state of the data.
- Consistency: Consistency means data remains even across different datasets and periods. Clean data will contain the same formats and units of measurement throughout.
- Completeness: Data completeness refers to the extent to which all necessary information is present in a dataset. If your data is incomplete and key details are missing, this can impact results. Make sure to fill in gaps or account for missing values in your analysis.
- Uniformity: Uniformity is the degree to which the data is using the same unit of measure or one coherent format and structure. This will include using the same units of measure or formatting dates and labeling them in categories.
- Validity: Validity means the data values fall within the predetermined acceptable range and conform to the expected patterns. For example, if a valid age entry should be between 0 and 120 years, validation rules and checks will be applied to ensure that the data meets this criterion. This prevents outliers and erroneous entries from skewing the results.
Techniques for cleaning data in Excel
The vast majority of business analysts have learned how to clean data in Excel. But given the huge number of Excel data cleaning techniques, this is usually an on-going process. To help analysts get their start (or refresh some old tricks), we’ve selected the most common ways for how to clean data in Excel with instructions for each data cleansing technique.
Remove duplicates
One of the easiest data cleansing techniques in Excel is removing duplicates. Different bits of data are often stored as different datasets, and sometimes in different databases. Duplication occurs when the data is consolidated, even through simply copying and pasting from various sources. Duplicate values can appear without the user’s knowledge so it’s important to take this step for cleaner data. To remove duplicates in Excel, follow these simple instructions.
- Select the cells you want to check for duplication
- Select the “Data” option in the toolbar and in the “Data Tools” ribbon, select the “Remove Duplicates” option
- Then click OK
Remove extra spaces
To get rid of those difficult-to-spot extra spaces, analysts can use the TRIM function, which removes leading and trailing spaces as well as the additional spaces between words. This isn’t just for aesthetics—removing the spaces in Excel is important in order to be able to surface those values through functions such as VLOOKUP. Here’s how to do it:
- The TRIM function can be used manually, but it’s possible to wrap the TRIM function in context with the VLOOKUP function. This allows users to search and correct all added spaces while finding the matches they were searching for.
- Your VLOOKUP formula would then consist of the following:
- TRIM + the lookup value, stripped of extra spaces
- The lookup range
- The column that has the value to return
- 0 (for exact match)
Find and replace text
The find and replace function is useful for several data cleansing techniques. One of them is to standardize values for datasets that may store values in multiple variations. You can use find and replace to tackle them one by one.
- Select the cell or column
- Go to Home > Find and Replace
- Enter a value in the Find what option
- Enter the value you want to replace it with
- Click on “Replace All” if you want to apply it to the entire sheet
Addressing blank cells is another way to use the find and replace function when cleaning data in Excel. If not addressed, these many read as errors or will not be included in your end analysis if they were originally intended to signify “0.”
- To select all blank cells, users need only to open the Go To dialogue box, select “Special” and then select “Blanks,” or all of the blanks in your cell.
- From there, it’s easy to enter a uniform value, such as “0” or “N/A,” by pressing Control and Enter.
Check for typos and misspelled words
Misspelled terms can happen anywhere so it’s a good idea to always check your data for spelling mistakes and typos.
- Select the data cell, column or sheet where you want to check for spelling errors.
- Next, go to the review option and choose “spelling.”
- Excel will automatically show the correct spelling in the dialogue box. Click “change’ if you agree with the suggestion.
Standardize formats
Fixing format structure in Excel is one of the most common data cleaning challenges. While most databases contain structured data, there are times when formats are wrongly detected or mismatches occur in data formats. Formatting in Excel is the process of modifying data in spreadsheets from one form to another. There are a few ways to standardize formats in Excel.
- Using Excel’s formatting tool
- Select the data range
- Right click and choose “Format Cells”
- Adjust the format setting as needed
- Using Power Query to detect and change the data types and formats
- Go to Power Query
- Select a data source
- Go to the Query > Edit function
- Select a column and then look for “Detect Data Type” under “Transform”
- Choose a data type or format from the drop down
Manage blank cells
There isn’t a single approach to handling blank cells in your dataset but it helps to know the reason why cells are blank in the first place. Here are the most common ways to handle blank cells in Excel:
- Delete blank cells
- Select the range you want to delete
- Right click on the selection
- Choose “Delete” and select the option to shift cells up, left, or entire rows/columns
- Conditional formatting
- Select “Conditional Formatting” from the Home Tab
- Select “New Rule”
- Select a rule type
- Choose how you want to format the blanks
- Filter the data
- On the Home Tab, select “Filter” under “Editing”
- Add a filter to the columns you want to filter
- Click on the data arrow for the selected columns
- Uncheck all the boxes except for the box “blanks”
Split and merge columns
If you have datasets that need to be split into separate columns, you can use the “text to columns” feature in Excel.
- Select the date range of the data
- Go to the data tab
- Click on “Text to Columns”
- Follow the steps to specify the delimiters and split the data
Merging data from multiple columns is another effective way to clean data in Excel. The easiest method is to use the “CONCATENATE” function or the ampersand (&) operator.
Distribute the contents of a cell into adjacent columns
When analysts first start to learn data cleaning in Excel, this is another early question that arises. Analysts don’t normally have absolute control as to how data is recorded, which means that too much data can be recorded in one cell. For example, sensor data arrives with both the date and time combined in a single value, when it is often more helpful to separate the two values into two columns. Here’s how to do it:
- Select the cell, range, or entire column that contains the text values that you want to split.
- On the “Data” tab, in the “Data Tools” group, click “Text to Columns.”
- Follow the instructions in the “Convert Text to Columns Wizard” to specify how you want to divide the text into separate columns.
The challenges of data cleaning in Excel
The experience of cleaning data in Excel is reliable and familiar; analysts have been successfully cleaning up data in Excel for decades in order to ready it for analysis.
However, as today’s data grows larger and increasingly complex, Excel is showing its age. Data cleansing works best when analysts are relatively familiar with the data at hand; they know what to search for and what types of errors need addressing. If not, it’s hard to identify invalid numbers or incorrect dates, especially when that data respects certain format standards. And when it comes to assessing data inconsistency between multiple spreadsheets, cleaning data in Excel becomes a complex project all by its own.
Instead of learning how to clean data in Excel, analysts are increasingly using new data preparation solutions that intelligently identify possible transformations and visually surface errors. These types of solutions mirror the direction of the analytics world—adopting platforms that are user-friendly, flexible and allow for visual interaction.
A time-saving alternative to cleaning data in Excel
Without proper data quality, your final analysis will suffer in accuracy, or you could potentially arrive at the wrong conclusion.
While using Excel to clean data can sometimes work well for simple datasets, new data preparation solutions offer an intelligent approach to tackling larger and more unfamiliar data.
Alteryx solves the data cleansing dilemma. It allows users to interactively explore their datasets by clicking on components of the data so that they can discover new and essential ways in which it must be transformed. Instead of a process driven by careful steps like data cleaning in Excel, Designer Cloud will automatically present suggested transformations for users to evaluate and edit. With Designer Cloud, not only are users producing more thoroughly cleansed datasets, but are also achieving it in significantly less time.
To learn more about how Designer Cloud works and if it’s a better alternative for your use case, schedule a free demo today.