Article: How To Avoid The Embarrassment Of Dirty Data
How To Avoid The Embarrassment Of Dirty Data
Mon 20th December 2010
More than 25% of the most critical data in the world's top companies is flawed. The problem, however, is due to bad business practice and not to any IT glitches. Here we find out how Excel can help you to keep your data clean.
When we hear about dirty data we often think about data quality; that is how trustworthy data is, including whether the data is valid, consistent, accurate and relevant. Usually, when data is described as dirty, the information is inaccurate, incomplete or duplicated. However, a startling three-quarters of large enterprises will make hardly any improvements to clean up their data. To gain the competitive advantage from this type of information, it's crucial that organisations begin to look at their data as a valuable corporate asset.
Poor quality or dirty customer data can lead to huge, unnecessary business costs when mail-outs are misdirected and emails sent to out-of-date contacts, and it's common for more than 10% of the marketing budget to be wasted on this kind of mis-targetting. Today, businesses from government, banking and finance, telecommunications, utilities to charity organisations recognise that data quality has a huge impact on most of their marketing and strategic business plans as they address the need into trying to clean up their act.
If dirty data is something that you regularly address in your organisation, you might also want to look at how clean your data is once it has been verified as being accurate and honest. While dirty data is associated with inaccuracies, clean data relies on clear and well-formatted information. Excel 2010 ensures that you have the tools to format and clean data professionally.
After cleansing, a data set might still have inconsistencies caused when generating from different sources. This is the point when the process of data cleansing involves removing typographical errors or validating and correcting values against a known list of entities. The validation could be to reject any address that does not have a valid post code, for example - which is referred to as strict cleaning; or it might be fuzzy cleaning - such as correcting records that partially match existing records.
With Excel 2010, you can ensure your data is clean by correcting misspelled words, sorting out leading or trailing spaces, unwanted prefixes, improper cases, and non-printing characters. By cleaning up your worksheets it's easier to read and understand your data and carry out more accurate calculations using this data.
It's not always easy to have control over the format and type of data that you import from an external data source, such as a database, text files, or a web page, but with Excel 2010 you can easily format all your data into one format. To remove duplicate rows, for example, you can use the Remove Duplicates dialog box. Or you can manipulate one or more columns by using a formula to convert the imported values into new values. For example, if you want to remove trailing spaces, you can create a new column to clean the data by using a formula, filling down the new column, converting that new column's formulas to values, and then removing the original column.
Fixed spaces and non-printing characters are other glitches which can sometimes cause unexpected results when you sort, filter or search. While data was input, extra space characters may have been added by mistake; or imported text data from external sources may contain non-printing characters that are embedded in the text. It's difficult to notice this type of characters but it's easy to remove them by using a combination of the Trim, Clean, and Substitute functions.
Numbers, too, can be inadvertently imported as text, but are easily amended with Excel; or even different date styles which can sometimes transport into formats that can be confused with numbered part codes or other strings that contain slash marks or hyphens, dates and times often need to be converted and reformatted.
Even merging two or more columns into one after importing data from an external data source, or splitting one column into two or more columns, is easy in Excel. Sometimes you may want to make the rows into columns, and the columns into rows. At other times, data is not even structured in a tabular format (rows and columns), and you need a way to transform the data from a non-tabular to a tabular format. Excel allows you to reconcile table data by joining or matching which is useful when you need to find and correct matching errors when two or more tables are joined. This might involve reconciling two tables from different worksheets, for example, to see all records in both tables or to compare tables and find rows that don't match.
Preventing errors is easier than trying to correct all your data at a later stage, and tools are being developed to help organisations eradicate errors early on in data collation. But errors will still occur, so it's good to know that with Excel there's always a reliable way to clean up your act.
Author is a freelance copywriter. For more information on excel training london, please visit https://www.stl-training.co.uk
Original article appears here:
https://www.stl-training.co.uk/article-1314-avoid-embarrassment-dirty-data.html