Gathering necessary data is a crucial part in problem-solving and in project management. This helps managers in two ways –
- It helps define the problem completely
- It helps evaluate the feasibility of a solution.
Data cleansing is the next important step which helps companies retain ‘quality data’ and remove that which is inconsistent, inaccurate and redundant. When it comes to Business Intelligence the quality of the data input determines the quality of the output.
There are several reasons why one can get saddled with bad quality data. It often happens when business dynamics of an organization change quickly. It can also be the result of un-manageable application proliferation. And finally, human error cannot be counted out. Once you understand the primary cause of bad quality data, you will be able to understand how to handle it better.
In this article, we will discuss three key reasons behind data quality issues. In reality, all applications contain dirty data – those which are meaningless or not representative of the business in which it is used. Dirty data gets included in the database for several reasons.
1. Manual Error is a primary source of dirty data. This happens due to the inability of the application to validate the data input. A simple example could be customer name. It is possible that the person making the entry on the application spells the initials, first name or last name wrong.
2. Data Manipulation is also a source of dirty data. This often is the result of people needing to fill up mandatory sections simply to move on in the application form. People do this despite knowing that their entries are wrong. Another reason for data manipulation is the unethical behavior of the user who may want to fudge data to fulfill certain criteria and gain access to certain features.
3. While data migration happens from one database to another, there is a possibility that incompatibility issues arise leading to data quality issues. For example, when a company expands and migrates its database to a new vendor, there is a possibility that dirty data gets created due to incompatibility in the tables between the old and new database systems.
Once you are able to analyze and understand the reasons behind the creation of dirty data, you will be able to come up with a decent solution for it as well.