Skip Navigation Links
xFDN Home
Contributions

Cleaning master data created by application

Author: Vikas Kale

Applications use database to capture and store data. As these applications are programmed and run manually there is possibilty to have bad data in the database. Analysis and reporting on incorrect data can be devastating. The primary challenge has been data profiling to identify bad data and then cleansing of bad data.

In this article, I will show how to identify and clean bad data based on some rules using xFusion. Here is a scenario showing how applications can create bad data in master tables.


Case: Web application that takes personal information from a web page and store it in database.
Rule: A client must submit his personal details only once. For each entry, the application creates an unique code for the client which is a primary key in database. Client may accidently submit the form with incorrect or incomplete information. When he resubmits the form with correct information the application generates a new code. So now, the database contains two codes for the same client.

Connecting to the database and extracting data from the database

Create a new connection, select database, providing all the valid parameters, check test connection and finish. Create new standard query and select all the columns from desired table.

Identifying duplicates on the desired columns

Our First step is to identify duplicate entries based on the columns name, surname and relationname. Using FindDuplicateRows function this can be achieved. You may change the column names to find duplicate records.

Identifying nearest names

xFusion has build in function like DoubleMetaphone, Soundex based on the phonetic algorithms. So these may help in identifying names 'Bryan' and 'Brian'.This knowledge can be used to take necessary decisions.

Filtering records with null or empty values

Using IsNullOrEmpty function we can identify records with no value in specified fields. In this case, any record with missing doorno is considers bad data. Refer to the screen shot below to identify bad data using xFusion builtin functions.

These were some of the data profiling and cleansing techniques using xFusion. Identifying and cleansing bad data before analysis and reporting is critical to any business.
©2010 Software Labs, Inc.