Skip Navigation Links
xFDN Home
Contributions

Improving analytical processing for the Municipal Corporations

Author: Chetan Bhattad

Greater Hyderabad Municipal Corportation (GHMC) had a need to analyze historical data to understand the sanction of building licences for each municipality. The data to be analyzed is stored for each month and each municipality in a separate Excel file. The challenge was to consolidate hundreds of Excel files, remove duplicates and standardize data.

To perform this analysis, one could use manual means to consolidate the data. However, manual entry is not only time-consuming but is also error-prone. Using an ETL tool such as xFusion Studio is a better choice.

This article explains how xFusion Studio can help meet the requirements of GHMC. To keep the article succinct, the steps have been rather simplified.

Step 1: Extracting data

Create new connection. Select Microsoft Excel data connector under ISAM/Files tab. Give the relative path to the files containing desired data. Save the connection. Now create a new standard query and select all the columns from desired table.



Note that the given input file does not have any column headers. For such cases, Excel automatically defines the column names as F1, F2, etc.
For renaming the column names, right click on the column name and select alias or you can use select function.



Step 2: Apply transform functions to get the data in proper format.

Records that have some specific values for the column "PROFORMA-I" must not be used for analysis. This can be done using Filter transform function.



In the given data, column F3 was expected to be numeric but some of the records contained non-numeric values. It was decided that such non-numeric values must be replaced with 0.



Step 3: Integrate the data from different Excel sheets

For each input Excel file, steps 1 and 2 must be repeated.

Our next task is to consolidate the data. We need to use 'AppendTableOnNames' function for this purpose.



Step 4: Identifying duplicates

The way the input data is, a specific building information may sometimes be in multiple Excel files. Such duplicates can be identified using FindDuplicateRows function and removed using RemoveDuplicateRows function.



Step 5: Analyzing the data

Function GroupBy can be used to aggregate data and draw conclusions as shown here:



In conclusion, xFusion Studio helps consolidate and analyze data for organizations such as GHMC where data is spread across multiple Excel files.

 

©2010 Software Labs, Inc.