Skip Navigation Links
xFDN Home
Contributions

Build a Clean Address List from Microsoft Outlook Inbox

Author: Pradeep Tapadiya, Software Labs Inc.

For marketing campaigns, it is sometimes desired to build the email list of all the people who you have interacted with. Your Outlook Address Book most likely contains just a subset of such an email list. In this article, I will show you how you can use xFusion Studio to build a clean SMTP ready email address list.

Step 1: Extract Email Addresses from Outlook

From within Outlook, select menu item File->Import and Export.
Run the wizard. Export to a fileComma Separated Values (Windows)InboxC:\Temp\MyInbox.csv.
Click on "Map Fields" button and remove all the fields except From: (Name), From: (Address), To: (Name), To: (Address), CC: (Name), CC: (Address), BCC: (Name) and BCC: (Address).

Step 2: Create connection to C:\Temp\MyInbox.csv

From xFusion Studio, create a new connection of type Text Data Connector. Load C:\Temp\MyInbox.csv as a comma separated file. Rename this connection to My Inbox.

Step 3: Extract data from InBox connection

Create a new Standard Query object. Select all the columns. When previewed, you should see a table containing eight columns. Rename the query to Step 3: Raw Data.

Step 4: Normalize the table

We need to rearrange eight columns into just two columns - Names and Addresses. Create a transformation query to break these eight columns into four tables of two columns each and then run a Union type transformation to append four tables into one. Rename this query to Step 4: Normalized Table.

Step 5: Split multiple recipients into individuals

It is common to see an email being copied to multiple recipients. As you can see in the previous figure, these recipients are separated by semicolons in the Outlook generated file. We need to split multiple recipients into individuals. First, let's split the Names field. Create a new Transformation Query object and use the function Split to split the input, as shown in the following figure:

Here, we are specifying that the field must be split by semicolons and that we must create at most ten columns. If you feel your CC and BCC list may contain more then ten addresses, you can adjust the value in the third parameter.

Rename this query to 5a: Names.

Likewise, create another Transformation Query object to split the addresses column. Rename this query to Step 5b: Addresses.

Step 6: Merge names and addresses

The previous step resulted in two tables of ten columns each; one containing the recipient names and the other containing the recipient addresses. We now need to create a new table that merges all the twenty columns into two columns containing the name and the corresponding email address.
To accomplish this, we create a new Transformation Query that uses a combination of Select and Union functions. Rename this query to Step 6: Individuals.

At the end of this step, we now have a table containing the names and email addresses of each recipient.

Step 7: Filter data with legal SMTP addresses

At this point, your table may contain some addresses that:
  • are NULL (for example, when cc list is empty)
  • are invalid SMTP addresses
To filter out NULL values as well as illegal SMTP addresses, we now need to create a new Transformation Query. We need to use a combination of Filter and RegexFind to filter out bad data. The following regular expression to filter out illegal SMTP addresses:

^[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}$

Step 8: Remove duplicates

After merging from, to, cc, and bcc list, you now will be left with many duplicate entries. Our last step is to remove duplicates entries. For this, we will create a new Transformation Query. The function we need is RemoveDuplicateRows.

The second parameter to RemoveDuplicateRows is the column(s) to look for removing duplicates. In our case, this column is Address - the column that contains email addresses.

You now have a result that contains legal SMTP email addresses without any duplicate records.

Here is the complete data flow:

You can extend the mechanism to process your "Sent Mail" folder as well as other folders. This is left as an exercise for you.

Download the xFusion pack from here.

©2007 Software Labs, Inc.