Build a Clean Address List from Microsoft Outlook Inbox
Author: Pradeep Tapadiya, Software Labs Inc.
For marketing campaigns, it is sometimes desired to build the email list of all
the people who you have interacted with. Your Outlook Address Book most likely contains
just a subset of such an email list. In this article, I will show you how you can
use xFusion Studio to build a clean SMTP ready email address list.
Step 1: Extract Email Addresses from Outlook
From within Outlook, select menu item File->Import and Export.
Run the wizard. Export to a file→Comma
Separated Values (Windows)→Inbox→C:\Temp\MyInbox.csv.
Click on "Map Fields" button and remove all the fields except From: (Name), From:
(Address), To: (Name), To: (Address), CC: (Name), CC: (Address), BCC: (Name) and
BCC: (Address).

Step 2: Create connection to C:\Temp\MyInbox.csv
From xFusion Studio, create a new connection of type Text Data Connector.
Load C:\Temp\MyInbox.csv as a comma separated file. Rename this connection
to My Inbox.

Step 3: Extract data from InBox connection
Create a new Standard Query object. Select all the columns. When previewed, you
should see a table containing eight columns. Rename the query to Step 3: Raw Data.

Step 4: Normalize the table
We need to rearrange eight columns into just two columns - Names and
Addresses. Create a transformation query to break these eight columns
into four tables of two columns each and then run a Union type transformation
to append four tables into one. Rename this query to Step 4: Normalized Table.

Step 5: Split multiple recipients into individuals
It is common to see an email being copied to multiple recipients. As you can see
in the previous figure, these recipients are separated by semicolons in the Outlook
generated file. We need to split multiple recipients into individuals. First, let's
split the Names field. Create a new Transformation Query object and
use the function Split to split the input, as shown in the following figure:

Here, we are specifying that the field must be split by semicolons and that we must
create at most ten columns. If you feel your CC and BCC list may contain more then
ten addresses, you can adjust the value in the third parameter.
Rename this query to 5a: Names.
Likewise, create another Transformation Query object to split the addresses column.
Rename this query to Step 5b: Addresses.
Step 6: Merge names and addresses
The previous step resulted in two tables of ten columns each; one containing the
recipient names and the other containing the recipient addresses. We now need to
create a new table that merges all the twenty columns into two columns containing
the name and the corresponding email address.
To accomplish this, we create a new Transformation Query that uses a combination
of Select and Union functions. Rename this query to
Step 6: Individuals.

At the end of this step, we now have a table containing the names and email addresses
of each recipient.
Step 7: Filter data with legal SMTP addresses
At this point, your table may contain some addresses that:
- are NULL (for example, when cc list is empty)
- are invalid SMTP addresses
To filter out NULL values as well as illegal SMTP addresses, we now need to create
a new Transformation Query. We need to use a combination of Filter
and RegexFind to filter out bad data. The following regular expression
to filter out illegal SMTP addresses:
^[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}$

Step 8: Remove duplicates
After merging from, to, cc, and bcc list, you now will be left with many duplicate
entries. Our last step is to remove duplicates entries. For this, we will create
a new Transformation Query. The function we need is RemoveDuplicateRows.

The second parameter to RemoveDuplicateRows is the column(s) to look for removing
duplicates. In our case, this column is Address - the column that contains email
addresses.
You now have a result that contains legal SMTP email addresses without any duplicate
records.
Here is the complete data flow:

You can extend the mechanism to process your "Sent Mail" folder as well as other
folders. This is left as an exercise for you.
Download the xFusion pack from here.
|