Thursday, 12 July 2012

SFDC CDC in Informcatica


This is for first timers who are working with SFDC CDC Implementation. Informatica provided detailed implementation steps in the Help file. I assume that you have gone through the Help file atleast once.
Here I am trying to simplify the process and identify the focus areas. 
There are two ways to implement SFDC Change Data Capture in informatica.
1.      Capture Changed Data continuously
2.      Capture Changed Data for a specific Period.
Enabling Capture Changed Data Continuously:
To set a continuous CDC session, follow below steps:
1.      Set CDC Time Limit property on the mapping  tab.  Time period (in seconds)  for which  you want to captures changes from Salesforce .
2.      Set Flush Interval property on mapping tab. Interval (in seconds) at which the PowerCenter Integration Service captures changed Salesforce data.
Setting these two properties will enable PowerCentre to capture the data changes continuously for a time period.
To capture changed data for an infinite period of time, Set CDC Time Limit property to -1. 
When the PowerCenter Integration Service runs a continuous CDC session, it reads all records and process the records with  row type as insert. After the PowerCenter Integration Service reads all source data, the CDC time limit and flush interval begin.
PowerCenter Integration Service completes the following tasks to capture changed data for a continuous CDC session:
1.      Reads all records created since the initial read and passes them to the next transformation as rows flagged for insert.
2.      Reads all records updated since the initial read and passes them to the next transformation as rows flagged for update.
3.      Reads all records deleted since the initial read and passes them to the next transformation as rows flagged for delete.
After the PowerCenter Integration service finishes reading all changed data, the flush interval is reset. The PowerCenter Integration Service stops reading from Salesforce when the CDC time limit ends.
Powercenter Help has detailed description on how a continuous CDC session is processed by the Integration service.
Enable Capture Changed Data for a specific Period:
To enable change data capture for a specific time period, define the start and end time for the time period in the session properties.
Start Time and EndTime must be in the format YYYY-MM-DDTHH:MI:SS.SSSZ.
The PowerCenter Integration Service completes the following steps to capture changed data for a time-period based CDC session:
1.      Reads all records created between the CDC start time and end time, and passes them to the next transformation as rows flagged for insert.
2.      Reads all records updated between the CDC start time and end time, and passes them to the next transformation as rows flagged for update.
3.      Reads all records deleted between the CDC start time and end time, and passes them to the next transformation as rows flagged for delete.
Please go through Rules and Guidelines for Processing a Time-Period Based CDC Session before enabling Time period based CDC session.
Having said this, I have faced some strange problems while trying to implement SFDC CDC session.
Issues and Work around for SFDC CDC session
While running a CDC Time-period enabled session, while reading source records were getting doubled. Integration service was reading exactly double the records as I Inserted. i.e, when I added 10 records to SFDC org, Informatica integration service read 20 records.
This problem occurred due to the “3 step processes” .
  • 1.      Read all created records
  • 2.      Read all updated records
  • 3.      Read all deleted records.

For reading Created Records , Integration service uses  CreatedDate . For eg,  [Select Id, LastName from Contact Where (CreatedDate>= $StartDate AND CreatedDate< $EndDate)]. 
For reading updated records, Integration service uses LastModifiedDate.For eg,  [Select Id, LastName from Contact Where (LastModifiedDate >= $StartDate AND LastModifiedDate < $EndDate)]. 
So, all the newly Created records were getting fetched  in both “Created Records” group an d ”Updated records” group.
Simple and easiest way to avoid this is to remove start and end time from the session properties (i.e disable time-period CDC ) and  add the time period in the fiter criteria.  For eg. (LastModifiedDate >= $StartDate AND LastModifiedDate < $EndDate) .
Having said this, There are always different ways to approach a solution and The solution again depends on the requirement.