Monday, May 7, 2012

Data Integration with SSIS and SOA

by Saleh Najar

One of the most important streamlining efforts that medium and large organizations can undertake is integration, integration, integration. Some organizations want to create an Enterprise Integration Layer. Others want to create Master Data records and yet other bigger ones want to create data warehouses and a business intelligence layer on top... whatever it is I've found that SOA (Service Oriented Architecture) is a great architecture when it comes to data integration on heterogeneous platforms, especially when used with SSIS (SQL Server Integration Services).

ETL (Extraction, Transformation and Loading) using SOA establishes clear data contracts that makes integration very robust against change. SOA also makes ETL agnostic to the underlying networks and specific operating systems.

I think that one of the best tools for SOA based data integration is Microsoft's SSIS. I've found that using SSIS with Services is an under-used approach to implementing ETL. I will briefly highlight the core steps necessary to implement ETL with Services using SSIS.

First, everything starts with authentication. In SSIS, you can easily authenticate with the Service by using a Script task. In an SSIS Script Task, you can easily make a call to a Service and get an authentication ticket for example to be used in subsequent calls to the Service.

Second, you need to make all the calls to the Service in one pass and stage them. Again, the calls to the service is done through the SSIS Script Task. The caveat here is to generate the proxy to the service into a separate file then include that file in the the SSIS Script Task project. Then, using the proxy classes, write the code to make the calls to the service, serializing them to Xml then writing them to a staging folder.

Third, using the SSIS Xml task, and using XSLT, the secret ingredient here, transform the staged Xml files to pipe delimited files, writing them again to the staging folder.

Finally, simply using the SSIS Bulk Insert task, load the data from the pipe delimited files from the staging folder into the destination database. Of course, do not forget to make all this configurable (an easy task in SSIS), add SSIS error handling and logging which are very powerful in SSIS. That's ETL with SSIS and Services in a nutshell.

I would like to mention here that the above method I described is optimal for cases when dealing with less than one thousand records. If more than one thousand records are involved, I recommend implementing paging with Services and SSIS. That can be an advanced topic of another post.