Friday, 11 November 2011

Implementing an Agile ETL process in SQL Server

I was fortunate enough recently to attend the DW 2.0 Conference in Sydney.  One of the presenters, Therese Ahlstam (from the Genesee Academy) advocated a technique for building data warehouse's she referred to as Hyper-agility.  The idea was to avoid the need to constant schema changes by storing Staging & ODS tables as Key/Value pairs.  To me this seems to fit the Serialisation/de-serialisation design pattern so common in many software frameworks these days.

This idea seemed quite radical to me, but I've been thinking on this ever since and I decided to have a look at how feisible this really is.  As it turns out, while the syntax is a bit tricky, this is not only possible but relatively easy to do in MS SQL Server thanks to the PIVOT/UNPIVOT commands.

Here is an example:

(pkID int, 
Name VARCHAR(50), 
Address VARCHAR(250), 
Phone VARCHAR(30) 

INSERT INTO SourceTable VALUES (1,'Bob Smith', '1 Miller St Sydney', '555-1234');
INSERT INTO SourceTable VALUES (2, 'John wayne', '10 california Dr Hollywood', '+1-555-4567');
INSERT INTO SourceTable VALUES (3,' Billy Bowden', '4 Wainui road Wellington', '+64-21-3456789');

--Unpivot the table.
SELECT pkID, KeyCol, ValCol INTO unpivotSourceTable
   (SELECT pkID, CAST(Name as VARCHAR(250)) as Name,Address,cast(Phone as VARCHAR(250)) as Phone from SourceTable) ST
   (ValCol FOR KeyCol IN       (Name,Address,Phone)

select * from  unpivotSourceTable

pkID KeyCol ValCol
1 Address 1 Miller St Sydney
1 Name Bob Smith
1 Phone 555-1234
2 Address 10 california Dr Hollywood
2 Name John wayne
2 Phone +1-555-4567
3 Address 4 Wainui road Wellington
3 Name Billy Bowden
3 Phone +64-21-3456789

Note we need to convert all the columns to the same data type to get this to work (which makes sense) .  This is how we would store the data in our Operational data store and/or staging database.

But how do we get back to a regular row when we need it ?  We pivot it !

SELECT pkID, [Name],[Address],[Phone]
(select pkID, KeyCol, ValCol FROM unpivotSourceTable) uST
(MIN(ValCol) for KeyCol in ([Name],[Address],[Phone]) )as pST

pkID Name Address Phone
1 Bob Smith 1 Miller St Sydney 555-1234
2 John wayne 10 california Dr Hollywood +1-555-4567
3 Billy Bowden 4 Wainui road Wellington +64-21-3456789

Do you think there is value in this approach ?  I've yet to put it into practice, but if you have, let me know.  I'd be a bit concerned about the performance for very large tables, but I can see how this would speed up development.

Thursday, 13 October 2011

SAP (Business Objects) Data Services

Hi All,

I am about to embark on a project with the Data services tool set.  It seems like a great tool, but there are two things I am struggling to find at present:

  • Any published books
  • Good websites, blogs, forums etc.  

If you have any good resources for me please leave a comment here for me.

Thanks !

Sunday, 27 February 2011

Saturday, 26 February 2011

Great quote !

The beginning of wisdom is a definition of terms. Socrates

Saturday, 19 February 2011

SQL Server 2011 download page

FYI - I've found CTP releases not worth the trouble in the past, but if you are keen to download SQL Server 2011, here is the link.

Friday, 18 February 2011

MDM in SQL Server 2011

Similar to my earlier post on SSIS, here is the page for Master data management.

Whats new in SSIS 2011

I've been looking for some information about what is coming in SQL Server 2011 for SSIS.  There isn't a lot out there, but what Microsoft has released is available here.  If you know of a better source of info, please let me know !