Wednesday, 31 March 2010

Re-learning perl

I haven't used perl much in the last 5 or so years.  I prefer 'modern' languages like ruby but some ETL tasks have come up on our project that are way beyond what can nicely be handled with SSIS.  Rather than having to use SQL as a text parsing language, we have decided to use perl.

An example of the type of problem we are facing is this:  We get a file where the first character determines the format of the rest of the file.  To get a complete transaction, you need to process 3 lines.  For example we have (this is for a telco system):

1[TAB]123345[TAB]Acme Co
2[TAB]+61290099009
3[TAB]+61380088008[TAB]OUTBOUND[TAB]0.56[TAB]0.09

Where [TAB] means a tab character.

So we get the account number and customer name on line 1, calling number on line 2, the called number, call type, duration and cost on line 3.  There is a 1-N relationship from 1 to 2 and 2 to 3.

so rather than write the most complicated cursor ever we are using perl.  The thing is i'm a bit rusty.

I'm interested to know:
a)  What are some good books for re-learning perl (as opposed from starting from scratch)
b)  Do you agree with the approach ?  how would you tackle this problem ?

Leigh.

1 comment:

Ivan said...

Hi,

I saw similar post at the CozyRoc's forum. This kind of layout can be easily processed with SSIS source script component. You will say this does require programming, but the same goes for the argument for using Perl as your ETL tool.

It maybe possible to create a tool, which can read ANY type of file. But just think for a moment how difficult to learn such a tool would be?

If you have other questions, feel free to contact us.

p.s.
We offer consulting services, in case you don't have resources internally to build scripts.