The ETL Process

I am currently working on a framework for ETL written in Ruby. For a brief description on why I am doing this, please read the linked post. To see all related content, view everything tagged "ETL Framework". Now that we have some ETL Steps that we can build we need some logical way of running them. Favoring composition over inheritence, I went with a container class called the ETLProcess. The basic UML for the ETLProcess class along with its relationship to the ETLStep class is below: [caption id="attachment_120" align="alignnone" width="467" caption="ETLProcess class with relationships to ETLStep classes"]

[/caption] A lot can be said about why I chose this design, but I think the best way to talk about it is to actually show how it works. The following ruby code shows how you would use ETLProcess and ETLStep classes to build out your ETL:

require 'etl_lib'

etl = ETLProcess.new 'My ETL Process'
etl.add_step(MyETLStep1.new)
etl.add_step(MyETLStep2.new)
etl.start

Lets walk through the parts here to get an idea of what is going on. The first line is a simple require to make sure that we have access to the ETLProcess and ETLStep classes. Next, we create a new object called "etl" that is the ETL process. Then we add the steps to the etl object with the add_step method. A lot of behind the scenes stuff happens when this is done which I will go into in another post. Notice that when a new step is added to the process that we actually create a new object on the fly. When the add_step method is called, the ETLStep object is added to an array called steps. Why not just add them directly? Because we need to do a lot with that Step object before it is ready to be run. The start method simply loops through the steps array and executes the start method on the step. I wanted to hide a lot of the background stuff from the actual ETL contained in the ETLStep, thus the two methods in the step, run and start. The start method does some background work and then executes the run method in the step. This keeps the development of new ETL sqeaky clean and focused on the actual ETL rather than on backend tasks like bookmarking and database connection handling. The other interesting thing that the ETLProcess class does is that it tracks when there is an error in the ETLStep and controls whether the step should be rolled back or not. By default, the ETL will run the rollback method in the Step if it fails. To turn this functionality off, you can specify it as an option when you add the step to the etl object. For example:

etl.add_step(MyStep.new, {:rollback_on_fail => false})

Another way the ETLProcess controls the running of ETL Steps is by providing a global rollback method. That way, if you wanted, you could rollback the entire ETL process if something didn't work just right. Here's an example:

require 'etl_lib'

etl = ETLProcess.new 'My ETL Process'
etl.add_step(MyETLStep1.new)
etl.add_step(MyETLStep2.new)
etl.start

success = true
# code to check to see if the ETL ran as expected..
# set success to false if it doesn't look good

etl.rollback unless success

That's a quick introduction to the ETLProcess Class. The next big hurdle is how to create unit tests for the ETL steps. I'll show you how I tackled that problem in the next post.

Permalink | Leave a comment »

The ETL Process

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112