The ETL Step

I am currently working on a framework for ETL written in Ruby. For a brief description on why I am doing this, please read the linked post. To see all related content, view everything tagged "ETL Framework". The foundation of my ETL framework is the ETL step. An ETL step is where the developer actually programs a specific task for the ETL to perform. An example would be to have the ETL download files from an FTP site or to load one or more tables that are related. The key here is that the task be at a level that you want to track progress. Some steps may even be re-usable. For example, you may want to have a step that loads files from a specific location into a processing queue for later steps to parse and load. You may want just one step with which you can pass a parameter that indicates where the source files live and then just execute that task several times. Since Ruby is, by its nature, object oriented, the ETL should be designed in an object-oriented manner. Here is a simple class diagram of the ETLStep class. [caption id="attachment_103" align="alignnone" width="265" caption="ETLStep Class Diagram (inherited properties omitted for subclasses)"]

[/caption] The superclass, or parent class, is where all the functionality of the class lives. Here we can control bookmarking and meta information about the ETL step. Notice also that ETLStep is abstract. As in Rails and other Ruby based frameworks, in order to actually write ETL in this framework you actually create a subclass of ETLStep. The subclasses require only two methods, run and rollback. The run method is where you develop your ETL process and the rollback is where you specify instructions on how to rollback that step. The other methods and attributes in the parent class are for additional functionality that I will talk about in another post. This design gives us some very basic, but powerful functionality when building ETL. Each step has responsibility for all the ETL that it performs. If it should fail in any way, the rollback method should clean up after the task leaving the data environment just how it was before the task ran. Also, each ETL step inherits the ability to perform logging, access database connections, and access global information from the entire ETL process. The ETL developer doesn't have to build this, it just comes through inheritance. So now all that is needed is a way to execute the run method. This is done by the ETLProcess object. In the next post, I'll talk about the ETLProcess class and how to actually create and run the ETL.

Permalink | Leave a comment »

The ETL Step

Trending Articles

Download – The Last Ship 1ª Temporada RMVB Dublado – MEGA

Black Angus Grilled Artichokes

Transformation of Sentence for HSC Students

Storage DRS Fault won't clear

R v Fanti

Nalgonda District Police Office Mobile Numbers List in Telangana State

Pass through scenario in SAP PI with no mapping for File to IDoc and Idoc to...

Group Policy Update Monitor False alerts

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

*** Warning: RDBMS CRASHED OR SESSIONS RESET. RECOVERY IN PROGRESS. - forum...

Uni bio28u biometric bundy

99 God Status for Whatsapp, Facebook

Re: Insect behavior generally appears to be explicable in terms of unconsc

Windows Update / Microsoft Update の接続先 URL について

Bureau of Internal Revenue: Regional Offices (Directory)

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

NOTES ZA GENERAL CHEMISTRY ZA NGAIZA

Top 10 FBB OnlyFans & Muscle Girl OnlyFans in 2023

QUIZ: Are You Smart Enough To Be A US Marine?

The 10 Tennessee Cities With The Largest Black Population For 2021