Tuesday, December 30, 2008

Why do you restrict developer?

I've been seeing some strange requests from developers these days. Here are few examples.

"I have a relational source and I want to take data for a column whose value is like '%XYZ%'. I want to accomplish this without SQL override on Source Qualifier".

"I have to create a copy of my target file in the archive location. Shall I create two target object instances in the mapping to acheive this? I can't use post-session command task".

It is certainly a surpirse to know as to why they want to avoid the obvious choice. All these developers have complained about development rules and guidelines imposing these restrictions at their organization level.

I have a question to the organizations.

"If you don't want to utilize the best options provided by PowerCenter to make your development/maintenance life easier, why do you opt to use PowerCenter at all?"

With the newer versions of PowerCenter, there is lot of flexibility provided to the developer to make his life lot easier than what it used to be during PowerMart times.

Imagine you code a mapping for the first developer question about '%XYZ%' and used a filter instead of overriding the SQL. If your soure table has 100 billion records and only 10 out of them satisfy '%XYZ%' condition. This will be the worst nightmare in support perspective. These are the mappings that eat into the organisations support budget and maintenance budget.

Why restrict the developer and suffer project financial and quality hammering?

There is another big problem with this kin of corporate behaviour. The inherent innovation capabilities of the developer are supressed and there is every possibility that the developer fails to use the best option when he/she moves to other organizations which give full freedom to the developers.

If this is the situation of the experienced developers, imagine some one who begin ETL PowerCenter development in these kind of restrictive-organiations. These poor developers would not even know that a certain option can be used in PowerCenter.

I hope this articles warns the organizations of a possible danger!

Friday, December 26, 2008

Avoid sequence generator

I would always advice you to avoid sequence generator while populating an ID column in the relational target table. I suggest you rather create a sequence on the target database and enable the trigger on that table to fetch the value from the database sequence.

There are many advantages by doing this way.

  • Less number of PowerCenter objects in the object which reduce development time and also less maintenance.
  • Today you may be writing the records into that target table using PowerCenter flow. What if you get another application tomorrow that also writes records into that table? Your ID column values go bizarre in this scenario.
  • During migration of objects from Dev to QA and also to Production further, there is no extra over-head of considering the persistent values of the sequence generator from the repository database.

In all of the above cases, a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself.

Performance of workflow - Most feared, but the easiest

With the newer and newer versions of PowerCenter, there is added flexibility for the developer to build better-performing workflows. Though performance tuning has been the most feared part of development, it is the easiest, if the intricacies are known. We shall discuss one major problems for performance - SQL query tuning.

SQL queries/actions occur in PowerCenter in one of the below ways.

  • Relational Source Qualifier
  • Lookup SQL Override
  • Stored Procedures
  • Relational Target

Accessing an execution plan for the query is the best way to tune it. Look for any table scans and remember "Full Table Scans are not evil", "Indexes are not always fast", and “Indexes can be slow too". Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Fetching 10 records out of 15 using index is faster or using full table scan is easier.

Many times the relational target indexes create problem for the performance of the actual loading of records into the relational target. It may surprise you (some times?), the way PowerCenter has reduced the time of loading the target data if you disable the indexes and then enable them in post-SQL again.

Please note that the flow has to also consider the integrity constraints when you have disabled the indexes before loading.

FTP Connection object - Gain platform independence

If you have any files to be read as source from Windows server when your PowerCenter server is hosted on UNIX/LINUX, then makes use of FTP users on the Windows server and use File Reader with FTP Connection object. This connection object can be added as any other connection string. This gives the flexibility of platform independence. This will further reduce the over-head of having SAMBA mounts on to the Informatica UNIX boxes.

Monday, December 22, 2008

"This" and "That" of Pre/Post Session command tasks

It is a very good practice to email the success or failure status of a task, once it is done. In the same way, when a business requirement drives, make use of the Post Session Success and Failure email for proper communication. The built-in feature offers more flexibility with Session Logs as attachments and also provides other run-time data like Workflow run-instance ID

Any archiving activities around the source and target flat files can be easily managed with-in the session using the post-session and pre-session commands. You can even build some resource-saving as well by issuing a gzip on the target file, to save space.
HTML Hit Counters