Thursday, January 1, 2009

Designing the mapping/workflow efficiently - part I

I wanted to write some useful design tips for a mapping and workflow to increase the performance. As PowerCenter is a wide ocean and there can't be any exhaustive list of best practices. To address this, I want to add to this list in phases and that is the reason, this blog article is named as "part - I".

Please note that these are not any rules-of-thumb, but would lead you into a sensible decision while developing interfaces in PowerCenter. Nevertheless, the design decisions are to be driven by the architectural, organizational and business requirements.
  1. I would always suggest you to think twice before using an Update Strategy, though it adds me certain level of flexibility in the mapping. If you have a straight-through mapping which takes data from source and directly inserts all the records into the target, you wouldn’t need an update strategy.
  2. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target. Use truncate option in Session properties, if you wish to clean the table before loading. I would avoid a separate pipe-line in the mapping that roans before the load with update-strategy transformation.
  3. You have 3 sources and 3 targets with one-on-one mapping. If the load is independent according to business requirement, I would create 3 different mappings and 3 different session instances and they all run in parallel in my workflow after my “Start” task. I’ve observed that the workflow runtime comes down between 30-60% of serial processing.
  4. PowerCenter is built to work of high volumes of data. So let the server be completely busy. Induce parallelism as far as possible into the mapping/workflow.
  5. Needless to say, if any transformation waits for complete source data to be arrived in before it can proceed further in the mapping, use a sorter transformation to speed up the process. For eg: an Aggregator transformation, a Joiner transformation etc will perform well if sorted data is given to them and that a sorter transformation at the start will make the efficient.
  6. Push as much filtering as possible into the SQL override of the source-qualifier/lookup to make the database handle most of the filtering as databases are experts in doing that stuff than using a filter transformation.
  7. As I always say, the above task is similar to manual push-down optimization.

As written in the start of this article, we will continue to add to the list. Please make your contributions to this and we shall do our best to to make Informatica world generate efficient code for PowerCenter.

10 comments:

Lalit Sharma said...

Good Stuff...

Radhakrishna Sarma said...

Thank you Lalit.

Cheers
Sarma.

Pushkar Kumar Singh said...

Nice and highly informative.

Regarding point 3, I think there is one more way to do.

Instead of creating 3 distinct sessions, we will individually read these 3 sources, in a mapping itself, and then will connect these pipelines using Union transformation to form a load order group, thereby running these pipelines concurrently. After Union transformation, we will use a Router transformation to trifurcate the data into 3 downstream lines, one for each target. Surely, we have to apply some ingenuity here in order to use Union transformation, but we will resultantly save overhead -- time and resources -- in creation of extra 2 DTM processes and their individual activites like verifying connection objects, etc.. However, we do need to consider some tradeoffs like distribution of one DTM buffer memory among 3 pipelines, against previous avaiability of one DTM buffer memory to each pipeline. Then, extra overhead incurred on execution of Union and Router transformation. I hope I m making sense.

Above outline is a hypothesis, and I don't know how much it will be prove to be cost-effective.

Regards,
Pushkar

Radhakrishna Sarma said...

You are right. This is at least better than running 3 sessions in sequence or creating 3 independent pipelines in the mapping running one after the other.

As long as you avoid keeping the server idle to whatever extent possible, it is good.

If you had created the mapping with union-router method, then if you still are not satisfied by the performance, then you can switch to what I had suggested and vice-versa.

Thanks anyway, for reviewing the blog.

Cheers
Sarma.

sean said...

VERY HELPFUL INFORMATION.
THANKS RADHAKRISHNA.

Radhakrishna Sarma said...

Thank you Sean for visiting my blog. Keep looking and I will updat the contents whenever possible.

Cheers
Sarma.

sowmya said...

nice stuff,,keep going whihc helps us a lott

Radhakrishna Sarma said...

You are welcome Sowmya. Keep visiting the blog for more information.

Cheers
Sarma.

Sudheer said...

Good Stuff

how to earn money from online said...

Hi Sharma Nice blog


what is the pushdown optimisition in detail please explaine to me .

Thanks in adv

 
HTML Hit Counters