Saturday, February 16, 2008

Informatica PowerCenter performance - Lookups

Lookup performance

Lookup is an important and a useful transformation when used effectively.
What is a lookup transformation? It is just not another transformation which fetches you data to look against the source data. It is a transformation when used improperly, makes your flow run for ages.

I now try to explain different scenarios where you can face problems with Lookup and also how to tackle them.


Unwanted columns:

By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table, but be sure to delete the unwanted columns from the lookup as they affect the lookup cache very much. You only need columns that are to be used in lookup condition and the ones that have to get returned from the lookup.

SQL query:

We will start from the database. Find the execution plan of the SQL override and see if you can add some indexes or hints to the query to make it fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are not an SQLer.

Size of the source versus size of lookup:

Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly.

Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup.

Conditional call of lookup:

Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again.

JOIN instead of Lookup:

In the same context as above, if the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins, if both the tables are in the same database and schema.

Increase cache:

If none of the above seems to be working, then the problem is certainly with the cache. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in the cache, it swaps the data from the file to the cache and keeps doing this until it finds the data. This is quite expensive for obvious reasons being an I/O operation. Increase the cache so that the whole data resides in the memory.

What if your data is huge and your whole system cache is less than that? Don't promise PowerCenter the amount of cache that it can't be allotted during the runtime. If you promise 10 MB and during runtime, your system on which flow is running runs out of cache and can only assign 5MB. Then PowerCenter fails the session with an error.

Cachefile file-system:

In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling up may take time and result in latency. So with the help of your system administrator try to look into this aspect as well.

Useful cache utilities:

If the same lookup SQL is being used in someother lookup, then you have to go for shared cache or reuse the lookup. Also, if you have a table that doesn't get data updated or inserted quite often, then use the persistent cache because the consecutive runs of the flow don't have to build the cache and waste time.

After all these are tried out, sit and wait for the session to get completed - there is nothing else you can do about the lookup.

66 comments:

Radhakrishna Sarma said...

Conditional call of lookup:

Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again.

Unknown said...

Sarma, It's really helpful. Thanks a lot.

-Maruthi

Venkata Subbaiah M T said...

Sarma, Can you please explain in brief how to do this?

-Venkat

Radhakrishna Sarma said...

Hi Venkata,

I've explained it in other thread. Please have a look in to this.

http://radhakrishnasarma.blogspot.com/2008/02/about-me.html?showComment=1219815900000#c7915919391761702380

Cheers
Sarma.

Venkata Subbaiah M T said...

Sarma, Thanks for your quick reply.

-Venkat.

Anonymous said...

Hell Sarma,

Thanks for your blog. I am facing a issue with lookup transformation on a file.

Mapping Structure:

Source Flat File --> expr Trans --> lkp file --> Filter --> Lookup -->Expr Trans --> Target File

We have 5 records in source file, 2 records in the first lookup file , 5 records in the seond lookup file.

This mapping is generating following error on execution - SORT 40406 in transformation..an error has occured(There are only 32 megabytes of process rss space available.Either Increase the swap space or decrease the cache size in the transformation.

Can you please help me with this error.

my contact details - maneesahmed@gmail.com

Radhakrishna Sarma said...

Hi Anees,

The error clearly says that, during runtime, PowerCenter expects more memory than expected for the transformation and so, change the cache size for the lookups to be in the limit.
I hope this helps. By the way, with such few records of 2,5 etc, I'm just wondering how you get cache overload error.

Cheers
Sarma

Santosh said...

is there anyway to synchronise the dynamic lookup cache among partitions...If yes please let me know..

Even I posted a query in ittoolbox. Please go through this url:

http://datawarehouse.ittoolbox.com/groups/technical-functional/informatica-l/synchronisingsharing-dynamic-lookup-cache-between-partitions-2759474

Thanks,

Unknown said...

Hi Sarma,

Thanks for support regarding fine tuning mappings & Sessions.
I need a the PPT of Informatica 8.6 features.
My Mail id: ratnajibv@yahoo.co.in

Thanks in advance.

Regards,
Ratnaji

Radhakrishna Sarma said...

Hi Ratnaji,

I've forwarded you the 8.x ppt that I've prepared to the email address you mentioned.

Cheers
Sarma.

Krishna Reddy said...

Hi Sharma,

I am new to informatica(ETL), i was not able to find the useful info through google.

Could you please forward the Informatica PPT's to my email id : krishna.ykr.reddy@gmail.com

Thanks,
Satish

Radhakrishna Sarma said...

Hi Satish,

I've forwarded you the ppt, but I've prepared it asssuming that the person who sees the ppt will have prior experience in Information of versions like 6.x or 7.x. Anyway, I have sent it to your email address.

Cheers
Sarma.

Krishna Reddy said...

Thanks a lot Sarma!!!

Unknown said...

Hi,

I have to tune my mapping. The issue is the source it takes long time to read from the source. The source is a table in informix with
35 million rows. how to tune the informix or there is any other way i can read data so that my mapping takes less time. it is taking 15 hr to complete one run.

source->LKP->EXP->LKP->EXP->FLAT FILE TGT 2UnCLKP

Abhay Sinha said...

Hi Sarma,
Its a realy informative Blob. Could you please send me Informatica PPT to my id ? ak561c@att.com
Thanks in advance.

Regards,
Abhay Sinha

Unknown said...

Hi Sarma,
Your blog is really excellent and full of informations.
Could you please send me the ppt to my mail id : niranjan.n123@gmail.com

Sudha said...

Hi Sarma,
Your blog is really help full. Keep up your work
-Sudha

Dhina said...

Hi Sarma,

I am trying to use dynamic lookup cache for refreshing my employee master.

Its inserting new rows in cache and returning newLookupRow as 1. Its doing no change in cache for existing rows and returning newLookupRow as 0.

But for cases when first_name of my existing employee (Say employee id 100) is getting changed in source data, this case is also returning newLookupRow as 0 (Idealy it should be 2 because my row has changed).

My lookup condition is lkp_employee_id = in_employee_id

Other lookup properties I have set are:
Enable Dynamic lookup is checked.
Insert else Update is checked.

In session level properties I have checked "Insert" and "Update as Update".
In session level properties I have selected "Data driven" for "treat source rows as" property.

Please let me know in case any other information is needed.

It will be great help to provide me with some hints to solve this issue.

Thanks,
Dhina.

sowmya said...

how can i use a orderby clause in loookup transformaion

Sreenath said...

Hi Sarma,

Could you please forward me informatica 8.x PPT if you have it to r.sreenath@gmail.com

naveen said...

Hi.
Could you please send informatica ppt.

my email id is bo.rlny@gmail.com

Thanks in advance
Naveen

Unknown said...

Hi Sarma,

Your blog is very informative, could you please forward me informatica PPT to shawnafridi@yahoo.com

Thanks,
Shawn

Anonymous said...

I am new to informatica and enjoying your blog - very informative, could you please forward me informatica 8 PPT to zookjjb@aol.com

Anonymous said...

I have several large volume tables that are used as look ups multiple times in multiple maps. Currently, I got all these look ups as Persistent Cache and I build these caches first everyday before the mappings that use these caches are ran. But the cache building process is taking really long every day and I am wondering if there is any alternative to this? One possible option that I can think of is use flat-files instead of caches as follows:
1. Load the entire data that I am interested in with proper filters etc, from tables into flat-files, sorting them by the columns of interest [using unix, may be?].
2. Point the look ups into flat-files instead of caches.

I am not positive as how much of time I can save by implementing this and how effective this would be as against persistent caches.

Can you please provide some details on these? -- metalmt@yahoo.com

Rajeev said...

Hello Radhakrishna,

This is Rajeev and new to Informatica. Could you please share Informatica 7x/8x PPT (Basics to Advance). It would be of a great help. Now, a days i am learning Informatica 7X series along with i plan to learn Cognos/BO.

My email id: raj.bits20@gmail.com

Cheers,
Rajeev

Anup said...

Hi Sarm,

Lookup information was quite useful, I am also new to this ETL tool, Can you please forward me the PPT for all the Transformation and some scenoria to work on the all the Transformation.

My emailid : anupckd872@gmail.com

Radhakrishna Sarma said...

Hi Anup,

The transformation guide from Informatica PowerCenter help pages will tell you all the possible uses of each transformation. If you have any specific question, please post it here.

Cheers
Sarma.

Unknown said...

Hi Sarma,

Your blog is very informative,i was unable to get the consolidated info about informatica 8.x new features and architecture.... can u please forward me informatica PPT to laddagiri.archana@gmail.com

Thanks,
Archana

jim said...

Hi Sharma, Thanks for sharing such wonderful information in informatica performance. could you please forward the PPTs to me. my e-mail is: jimmymy@gmail.com.

thnanks
jimmy.

Shalini said...

Thank you for the information. could you please send the PPT of Informatica 8.6 features to my Email ID: amalshalini@yahoo.com

Shalini said...

Also the performance related PPTs...

Ranjul Gupta said...
This comment has been removed by the author.
Ranjul Gupta said...

Hello Sarma,

Your blog gives us a great insight about the Lookup & various ways to use it.
Can you please send the PPT to my email id is ranjul.gupta@gmail.com

Unknown said...

Can you send me those PPT's to my email id please..!
Kumarvijay.walikar@gmail.com

Anonymous said...

Hi Sarma,

I have a question regarding the lookup cache file size for a process. I need to find out total size of all the lookup cache files(.idx & .dat) for each lookup transformation. How can I find it? and will there be any such information in session log, if yes, how to locate it.

can you pls. reply me on my email.
Thanks

Kalycs
(kalycs.dwh@gmail.com)

Kal said...

Hi Sarma,

I have a source(3.3 million) which has 30 cols and in denormalized format.
I have no problem in normalizing the above source, however the requirement is out of 30 columns there are some columns which are related for example: col "LAN" and col "LANGUAGE". so col "LANGUAGE" will be used as description for col "LAN" and there are few of them.

I got the logic and I successfully implemented the requirement in a mapping using "DYNAMIC SQL" in "STORED PROCEDURE TRANSFORMATION" now my new issue is related to performance, as we all know that the performance will be reduced with "STORED PROCEDURE TRANSFORMATION" as I am calling this transformation for around 50 million records.


I am trying couple of things with LOOKUP transformation
1. Trying to select a dynamic column, which is the same as the input value.

Eg: SELECT ACCT_DESC_in FROM TABLE_A
WHERE PK_COL= PK_COL_IN.

Here ACCT_DESC_in is not a column in TABLE_A, however, the value in the ACCT_DESC_in is the column name of the TABLE_A
I tried with DYNAMIC SQL in the STORED PROCEDURE TRANSFORMATION and it is working fine and want to try with LOOKUP TRANSFORATION.

If the above option is not possible, then

2. I have to use the source TABLE_A for lookup transformation, and I have to include 17 of the same lookups(as i cannot use the dynamic column selection).(source table has 3.3 million records and 30 columns) is there any way we can build the cache for the TABLE_A one time and can be used between 17 lookups in the same mapping?? which will improve the performance in a big way.

can you please help me out.

Anonymous said...

Hi,

Can u pls frwd me the ppt at msinharay@gmail.com...Also wanted to know how unconnected lookup transformation is faster/efficient than connected lookup

Unknown said...
This comment has been removed by the author.
Unknown said...

Hi Sarma,
I am using an unconnected lookup in mapping having approx 9 million rows and have approx 30k rows in the source, i think its better to go for uncahed loopkup in this case what do you suggest?

Anonymous said...

Sarma, Could you please send me the Informatica 8 architecture PPT if you have any.
MY email address is vinay4luvv@yahoo.com

Anonymous said...

Hi Radhakrishna,

Can you please forward your 8.X informatica ppt, which you have prepared to indypoli@gmail.com

Thanks for your help.

Thanks,
Praveen

Anonymous said...

hi sir
i need to know without using sorter t\f how can we sort data.
2.how the confirmed dimension is used in a fact table?
3.

Meena said...

Hi Sarma,

I am tuning one of the mapping in the informatica 6.The developers had made used of the informatica standard mapplets. Inside the mapplets, there are about 40 to 50 lookups. How could i improve the performance in this scenario

Thanks,
Meena

Meena said...

Hi Sarma,

This is the exact problem of mine.

I am using Informatica Powercenter 6.1 and the mappings are making use of the standard mapplets. In the standard mapplets, there are about 45 unconnected lookups in order to retrieve the values. My requirement is to tune the performance of this mapping. How to achieve the better performance by retaining the same design or to better address the problem wrt lookups

Many Thanks,
Meena

Learning Guy! said...

Hi Radha,
I have always this question in mind but never got concrete answers.Suppose I am in the middle of a mapping, where I need to do either a join/lookup on some table. Can you please tell the different judgement criteria to decide whether should Opt for lookup or joiner? Many thanks.

Thanks!
Ashish

Debojyoti said...

Hi Sarma

I was following your excellent blog..If possible can you mail me your informatica PPT @ debo_fti@yahoo.com
I appreciate your help in advance

Debo

Unknown said...

Hi Sarma

I was following your excellent blog..If possible can you mail me your informatica PPT @ narenrawat203@rediffmail.com
I appreciate your help in advance

Narendra

Unknown said...

Hi Sarma

I was following your excellent blog..If possible can you mail me your informatica PPT @ debo_fti@yahoo.com
I appreciate your help in advance

narendra

Anonymous said...

Hi Sarma

I was following your excellent blog..If possible can you mail me your informatica PPT @ narenrawat203@rediffmail.com
I appreciate your help in advance

Narendra

Anonymous said...

Hi Radhakrishna...can you help me where to check a workflow is scheduled or not...I referred VIEW REP_WORKFLOWS, but it has only scheduler start_time...

Unknown said...

Hi Radhakrishna,

Please can you forward PPT for performance tuning to my mail id rajkumar.bachu@gmail.com

Thanks in Advance
Rajkumar.

Unknown said...

Hi Sharma,

Ur blog is wonderful nd helpful even fresher can find it easy to grasp,much appreciated.Can you please send me the doc,i hv seen some readers have requested u to send the doc ,me too wants to go through it. My id is rajeshkumar.pandey@rbs.com

Cheers
Rajesh

Unknown said...

Hi Radhakrishna,

Could you please forward informatica 8.x ppt to me.
my id is sujatha_enrichit@yahoo.com

Thanks,
Sujatha

Mahesh said...

Hi Radhakrishna,

I have been following your excellent blog. If possible can you mail me your informatica PPT to sunjeye@gmail.com.

I appreciate your help in advance


Sunjeye

Anonymous said...

Can you send me the PPT please at anuguru23@hotmail.com,as early as possible.

Informatica interview said...

Thanks for sharing valuable informations,

Informatica help

Nagarjuna said...

Hi Radha Krishna,

How are you. I think you remember me when i am in south africa i asked a doubt regarding XML Source system. Ok..

Please mail Informatica 8 Architecture at pv.nagarjuna@gmail.com

Srikanth said...

Dear Radhakrishna,
I am Srikanth and I have been a regular visitor to your site.
I have few queries which I present it below. Request you to please send your comments to coolsreek@gmail.com. Thanks in advance
1) Is using Mapplets in a position to process loads through informatica much faster? If so, How?
2) How to use Informatica 9.0 to process on a daily basis 50 Million per day in some case within the batch window cycle of say around 6 hrs.What are the components need to be used and anyother stretegy there?

CP said...
This comment has been removed by the author.
CP said...

Hi,

I wanted to ask, ' When we say that First record comes from the source and hits the lkp, the Cache is created and henceforth the rest of the process',

where does the first row from source go i.e. how does INFA know the first record has arrived and the cache has to be created now. Where does this first row gets checked or with what does it get checked?? Hope i am making sense here, please let me know if not!!


Regards,
Cp

maheboob said...

Hi sarma,
ur blog is very informative. Can u add real time scenario in ur blog.

Can u mail me informatica ppt & architecture to my mail id

reachmaheboob@gmail.com

Sreeya said...

Hi Sarma,

Could you please forward the Informatica PPT's to my email id sreeya7@yahoo.com

-
Sreeya

Anonymous said...

Hey Sarma,

Can you please forward the Informatica PPT's to my email id: boussak@yahoo.fr
Can you explain me the difference between LookUp Types ? and when we use one and not the other ?
I really appreciate your help, and your blog.

Thanks,

Bouba

Anonymous said...

Iam really satisfy by your information.
It's well-written, to the point, and relative to what I do.
I like it very much for giving information on
Excellent Informatica Online Training
.

Anonymous said...

Hi Sarma,
I'm getting an error in informatica when I'm using a XML Target. "cache manager was unable to allocate a new block" Can you please help me out with this. My email address is ram.thunda@gmail.com

Thanks

Unknown said...

This was really a good post and also very informative.

http://www.tekclasses.com/

 
HTML Hit Counters