Lookup performance
Lookup is an important and a useful transformation when used effectively.
What is a lookup transformation? It is just not another transformation which fetches you data to look against the source data. It is a transformation when used improperly, makes your flow run for ages.
I now try to explain different scenarios where you can face problems with Lookup and also how to tackle them.
Unwanted columns:
By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table, but be sure to delete the unwanted columns from the lookup as they affect the lookup cache very much. You only need columns that are to be used in lookup condition and the ones that have to get returned from the lookup.
SQL query:
We will start from the database. Find the execution plan of the SQL override and see if you can add some indexes or hints to the query to make it fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are not an SQLer.
Size of the source versus size of lookup:
Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly.
Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup.
Conditional call of lookup:
Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again.
JOIN instead of Lookup:
In the same context as above, if the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins, if both the tables are in the same database and schema.
Increase cache:
If none of the above seems to be working, then the problem is certainly with the cache. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in the cache, it swaps the data from the file to the cache and keeps doing this until it finds the data. This is quite expensive for obvious reasons being an I/O operation. Increase the cache so that the whole data resides in the memory.
What if your data is huge and your whole system cache is less than that? Don't promise PowerCenter the amount of cache that it can't be allotted during the runtime. If you promise 10 MB and during runtime, your system on which flow is running runs out of cache and can only assign 5MB. Then PowerCenter fails the session with an error.
Cachefile file-system:
In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling up may take time and result in latency. So with the help of your system administrator try to look into this aspect as well.
Useful cache utilities:
If the same lookup SQL is being used in someother lookup, then you have to go for shared cache or reuse the lookup. Also, if you have a table that doesn't get data updated or inserted quite often, then use the persistent cache because the consecutive runs of the flow don't have to build the cache and waste time.
After all these are tried out, sit and wait for the session to get completed - there is nothing else you can do about the lookup.
Saturday, February 16, 2008
Subscribe to:
Post Comments (Atom)
66 comments:
Conditional call of lookup:
Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again.
Sarma, It's really helpful. Thanks a lot.
-Maruthi
Sarma, Can you please explain in brief how to do this?
-Venkat
Hi Venkata,
I've explained it in other thread. Please have a look in to this.
http://radhakrishnasarma.blogspot.com/2008/02/about-me.html?showComment=1219815900000#c7915919391761702380
Cheers
Sarma.
Sarma, Thanks for your quick reply.
-Venkat.
Hell Sarma,
Thanks for your blog. I am facing a issue with lookup transformation on a file.
Mapping Structure:
Source Flat File --> expr Trans --> lkp file --> Filter --> Lookup -->Expr Trans --> Target File
We have 5 records in source file, 2 records in the first lookup file , 5 records in the seond lookup file.
This mapping is generating following error on execution - SORT 40406 in transformation..an error has occured(There are only 32 megabytes of process rss space available.Either Increase the swap space or decrease the cache size in the transformation.
Can you please help me with this error.
my contact details - maneesahmed@gmail.com
Hi Anees,
The error clearly says that, during runtime, PowerCenter expects more memory than expected for the transformation and so, change the cache size for the lookups to be in the limit.
I hope this helps. By the way, with such few records of 2,5 etc, I'm just wondering how you get cache overload error.
Cheers
Sarma
is there anyway to synchronise the dynamic lookup cache among partitions...If yes please let me know..
Even I posted a query in ittoolbox. Please go through this url:
http://datawarehouse.ittoolbox.com/groups/technical-functional/informatica-l/synchronisingsharing-dynamic-lookup-cache-between-partitions-2759474
Thanks,
Hi Sarma,
Thanks for support regarding fine tuning mappings & Sessions.
I need a the PPT of Informatica 8.6 features.
My Mail id: ratnajibv@yahoo.co.in
Thanks in advance.
Regards,
Ratnaji
Hi Ratnaji,
I've forwarded you the 8.x ppt that I've prepared to the email address you mentioned.
Cheers
Sarma.
Hi Sharma,
I am new to informatica(ETL), i was not able to find the useful info through google.
Could you please forward the Informatica PPT's to my email id : krishna.ykr.reddy@gmail.com
Thanks,
Satish
Hi Satish,
I've forwarded you the ppt, but I've prepared it asssuming that the person who sees the ppt will have prior experience in Information of versions like 6.x or 7.x. Anyway, I have sent it to your email address.
Cheers
Sarma.
Thanks a lot Sarma!!!
Hi,
I have to tune my mapping. The issue is the source it takes long time to read from the source. The source is a table in informix with
35 million rows. how to tune the informix or there is any other way i can read data so that my mapping takes less time. it is taking 15 hr to complete one run.
source->LKP->EXP->LKP->EXP->FLAT FILE TGT 2UnCLKP
Hi Sarma,
Its a realy informative Blob. Could you please send me Informatica PPT to my id ? ak561c@att.com
Thanks in advance.
Regards,
Abhay Sinha
Hi Sarma,
Your blog is really excellent and full of informations.
Could you please send me the ppt to my mail id : niranjan.n123@gmail.com
Hi Sarma,
Your blog is really help full. Keep up your work
-Sudha
Hi Sarma,
I am trying to use dynamic lookup cache for refreshing my employee master.
Its inserting new rows in cache and returning newLookupRow as 1. Its doing no change in cache for existing rows and returning newLookupRow as 0.
But for cases when first_name of my existing employee (Say employee id 100) is getting changed in source data, this case is also returning newLookupRow as 0 (Idealy it should be 2 because my row has changed).
My lookup condition is lkp_employee_id = in_employee_id
Other lookup properties I have set are:
Enable Dynamic lookup is checked.
Insert else Update is checked.
In session level properties I have checked "Insert" and "Update as Update".
In session level properties I have selected "Data driven" for "treat source rows as" property.
Please let me know in case any other information is needed.
It will be great help to provide me with some hints to solve this issue.
Thanks,
Dhina.
how can i use a orderby clause in loookup transformaion
Hi Sarma,
Could you please forward me informatica 8.x PPT if you have it to r.sreenath@gmail.com
Hi.
Could you please send informatica ppt.
my email id is bo.rlny@gmail.com
Thanks in advance
Naveen
Hi Sarma,
Your blog is very informative, could you please forward me informatica PPT to shawnafridi@yahoo.com
Thanks,
Shawn
I am new to informatica and enjoying your blog - very informative, could you please forward me informatica 8 PPT to zookjjb@aol.com
I have several large volume tables that are used as look ups multiple times in multiple maps. Currently, I got all these look ups as Persistent Cache and I build these caches first everyday before the mappings that use these caches are ran. But the cache building process is taking really long every day and I am wondering if there is any alternative to this? One possible option that I can think of is use flat-files instead of caches as follows:
1. Load the entire data that I am interested in with proper filters etc, from tables into flat-files, sorting them by the columns of interest [using unix, may be?].
2. Point the look ups into flat-files instead of caches.
I am not positive as how much of time I can save by implementing this and how effective this would be as against persistent caches.
Can you please provide some details on these? -- metalmt@yahoo.com
Hello Radhakrishna,
This is Rajeev and new to Informatica. Could you please share Informatica 7x/8x PPT (Basics to Advance). It would be of a great help. Now, a days i am learning Informatica 7X series along with i plan to learn Cognos/BO.
My email id: raj.bits20@gmail.com
Cheers,
Rajeev
Hi Sarm,
Lookup information was quite useful, I am also new to this ETL tool, Can you please forward me the PPT for all the Transformation and some scenoria to work on the all the Transformation.
My emailid : anupckd872@gmail.com
Hi Anup,
The transformation guide from Informatica PowerCenter help pages will tell you all the possible uses of each transformation. If you have any specific question, please post it here.
Cheers
Sarma.
Hi Sarma,
Your blog is very informative,i was unable to get the consolidated info about informatica 8.x new features and architecture.... can u please forward me informatica PPT to laddagiri.archana@gmail.com
Thanks,
Archana
Hi Sharma, Thanks for sharing such wonderful information in informatica performance. could you please forward the PPTs to me. my e-mail is: jimmymy@gmail.com.
thnanks
jimmy.
Thank you for the information. could you please send the PPT of Informatica 8.6 features to my Email ID: amalshalini@yahoo.com
Also the performance related PPTs...
Hello Sarma,
Your blog gives us a great insight about the Lookup & various ways to use it.
Can you please send the PPT to my email id is ranjul.gupta@gmail.com
Can you send me those PPT's to my email id please..!
Kumarvijay.walikar@gmail.com
Hi Sarma,
I have a question regarding the lookup cache file size for a process. I need to find out total size of all the lookup cache files(.idx & .dat) for each lookup transformation. How can I find it? and will there be any such information in session log, if yes, how to locate it.
can you pls. reply me on my email.
Thanks
Kalycs
(kalycs.dwh@gmail.com)
Hi Sarma,
I have a source(3.3 million) which has 30 cols and in denormalized format.
I have no problem in normalizing the above source, however the requirement is out of 30 columns there are some columns which are related for example: col "LAN" and col "LANGUAGE". so col "LANGUAGE" will be used as description for col "LAN" and there are few of them.
I got the logic and I successfully implemented the requirement in a mapping using "DYNAMIC SQL" in "STORED PROCEDURE TRANSFORMATION" now my new issue is related to performance, as we all know that the performance will be reduced with "STORED PROCEDURE TRANSFORMATION" as I am calling this transformation for around 50 million records.
I am trying couple of things with LOOKUP transformation
1. Trying to select a dynamic column, which is the same as the input value.
Eg: SELECT ACCT_DESC_in FROM TABLE_A
WHERE PK_COL= PK_COL_IN.
Here ACCT_DESC_in is not a column in TABLE_A, however, the value in the ACCT_DESC_in is the column name of the TABLE_A
I tried with DYNAMIC SQL in the STORED PROCEDURE TRANSFORMATION and it is working fine and want to try with LOOKUP TRANSFORATION.
If the above option is not possible, then
2. I have to use the source TABLE_A for lookup transformation, and I have to include 17 of the same lookups(as i cannot use the dynamic column selection).(source table has 3.3 million records and 30 columns) is there any way we can build the cache for the TABLE_A one time and can be used between 17 lookups in the same mapping?? which will improve the performance in a big way.
can you please help me out.
Hi,
Can u pls frwd me the ppt at msinharay@gmail.com...Also wanted to know how unconnected lookup transformation is faster/efficient than connected lookup
Hi Sarma,
I am using an unconnected lookup in mapping having approx 9 million rows and have approx 30k rows in the source, i think its better to go for uncahed loopkup in this case what do you suggest?
Sarma, Could you please send me the Informatica 8 architecture PPT if you have any.
MY email address is vinay4luvv@yahoo.com
Hi Radhakrishna,
Can you please forward your 8.X informatica ppt, which you have prepared to indypoli@gmail.com
Thanks for your help.
Thanks,
Praveen
hi sir
i need to know without using sorter t\f how can we sort data.
2.how the confirmed dimension is used in a fact table?
3.
Hi Sarma,
I am tuning one of the mapping in the informatica 6.The developers had made used of the informatica standard mapplets. Inside the mapplets, there are about 40 to 50 lookups. How could i improve the performance in this scenario
Thanks,
Meena
Hi Sarma,
This is the exact problem of mine.
I am using Informatica Powercenter 6.1 and the mappings are making use of the standard mapplets. In the standard mapplets, there are about 45 unconnected lookups in order to retrieve the values. My requirement is to tune the performance of this mapping. How to achieve the better performance by retaining the same design or to better address the problem wrt lookups
Many Thanks,
Meena
Hi Radha,
I have always this question in mind but never got concrete answers.Suppose I am in the middle of a mapping, where I need to do either a join/lookup on some table. Can you please tell the different judgement criteria to decide whether should Opt for lookup or joiner? Many thanks.
Thanks!
Ashish
Hi Sarma
I was following your excellent blog..If possible can you mail me your informatica PPT @ debo_fti@yahoo.com
I appreciate your help in advance
Debo
Hi Sarma
I was following your excellent blog..If possible can you mail me your informatica PPT @ narenrawat203@rediffmail.com
I appreciate your help in advance
Narendra
Hi Sarma
I was following your excellent blog..If possible can you mail me your informatica PPT @ debo_fti@yahoo.com
I appreciate your help in advance
narendra
Hi Sarma
I was following your excellent blog..If possible can you mail me your informatica PPT @ narenrawat203@rediffmail.com
I appreciate your help in advance
Narendra
Hi Radhakrishna...can you help me where to check a workflow is scheduled or not...I referred VIEW REP_WORKFLOWS, but it has only scheduler start_time...
Hi Radhakrishna,
Please can you forward PPT for performance tuning to my mail id rajkumar.bachu@gmail.com
Thanks in Advance
Rajkumar.
Hi Sharma,
Ur blog is wonderful nd helpful even fresher can find it easy to grasp,much appreciated.Can you please send me the doc,i hv seen some readers have requested u to send the doc ,me too wants to go through it. My id is rajeshkumar.pandey@rbs.com
Cheers
Rajesh
Hi Radhakrishna,
Could you please forward informatica 8.x ppt to me.
my id is sujatha_enrichit@yahoo.com
Thanks,
Sujatha
Hi Radhakrishna,
I have been following your excellent blog. If possible can you mail me your informatica PPT to sunjeye@gmail.com.
I appreciate your help in advance
Sunjeye
Can you send me the PPT please at anuguru23@hotmail.com,as early as possible.
Thanks for sharing valuable informations,
Informatica help
Hi Radha Krishna,
How are you. I think you remember me when i am in south africa i asked a doubt regarding XML Source system. Ok..
Please mail Informatica 8 Architecture at pv.nagarjuna@gmail.com
Dear Radhakrishna,
I am Srikanth and I have been a regular visitor to your site.
I have few queries which I present it below. Request you to please send your comments to coolsreek@gmail.com. Thanks in advance
1) Is using Mapplets in a position to process loads through informatica much faster? If so, How?
2) How to use Informatica 9.0 to process on a daily basis 50 Million per day in some case within the batch window cycle of say around 6 hrs.What are the components need to be used and anyother stretegy there?
Hi,
I wanted to ask, ' When we say that First record comes from the source and hits the lkp, the Cache is created and henceforth the rest of the process',
where does the first row from source go i.e. how does INFA know the first record has arrived and the cache has to be created now. Where does this first row gets checked or with what does it get checked?? Hope i am making sense here, please let me know if not!!
Regards,
Cp
Hi sarma,
ur blog is very informative. Can u add real time scenario in ur blog.
Can u mail me informatica ppt & architecture to my mail id
reachmaheboob@gmail.com
Hi Sarma,
Could you please forward the Informatica PPT's to my email id sreeya7@yahoo.com
-
Sreeya
Hey Sarma,
Can you please forward the Informatica PPT's to my email id: boussak@yahoo.fr
Can you explain me the difference between LookUp Types ? and when we use one and not the other ?
I really appreciate your help, and your blog.
Thanks,
Bouba
Iam really satisfy by your information.
It's well-written, to the point, and relative to what I do.
I like it very much for giving information on
Excellent Informatica Online Training .
Hi Sarma,
I'm getting an error in informatica when I'm using a XML Target. "cache manager was unable to allocate a new block" Can you please help me out with this. My email address is ram.thunda@gmail.com
Thanks
This was really a good post and also very informative.
http://www.tekclasses.com/
Post a Comment