Tuesday, August 26, 2008

How to generate target file names dynamically from the mapping?

I've seen this question many times in the Devnet forums very recently. I thought I will just elaborate the PowerCenter documentation here.



In order to generate the target file names from the mapping, we should make use of the special "FileName" port in the target file. You can't create this special port from the usual New port button. There is a special button with label "F" on it to the right most corner of the target flat file when viewed in "Target Designer".



Below two screen-shots tell you how to create the special port in your target file.






Once this is done, the job is done. When you want to create the file name with a timestamp attached to it, just use a port from an Expression transformation before the target to pass a value of Output Port with expression $$FILE_NAMEto_char(sessstarttime, 'YYYYMMDDHH24:MISS')'.csv'.


Please note that $$FILE_NAME is a parameter to the mapping and I've used sessstarttime because it will be constant through out the session run.


If you use sysdate, it will change if you have 100s of millions of records and if the session may run for an hour, each second a new file will get created.


Please note that a new file gets created with the current value of the port when the port value which maps to the FileName changes.

We'll come to the mapping again. This mapping generates two files. One is a dummy file with zero bytes size and the file name is what is given in the Session properties under 'Mappings' tab for target file name. The other file is the actual file created with the desired file name and data.

When you have different sets of input data with different target files created, use the same instance, but with a Transaction Control transformation which defines the boundary for the source sets. A demonstration of this is available in the PowerCenter documentation.

Wednesday, March 5, 2008

10000 BC - utter disappointment (10000 bc movie review)

10000 BC...

One of the most boring movies I had seen in recent times. All the incidents from the film are neither factual nor practical.

***** SPOILER *****

Plot is quite simple. Our young hero goes in search of his stolen girl friend and travels across mountains, rivers, deserts and what not! He finally reaches the place where she is held in captivity as a slave. He fights with the people who held her and frees every one from the captivity.

Most of the scenes in the movie seem ridiculous to me. With great difficulty, I can pickup two of them to the most-ridculous category.

One - When the hero falls into a trap, he finds that a smilodon/sabre-toothed tiger is also trapped and the tiger got stuck in some wooden frame. He tries to kill it, but finally saves it thinking it wouldn't kill him. To our surpise, the tiger doesn't kill him, but leaves that place. When the hero was surrounded by a group of strangers who try to kill him, this smilodon appears and the scene looks as, if this tiger had come to save the hero.

Two - When the hero is around 12 years of age, he says this to his young girl friend. "All the clouds move, but that brighter cloud in the middle (is referring to the Moon), doesn't move. In the same way, my love to you, doesn't go away from my heart." This is the first dialogue he utters to his girl friend when he first meets her.

With some horrible disturbing characters, this film climax looked more like the Apocalypto human-sacrifice scene. If one can digest the gratitude of the obligate carnivorous, sentimental dialogues which are of distant-reality, this film is a very good entertainer.

One salute if you don't watch the movie after reading this review, two if you do.

Saturday, February 16, 2008

Informatica PowerCenter performance - Lookups

Lookup performance

Lookup is an important and a useful transformation when used effectively.
What is a lookup transformation? It is just not another transformation which fetches you data to look against the source data. It is a transformation when used improperly, makes your flow run for ages.

I now try to explain different scenarios where you can face problems with Lookup and also how to tackle them.


Unwanted columns:

By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table, but be sure to delete the unwanted columns from the lookup as they affect the lookup cache very much. You only need columns that are to be used in lookup condition and the ones that have to get returned from the lookup.

SQL query:

We will start from the database. Find the execution plan of the SQL override and see if you can add some indexes or hints to the query to make it fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are not an SQLer.

Size of the source versus size of lookup:

Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly.

Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup.

Conditional call of lookup:

Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again.

JOIN instead of Lookup:

In the same context as above, if the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins, if both the tables are in the same database and schema.

Increase cache:

If none of the above seems to be working, then the problem is certainly with the cache. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in the cache, it swaps the data from the file to the cache and keeps doing this until it finds the data. This is quite expensive for obvious reasons being an I/O operation. Increase the cache so that the whole data resides in the memory.

What if your data is huge and your whole system cache is less than that? Don't promise PowerCenter the amount of cache that it can't be allotted during the runtime. If you promise 10 MB and during runtime, your system on which flow is running runs out of cache and can only assign 5MB. Then PowerCenter fails the session with an error.

Cachefile file-system:

In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling up may take time and result in latency. So with the help of your system administrator try to look into this aspect as well.

Useful cache utilities:

If the same lookup SQL is being used in someother lookup, then you have to go for shared cache or reuse the lookup. Also, if you have a table that doesn't get data updated or inserted quite often, then use the persistent cache because the consecutive runs of the flow don't have to build the cache and waste time.

After all these are tried out, sit and wait for the session to get completed - there is nothing else you can do about the lookup.

Thursday, February 14, 2008

About me

I did all my education in the Kurnool district of AP, India. After I am out of my college, joined Automatic Data Processing Inc (ADP, Hyderabad) in June 2003 and worked there for almost 2.5 years. I decided to move out and joined Tata Cosultancy Services in October, 2005. I've resigned from TCS and am in Singapore at the moment.
DBMS (Database Management System) was one of the subjects I hated during my engineering (because of all the Normalization rules blah blah). Here now, I end up being a developer in Informatica PowerCenter with more than essential knowledge of Oracle SQL & PL/SQL. The more I learn about Oracle, the more I like it.
 
HTML Hit Counters