Friday, December 26, 2008

Avoid sequence generator

I would always advice you to avoid sequence generator while populating an ID column in the relational target table. I suggest you rather create a sequence on the target database and enable the trigger on that table to fetch the value from the database sequence.

There are many advantages by doing this way.

  • Less number of PowerCenter objects in the object which reduce development time and also less maintenance.
  • Today you may be writing the records into that target table using PowerCenter flow. What if you get another application tomorrow that also writes records into that table? Your ID column values go bizarre in this scenario.
  • During migration of objects from Dev to QA and also to Production further, there is no extra over-head of considering the persistent values of the sequence generator from the repository database.

In all of the above cases, a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself.

10 comments:

Srikanth said...

I have a questions on this

Well database people disagree and say that ETl people should handle sequences. The reason being triggers being unreliable especially when data is huge

oracle triggers can be a drag on the database and
have heard complaints from the dba's

SQL server is good in this case there is a column called identity
which works as the sequence generator

Radhakrishna Sarma said...

Thank you for writing Srikanth.

I can understand the cribbing of DBAs. They have never been friendly with developers.

It is again a design decision that has to be made for this. From my experience, I've worked on Oracle sequences writing into Oracle targets of gigs of data. I've never had any problem with the triggers.

If the DBA insists, then you can go with the sequence generator, but in most other cases, I avoid it.

Cheers
Sarma.

Anonymous said...

Why not use a lookup and get the latest surrogate key value first. There is already an index on this PK anyway.
Next add 1 to that value using a sequence generator and expression transformation. Don't forget resetting the seq every run !
This fixes all of your problems and works very efficiently !
On top of that you can continue using BULK Loads in PowerCenter, which you can not if you use a db seq.

Maarten
www.maartenvanleenhove.be

Radhakrishna Sarma said...

Hi Marteen,

First of all, thank you very much for reviewing the article and publishing your comment on the same. If we come to the contents of your comment, yes - the solution you have given covers "most" of the problems I've mentioned, but there is a problem with that solution.

When I have GUI writing into the table and at the same, I have this BULK load in PowerCenter happening in parallel. That time, the integrity is lost.

BTW, your solution is the closest to let any one tempt to use Seq Generator.

I've also gone through your website. I was living in Eindhoven for more 2 years before I moved to S'pore.

Cheers
Sarma.

Anonymous said...

I do not agree to both of your approaches. 1) Using trigger is a very costly operation as per DBA's and should be avoided. 2) using lookup to get the previous maximum value and then add one to it 1 can be expensive when your target tables is a huge one(consider the time lookup would take to create its cache for 50 millioon rows or so). Even if we use uncached lookup, still query the table evrytime to gett eh max value would be time consuming.

Another way, I may suggest would be using a data base seqeuence and calling it from Informatica mapping through a stored procedure trans. In DB, you can have a small proc written, which gives you nestval of seq, everytime it is called. This may help cater to all your probs. Let me know your thoughts.

Radhakrishna Sarma said...

Hi Anonymous,

Thanks for visiting my blog and leaving a comment and regarding your disagreement, yeah, its your opinion and the context differs.
But for the approach that you have suggested about stored proc to increment a database sequence will be the worst performer because for each row Informatica establishes a Oracle connection and there is a context switch b/n SQL & PL/SQL engines and then a Oracle connection close. All of these are expensive I/O operations. So big NO from my side...

Cheers
Sarma.

Anonymous said...

By using the unconnected lookup,we will call the lookup transformation only once to get the max seq value from the database and assign it to a variable.From there onwards, for that load the variable gets incremented.(assuming no parallel loads on the target table)
if you have to do parallel loading (using informatica) then you can use shared seq generator with cache.

Ajain said...

Seems I gone through this blog very late. However, if my suggestion still hold valid here I'm posting.

Best approach as per my understanding will be to create a DB sequence and call it from Infa mapping via a resuable SQL transformation.
Pros: 1. Pushdown optimization is achieved.
2. Infa sequence generation's persistent value maintenance to upstream environment is no more an issue.
3. Parallel execution of some other ETL's will not cause any issue in terms of surrogate key value conflicts.

Anonymous said...

It looks like the guy who wrote this article has no idea on how the oracle database works. He need some real time experience rather than blabbering something.

Miracle Electronics said...

These technical details had been of great help. Thanks for sharing!!Keep updating new posts on your blog!!
Power transformers in India | Transformer Manufacturer in India

 
HTML Hit Counters