Radhakrishna Sarma

Tuesday, June 15, 2010

SQL Overrides in Lookups, Source Qualifier etc

I have seen in many instances, you would want to know the SQL overrides written in a mapping Source Qualifier transformation or Lookup SQL override or Pre-SQL & Post-SQL etc. Below is the query that takes the workflow name & Folder name as input and gives you all the SQL overrides wherever they are in the workflow.

select folder, wf_name, 
       sess_name, mapping_name, 
       transformation_name, attr_name, 
       line_no, sql_value
from (select f.subj_name folder, 
             wf.task_name wf_name, 
             sess.instance_name sess_name, 
             m.mapping_name mapping_name, 
             w_inst.instance_name transformation_name, 
             attr.line_no, attr.attr_value sql_value, 
             attr_type.attr_name attr_name,
             row_number() over (partition by wf.task_name, 
                                             sess.instance_name, 
                                             m.mapping_name, 
                                             w_inst.instance_name, 
                                             attr.line_no, 
                                             attr.attr_value
                                order by attr.session_task_id desc
                               ) rn
      from opb_task_inst wf_inst
           ,opb_task_inst sess
           ,opb_session s
           ,opb_mapping m
           ,opb_subject f
           ,opb_widget_attr attr
           ,opb_widget_inst w_inst
           ,opb_task wf
           ,(select o.object_type_id object_type_id, 
                    o_attr.attr_id attr_id, 
                    o.object_type_name||': '||o_attr.attr_name attr_name
             from opb_attr o_attr, 
                  opb_object_type o
             where o.object_type_id = o_attr.object_type_id
             and o_attr.attr_datatype = 2
             and o_attr.attr_value is null
             and upper(o_attr.attr_name) like '%SQL%'
            ) attr_type
      where wf_inst.task_id = sess.task_id
      and sess.task_type = 68
      and sess.task_id = s.session_id
      and wf.subject_id = f.subj_id
      and s.mapping_id = m.mapping_id
      and attr.widget_id = w_inst.widget_id
      and w_inst.mapping_id = m.mapping_id
      and w_inst.widget_type = attr_type.object_type_id
      and wf_inst.workflow_id = wf.task_id
      and wf.task_type = 71
      and (attr.session_task_id = s.session_id
           or attr.session_task_id = 0)
      and attr.attr_id = attr_type.attr_id
      and attr.attr_value is not null
      and attr.attr_value <> '0'
      and wf.task_name = 'WORKFLOW_NAME'
      and f.subj_name = 'FOLDER_NAME'
      )
where rn = 1
order by 1, 2, 3, 4, 5, 6, 7

Shortcut Object and its parent folder

Have you always wondered how to get the shortcut object & it's parent folder. This requirement is imminent in folders where you have tens of shortcut transformations & mappings. Below query can get you what you want.

select shortcut_name, 
       case s.object_type
         when 21 then 'Mapping'
         when 23 then (select o.object_type_name
                       from opb_object_type o
                           ,opb_widget w
                       where w.widget_id = s.object_id
                       and w.widget_type = o.object_type
                      )
         when 24 then 'Target'
         when 25 then 'Source'
         when 44 then 'Mapplet'
         else 'Nothing'
       end Object_Type,
       decode(s.object_type, 21, (select f.subj_name
                                  from opb_subject f,
                                       opb_mapping m
                                  where m.mapping_id = s.object_id
                                  and m.subject_id = f.subj_id
                                  ),
                             23, (select f.subj_name
                                  from opb_subject f,
                                       opb_widget w
                                  where w.widget_id = s.object_id
                                  and w.subject_id = f.subj_id
                                  ),
                             25, (select f.subj_name
                                  from opb_subject f,
                                       opb_src src
                                  where src.src_id = s.object_id
                                  and src.subj_id = f.subj_id
                                  ),
                             24, (select f.subj_name
                                  from opb_subject f,
                                       opb_targ t
                                  where t.target_id = s.object_id
                                  and t.subj_id = f.subj_id
                                  ),
                             44, (select f.subj_name
                                  from opb_subject f,
                                       opb_widget w
                                  where w.widget_id = s.object_id
                                  and w.subject_id = f.subj_id
                                  )
              ) obj_original_folder
from opb_shortcut s, 
     opb_subject f
where s.subject_id = f.subj_id
and f.subj_name = 'FOLDER_NAME'

Thursday, June 10, 2010

Repository tables Expression query

In a mapping with multiple expression transformations and multiple unconnected lookups, it is difficult for you to identify the expressions calling these unconnected lookups. This SQL will give you what you are looking for.

select w.instance_name,
       g.expression
from opb_widget_expr f,
     opb_expression g,
     opb_widget_inst w,
     opb_mapping m
where f.expr_id = g.expr_id
and f.widget_id = w.widget_id
and w.widget_type = 5
and w.mapping_id = m.mapping_id
and m.mapping_name = 'm_Open_Trades_Target_load'

Wednesday, November 11, 2009

Informatica 9 Launch

Though the webinar on Informatica 9 launch is studded with Key Note, Breakouts etc., I could only grab these points from the presentation about the new version.

Logical Data objects - Something like a source, to which you can apply certain rules (set of mapplets or transformations) and you can deploy them as SQL services or web services so that you can view the data using a Browser. Note that till Informatica 8.x, you could only run transformations by creating a mapping, but in Informatica 9, you don't need a mapping, but a data object would do.
Informatica Developer tool - The only client tool I could see in the webinar is something with "D" as a task-bar icon, doesn't necessarily mean "Designer" as in 8.x.
Powerful Search tool to search the entire metadata for a port-name or even for a transformation name.
With the new Analyst tools, there is extensive flexibility to the Business user to customize the Informatica objects for his needs. This would substantially reduce the Development effort, but considering the effort/cost in training Business Users to customize the developed Informatica objects, I am not sure if it is all a cake-walk.
There are no two different objects as source or target. You can just import one object and can be used as a Source and also as a target, there by reducing the metadata storage.

Please note that I have not seen the New version yet, all the above is only from the webinar that I have attended at http://www.informatica.com/9

Wednesday, September 2, 2009

Informatica PowerCenter Repository tables

I am sure every PowerCenter developer either has an intention or necessity to know about the Informatica metadata tables and where information is stored etc. For the starters, all the objects that we create in Informatica PowerCenter - let them be sources, targets, mappings, workflows, sessions, expressions, be it anything related to PowerCenter, will get stored in a set of database tables (call them as metadata tables or OPB tables or repository tables).

* I want to know all the sessions in my folder that are calling some shell script/command in the Post-Session command task.
* I want to know how many mappings have transformations that contain "STOCK_CODE" defined as a port.
* I want to know all unused ports in my repository of 100 folders.

In repositories where you have many number of sessions or workflows or mappings, it gets difficult to achieve this with the help of Informatica PowerCenter client tools. After all, whole of this data is stored in some form in the metadata tables. So if you know the data model of these repository tables, you will be in a better position to answer these questions.

Before we proceed further, let me clearly urge for something very important. Data in the repository/metadata/OPB tables is very sensitive and that the modifications like insert or updates are to be made using the PowerCenter tools ONLY. DO NOT DIRECTLY USE UPDATE OR INSERT COMMANDS AGAINST THESE TABLES.

Please also note that there is no official documentation from Informatica Corporation on how these tables act. It is purely based on my assumption, research and experience that I am providing these details. I will not be responsible to any of the damages caused if you use any statement other than the SELECT, knowing the details from this blog article. This is my disclaimer. Let us move on to the contents now.

There around a couple of hundred OPB tables in 7.x version of PowerCenter, but in 8.x, this number crosses 400. In this regard, I am going to talk about few important tables in this articles. As such, this is not a small topic to cover in one article. I shall write few more to cover other important tables like OPB_TDS, OPB_SESSLOG etc.

We shall start with OPB_SUBJECT now.

OPB_SUBJECT - PowerCenter folders table

This table stores the name of each PowerCenter repository folder.

Usage: Join any of the repository tables that have SUBJECT_ID as column with that of SUBJ_ID in this table to know the folder name.

OPB_MAPPING - Mappings table

This table stores the name and ID of each mapping and its corresponding folder.

Usage: Join any of the repository tables that have MAPPING_ID as column with that of MAPPING_ID in this table to know the mapping name.

OPB_TASK - Tasks table like sessions, workflow etc

This table stores the name and ID of each task like session, workflow and its corresponding folder.

Usage: Join any of the repository tables that have TASK_ID as column with that of TASK_ID/SESSION_ID in this table to know the task name. Observe that the session and also workflow are stored as tasks in the repository. TASK_TYPE for session is 68 and that of the workflow is 71.

OPB_SESSION - Session & Mapping linkage table

This table stores the linkage between the session and the corresponding mapping. As informed in the earlier paragraph, you can use the SESSION_ID in this table to join with TASK_ID of OPB_TASK table.

OPB_TASK_ATTR - Task attributes tables

This is the table that stores the attribute values (like Session log name etc) for tasks.

Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this table means. You can know more about OPB_ATTR table in the next paragraphs.

OPB_WIDGET - Transformations table

This table stores the names and IDs of all the transformations with their folder details.

Usage: Use WIDGET_ID from this table to that of the WIDGET_ID of any of the tables to know the transformation name and the folder details. Use this table in conjunction with OPB_WIDGET_ATTR or OPB_WIDGET_EXPR to know more about each transformation etc.

OPB_WIDGET_FIELD - Transformation ports table

This table stores the names and IDs of all the transformation fields for each of the transformations.

Usage: Take the FIELD_ID from this table and match it against the FIELD_ID of any of the tables like OPB_WIDGET_DEP and you can get the corresponding information.

OPB_WIDGET_ATTR - Transformation properties table

This table stores all the properties details about each of the transformations.

Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this transformation means.

OPB_EXPRESSION - Expressions table

This table stores the details of the expressions used anywhere in PowerCenter.

Usage: Use this table in conjunction with OPB_WIDGET/OPB_WIDGET_INST and OPB_WIDGET_EXPR to get the expressions in the Expression transformation for a particular, mapping or a set.

OPB_ATTR - Attributes

This table has a list of attributes and their default values if any. You can get the ATTR_ID from this table and look it up against any of the tables where you can get the attribute value. You should also make a note of the ATTR_TYPE, OBJECT_TYPE_ID before you pick up the ATTR_ID. You can find the same ATTR_ID in the table, but with different ATTR_TYPE or OBJECT_TYPE_ID.

OPB_COMPONENT - Session Component

This table stores the component details like Post-Session-Success-Email, commands in Post-Session/pre-Session etc.

Usage: Match the TASK_ID with that of the SESSION_ID in OPB_SESSION table to get the SESSION_NAME and to get the shell command or batch command that is there for the session, join this table with OPB_TASK_VAL_LIST table on TASK_ID.

OPB_CFG_ATTR - Session Configuration Attributes

This table stores the attribute values for Session Object configuration like "Save Session log by", Session log path etc.

Wednesday, July 15, 2009

Error Logging in PowerCenter

In order to capture any Informatica PowerCenter errors into a flat file or database during runtime, Informatica Corporation suggests row-level error logging. The major disadvantage with this is that the performance of the workflow is affected because of row-level processing as opposed to block processing.

In order to overcome this, a simple approach can be followed that can be a common approach for all the workflows. This approach is based on the fact that $Session_Name.ErrorMsg will store NULL value if the session runs fine, otherwise stores the latest error message from the Session run.

1) Create two workflow variables - one for Error message $$ERROR_MESSAGE and the other $$SESSION_NAME to store the failed session name.

2) Create an assignment task in the workflow and create links to it from each of the sessions. Please note that the flow should be TOWARDS the assignment task from the sessions.

3) Modify the link expression for all these links to $Session_Name.PrevTaskStatus = FAILED.

4) In the assignment task, assign $Session_Name.ErrorMsg to the workflow variable $$ERROR_MESSAGE and assign Session_Name to $$SESSION_NAME.

5) You need a bit of nested iifs to achieve this.

For variable $$ERROR_MESSAGE, the expression contains

:udf.if_null_or_blank($Session_Name_1.ErrorMsg,
    :udf.if_null_or_blank($Session_Name_2.ErrorMsg,
        :udf.if_null_or_blank($Session_Name_3.ErrorMsg,
            :udf.if_null_or_blank($Session_Name_4.ErrorMsg ,
                :udf.if_null_or_blank($Session_Name_5.ErrorMsg ,
                    :udf.if_null_or_blank($Session_Name_6.ErrorMsg ,
                        :udf.if_null_or_blank($Session_Name_7.ErrorMsg ,
                            :udf.if_null_or_blank($Session_Name_8.ErrorMsg ,
                                :udf.if_null_or_blank($Session_Name_9.ErrorMsg ,
                                    :udf.if_null_or_blank($Session_Name_10.ErrorMsg ,
                                        :udf.if_null_or_blank($Session_Name_11.ErrorMsg ,
                                            :udf.if_null_or_blank($Session_Name_12.ErrorMsg ,
                                                 'A Fatal Error occurred' 
                                 ,$Session_Name_12.ErrorMsg
                                 )
                              ,$Session_Name_11.ErrorMsg
                              )
                           ,$Session_Name_10.ErrorMsg
                           )
                        ,$Session_Name_9.ErrorMsg
                        )
                     ,$Session_Name_8.ErrorMsg
                     )
                  ,$Session_Name_7.ErrorMsg
                  )
               ,$Session_Name_6.ErrorMsg
               )
            ,$Session_Name_5.ErrorMsg
            )
         ,$Session_Name_4.ErrorMsg
         )
      ,$Session_Name_3.ErrorMsg
      )
   ,$Session_Name_2.ErrorMsg
   )
,$Session_Name_1.ErrorMsg
)

:udf.if_null_or_blank(input_String_Argument, output_if_NULL_Blank, output_if_not_NULL_Blank) is a user-defined function with expression contents

iif(isnull(input_String_Argument) or length(ltrim(rtrim(input_String_Argument)))
= 0, output_if_NULL_Blank, output_if_not_NULL_Blank)

In the same way, create the expression for the $$SESSION_NAME. It should be the same expression as for the $$ERROR_MESSAGE but in the else part of the iif(), the session names should be specified instead of ErrorMsg.

6) From this assignment task take a link to a session which stores the contents of these workflow variables into a database table or a flat file. let us call this session mapping as LOG mapping.

You may question the scope of these workflow variable inside the Log mapping. If you can use the workflow variable in the source qualifier SQL override, then you can get the data from the same. Like this:

select '$$ERROR_MESSAGE', '$$SESSION_NAME'
from dual

Take 2 ports out of the source qualifier onto a expression transformation and then continue loading into a relation table target or a flat file target.

7) It is IMPORTANT to make sure that the General tab property of the Assignment task --> Treat Input Links as "OR". This makes sure if at least one session fails, the assignment task is triggered and the error is logged.

If you implement the Error Logging this way, you will be able to catch all kinds of Informatica errors.

Wednesday, April 22, 2009

Dynamically generate parameter files

I've seen this in many places, but just want to tell you the way I have been doing it for few years now. I will take the context in a bit more detailed manner to make every one understand it.

1. PowerCenter objects – Introduction:

• A repository is the highest physical entity of a project in PowerCenter.

• A folder is a logical entity in a PowerCenter project. For example, Customer_Data is a folder.

• A workflow is synonymous to a set of programs in any other programming language.

• A mapping is a single program unit that holds the logical mapping between source and target with required transformations. A mapping will just say a source table by name EMP exists with some structure. A target flat file by name EMP_FF exists with some structure. The mapping doesn’t say in which schema this EMP table exists and in which physical location this EMP_FF table going to be stored.

• A session is the physical representation of the mapping. The session defines what a maping didn’t do. The session stores the information about where this EMP table comes from. Which schema, with what username and password can we access this table in that schema. It also tells about the target flat file. In which physical location the file is going to get created.

• A transformation is a sub-program that performs a specific task with the input it gets and returns some output. It can be assumed as a stored procedure in any database. Typical examples of transformations are Filter, Lookup, Aggregator, Sorter etc.

• A set of transformations, that are reusable can be built into something called mapplet. A mapplet is a set of transformations aligned in a specific order of execution.

As with any other tool or programing language, PowerCenter also allows parameters to be passed to have flexibility built into the flow. Parameters are always passed as data in flat files to PowerCenter and that file is called the parameter file.

2. Parameter file format for PowerCenter:

For a workflow parameter which can be used by any session in the workflow, below is the format in which the parameter file has to be created.

[Folder_name:WF.Workflow_Name]
$$parameter_name1=value
$$parameter_name2=value

For a session parameter which can be used by the particular session, below is the format in which the parameter file has to be created.

[Folder_name:WF.Workflow_Name:ST.Session_Name]
$$parameter_name1=value
$$parameter_name2=value

3. Parameter handling in a data model:

• To have flexibility in maintaining the parameter files.

• To reduce the overhead for the support to change the parameter file every time a value of a parameter changes

• To ease the deployment,
all the parameters have to be maintained in Oracle or any database tables and a PowerCenter session is created to generate the parameter file in the required format automatically.

For this, 4 tables are to be created in the database:

1. FOLDER table will have entries for each folder.

2. WORKFLOWS table will have the list of each workflow but with a reference to the FOLDERS table to say which folder this workflow is created in.

3. PARAMETERS table will hold all the parameter names irrespective of folder/workflow.

4. PARAMETER_VALUES table will hold the parameter of each session with references to PARMETERS table for parameter name and WORKFLOWS table for the workflow name. When the session name is NULL, that means the parameter is a workflow variable which can be used across all the sessions in the workflow.

To get the actual names because PARAMETER_VALUES table holds only ID columns of workflow and parameter, we create a view that gets all the names for us in the required format of the parameter file. Below is the DDL for the view.

a. Parameter file view:

CREATE OR REPLACE VIEW PARAMETER_FILE
(
HEADER,
DETAIL
)
AS
select '['fol.folder_name'.WF:' wfw.workflow_name']' header
,pmr.parameter_namenvl2(dtl.logical_name, '_'dtl.logical_name, NULL)'='
dtl.value detail
from folder fol
,parameters pmr
,WORKFLOWS wfw
,PARAMETER_VALUES dtl
where fol.id = wfw.folder_id
and dtl.pmr_id = pmr.id
and dtl.wfw_id = wfw.id
and dtl.session_name is null
UNION
select '['fol.folder_name'.WF:' wfw.workflow_name'.ST:' dtl.session_name']' header
,decode(dtl.mapplet_name, NULL, NULL, dtl.mapplet_name'.')
pmr.parameter_namenvl2(dtl.logical_name, '_'dtl.logical_name, NULL)'=' dtl.value detail
from folder fol
,parameters pmr
,WORKFLOWS wfw
,PARAMETER_VALUES dtl
where fol.id = wfw.folder_id
and dtl.pmr_id = pmr.id
and dtl.wfw_id = wfw.id
and dtl.session_name is not null

b. FOLDER table

ID (NUMBER)
FOLDER_NAME (varchar50)
DESCRIPTION (varchar50)

c. WORKFLOWS table

ID (NUMBER)
WORKFLOW_NAME (varchar50)
FOLDER_ID (NUMBER) Foreign Key to FOLDER.ID
DESCRIPTION (varchar50)

d. PARAMETERS table

ID (NUMBER)
PARAMETER_NAME (varchar50)
DESCRIPTION (varchar50)

e. PARAMETER_VALUES table

ID (NUMBER)
WF_ID (NUMBER)
PMR_ID (NUMBER)
LOGICAL_NAME (varchar50)
VALUE (varchar50)
SESSION_NAME (varchar50)

• LOGICAL_NAME is a normalization initiative in the above parameter logic. For example, in a mapping if we need to use $$SOURCE_FX as a parameter and also $$SOURCE_TRANS as another mapping parameter, instead of creating 2 different parameters in the PARAMETERS table, we create one parameter $$SOURCE. Then FX and TRANS will be two LOGICAL_NAME records of the PARAMETER_VALUES table.

• m_PARAMETER_FILE is the mapping that creates the parameter file in the desired format and the corresponding session name is s_m_PARAMETER_FILE.

Radhakrishna Sarma

Tuesday, June 15, 2010

SQL Overrides in Lookups, Source Qualifier etc

Shortcut Object and its parent folder

Thursday, June 10, 2010

Repository tables Expression query

Wednesday, November 11, 2009

Informatica 9 Launch

Wednesday, September 2, 2009

Informatica PowerCenter Repository tables

Wednesday, July 15, 2009

Error Logging in PowerCenter

Wednesday, April 22, 2009

Dynamically generate parameter files

Cricinfo International Scores

Followers

Blog Archive

About Me

Radhakrishna Sarma

Tuesday, June 15, 2010

SQL Overrides in Lookups, Source Qualifier etc

Shortcut Object and its parent folder

Thursday, June 10, 2010

Repository tables Expression query

Wednesday, November 11, 2009

Informatica 9 Launch

Wednesday, September 2, 2009

Informatica PowerCenter Repository tables

Wednesday, July 15, 2009

Error Logging in PowerCenter

Wednesday, April 22, 2009

Dynamically generate parameter files

Cricinfo International Scores

Subscribe To

Followers

Blog Archive

About Me