Informatica Interview Questions

Suraj Jeswara
16 min readMar 31, 2021

Questions

All sort of settings for look up transformation.

Difference between active and passive transformation.

Is update strategy and active or passive transformation? Why?

How to replace a mapping with Update strategy with a mapping without update strategy?

When to use lookup vs when to use joiner transformation?

When to use auto memory settings for caching transformations and when not. 10

How does target load order works?

What is incremental aggregation?

When should you do incremental aggregation?

Define SCD1 and SCD2?

What are the command line utilities for PowerCenter and when to use them?

What are the most important session settings?

How will you debug a complex mapping error?

What happens in persistent cache?

What is the use of MD5?

How will you ensure that a mapping is optimised?

What happens when you use partitioning in informatica PC?

Suggest some ways to reduce query time from a database having approx. 10 million records.

Suggest ways to remove duplicate data using Informatica PC?.

What is the difference between snowflake schema and star schema?

What is the benefit of using a start schema?

What are the different kinds of dimensions you have worked on and define them?

How will you implement an SCD type 2 logic?

What is push-down optimization?

All sort of settings for look up transformation.

Below are all the properties. However, just naming a few frequently used once will do.

Difference between active and passive transformation

An active transformation can change the number of rows that pass through the transformation and can also alter the row type or transaction boundary. For example, the Filter transformation is active because it removes rows that do not meet the filter condition.

A passive transformation does not change the number of rows that pass through the transformation.

You can connect multiple branches to a downstream passive transformation when all transformations in the branches are passive.

You cannot connect multiple active transformations or an active and a passive transformation to the same downstream transformation or transformation input group. You might not be able to concatenate the rows. An active transformation changes the number of rows, so it might not match the number of rows from another transformation.

For example, one branch in a mapping contains an Expression transformation, which is passive, and another branch contains an Aggregator transformation, which is active. The Aggregator transformation performs aggregations on groups, such as sums, and reduces the number of rows. If you connect the branches,

Data Integration cannot combine the rows from the Expression transformation with the different number of rows from the Aggregator transformation. Use a Joiner transformation to join the two branches.

Is update strategy an active or passive transformation? Why?

Update strategy transformation is an active and connected transformation. Update strategy transformation is used to insert, update, and delete records in the target table. It can also reject the records without reaching the target table. When you design a target table, you need to decide what data should be stored in the target. It is active since it flags rows for different operations comprising of insert, update, delete or reject.

The description of each flag is given below:

a) DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.

b) DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.

c) DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.

d) DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

How to replace a mapping with Update strategy with a mapping without update strategy?

You can do so by setting the update strategy in session level. Configuring at session level instructs the integration service to either treat all rows in the same way (Insert or update or delete) or use instructions coded in the session mapping to flag for different database operations.

you can select a single database operation for all rows using the Treat Source Rows As setting from the ‘Properties’ tab of the session.

Insert :- Treat all rows as inserts.

Delete :- Treat all rows as deletes.

Update :- Treat all rows as updates.

Data Driven :- Integration Service follows instructions coded into Update Strategy flag rows for insert, delete, update, or reject.

Once you determine how to treat all rows in the session, you can also set options for individual rows, which gives additional control over how each rows behaves. Define these options in the Transformations view on Mapping tab of the session properties.

Insert :- Select this option to insert a row into a target table.

Delete :- Select this option to delete a row from a table.

Update :- You have the following options in this situation:

Update as Update :- Update each row flagged for update if it exists in the target table.

Update as Insert :- Insert each row flagged for update.

Update else Insert :- Update the row if it exists. Otherwise, insert it.

Truncate Table :- Select this option to truncate the target table before loading data.

When to use lkp vs when to use joiner transformation?

When we want to return only result of a single match from the other table, we need to use a look up transformation as a joiner will return all matching records from multiple matches. If table size is too large and single matching record need to be returned then use a look up transformation. If data from 2 different database needs to be joined with outer or inner join then better to use joiner. If your table is getting updated in between sessions then use lookup using dynamic cache option.

Some difference

When to use auto memory settings for caching transformations and when not.

To enable automatic memory settings, you must set a value for the Maximum Memory Allowed for Auto Memory Attributes option. When one of these options is set to zero, automatic memory attributes are disabled.

These settings are set to zero for sessions and session configuration objects in upgraded repositories and hence this feature is disabled. You can improve the performance of these sessions by changing these memory settings to non-zero values using the following values as defaults:

How does target load order works?

In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially regardless of the type of target. If the target is a copy of the same flat file target and you specify the Append if Exists property for the target in the session properties, the Integration Service processes all the target load order groups.

To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping.

Steps to set a target load order:

· Create a mapping that contains multiple target load order groups.

· Click Mappings > Target Load Plan.

· The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier.

· Select a source qualifier from the list.

· Click the Up and Down buttons to move the source qualifier within the load order.

· Repeat steps 3 to 4 for other source qualifiers you want to reorder.

· Click OK.

What is incremental aggregation?

Incremental Aggregation is the process of capturing the changes in the source and calculating the aggregations in a session. This process makes the integration service to update the target incrementally and avoids the process of calculating the aggregations on the entire source.

Consider the below sales table as an example and see how the incremental aggregation works.

Source:

YEAR PRICE

— — — — —

2010 100

2010 200

2010 300

2011 500

2011 600

2012 700

When you run the session for the first time using the incremental aggregation, then integration service process the entire source and stores the data in two file, index and data file. The integration service creates the files in the cache directory specified in the aggregator transformation properties.

Incremental Aggregation is the process of capturing the changes in the source and calculating the aggregations in a session. This process makes the integration service to update the target incrementally and avoids the process of calculating the aggregations on the entire source. Consider the below sales table as an example and see how the incremental aggregation works.

Source:YEAR PRICE----------2010 1002010 2002010 3002011 5002011 6002012 700

For simplicity, I have used only the year and price columns of sales table. We need to do aggregation and find the total price in each year.

When you run the session for the first time using the incremental aggregation, then integration service process the entire source and stores the data in two file, index and data file. The integration service creates the files in the cache directory specified in the aggregator transformation properties.

After the aggregation, the target table will have the below data.

Target:YEAR PRICE----------2010 6002011 11002012 700

Now assume that the next day few more rows are added into the source table.

Source:

YEAR PRICE

— — — — —

2010 100

2010 200

2010 300

2011 500

2011 600

2012 700

2010 400

2011 100

2012 200

2013 800

Now for the second run, you have to pass only the new data changes to the incremental aggregation. So, the source will contain the last four records. The incremental aggregation uses the data stored in the cache and calculates the aggregation. Once the aggregation is done, the integration service writes the changes to the target and the cache. The target table will contain the below data.

When should you do incremental aggregation?

1. When you use incremental aggregation, first time you have to run the session with complete source data and in the subsequent runs you have to pass only the changes in the source data.

2. Use incremental aggregation only if the target is not going to change significantly. If the incremental aggregation process changes more than half of the data in target, then the session performance many not benefit. In this case go for normal aggregation.

Define SCD1 and SCD2?

SCD Type 1: The attribute value is overwritten with the new value, obliterating the historical attribute values. For example, when the product roll-up
changes for a given product, the roll-up attribute is merely updated with the current value.
SCD Type 2: A new record with the new attributes is added to the dimension table. Historical fact table rows continue to reference the old dimension key with the old roll-up attribute; going forward, the fact table rows will reference the new surrogate key with the new roll-up thereby perfectly partitioning history.

What are the command line utilities for PowerCenter and when to use them?

Pmrep

· Insert, Delete, Update Informatica meta objects.

· List the Informatica meta objects.

· You can have the deployments, migrations.

· Sequence generator updations

· Property changes for powercenter objects.

· Plug In installation (Salesforce,SAP)

Pmcmd

· Trigger(kickoff),Abort the powercenter jobs.

· Get the statistics of the status powercenter jobs.

What are the most important session settings?

· Treat source rows as

· Commit Interval

· Session log file Property

· Test Load Property

· Log options

· Error Handling

· Source/Target Properties

· Connections

What happens in persistent cache?

When you use a persistent cache, Data Integration saves and reuses the cache files from mapping run to mapping run.

If the lookup table does not change between mapping runs, you can use a persistent cache. A persistent cache can improve mapping performance because it eliminates the time required to read the lookup table. The first time that Data Integration runs a mapping using a persistent lookup cache, it saves the cache files to disk. The next time that Data Integration runs the mapping, it builds the memory cache from the cache files.

What is the use of MD5?

MD5 (Message Digest Function) is a hash function in Informatica which is used to evaluate data integrity. The MD5 function uses Message-Digest Algorithm 5 (MD5) and calculates the checksum of the input value. MD5 is a one-way cryptographic hash function with a 128-bit hash value. MD5 returns a 32 character string of hexadecimal digits 0–9 & a-f and returns NULL if the input is a null value.

Example:

When you wish to write changed data to a database. Use MD5 to generate checksum values for rows of data that you read from a source.

When you run a session, compare the previously generated checksum values against the new checksum values. Then, write the rows with an updated checksum value to the target. You can conclude that an updated checksum value would indicate that the data has changed.

How will you ensure that a mapping is optimised?

Check the below points to ensure that mapping is optimised.

To optimize a mapping, you can perform the following tasks:

· Configure the mapping with the least number of transformations and expressions to do the most amount of work possible.

· Delete unnecessary links between transformations to minimize the amount of data moved.

· Choose an optimizer level that determines which optimization methods the Data Integration Service can apply to the mapping. When the Data Integration Service optimizes a mapping, it attempts to reduce the amount of data to process. For example, the Data Integration Service can use early selection optimization to move a filter closer to the source. It can use the cost-based optimization method to change the join processing order.

· Choose a pushdown type to enable the Data Integration Service to determine whether it can push down partial or full transformation logic to the source database.

· Configure data object caching to enable the Data Integration Service cache logical data objects and access pre-built logical data objects when it runs a mapping. By default, the Data Integration Service extracts source data and builds required data objects when it runs a mapping. Mapping performance increases when the Data Integration Service can access pre-built data objects.

· Indicate if the SQL transformation, Web Service Consumer transformation, and the Java transformation do not have side effects when you configure these transformations. Some transformations have side effects that restrict optimization. For example, a transformation can have a side effect if the transformation writes to a file or database, adds to a count, raises an exception, or writes an email. In most cases, the Data Integration Service identifies which transformations have side effects that restrict optimization.

What happens when you use partitioning in informatica PC?

The Informatica PowerCenter Partitioning Option increases the performance of PowerCenter through parallel data processing. Partitioning option will let you split the large data set into smaller subsets which can be processed in parallel to get a better session performance.

There is different type of partitioning namely:

· Database partitioning : The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.

· Round-Robin Partitioning : Using this partitioning algorithm, the Integration service distributes data evenly among all partitions. Use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions.

· Hash Auto-Keys Partitioning : The PowerCenter Server uses a hash function to group rows of data among partitions. When hash auto-key partition is used, the Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.

· Hash User-Keys Partitioning : Hash user keys. The Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You choose the ports that define the partition key.

· Key Range Partitioning : With this type of partitioning, you specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port.

· Pass-through Partitioning : In this type of partitioning, the Integration Service passes all rows at one partition point to the next partition point without redistributing them.

Suggest some ways to reduce query time from a database having approx. 10 million records.

1. Partitioning

2. Indexing

Suggest ways to remove duplicate data using Informatica PC?

For Relational Tables

1. Source Qualifier > ‘SELECT DISTINCT’ option

2. Source Qualifier > SQL override (Write your own Query)

For Flat files or other sources

1. Sorter > Aggregator

2. Sorter > Expression(To compare the current and previous value and assign duplicate count) > Filter(filter duplicate records)

3. Sorter > ‘DISTINCT’ option (ONLY to remove duplicate across all ports)

What is the difference between snowflake schema and star schema?

What is the benefit of using a start schema?

They are fast as quick to retrieve data from them and no normalization exists as in case of snow flake schema.

What are the different kinds of dimensions you have worked on and define them?

Conformed Dimension

A conformed dimension is the dimension that is shared across multiple data mart or subject area. Company may use the same dimension table across different projects without making any changes to the dimension tables.

Conformed dimension example would be Customer dimension, i.e. both marketing and sales department can use Customer dimension for their reporting purpose.

Junk Dimension

A junk dimension is a grouping of typically low cardinality attributes, so you can remove them from main dimension.

You can use Junk dimensions to implement the rapidly changing dimension where you can use it to stores the attribute that changes rapidly.

For example, attributes such as flags, weights, BMI (body mass index) etc

Degenerated Dimension

A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table.

For example, receipt number does not have dimension table associated with it. Such details are just for information purpose.

Role Playing Dimension

Dimensions which are often used for multiple purposes within the same database are called role-playing dimensions.

For example, you can use a date dimension for “date of sale”, as well as “date of delivery”, or “date of hire”.

How will you implement an SCD type 2 logic?

Explained here http://www.disoln.org/2012/08/slowly-changing-dimension-type-2-implementation-using-informatica.html

However, different approaches exist to this.

What is push-down optimization?

When the Data Integration Service applies pushdown optimization, it pushes transformation logic to the source database. The Data Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source database runs the SQL queries to process the transformations.

Pushdown optimization increases mapping performance when the source database can process transformation logic faster than the Data Integration Service. The Data Integration Service also reads less data from the source.

USE CASES:

UC-1:

- You have a file containing employee and department name. You need to develop a mapping that will write data for employees of each department in a different department specific file.

1. Read the data from the file one by one.

2. Use a comparison to see if the new and previous records are from different departments.

3. Keep adding the row to a file one by one using transaction control transformation.

4. If there is a change in the department name use transaction control transformation to commit the file.

5. Then start loading to new file then back to step 4

UC-2:

- You have a file with some N records. How will you develop a mapping to write the records in a file in reverse order. i.e. Nth record first and the N-1 and so on.

· Create a new port

· Increment that port by 1

· Sort based on that port in descending order

· Load the data into target

UC-3:

- You have a location where a new file is generated every week. You are asked to develop a process that will load the data from only the most recent file.

1. Your workflow will have a cmd task along with a session task

2. The cmd task will be executed first

3. The cmd task will have a unix command to give the latest file name and write it into a file. Suppose abc.txt

4. Use the file abc.txt as a source to your session and check “indirect load” in the session property

UC-4:

- Suppose you have a mapping that takes 5 mins to load a table. You have found today that it is taking 20 mins. How will you troubleshoot it to fix the issue?

· The session log needs to be opened

· Check the time taken to read the data

· Check the time taken to write the data

· Check time taken in lookups

The whole idea here is to check the bottlenecks at database level. i.e if the database is being used by some other process while the session was being executed. If found the bottleneck needs to be dealt to prevent any such case in future.

UC-5:

- What test cases will you consider for testing a 1 to 1 mapping loading to a database from a database.

1. The SQL override query executes in the database

2. Data fetched from the source

3. All data loaded to target

4. Based on the session property “Treat source rows as” see if existing rows are updated and new rows inserted

5. See session succeed

6. Tally the target count to source data count

--

--

Suraj Jeswara

I am passionate about learning new things and sharing it with others. :)