DWH Fundas: 2011

Tip 1:
Always use sorted data for very large data aggregations, or use PowerCenter server 64-bit , and allocate a large amount of memory. Sorted aggregations run much faster than unsorted aggregations.

Tip 2:
Aggregator transformations do not sort data. The aggregator uses a clustering algorithm, not a sort algorithm. When there are duplicate rows the aggregator may put data out in a seemingly sorted order but it does not guarantee it.

Tip 3:
Keep mappings as simple as possible. The smaller the better in terms of performance and tuning. Divide and conquer is the best strategy for fastest mapping performance. Sometimes multi-staging the work, or splitting the workload between the database and stages can release dependencies upstream, and increase parallelism.

Tip 4:
Make sure to allocate a large amount of memory (as much as possible) for mapping objects that cache.

Tip 5:
Aggregator transformations can be used to pivot (de-normalize) data.

Tip 6:
When replacing PERL code, make sure to break the code into units of work. Use each unit as a design step in the mapping architecture. Develop the overall complex mapping, then break it apart into smaller manageable steps.

Tip 7:
Keep the mapping objects as streamlined as possible. Run the data through the transforms, not around them. This helps with the partitioning options at the session level, as well as the parallelism capabilities of the mapping.

Tip 8:
When using a Sort, Aggregator, Joiner, or Lookup transformation keep the keys as "small" as possible (measured in precision). Much of the same mathematics that play in computing relational database indexes also play in computing the"indexed" fields that perform the operations listed above.

Tip 9:
Keep filter conditions simple, move the complex condition expressions into expression objects. This keeps the filter fast. When the filter runs slowly it's usually because of a complex condition.

Tip 10:
Break complex conditions down into smaller parts. Use the variables within an expression to build complex expression logic. This keeps the mappings more maintainable.

Tip 11:
Never have more than five (5) targets per mapping. This will slow down the mapping exponentially. Complex maps usually demand multiple targets, but the more targets you have, the poorer the performance.

Tip 12:
Complex architectures usually require update strategies within the mapping. The update strategies can result in a performance hit to the session, sometimes significant. It is recommended to minimize the usage of Update Strategies transformations for optimal performance.

Tip 13:
If you have very large and complex mappings that are running with a large amount of data (~50 million+ rows) then it is recommended to use the PowerCenter 64-bit server for optimal performance. It provides you with access to plenty of memory and high speed performance for large mappings.

Tip 14:
Any mapping with 50+ objects is simply too large and MUST be broken down into multiple mappings.

Tip 15:
To create complex output (say a mainframe ASCII file), use a single flat file, single string (4k if necessary), format the string in one or more export "expressions". Use the LPAD and RPAD functions to re-format data, and put record indicator columns on the output side.

Tip 16:
Always set the "master" in the joiner to be the smaller of the two tables (except when using detail outer join or full outer join). This will keep the caching of the two set to the minimum number of rows.

Tip 17:
Replace a lookup with a joiner, whenever you are faced with extremely large data sets

Tip 18:
Use reusable lookups instead of the same lookup multiple times. This will assist in reusing the lookup caches, and improve performance.

Tip 19:
If you are going to use a sequence generator, and share it across multiple mappings (or make the session run in parallel) then set it to cache a minimum of 10,000 values.

The use of unconnected lookup transformation.

1) The big question is why should we use unconnected lkp transformation,
if we can achieve same with connected one?

2) Which gives better performance, connected or unconnected and why?

Answers 1 :
Depending upon the scenario that we have we use the connected and unconnected LKP..

Scenario:
In our scenario we need to lookup multiple times,creating lkp multiple times and putting it in flow is not a good idea..bcoz it
will create more cache and thus deteroiting the performance
Instead of that we can create one unconnected lookup use it multiple times where ever u want which will be much more faster.

Answers 2:
You can use conn lookup but if u need 1 row so u can go for unconnected lKP becoz u can
use it multiple time in same mapping from multiple expression transformation.

Benefit of the above approach :
The benefit of an uncncted lKP is that you can do conditional lookups (using IIF,DECODE, etc.).
In other words, you can control the lookup in unconnected lookup.

Uncnted LKP gives better performance because it's not connectd to mapping pipeline.
so data transform is faster compared to connected one

Unconnected lookup can be used when we need only one returning port from the lookup table or if the same lookup condition is used
many times in the mapping. And moreover unconnected lookup gives
better performance as it just acts as a function call. but if
more than one ports has to be returned from look up table then we must use a
connected lookup.

Monday, June 20, 2011

Informatica Tips

A simpler way to write nested IF

Friday, June 17, 2011

The Use of UNConnected Lookup

Blog Archive