Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




March 8, 2002

/020308/505warehouse1_1.jhtml">

Real-Life Data Mart Processing

Is your mart the symmetric "information diamond" at the end of the data pipeline?

By Gabriel Tanase
Edited by Ralph Kimball

Continued from Page 1

I've seen three kinds of nonlinear aggregations in the data marts I've built. All of these are from service industries such as insurance. I call the nonlinear aggregation types MSI (for "multistep iterative"), PTS (parameterized time series), and MTS (multiple time series).

An example of an MSI nonlinear aggregation is the calculation of a series of estimated future repayment or residual values for a loan over a number of years. The calculation is iterative because the formula used gives the value for year n+1 starting from the value calculated for year n in a previous step.

An example of a PTS nonlinear aggregation is the calculation of the "booked" (or "earned") value of a paid-in-advance service such as insurance policy premium, during the service lifetime. The booked value at a given target moment is calculated as a percentage of the advance payment. The percentage applied depends nonlinearly on the time span between the service inception date and the target time for calculation, which can be "now" or a time in the future. When such a percentage is not computed on the fly, but taken from a table, the time granularity of the parameter table constrains the time granularity of the measure calculated using it.

An example of an MTS nonlinear aggregation is the calculation of booked value of a paid-in-advance service like the previous example but additionally with the presence of an open-ended number of additions or changes in service components such as adjustments or cancellations. Any such calculation must carefully take into account all of the separate time spans between the service inception date and the dates of these service changes.

AGGREGATE MEASURES IN ATOMIC MARTS

There is one final real-life issue challenging the conventional vision. We may have to deal with data that is:

  • Available only at some aggregate level. It is either temporarily or permanently unavailable at the atomic transaction level because of current limitations in business processes or IT applications, but, if available, it would be meaningful at the lowest grain. Perhaps monthly summaries arrive from a source days or even weeks before the underlying atomic-level data can be loaded. The danger, of course, is that tricky little adjustments may be made to the underlying data during the time delay.
  • Available and meaningful only at some aggregate level. It is unavailable at the atomic transaction level because according to the current business processes and understanding it does not make sense at the lowest grain. This can happen when a source system itself summarizes data to a reporting period using complex and unobservable business rules.


Rate This Article

Comments:

Optional e-mail address:

Furthermore, when such aggregate-only data is assembled together with summary items obtained via normal count-and-sum aggregation from the atomic level, the process of calculating more derived business measures will create an asymmetry in the data mart. Some higher-level summaries will include the summary-only data, while others won't. Therefore, aggregation results from different paths may not match when compared at the top level. By any conventional vision, this situation is very undesirable in a data mart.

STAY TUNED: A STRUCTURE FOR REAL LIFE

I have laid out several real-world threads most data mart designers are likely to encounter: three distinct usage styles (the likelihood that the data mart is not the end of the information delivery pipeline; the existence of complex nonlinear aggregated measures; and the conflict between source-aggregated and data mart-aggregated data that should produce the same results but don't).

In my next column, I'll draw these threads together and suggest an adaptable structure you can use to tackle these issues if they arise in your environment. I'll show you how I create a specific asymmetric aggregate level of the data mart. We'll look at how to recognize business requirements for it and decide how to build a custom user interface for storing the underlying component factors of the nonlinear aggregations in recognizable data mart dimensions.


Gabriel Tanase [gabriel@gabrieltanase.com] is a system designer based in Ireland. He has worked on several business intelligence projects for a leading European insurance provider.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics