Archive for the ‘OLAP’ category

From Constable to Cubism

September 18th, 2008

The BI/DW landscape in many organisations appears to have embraced cubism in all its forms and many value the capability immensley, however I just don’t see any need for them in a modern single campus university installation which is why UNE has no cubes at all.  Maybe I’m missing something but with a browser client, portability is no issue, performance is fine and since investing time in constructing conformed date and time dimensions we have all the flexibility we need there (so far).

The benefit for us is our nightly warehouse cycle is missing a significant step, one that therefore never fails and doesn’t need extra expertise and skills to develop and maintain.  Given our regional location and small team size, that is a big plus for us.

So what kind of performance do we get with drill-down and adhoc analysis?
Well of course I have only demonstrations from vendor data and examples from other sites to compare with but it seems good.  We certainly don’t see many spinning hour glasses unless we’re running something against a big (30M row) speriodic snapshot history base fact table, and even then with bitmap indexes we can get a 2-year time-series report on daily enrolment for a unit in maybe 20 seconds.  That seems acceptable to us.

Do we have lots of expensive hardware to achieve this?
Absolutely not, I mean its good but its not bleeding edge technology and we don’t exactly have millions to throw around at this type of thing.  Our warehouse is an Oracle 10g RAC installation on 64-bit Linux technology, however we do have highly optimised Kimball dimensional models and  and a low number of users actively writing queries and performing analysis - so things are scaled appropriately and load is rarely an issue.

Mastering Data Warehouse Aggregates

This book is one that has helped us a little, it is really just common sense however - if you only ever need a view of enrolment by course then create an aggregate of the monster atomic-grained enrolment fact at course level and point everything to that.  The original source fact table is always there to drill-down or across or into should the detail be required, but instead of having a row per student unit per day you can perhaps get away with one per course per day - maybe 200 rows a day instead of 80,000.

I think the other thing that has helped is a piece of advice I retained from a very early training course in Melbourne, that was to ‘push everything down’ to the database level.  So if you need to do some kind of value banding for example, don’t do the decode in your reporting, or even your modelling layer.  Make your ETL work for you so your tables are all crammed full of attributes and pre-calculated values so your presentation layer just reads and presents.  Of course that means up-front thinking but that’s just IT for you - the delivery is the easy bit, the hard stuff is all hidden behind the scenes.

And speaking of being hidden, I’m taking a bit of time off work next week so this blog will remain silent from my side but I really hope you feel the urge to write something or contribute to this in some way - I’m going to try and keep this going but it is here as a resource for all Aus/NZ university BI teams so please do chip in with your views, we really do want to hear them and it also gives me ideas about what to write…

Looking forward to lots of comments…