Archive for the ‘Dimensional Models’ category

Metric Epiphany

April 17th, 2009

Thanks for the email feedback and advice about metric star schema designs in response to my last post.  Some very useful suggestions from Griffith got me thinking about storing the actuals as numerator and denominator for percentage measures.

The problem I had was with roll-ups of differing metric ‘currencies’ - percentage v absolute values being handled by the same model.  It is easy to aggregate EFTSL actuals on the fly but percentages won’t roll up unless you have the actuals and can recalculate them at the higher level as suggested by Griffith.  The problem with this is that the model allocates a status to the measure row which just became dynamic and is now only known at reporting time.

lightbulbIt then struck me that there are only two types of behaviour we need to worry about:

  • Percentage
  • Absolute

The absolute values encompass things like enrolments, load, income, headcounts etc, they all roll up from an atomic grain very well and the percentage actual elements all roll up too but their targets need to be averaged.

So we need pairs of measure types in the fact table - a target and actual for absolute values and a target and actual for percentage values.

In the roll-up construction we specific sum(x) for the absolute values and avg(x) for the percentage ones and then in the reporting we determine which measures to display based on the properties of the metric being displayed.

At least that is the theory, lets see what happens…

Metric Facts

April 14th, 2009

Is anyone out there using an aggregated fact table for managing KPI or organisational metric performance?  I’m purely talking about the star schema design here to enable high-level reporting on data that already exists in the warehouse at an atomic level, things like:

  • EFTSL by School or Faculty
  • Student Satisfaction
  • Unit Enrolments

Having tried (and failed) to use Cognos Metric Store for this purpose, I think the most flexible and  best performing results could be obtained from something like the following.  I stress this is not in production, or even built yet, but does it ring any bells with anyone out there?  Please comment if it does.

metrics_star_schema

I won’t get into product specifics here as I am interested in the design rather than the capabilities of specific products although I realise the product does at least influence the design to a certain extent.

There are two potential issues that I recognise with the above at present:

  • The grain of the fact varies according to the metric - some are annual, some semester-based
  • The dimensions are not all relevant for all metrics - we would need some N/A rows in the dim

Seeing the Big Pineapple

March 5th, 2009

Since my last post about Tableau, I have been thinking much harder about the viability of putting a visualisation tool on top of a well architected and implemented dimensional warehouse.

I don’t think there is much dispute that in terms of exploring data through play, there are some pretty sophisticated visualisation tools out there and potentially these can help you find startlingly obvious things in your data that you simply hadn’t realised were there.

bigpineapple

The problem for me previously was that these tools were great at manipulating small volumes of pre-aggregaeted data in a spreadsheet or simple client database but throwing large volumes of highly granualar data around from an enterprise-scale data warehouse was a completely different ball game.

Some of these tools could  interrogate a proprietary OLAP cube and do so rapidly but but what is that if not a highly specialised set of pre-aggregated data?

An industrial-strength BI product such as Cognos that gives you ETL-to-Delivery in a single platform is certainly an attractive single-vendor solution and it comes with all manner of report scheduling and delivery capability - truly an end to end solution.

The downside of the sledgehammer is where it misses the specialisation that can only comes with the niche products.  There is no way you can get outputs from Cognos to look like those from Tableau but they aren’t meant to, they are entirely different products for entirely different purposes.

So are they complimentary?

Well yes, that would have been my initial suggestion.  Have a high-end visualiastion tool to do some really smart reporting on a set of aggregate tables or even Excel output from your main BI platform.  This is definitely a workable solution although the downside is that you move away from what was probably a hard-fought stance of having a single BI platform in your organisation.  I could probably live with that if the platform was administered and managed by the same people who administer and manage the main BI platform.  Also, this is an ever-changing world and we shouldn’t be constrained by yesterday.

But is there another option?

You know I think there is, or at least I think there will be very soon.  These niche products are evolving very rapidly but I don’t think we will end up with more of the same.  I believe we are rapidly approaching a point where you can have your warehouse and ETL entirely separate from your BI visualisation tools without compromising performance and all the other elements that have traditionally only been provided with a mainstream BI vendor - things like enterprise-level security, integrated scheduling, delivery and formatting options, browser-based interfaces, dashboards etc.

If extremely powerful end-user visualisation tools can effectively sit on top of an enterprise data warehouse in an organisation with large volumes of high quality dimensionally-modeled data then this is an exciting prospect indeed.

Attribute Dementia

February 20th, 2009

Something that we all have to do from time to time in a dimensionally modelled world is add new attributes to an aging dimension. Our Student dimension is a regular for updates and minor surgery and consequently the process has increased risk as there are more attributes that could get forgotten in a rebuild. Accidently rebuilding a dimension and missing an attribute is a disaster, especially if that has been T2′d over time. It may be impossible to recover without a database restore and if you only noticed today then…

zimmer

Incredibly, in Oracle 10g there is still no way to reorder columns in a physical table so the only way to neatly add an attribute to a dimension is to add the new column to the end of the physical table and then create a new copy of the table using ‘create table dimension_new as select a,b,c,d..n from dimension_old.

This is pretty straightforward when your dimension has 10 attributes but what about when it has 50? SQL Developer, TOAD and other tools all have ways of letting you get the DDL for a table, but you usually still need to fiddle with the output (remove column types and lengths etc) before you can run it. That fiddling gives opportunity for error.

So here is the solution that we created today - a function to call with the before and after table name that gives you a create statement for the new table. You then just need to put the new column in the right spot. It’s not pretty but it works and it reduces the risk. I used it to add a new attribute to our Student dimension tonight and life was less complicated. Big thanks to Coops for hacking this together in about 5 minutes. Please take the code and use it if you can. Even better, improve it and post it back.

2nd March 2009 update: Check here for a Kimball forum discussion on this
create or replace function bi_column_shift_2k(p_old_table_name varchar2,
p_new_table_name varchar2) return varchar2 as
» Read more: Attribute Dementia

A very big number indeed

November 20th, 2008

Movember - Sponsor MeAre you faced with justifying the required time and effort investment of dimensional modeling to people who believe you can ‘just point your BI tool at the data and let it do the work’?

If you are, and I hope you resist this crazy notion, then the following might just help.

I had to do an exec presentation a few months ago and was wondering how best to convey the flexibility of a dimensional model or star schema in terms of report development potential. I went back to basic maths and discovered that the enormous number below is the number of potential unique reports that could be produced from our current student admissions model - just one of many in the warehouse.

Combinations

The conditions are that each report has 6 unique columns which are attributes of a dimension or measures in a fact.  These are not permutations (where I can repeat numbers) but unique combinations.

If this doesn’t get their attention, try a derivation of it that can be related to. Maybe suggest that if you could produce these reports at the rate of one per second then it would take over 115 years to produce them all, or if they were each printed on a single A4 page (no! no! no!) and put end to end they would stretch over 1,084,865KM, or 27 times round the earth.

So next time you’re advised that ‘we know what reports we want already’ or that ‘this flexible system comes pre-packaged with hundreds of reports’ you have some ammunition to put those volumes in perspective.

We know that tomorrow, someone, somewhere will need to view the data in a different way, responding to a different demand of a different market. Dimensional models have the inherent agility to deliver to that need. Invest in them, unlike the ones on Wall St, these don’t go South.