Archive for the ‘Kimball’ category

Tracking ETL Performance

June 19th, 2009

Something that has been sitting on our list of things-to-do-when-we-have-a-moment for a long time is some automated auditing of ETL.  Ralph Kimball talked about the need to build monitoring into the ETL process during his ETL Architecture in Depth course and with our warehouse growing in size and complexity every day, it is no longer something we instinctively know.

James had a bit of time during our last sprint and created a neat Framework Manager package referencing the Cognos audit tables.  We now have a really useful set of automated charts that are produced each night showing the performance of each ETL step within the following three major ETL subsystems.

  • ODS - Operational Data Extraction
  • STG - Dimension and Fact Staging
  • PRS - Dimension and Fact Presentation

The following snippet of one chart from the STG subsystem shows build times in seconds for each table.  The S_x prefix is our naming convention for Staging tables where x is either (F)act, (B)ridge or (D)imension.  The actual charts are much lengthier than this example and are great for identifying bottlenecks and where to do some fine-tuning to improve performance.

picture-5

Our overall ETL jobstream in production typically takes between 1-2 hours depending on workload and environmental factors.

What is interesting is comparing these charts with the Development and User Acceptance warehouse jobstreams.  We’ve spotted instances where we’re missing an index or an index is different between environments purely through looking at the difference in performance.  Its a really useful tool in that respect.

What we plan to do in the future is track performance of individual steps over time to see in the impact of seasonality - such as heavy transactional load when results are published or during peak admission or enrolment times.  This same technique on a more macro timescale will help us in understanding warehouse growth and the impact of that on database performance which we can feed back to our DBA’s and Data Centre to assist capacity planning.

This is the type of thing that I imagine gets forgotten or just not done but if you can find the time, I reckon it will pay huge dividends.  You’ll spot issues in time to do something about them and gain a lot of valuable intelligence about your warehouse.  Of course, no one in the business will sponsor you to do this…

We’re off to see the Wizard

March 21st, 2009

wizard-of-ozI’m looking forward to having my views of ETL turned upside down and inside out and then hopefully reconstructed in a much more sensible way in the space of 4 days next week.

Yes, Ralph the Kimball Wizard is back in town and giving his highly acclaimed ETL Architecture in Depth course along with Bob Becker in Sydney from 23rd to 26th March.

I suspect it is a little too late but if you still think you can attend then here is the link to the registration details page at Altis.

I know that there will be people from UNE and Wollongong attending but if there are more of us and its worth catching up ‘after school’ then use this post to let me know and I’ll try and organise something.

Attribute Dementia

February 20th, 2009

Something that we all have to do from time to time in a dimensionally modelled world is add new attributes to an aging dimension. Our Student dimension is a regular for updates and minor surgery and consequently the process has increased risk as there are more attributes that could get forgotten in a rebuild. Accidently rebuilding a dimension and missing an attribute is a disaster, especially if that has been T2′d over time. It may be impossible to recover without a database restore and if you only noticed today then…

zimmer

Incredibly, in Oracle 10g there is still no way to reorder columns in a physical table so the only way to neatly add an attribute to a dimension is to add the new column to the end of the physical table and then create a new copy of the table using ‘create table dimension_new as select a,b,c,d..n from dimension_old.

This is pretty straightforward when your dimension has 10 attributes but what about when it has 50? SQL Developer, TOAD and other tools all have ways of letting you get the DDL for a table, but you usually still need to fiddle with the output (remove column types and lengths etc) before you can run it. That fiddling gives opportunity for error.

So here is the solution that we created today - a function to call with the before and after table name that gives you a create statement for the new table. You then just need to put the new column in the right spot. It’s not pretty but it works and it reduces the risk. I used it to add a new attribute to our Student dimension tonight and life was less complicated. Big thanks to Coops for hacking this together in about 5 minutes. Please take the code and use it if you can. Even better, improve it and post it back.

2nd March 2009 update: Check here for a Kimball forum discussion on this
create or replace function bi_column_shift_2k(p_old_table_name varchar2,
p_new_table_name varchar2) return varchar2 as
» Read more: Attribute Dementia

He’s a Kimball Wizard…

February 6th, 2009
…there has to be a twist…
kglogobasic

Ever since he was a young boy
He’s played with data stores
from Adabas to xbase he must have played them all…

I see from the latest Design Tip #109 that there is now an online Kimball Forum available at http://forum.kimballgroup.com I imagine this will become a fantastic resource for us to use - judging by the General Usage Guidelines it will be moderated and free from commercial advertising and sales pitches - how refreshing.

I was fortunate enough to get to the Kimball Dimensional Modeling in Depth course last year and I’m already looking forward to the ETL Architecture in Depth course in Sydney next month. If you’re interested then it is being managed by Altis on behalf of Kimball University.

How do you think he does it?
I don’t know!
What makes him so good?

If you’re under 35 or have spent your life under a plant pot and have no idea what I’m on about then check here.

A very big number indeed

November 20th, 2008

Movember - Sponsor MeAre you faced with justifying the required time and effort investment of dimensional modeling to people who believe you can ‘just point your BI tool at the data and let it do the work’?

If you are, and I hope you resist this crazy notion, then the following might just help.

I had to do an exec presentation a few months ago and was wondering how best to convey the flexibility of a dimensional model or star schema in terms of report development potential. I went back to basic maths and discovered that the enormous number below is the number of potential unique reports that could be produced from our current student admissions model - just one of many in the warehouse.

Combinations

The conditions are that each report has 6 unique columns which are attributes of a dimension or measures in a fact.  These are not permutations (where I can repeat numbers) but unique combinations.

If this doesn’t get their attention, try a derivation of it that can be related to. Maybe suggest that if you could produce these reports at the rate of one per second then it would take over 115 years to produce them all, or if they were each printed on a single A4 page (no! no! no!) and put end to end they would stretch over 1,084,865KM, or 27 times round the earth.

So next time you’re advised that ‘we know what reports we want already’ or that ‘this flexible system comes pre-packaged with hundreds of reports’ you have some ammunition to put those volumes in perspective.

We know that tomorrow, someone, somewhere will need to view the data in a different way, responding to a different demand of a different market. Dimensional models have the inherent agility to deliver to that need. Invest in them, unlike the ones on Wall St, these don’t go South.