Archive for the ‘Business Intelligence’ category

Data Testing using Visualisation

August 6th, 2009

I found a really useful and interesting way of testing large volumes of data this week, thanks to my favourite bit of visualisation software, Tableau.  We’ve been pushing big volumes of data into the warehouse, then out into Cognos Metric Store and then using ETL to bring them back in again for reporting.  All in all a fairly complicated, many-stepped and relatively risky process.

After following the usual development test steps, checking results and peer-reviewing code we would ordinarily have handed the results over for client testing.  The problem in this case is that the results are produced at an isolated unit level and that overall totals and patterns are very hard to spot.  While individual cases may look absolutely fine in 99.9% of cases, the outliers are the ones that if wrong, might upset the overall result.

So, how can we review hundreds of thousands of data items at a macro level as part of testing?

Well one way would be to write simple SQL statements that check the validity of data at different aggregation points - in our case these might be unit, course, discipline, school and faculty, but at best you would still be looking at lots of numbers and using personal interpretation to gauge correctness.

picture-1

Instead, how about playing with the data using a visualisation tool?  Manipulate the data in real time, drag and drop data items and attriubtes in and out of the visualisation at will until you’re convinced things look as they should.  What I found in the above example is that things didn’t look as they should in the case of a single combination of data.  This was the mean pass rate for a single unit across all teaching period (semesters) in a single year.  It could not validly be anything other than between 0% and 100% but clearly shows one combination way in excess of 150%.

The important point here is this wouldn’t have been identified through simply sorting rows of data by a column or basic data profiling checks because the visualisation is a representation of a formula applied to the aggregation of many rows of data (in this case passes/attempts across units, by one delivery mode, in once school in one historical year.  Yet it clearly stands out in the above visualisation.

Modern BI involves pushing out dimensional models from which literally billions of report combinations are possible to maybe hundreds or thousands of users.  As developers we no longer have the luxury of being able to validate and verify columns and rows of numbers in specific reports before they are published.  Visualisation software is a powerful tool to have at your disposal during pre-release testing of massive models like these.  I know I will be using it from now on to enhance confidence and improve the quality of outputs from the data warehouse.

Is there an Ostrich in the house?

August 1st, 2009

You might have noticed a slightly lower posting rate than usual for the last few weeks.  That has been due to a mammoth effort in getting our Unit Monitoring data together and ready for consumption.  We’ve still yet to organise the awareness, education, training and roll-out, but the data is all now transformed and ready to go which is a huge relief and a not insignificant achievement.  We have 19 measures for every one of the units taught over the last 4 years and can display those measures by mode of study and teaching period.  That amounts to over 280,000 individual data points and some fairly involved ETL and reporting.

What I’m now most interested in, is how best to roll out this information resource.  I’m confident that democratisation of data is a concept that most people will wholeheartedly embrace and I am expecting to be told about all the things we aren’t but should be measuring rather than being challenged about the ones we are.

However I think we also need to be ready for and receptive to people who are nervous or at least a little reticent about the whole idea of measurement.  Also, following on from Paul’s article earlier in the week, we need to keep the needs and motivations of the recipients of this information at the forefront of our minds.

It does of course seem a little ironic that in an education environment, where we measure and assess our students, we would be concerned about being measured ourselves, so perhaps I’m unnecessarily worrying about this and in fact all the Ostriches stood up long ago.  Only time will tell.

Finding Patterns in Australian Research

July 15th, 2009

I love blogging, so many positive benefits result from it.  On Tuesday I received an email out of the blue from Marco Fahmi who is a PhD student at QUT.  Marco is working on a project called the Australian Research Atlas which mines Australian university and research databases to find interesting patterns.

This particular example shows CRC participants with node size representing dollar value.  More details here.

crc_participants

Marco has found that gaining access to publicly available institutional data can be difficult and of course he’s right.  We’re back onto that pet subject of mine - the democratisation of data for the common good - lets not go there today…

Marco has also written Pajektools - a javascript 3D network visualisation tool which looks interesting.

The Importance of Best Practice

July 3rd, 2009

deakin_logo

Michael Gibson provides his perspective on best practice BI in the first of a series of Guest Posts from other Australian Universities

Michael is Data Warehouse Manager at Deakin University in Victoria, Australia

Due to the common practice of Australian Universities failing to properly recognise the promise of Business Intelligence initiatives, it seems clear to me that many institutions have embarked on the BI journey ill-prepared.  The result being institutions that have not been aware of accepted best practice, and initial implementations that are lacking in many critical aspects.

Some institutions will recognise this and look to rectify the situation at a later date – once more funding becomes available.  But wouldn’t it be nice if this didn’t have to happen?

I understand that most institutions are genuinely constrained financially, and will not have the resources of the private sector available for these sorts of programmes.  This means that compromises are often necessary, but after talking to many people from other Universities I believe it’s more than just a financial issue.

It seems to me to also be a cultural issue.  I don’t believe that Universities are in the practice of looking externally for expertise, nor are they usually open to many new ideas (not that BI is a very new idea).  Universities will often assume they can achieve their goals without assistance or sufficient funding.  But be that as it may, we all know that changing the culture within any organisation is a very difficult prospect.

So what are the problems I speak of?  Well, not being aware of best practices means institutions are usually not aware of the ideal way(s) to approach BI.  They will often start down the path with no, or poor planning (i.e. no BI strategy), and even if they did have an effective plan, they implement it poorly.  With the overall result being an initiative that delivers far less than it could have.

How do you become aware of best practice?  With BI being a mature, specialised discipline, I personally believe there is little point in trying to learn it all yourself – as you are destined to fall into the many traps others did before you.  Essentially you need the sort of expertise that comes with experience.

How do you do this if you do not have any money?  Well, the short answer is that you probably can’t.  There aren’t any magic bullets.  You need at least some money to acquire the necessary expertise.

The best hope is to find an executive with vision, who is open to new ways of doing things, and trying to convince them to give you some cash!

Michael Gibson
Data Warehouse Manager
Deakin University

Big BI Fish in a Shrinking Pond?

June 15th, 2009

Timo Elliot has posted a rather nice visualisation of Gartner’s 2008 BI market share research as reported by Information Week.  Timo has, not surprisingly, used Xcelsius and although I’m not quite sure why IBM/Cognos is seemingly struggling to keep up with the pack (when Microsoft actually has the smallest share) it clearly illustrates the staggering 24% market share held by SAP/BusinessObjects.  Bear in mind that these vendors collectively have 64.5% of the market.

You can see Timo’s full post here, complete with a more readily interpreted, but not so pretty, bar chart