Archive for the ‘Visualisation’ category

Visualising Leadership

August 11th, 2009

I know this is a little out of my usual range, but I love visualisation and am always interested in the concept of leadership (as compared to management) and after all, BI isn’t just about producing tables of numbers and pretty charts.

I’m working on a presentation for the Griffith DWSIG later this month and so dug out my copy of Andrew Abela’s Extreme Presentation Method again. He is still pushing out some interesting content on his own blog and had a link to this beautiful visualisation, created by a company called Xplane.  It touches on many things, including the level and importance of education in our world today.  It is worth 6 minutes of your time.

I found it pretty inspiring, hope you do too…

Data Testing using Visualisation

August 6th, 2009

I found a really useful and interesting way of testing large volumes of data this week, thanks to my favourite bit of visualisation software, Tableau.  We’ve been pushing big volumes of data into the warehouse, then out into Cognos Metric Store and then using ETL to bring them back in again for reporting.  All in all a fairly complicated, many-stepped and relatively risky process.

After following the usual development test steps, checking results and peer-reviewing code we would ordinarily have handed the results over for client testing.  The problem in this case is that the results are produced at an isolated unit level and that overall totals and patterns are very hard to spot.  While individual cases may look absolutely fine in 99.9% of cases, the outliers are the ones that if wrong, might upset the overall result.

So, how can we review hundreds of thousands of data items at a macro level as part of testing?

Well one way would be to write simple SQL statements that check the validity of data at different aggregation points - in our case these might be unit, course, discipline, school and faculty, but at best you would still be looking at lots of numbers and using personal interpretation to gauge correctness.

picture-1

Instead, how about playing with the data using a visualisation tool?  Manipulate the data in real time, drag and drop data items and attriubtes in and out of the visualisation at will until you’re convinced things look as they should.  What I found in the above example is that things didn’t look as they should in the case of a single combination of data.  This was the mean pass rate for a single unit across all teaching period (semesters) in a single year.  It could not validly be anything other than between 0% and 100% but clearly shows one combination way in excess of 150%.

The important point here is this wouldn’t have been identified through simply sorting rows of data by a column or basic data profiling checks because the visualisation is a representation of a formula applied to the aggregation of many rows of data (in this case passes/attempts across units, by one delivery mode, in once school in one historical year.  Yet it clearly stands out in the above visualisation.

Modern BI involves pushing out dimensional models from which literally billions of report combinations are possible to maybe hundreds or thousands of users.  As developers we no longer have the luxury of being able to validate and verify columns and rows of numbers in specific reports before they are published.  Visualisation software is a powerful tool to have at your disposal during pre-release testing of massive models like these.  I know I will be using it from now on to enhance confidence and improve the quality of outputs from the data warehouse.

Finding Patterns in Australian Research

July 15th, 2009

I love blogging, so many positive benefits result from it.  On Tuesday I received an email out of the blue from Marco Fahmi who is a PhD student at QUT.  Marco is working on a project called the Australian Research Atlas which mines Australian university and research databases to find interesting patterns.

This particular example shows CRC participants with node size representing dollar value.  More details here.

crc_participants

Marco has found that gaining access to publicly available institutional data can be difficult and of course he’s right.  We’re back onto that pet subject of mine - the democratisation of data for the common good - lets not go there today…

Marco has also written Pajektools - a javascript 3D network visualisation tool which looks interesting.

Visualising Data Warehouse Volumes

July 13th, 2009

Envisaging the size of data warehouses can be hard and these snippets are from a much larger infographic on mozy.com which communicates data volume very well.  The whole post is here and is worth a look if you need to put data volume into context.

picture-2

picture-4

Our biggest warehouse table has just over 43 million rows in it.  This is a daily periodic snapshot of enrolment which we’ve been running since July 2007.  With a row length of 82 bytes that equates to a mere 3.2GB.  A relative drop in the petabyte ocean…

picture-1

Metric Distribution Analysis

July 6th, 2009

nysi_coverBeing a bit of a Stephen Few fan, I’ve now absorbed his latest book, Now You See It: Simple Visualisation Techniques for Quantitative Analysis.

Unlike everyone else (yes everyone) who has reviewed it on Amazon.com so far, I don’t actually think it deserves a full 5-Star rating on all fronts (for instance, the reproduction of Tableau charts and some other outputs is low in terms of image quality, which for a visualisation publication surely is a problem).  That aside however, it does have some excellent content and one particular device which I’ve latched onto is contained in the chapter on Distribution Analysis.

Representing a metric and providing information on the value relative to the entire distribution is having your cake and eating it.  Not only do you get the measure, but you see where it sits relative to its peers so I see this as a very complimentary graphic alongside the more traditional metric traffic light and trend presentation.

So what could a metric distribution analysis look like?

Few uses an example where you have the Low, Median and High values of a particular measure displayed on a single axis, he then improves the example by adding a marker for the 25th and 75th percentiles.

This then starts to sound very similar to the tertile distribution (where we chop the data up into thirds) we have been working on for presenting unit and course metrics at UNE.  We have calculated a lower, mid and upper tertile with boundaries at 33.34% and 66.67%.  Each measure for each unit and course is then calculated and given the appropriate traffic light depending on which tertile it falls into.

What we hadn’t thought of was being able to supplement the traffic light, trend and value with a distribution chart.  So here’s how it currently looks in development for a sample unit for our Attrition measure based on Few’s ideas:

picture-9

  • Attrition is less than 10% for this unit (green star)
  • The lowest attrition of any unit is just under 5% (left red triangle)
  • The 33rd percentile is at around 14% attrition (blue vertical bar)
  • The 66th percentile is at around 22% attrition (light brown vertical bar)
  • The highest attrition of any unit is around 31% (right red triangle)
  • This unit is performing very well and attrition is significantly low

In combination with the traffic light, trend and time-series line chart, there is a huge amount of information being conveyed by a very simple instrument.  Should I add values to this?  Well maybe.  I tried and it got really cluttered and of course the traffic light scorecard itself will have the values so maybe this graphic is fine just as it is when used in conjunction with the other devices.