Something that has been sitting on our list of things-to-do-when-we-have-a-moment for a long time is some automated auditing of ETL. Ralph Kimball talked about the need to build monitoring into the ETL process during his ETL Architecture in Depth course and with our warehouse growing in size and complexity every day, it is no longer something we instinctively know.
James had a bit of time during our last sprint and created a neat Framework Manager package referencing the Cognos audit tables. We now have a really useful set of automated charts that are produced each night showing the performance of each ETL step within the following three major ETL subsystems.
- ODS - Operational Data Extraction
- STG - Dimension and Fact Staging
- PRS - Dimension and Fact Presentation
The following snippet of one chart from the STG subsystem shows build times in seconds for each table. The S_x prefix is our naming convention for Staging tables where x is either (F)act, (B)ridge or (D)imension. The actual charts are much lengthier than this example and are great for identifying bottlenecks and where to do some fine-tuning to improve performance.
Our overall ETL jobstream in production typically takes between 1-2 hours depending on workload and environmental factors.
What is interesting is comparing these charts with the Development and User Acceptance warehouse jobstreams. We’ve spotted instances where we’re missing an index or an index is different between environments purely through looking at the difference in performance. Its a really useful tool in that respect.
What we plan to do in the future is track performance of individual steps over time to see in the impact of seasonality - such as heavy transactional load when results are published or during peak admission or enrolment times. This same technique on a more macro timescale will help us in understanding warehouse growth and the impact of that on database performance which we can feed back to our DBA’s and Data Centre to assist capacity planning.
This is the type of thing that I imagine gets forgotten or just not done but if you can find the time, I reckon it will pay huge dividends. You’ll spot issues in time to do something about them and gain a lot of valuable intelligence about your warehouse. Of course, no one in the business will sponsor you to do this…

I’m looking forward to having my views of ETL turned upside down and inside out and then hopefully reconstructed in a much more sensible way in the space of 4 days next week.


