I had a few conversations, actually more like 6 different conversations, pertaining to BI (Business Intelligence), OLAP (Online Analytical Processing), Cube or UDM (Unified Dimensional Model) Processing, ETL (Extract Transform Load), normalization and de-normalization for Reporting Purposes, and other such related topics. It seems that there are tons of BI Projects starting up around the area. Some of the projects are small “it’s just a possibility” type of project to the fully ramped we’re going to “toss a few million at this and get some serious BI out of it” type of project.
With all these conversations I decided to do a quick write up of what BI is all about. I figured, what better way to refresh the memory of experience than to do some writing on the topic.
First I gotta knock out the crazy acronyms. There are a bunch.
There are also a few key concepts that one must understand. I’m just going to discuss and define these concepts below, in a somewhat haphazard way as I think of them.
A dimension is a way data is cut, what appears vertically or horizontally on a report, to display information based on data. A dimension can be broken out by time, linked, parent-child, standard or other means. The data the dimensions cut are stored via a fact table. A fact table stores all the detailed values for the measures or facts. A fact table is usually a denormalized table with the bulk of the data to be reported on. The main thing to know, is that each dimension is a way to break out the data, across various measurements such as time. Another key word to know is that when data forms grouped attributes they are referred to as a natural hierarchy.
One can also have a dimension table that has a break down of information to use as a dimension cut across the data. For instance a product could be displayed in the fact table, and the individual rows could be aggregated across the product types.
One odd thing, at least for newcomers to the BI world, is how often a relational database table has an application key but a surrogate key is assigned the row once it is in the OLAP Cube or Warehouse. The reason this is often, if not always done, is because the application key could lose its uniqueness once brought across ETL or stored over a period of times to represent points of data. If this occurs one needs a surrogate key to prevent confusion about which row is which.
I’ve puttered out at this point, and am contemplating posting this incomplete write up at a wonderful 1:00am. But instead I’ll sleep on it so I’ll probably post it around 8am or so. I’ll be writing a follow up to complete out the basic concepts and ideas behind BI. In the near future, maybe even via a screen captured VLOG entry I’ll post how to actually setup and prepare a sample OLAP Cube for reporting via BIDS. In the meantime stay tuned for the awesome power of the Snowflake and Star Designs!