Data Warehousing, Part I: What Is It?

By nate

(This was originally posted to the NTEN Data Warehousing group list, by Nate Wilbert. Nate has kindly allowed us to publish his work here.)

I'm not really sure how many in this group have worked with data warehouses so I'm going to start this first post off with an analogy. In the group description you heard me liken data warehousing to putting a jigsaw puzzle together. Let's work that out just a bit.

Say it's your birthday and one of your uncles just got you a thousand piece jigsaw puzzle. Whew! That's going to take some time to put together. Let's talk broadly about some of the steps involved. You might need to set aside a table for a few days or weeks where you can have a dedicated space for assembling the puzzle. As you start putting it together you know that it helps to try and get the perimeter established first and then take sections at a time. You can then attach those sections to the larger framework as they complete. It may take many hours, but eventually that picture begins to take shape and pretty soon you're moving right along as there are fewer and fewer pieces left to add. Finally, when I was growing up these puzzle events were not usually reserved for just one person. They were a family activity.

When you're assembling the thousand piece jigsaw puzzle known as your data warehouse the steps are going to be very similar. Just like the table is set aside for putting together the puzzle, we need a dedicated space for reporting. Performance is a first priority when it comes to data warehousing and sharing a server with several other applications can ruin performance and a data warehouse. A framework for reporting must be established. This could be done via a combination of requirements gathering and collecting all the known data requests currently outstanding. There are usually major areas one can address in pieces such as your donor / constituent data, your financial data or your operational data. We frequently don't do it all at once. It takes time and over time one area is added to another as we work towards the complete picture. Finally, I know in the nonprofit world it is typically hard to build a larger team unless you work for a very large nonprofit. You may be the lone developer…that's ok! This list is here to walk with you on your journey.

Let's recap:

1. Get a framework for reporting established.
2. Recognize there are usually many subject areas to include in a warehouse and pace yourself, keeping in mind that they will all eventually connect to each other.
3. Having a dedicated environment for the data warehouse will help everyone, and with open source technologies (or deeply discounted technology) available paired with cheaper hardware it's not that expensive anymore.
4. Building a data warehouse is a team effort. Even if you are the sole developer, you will need to interact well with the various owners of your source data. They will come to depend on you for reporting so the better you get along the easier your job will be.
5. I did not really talk about this above, but the following is important. You will likely need a sponsor who helps direct your data warehouse activities. This would be someone who is in a position to champion your activities. It's hard as a developer running reports to just say, "we need a warehouse!" It helps to have that sponsor who recognizes the need, who will go to bat for you when budgeting comes out and who will mediate conflicts over whose data gets loaded next.

I want to close by pointing you towards two of the primary people responsible for getting data warehousing to the masses. Both have written numerous books and articles on this subject. They have their differences, primarily when it comes to actually designing the database structures that hold your data, but they are brilliant practitioners and reading their words will hopefully help get you more familiar with some of the concepts we'll be discussing.

Ralph Kimball
http://ralphkimball.com/
http://ralphkimball.com/html/designtips.html

His design tips are really helpful, and if you scroll all the way to the bottom of that page you'll see him talk about a "clickstream" data mart which was just a fancy name for the place you collect and report off for web traffic data. That was the first type of data warehousing project I ever worked on and I recall using his articles extensively.

Bill Inmon
http://www.inmoncif.com/home/
http://www.inmoncif.com/library/cif/

The library page for Inmon has some interesting diagrams. They're a bit busy (really busy!) but the term definitions are included and the links off to the right side help to explain them further.

I think that perhaps next time we'll dive into a bit more of the technical framework for data warehousing and explain some of the differences between a database design for data warehousing and (e.g.) the design of your donor or constituent database.

Have a great week.

Nate Wilbert