
NEED HELP?
Visit our FAQ if you run into problems downloading our eBooks. If you are wondering why all of the chapters aren't available for some of the eBooks, we'll explain it here!
PAGE FEEDBACK

by Don Jones
To be clear, Online Analytical Processing (OLAP), the basis for data warehousing, is not universally faster that Online Transaction Processing (OLTP), the basis for most databases that support real-time applications. OLAP is simply faster for certain types of reporting. Typically, an OLAP database is populated by copying data from an OLTP database and expanding or transforming that data into a different structure. Let’s take an example.
Consider an order-entry application for a large retail chain. The corporate headquarters aggregates all sales from all stores into a single database. Broadly speaking (this is a simplified example), that information is coming from an OLTP database—or a set of OLTP databases, like the stores’ on-site computerized cash registers—and it might have a table structure that lists the store name, the date of the sale, and the product that was sold. It would be fairly time consuming, from a database server perspective, to find out how many TV sets were sold in 2008—the data is simply not in a form that a database server can use to quickly execute that query. Expanding the data into a star schema—one of the simplest OLAP schemas—might create a structure like this:

(Source: This graphic is taken with permission from http://en.wikipedia.org/wiki/Star_schema.)
The three “dimension” tables list every date on which a product was sold, every store that sold a product, and every product that was sold. The central “fact” table connects them so that we know which store sold any given product on any given day. This structure makes it easier—and faster—to query facts like how many TVs were sold in 2008, how many stores sold refrigerators, or how many stores sold a washer and a dryer in the same sale. Actual star schemas are usually much more complicated; see http://opensourceanalytics.com/2006/04/28/sales-data-mart-dimensional-model-for-retail/ for a discussion on a star schema for general retail sales.
Star schemas are often referred to as cubes: If you can imagine a three-dimensional cube, products would be one axis, stores another axis, and dates the third axis. It’s tougher to imagine when the “cube” has four, five, six, or more dimensions, but those are very common schemas as well—and the “cube” image is why those outer tables in the schema are called “dimension” tables.
A somewhat more complex schema is the snowflake schema. You still use central fact tables to relate multiple dimension tables, but dimension tables may also be connected to related, supporting tables:

(Source: This graphic is taken with permission from http://en.wikipedia.org/wiki/Snowflake_schema.)
As you can see, each dimension table will have less redundant data because the repetitive data is broken out into supporting tables: Month, Quarter, Day of Week, and so forth. This structure makes the data warehouse somewhat more compact because there is less repeated data to store, and can actually make certain types of queries even faster. For example, it’s now quicker to determine how many DVD players were sold in the fourth quarter of any given year because “quarter” information is drawn into a separate table. The query would need to join only the Quarter, Date, and Product Category dimension tables with the central Fact table.
These examples both overlook one important fact: A data warehouse rarely contains data from only one source. In fact, the whole point is to pull in data from multiple sources so that it can all be correlated. You might combine data from the order-entry application, UPS and FedEx shipping receipts, orders to vendors, and so forth—enabling you to answer questions like, “How quickly can vendors respond when we have a rush of orders?”
About the Author
Don Jones has more than a decade of professional experience in the IT industry. He's the author of more than 30 IT books, including Windows PowerShell: TFM; VBScript, WMI, and ADSI Unleashed; Managing Windows with VBScript and WMI; and many more. He's a top-rated and in-demand speaker at conferences such as Microsoft TechEd and TechMentor, and writes the monthly Windows PowerShell column for Microsoft TechNet Magazine. Don is a multiple-year recipient of Microsoft's "Most Valuable Professional" (MVP) Award with a specialization in Windows PowerShell. Don's broad IT experience includes work in the financial, telecommunications, software, manufacturing, consulting, training, and retail industries and he's one of the rare IT professionals who can not only "cross the line" between administration and software development, but also between IT workers and IT management. Don is a co-founder of Concentrated Technologies, and serves as author and series editor for Realtime Publishers.
Sign up for our Realtime Nexus newsletters and book alerts and discover when new books on your favorite IT topics are available!
