Thursday, March 13, 2014

Selecting the right BI tool

Firms these days are investing heavily in Business Intelligence and Analytics as part of their marketing strategy. Selecting the right BI tool is a critical decision in making sure the enterprise is able to fully utilize the potential of its "untapped" data. Factors that are an important part of the decision can be broadly classified under the below overarching aspects:

1) Infrastructure and architecture: Before making the decision, a thorough analysis needs to be done against the firm's existing infrastructure. Questions to be answered are: What new hardware would be required for the BI tool? Would it integrate well with the company's existing IT architecture?

2) Organizational requirements: The firm-specific BI requirements also play a role in deciding the tool. The volume of data to be analysed, frequency of reporting, and the performance levels required of the BI application come under this aspect.

3) Cost-benefit analysis: Detailed demos with possible BI tool vendors must be undertaken to understand the extent of features as it applies to the firm's data. The cost of ownership is then evaluated against the benefits provided.

Besides the above considerations, a good place to start would be to look at the current competitive landscape of BI tools. Gartner's magic quadrant for Business Intelligence and Analytics applications provides a comprehensive study on the various BI tools available in the market and their current positioning. The magic quadrant is divided into leaders, challengers, niche players, and visionaries. Gartner evaluates the BI products on 17 different capabilities that can be broadly classified under information delivery, analysis, and integration.

Tableau, QlikView, Microsoft, Oracle, SAP, SAS, IBM, and MicroStrategy are the top players in BI and come under the "leaders" segment of the market.




References:

1) CHOOSING AND SELECTING BI TOOLS. (n.d.). Retrieved from Passionned Group: http://www.passionned.com/business-intelligence/business-intelligence-tools/choosing-and-selecting-bi-tools/

2) Hendrickson, J. (2011, March 2). Trying to decide which BI tool is best? Retrieved from TechNet Blogs: http://blogs.technet.com/b/tothesharepoint/archive/2011/03/02/trying-to-decide-check-out-this-white-paper-quot-how-to-choose-the-right-business-intelligence-technology-to-suit-your-style-quot.aspx

3) Joshi, N. (2012, May 10). 10 things you (probably) want to consider for BI tool selection. Retrieved from e-zest: http://www.e-zest.net/blog/10-things-you-probably-want-to-consider-for-bi-tool-selection/

4) Sallam, R. L., Tapadinhas, J., Parenteau, J., Yuen, D., & Hostmann, B. (2014, February 20). Magic Quadrant for Business Intelligence and Analytics Platforms. Retrieved from Gartner.

5) To, S. (2012, December 4). How To Choose The Right Business Intelligence Tool. Retrieved from SAP Business Innovation: http://blogs.sap.com/innovation/analytics/how-choose-the-right-business-intelligence-tool-022501


Thursday, February 27, 2014

Bridge tables and the purist's connundrum

Bridge tables are sometimes an essential requirement for dimensional modeling. Although bridge tables may not be the most ideal solution, certain situations warrant their usage. Simply put, bridge tables are required when a fact row might correspond to more than one row in the dimension table. There are however many cases where this could occur. Ragged hierarchies, multivalued attributes, many-to-many dimensions are some of the main scenarios where bridge tables are required. Another common pattern across bridge tables is that they are used when there is frequent changes or a possibility of frequent changes to the information contained in the dimension tables. Of these examples we will focus on the scenario of multivalued dimensions.

After choosing the grain of  the fact table, we sometimes discover that a single fact table row could contain multiple rows in the dimension table. In such situations, we "snowflake" the dimension and connect a bridge table to it. Let us take the example where there are multiple suppliers for an item. In order to build the bridge, we require to create a supplier "group" dimension table which contains a separate supplier group primary key and attributes of the group. After this, we link the supplier group bridge table to this newly created supplier group dimension containing all possible supplier key values for the specific supplier group key. The supplier group also links to a supplier dimension containing attributes of the supplier. The bridge creation process is complete. It should be noted that the same creation process can be applied for a many-to-many dimension relation.

Bridge tables relieve much complexity from the dimensional schema, but add some complexity to the query. From a dimensional modelling purists point of view however, this is not the preferred solution. In an ideal situation, all dimension tables must be connected to the fact table, and not other dimension tables. A purist therefore proposes that we create an artificial fact table resulting in a multi-star schema model (Reference 1). In our above example, we will have a fact table called item_supplier_fact containing a row for every supplier-item. Any calculations and derived attributes pertaining to this grouping will be placed in the new artificial fact table.


References:

1) Darmawikarta, D. (2009, December). Solving Many-to-Many Relationship in Dimensional Modeling. Retrieved from PACKT Publishing: http://www.packtpub.com/article/solving-many-to-many-relationship-dimensional-modeling


Thursday, February 13, 2014

Type of fact tables and their applications

We will now look at the different types of fact tables used for building data warehouses . There are three major fact table types that we will discuss today: transaction snapshot, periodic snapshot, and accumulating snapshot fact table. Each of these three kinds of fact tables are used for different purposes. Generally more than one type of fact table is used in conjunction with another to obtain a "holistic business picture".

Transaction fact tables record information at the lowest possible grain level: the transaction. Transaction fact tables are therefore more densely populated than the other types. They also capture the most amount of information pertaining to a particular transaction. A date key is associated with a transaction fact table and it represents the specific date on which the transaction was "processed" or "completed". Given below is an example of a basic e-commerce order fact table.



Periodic snapshot fact tables record information over a period of time. They are one of the two kinds of time-based fact tables that can be implemented. Periodic snapshots could aggregate information over a day, week, month, or even year. Since data is aggregated over a period of time, this fact table is clearly of a higher grain level than the transaction fact table. Periodic snapshot is useful to obtain summary level performance of a business process over a period of time to drive decision making. But to get complete information at the transaction level we will still need to couple this data with a transaction fact table.
Periodic snapshots are also unique because they contain semi-additive facts. Semi-additive facts can be aggregated across some dimensions but not across others; the latter is most often a time dimension. An example of a semi additive fact would be the final state of inventory in a warehouse at the end of the week. It would not make sense to add the inventory levels across different weeks but it would make sense to add this fact across different warehouses at the end of a particular week.




Accumulating snapshot fact tables are the third type of fact tables. They can be described similar to transaction facts tables or periodic snapshot fact tables depending on the way we look at them. They are similar to transaction tables because every row contains information of a single event and are similar to periodic snapshot tables because they record information over the entire lifetime of the event. Accumulating snapshots require multiple updates on the same row for every "milestone"  of the event. As a result, a single row has multiple date keys to capture the milestones for the event. The different stages of a particular customer order with dates corresponding to order entry, shipment processing, order execution and shipment completion could be an example for the e-commerce scenario.
Although accumulating snapshots have multiple date key facts on a single row, they still reference a single date dimension table which acts as a role-playing dimension.

With this blog piece hopefully we should have a better understanding of the different types of fact tables and their possible applications

Thursday, February 6, 2014

Delving deeper into grain, facts, and dimensions

I will focus on grain, facts, and dimensions in this blog, since these are important topics that provides the foundation to the rest our BI learning.

Choosing the right grain level:

As discussed in the earlier blog, the grain measures the business process or event in question. Choosing the grain is critical because it determines the level of analyses that you can do with the data in your data warehouse (DW). As a result, it is unwise to begin the design by selecting a higher level of detail as your grain. Storing the most atomic level possible will allow the application to "roll up" data or summarize based on requests from the end users. For example, in terms of time period of the business process, if we choose the grain as a month, it would not be possible for business users to select custom time frame represented by a particular number of days. Selecting the grain as a day-level transaction data however provides the ability to customize for this purpose. Therefore selecting the right grain is the first critical step in the design of a DW.

Constructing dimensions and facts:

Allocating the right attributes to the facts and dimensions tables is another important aspect of a correct DW design. As discussed in the earlier blog there can be attributes which can be ambiguous in terms of facts and dimensions. Simply put, facts are items which we can quantify. They can be categorized as: additive, non-additive, and semi-additive. The vast majority of facts that we encounter are completely additive. Non-additive facts are facts that cannot be aggregated under any dimension (because they don't make sense through aggregation). Examples of non-additive facts may be numerical attribute, such as unit price, that make sense only from the grain or transaction perspective. Ratios are classified as non-additive facts. There may also be facts that are additive only under certain dimensions and hierarchies of the model. These facts are called semi-additive facts. We also have a fact type called derived facts which are not explicitly present in the source operational database but are computed and stored in the fact table for convenience purpose. Derived facts avoid business users the hassle of having to compute the value through a query during runtime. Through derived facts, the risk of committing calculation errors is also eliminated.

Although we have said that numerical attributes would most probably be categorized as facts, what would we do in case the attribute is numerical but represents a classification? For example: 1- Freshman, 2- Sophomore, 3-Junior, 4-Senior. In this case, the attributes come under the dimension table. Attributes that are used to categorize items or attributes that describe the 'who', 'what', 'when', 'where', 'how, and 'why' can be placed under dimensions (Note: Attributes which are both categorical and quantifiable should be stored in both the fact and dimension tables). Dimensions often comprise of fields that are often used to "filter" the final result set. For dimension tables, the primary key is often not the natural key that is obtained from the operational source database. The source database stores only one record for changes over time. The source natural key may also be poorly administered. As a result, a new dimension surrogate key is created as the primary key of the dimension table.

Hopefully this blog piece further helps in the development of a BI dimensional model.

Tuesday, January 28, 2014

Business Intelligence: An Introduction; Fact Tables and Dimensions

In today's world of abundant information and 'Big Data', business intelligence (BI) is a highly relevant topic of study. BI provides techniques and tools required for analyzing and reporting large amounts of related data in order to drive business decisions. Due to their purpose, BI tools and applications are used by people of different authority, spanning the depth of the organization. For this reason, the output of BI applications must deliver readily understandable business insights and a highly intuitive user experience.

In order to accomplish this, the design of the application/data warehouse must be simple. A confusing design will inevitably lead to a confusing user-experience and a lack of acceptance from end users. Therefore, we breakdown the fundamental data elements of the business into meaningful categories: fact tables and dimensions. The classification of data warehouse information into fact tables and dimensions was interesting to me as it provided a different perspective on the way we looked at entities and attributes. I would say this is a more relevant form of classification because it makes sense from a business reporting point of view.

In simple words, facts are business measures: they are used to provide measurement of a particular business process. For example, the total revenue of a product line comes under the fact table. The definition of fact must also be accompanied by the grain. Grain is the level of detail present in the fact table. The level of detail of the data is inversely proportional to the granularity of the data. In other words, the grain gives you an idea about the kind of data is present in the fact table.

Dimension is the other category of data present in a data warehouse model. Dimensions contrast with facts because they are not measurements of the data, but provide descriptive information of the data present in fact tables. As a result they are mostly textual in nature. Sometimes, there can be situations of ambiguity in determining whether a particular numerical attribute is a fact or a dimension. Under such circumstances, Visualizing the final report or result set we require will help us make this determination. It also helps to see if the numerical attribute is continuous or discrete ; continuous attributes are mostly facts and discrete attributes are mostly dimensions (this rule is bound to have exceptions!)

Hopefully, this brief introduction on facts and dimensions will provide us with a basic understanding of data warehouse models and guide us in the design of simple and flexible BI applications.