I will focus on grain, facts, and dimensions in this blog, since these are important topics that provides the foundation to the rest our BI learning.
Choosing the right grain level:
As discussed in the earlier blog, the grain measures the business process or event in question. Choosing the grain is critical because it determines the level of analyses that you can do with the data in your data warehouse (DW). As a result, it is unwise to begin the design by selecting a higher level of detail as your grain. Storing the most atomic level possible will allow the application to "roll up" data or summarize based on requests from the end users. For example, in terms of time period of the business process, if we choose the grain as a month, it would not be possible for business users to select custom time frame represented by a particular number of days. Selecting the grain as a day-level transaction data however provides the ability to customize for this purpose. Therefore selecting the right grain is the first critical step in the design of a DW.
Constructing dimensions and facts:
Allocating the right attributes to the facts and dimensions tables is another important aspect of a correct DW design. As discussed in the earlier blog there can be attributes which can be ambiguous in terms of facts and dimensions. Simply put, facts are items which we can quantify. They can be categorized as: additive, non-additive, and semi-additive. The vast majority of facts that we encounter are completely additive. Non-additive facts are facts that cannot be aggregated under any dimension (because they don't make sense through aggregation). Examples of non-additive facts may be numerical attribute, such as unit price, that make sense only from the grain or transaction perspective. Ratios are classified as non-additive facts. There may also be facts that are additive only under certain dimensions and hierarchies of the model. These facts are called semi-additive facts. We also have a fact type called derived facts which are not explicitly present in the source operational database but are computed and stored in the fact table for convenience purpose. Derived facts avoid business users the hassle of having to compute the value through a query during runtime. Through derived facts, the risk of committing calculation errors is also eliminated.
Although we have said that numerical attributes would most probably be categorized as facts, what would we do in case the attribute is numerical but represents a classification? For example: 1- Freshman, 2- Sophomore, 3-Junior, 4-Senior. In this case, the attributes come under the dimension table. Attributes that are used to categorize items or attributes that describe the 'who', 'what', 'when', 'where', 'how, and 'why' can be placed under dimensions (Note: Attributes which are both categorical and quantifiable should be stored in both the fact and dimension tables). Dimensions often comprise of fields that are often used to "filter" the final result set. For dimension tables, the primary key is often not the natural key that is obtained from the operational source database. The source database stores only one record for changes over time. The source natural key may also be poorly administered. As a result, a new dimension surrogate key is created as the primary key of the dimension table.
Hopefully this blog piece further helps in the development of a BI dimensional model.
Choosing the right grain level:
As discussed in the earlier blog, the grain measures the business process or event in question. Choosing the grain is critical because it determines the level of analyses that you can do with the data in your data warehouse (DW). As a result, it is unwise to begin the design by selecting a higher level of detail as your grain. Storing the most atomic level possible will allow the application to "roll up" data or summarize based on requests from the end users. For example, in terms of time period of the business process, if we choose the grain as a month, it would not be possible for business users to select custom time frame represented by a particular number of days. Selecting the grain as a day-level transaction data however provides the ability to customize for this purpose. Therefore selecting the right grain is the first critical step in the design of a DW.
Constructing dimensions and facts:
Allocating the right attributes to the facts and dimensions tables is another important aspect of a correct DW design. As discussed in the earlier blog there can be attributes which can be ambiguous in terms of facts and dimensions. Simply put, facts are items which we can quantify. They can be categorized as: additive, non-additive, and semi-additive. The vast majority of facts that we encounter are completely additive. Non-additive facts are facts that cannot be aggregated under any dimension (because they don't make sense through aggregation). Examples of non-additive facts may be numerical attribute, such as unit price, that make sense only from the grain or transaction perspective. Ratios are classified as non-additive facts. There may also be facts that are additive only under certain dimensions and hierarchies of the model. These facts are called semi-additive facts. We also have a fact type called derived facts which are not explicitly present in the source operational database but are computed and stored in the fact table for convenience purpose. Derived facts avoid business users the hassle of having to compute the value through a query during runtime. Through derived facts, the risk of committing calculation errors is also eliminated.
Although we have said that numerical attributes would most probably be categorized as facts, what would we do in case the attribute is numerical but represents a classification? For example: 1- Freshman, 2- Sophomore, 3-Junior, 4-Senior. In this case, the attributes come under the dimension table. Attributes that are used to categorize items or attributes that describe the 'who', 'what', 'when', 'where', 'how, and 'why' can be placed under dimensions (Note: Attributes which are both categorical and quantifiable should be stored in both the fact and dimension tables). Dimensions often comprise of fields that are often used to "filter" the final result set. For dimension tables, the primary key is often not the natural key that is obtained from the operational source database. The source database stores only one record for changes over time. The source natural key may also be poorly administered. As a result, a new dimension surrogate key is created as the primary key of the dimension table.
Hopefully this blog piece further helps in the development of a BI dimensional model.
No comments:
Post a Comment