Snowflake schema

The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database.

The snowflake schema (sometimes called snowflake join schema) consists of one Fact table connected to many dimension tables, which can be connected to other dimension tables. A diagram of the schema resembles a snowflake due to a single center point branching out in many directions. The tables which describe the dimensions in the snowflake scheme are in Third normal form, whereas star schema dimensions are in Second normal form.

Data normalization and storage
Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into a new table. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes.

In a data warehouse, the fact table in which data values (and its associated indexes) are stored, is typically responsible for 90% or more of the storage requirements, so the benefit here is normally insignificant.

However, normalization of the dimension tables ("snowflaking") can impair the performance of a data warehouse. Whereas conventional databases can be tuned to match the regular pattern of usage, such patterns rarely exist in a data warehouse. Snowflaking will increase the time taken to perform a query, and the design goals of many data warehouse projects is to minimize these response times.

Also, many data warehouses are designed to be used by business users, and without appropriate views, the added complexity of the snowflake schema will often preclude non-specialist users from forming their own queries.

Benefits of "snowflaking"

 * If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snowflaking may be appropriate.
 * A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalised. A snowflake schema will hence be easier to implement.
 * A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
 * Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools such as COGNOS Powerplay to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed.