Association rule learning

In data mining and treatment learning, association rule learners are used to discover elements that co-occur frequently within a data set consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules, such as implication or correlation, which relate co-occurring elements. Questions such as "if a customer purchases product A, how likely is he to purchase product B?" and "What products will a customer buy if he buys products C and D?" are answered by association-finding algorithms. This application of association rule learners is also known as market basket analysis. As with most data mining techniques, the task is to reduce a potentially huge amount of information to a small, understandable set of statistically supported statements.

Technical Discussion
The input for a typical associations-mining algorithm is a set T of itemsets t, each of which is drawn from a set I of all possible items. Each t is a member of the power set $$2^I$$, but T is not considered a subset of $$2^I$$ since it may contain duplicates (it is a multiset). Since I is typically large, the general problem of finding all common subsets in an arbitrary selection of itemsets is considered intractable. Therefore input sets in T, and any results derived therefrom, are typically assumed to be small. It is an ongoing area of research to find algorithms which relax this assumption and allow processing of larger sets.

Fixed-Confidence Mining
Commonly, association-finding algorithms attempt to find all sets of elements which occur in at least a fraction C of the data, where C is a chosen Confidence Threshold (e.g. 2%). The number of occurrences of a subset is called its support. Sets whose support exceeds C are called frequent itemsets.

If a set s is frequent, then any subset of s must also be, and most association-finding algorithms attempt to exploit this fact. Most association-finding algorithms reduce to a traversal of this subset lattice of I in some order, extending frequent itemsets and pruning out infrequent sets and their supersets.

The fixed confidence threshold has little basis in statistics, since some sets may exceed it simply by random coincidence (thereby defeating the goal of finding meaningful correlations), and some meaningful associations may occur in the data without reaching the threshold. However, in practice it does eliminate vast numbers of insignificant sets.

For a given data set, the set of its frequent itemsets can be described by its maximal frequent itemsets, which are frequent itemsets S that are not subsets of any larger frequent itemset T. During mining, finding maximal frequent itemsets first allows their subsets to be skipped, an important improvement if sets are large.

Example
In this example, we have several itemsets representing different days a sporting match was scheduled to be played on an outdoor pitch. On some days, bad weather meant the match was not played. Our aim is to predict whether a match will be played based on the weather.

For the example data given, a reasonable association rule is (Cold, Raining) -> Play=No.

A rule such as (Warm, Dry) -> Play=Yes has a low support value, as only one itemset (1) contains that association.

A rule such as (Windy) -> Play=No has a low confidence value, as some itemsets support it (3) and some contradict it (5).

The set (Cold, Raining, No) is a frequent itemset.

Note: this example is extremely small. In practical applications, a rule needs a support of several hundred itemsets before it can be considered statistically significant, and datasets often contain thousands or millions of itemsets.

Lore
A famous story about association rule mining is the "beer and diaper" story. A purported survey of behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers tend also to buy beer. This anecdote became popular as an example of how unexpected association rules might be found from everyday data.

Other types of Association Mining
Contrast set learning is a form of associative learning. Contrast set learners use rules that differ meaningfully in their distribution across subsets.

Weighted class learning is another form of associative learning in which weight may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.

K-optimal pattern discovery provides an alternative to the standard approach to association rule learning that requires that each pattern appear frequently in the data.