Market basket analysis

Market Basket Analysis (MBA) applies association rule learning to purchase data with the goal of identifying cross-selling opportunities. Given a data set, the algorithm trains and identifies product baskets and product association rules. Product baskets (referred to as item sets) are groups of products purchased together at checkout. Product association rules predict the purchase of one or more other products (the consequent) given the known presence of some products in a basket (the antecedent).

Example
Consider sitting in an English pub and buying a pint of beer but not a bar meal. While servicing the request, the barkeep asks if you are interested in a bag of chips as well. Why would the keep ask such a question? Because it is the goal of the keep, in some regards, to be profitable and maximize the amount of revenue per transaction. By asking if you wanted chips, the barkeep may make a bigger tip or the bar may make more revenue. The barkeep knew to ask you this question, and knew there was a good chance (a high probability) that you would also take the chips. The barkeep had this knowledge from experience, specifically from previous interactions with customers.

Similarly, the association rule finding algorithm is trained on historical data, i.e. past transactions. The data contains checkout information and a list of products that were purchased in each transaction, perhaps along with other information (volume, sale amount, although in many cases just the presence or absence of a product in a transaction is sufficient). While training, the algorithm may identify a relationship (a form of an association) between beer and no bar meals, and predict you are more likely to buy crisps (US. chips) over someone not identified with that relationship.

Typically the relationship will be in the form of a rule such as:
 * IF {beer, no bar meal} THEN {crisps}

The probability that a customer will buy beer without a bar meal (i.e. that the antecedent is true) is referred to as the support for the rule. The conditional probability that a customer will purchase crisps is referred to as the confidence of the rule.

Usage
In retailing, most purchases are bought on impulse according to models of consumer behavior. Market basket analysis gives clues as to what a customer might have bought if the idea had occurred to them.

Market basket analysis can be used as a first step in deciding the location and promotion of goods inside a store. If, as has been observed, purchasers of Barbie dolls have are more likely to buy candy, then high-margin candy can be placed near to the Barbie doll display. Customers who would have bought candy with their Barbie dolls had they thought of it will now be suitably tempted. This, however, is only the first level of analysis.

Challenges
The algorithms for performing market basket analysis are fairly straightforward. The complexities mainly arise in exploiting taxonomies, avoiding combinatorial explosions (a supermarket may stock 10,000 or more line items), and dealing with the large amounts of transaction data that may be available.

A major difficulty is that a large number of the rules found may be trivial for anyone familiar with the business. Although the volume of data has been reduced, we are still asking the user to find a needle in a haystack. Requiring rules to have a high minimum support level and a high confidence level risks missing any exploitable result we might have found. One partial solution to this problem is differential market basket analysis.

The computational complexity involved in calculating the results of market basket analysis is at least the square of the number of transaction item-lines (records of every item purchased.) With data warehouses storing billions of transaction lines, this yields extremely high computational requirements. Special techniques involving filtering or aggregation of the transaction database are commonly used to in analysis algorithms to increase performance and allow some level of interactivity, such as in business intelligence applications.

Caveats
In text books as well as in the business literature, market basket analysis is often promoted as a means to obtain product associations to base a retailer’s promotion strategy on. They argue that associated products with a high lift/interest can be promoted effectively by only discounting just one of the two products. Implicitly, they argue that market basket analysis automatically identifies complements. Academics, however, have shown that one should be careful with this conclusion. They show that this implicit assumption does not hold. Their empirical analysis reveals that market basket analysis identifies as many substitutes as complements. Therefore, market basket analysis should not be used to build a promotion expert system for retailers, unless supplemented by other, more empirical, methods of product relationship determination.

Differential market basket analysis
Differential market basket analysis can find interesting results and can also eliminate the problem of a potentially high volume of trivial results.

Differential analysis compares results between different stores, between customers in different demographic groups, between different days of the week, different seasons of the year, etc. If the results show that a rule holds in one store, but not in any other (or does not hold in one store, but holds in all others), then we can infer that there is something interesting about that store. Perhaps its clientele are different, or perhaps it has organized its displays in a novel and more lucrative way. Investigating such differences, via data mining or other methods, may yield useful insights which will improve company sales.

Other application areas
Although market basket analysis conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in which it can be applied. These include:


 * Analysis of credit card purchases.
 * Analysis of telephone calling patterns.
 * Identification of fraudulent medical insurance claims.
 * Analysis of telecom service purchases.

Despite the terminology, there is no requirement for all the items to be purchased at the same time. Algorithms can be adapted to look at a sequence of purchases (or events) spread out over time. Predictive market basket analysis can be used to identify sets of item purchases (or events) that generally occur in sequence, which is something of interest to direct marketers, criminologists, and others.