Random multinomial logit

Introduction
In statistics and machine learning, random multinomial logit (RMNL) is a technique for (multi-class) statistical classification using repeated multinomial logit analyses via Leo Breiman's random forests.

Rationale for the new method
Several learning algorithms have been proposed to handle multiclass classification. While some algorithms are merely an extension or combination of intrinsically binary classification methods (e.g., multiclass classifiers as one-versus-one or one-versus-all binary classifiers), other algorithms like multinomial logit (MNL) are specifically designed to map features to a multiclass output vector. MNL’s robustness is greatly appreciated and therefore, MNL has a proven track record in many disciplines, including transportation research and CRM (customer relationship management). Unfortunately, MNL cannot overcome the curse of dimensionality, thereby implicitly necessitating feature selection, i.e., the selection of a best subset of variables of the input feature set. In contrast to binary logit, to date, software packages mostly lack any feature selection algorithm for MNL. This absence constitutes a serious problem for several application areas.

Recently, random forests, (i.e., a classifier combining a forest of decision trees grown on random input vectors and splitting nodes on a random subset of features) have been introduced for the classification of binary and multiclass outputs. Feature selection is implicitly incorporated during each tree construction. RMNL, a random forest of multinomial logit models, attempts to overcome the feature selection difficulty of MNL.