Ensembles of classifiers

Recently in the area of Machine Learning the concept of combining classifiers is proposed as a new direction for the improvement of the performance of individual classifiers. These classifiers could be based on a variety of classification methodologies, and could achieve different rate of correctly classified individuals. The goal of classification result integration algorithms is to generate more certain, precise and accurate system results. Dietterich (2001) provides an accessible and informal reasoning, from statistical, computational and representational viewpoints, of why ensembles can improve results.

Methods
Numerous methods have been suggested for the creation of ensemble of classifiers.
 * Using different subset of training data with a single learning method
 * Using different training parameters with a single training method (e.g. using different initial weights for each neural network in an ensemble)
 * Using different learning methods.

Weaknesses

 * Increased storage
 * Increased computation
 * Decreased comprehensibility

The first weakness, increased storage, is a direct consequence of the requirement that all component classifiers, instead of a single classifier, need to be stored after training. The total storage depends on the size of each component classifier itself and the size of the ensemble (number of classifiers in the ensemble). The second weakness is increased computation: to classify an input query, all component classifiers (instead of a single classifier) must be processed, and thus it requires more execution time. The last weakness is decreased comprehensibility. With involvement of multiple classifiers in decision-making, it is more difficult for users to perceive the underlying reasoning process leading to a decision.

Bagging
Bagging is a method of the first category (Breiman, 1996). If there is a training set of size t, then it is possible to draw t random instances from it with replacement (i.e. using a uniform distribution), these t instances can be learned, and this process can be repeated several times. Since the draw is with replacement, usually the instances drawn will contain some duplicates and some omissions as compared to the original training set. Each cycle through the process results in one classifier. After the construction of several classifiers, taking a vote of the predictions of each classifier performs the final prediction.

Boosting
Another method of the first category is called boosting. AdaBoost is a practical version of the boosting approach (Freund and Schapire, 1996). Boosting is similar in overall structure to bagging, except that one keeps track of the performance of the learning algorithm and forces it to concentrate its efforts on instances that have not been correctly learned. Instead of choosing the t training instances randomly using a uniform distribution, one chooses the training instances in such a manner as to favour the instances that have not been accurately learned. After several cycles, the prediction is performed by taking a weighted vote of the predictions of each classifier, with the weights being proportional to each classifier’s accuracy on its training set.

Boosting algorithms are considered stronger than bagging on noise free data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, Kotsiantis and Pintelas (2004) built an ensemble using a voting methodology of bagging and boosting ensembles that give better classification accuracy.