Ensemble Learning
A machine learning paradigm that combines predictions from multiple models to produce more accurate and robust results than any single model alone.
Also known as: Ensemble Methods, Model Ensembling
Category: AI
Tags: ai, machine-learning, optimization, models, fundamentals
Explanation
Ensemble learning is a fundamental machine learning strategy that improves predictive performance by combining the outputs of multiple individual models, often called base learners or weak learners. The core insight is that a collection of diverse models, each capturing different aspects of the data, can collectively make better predictions than any single model. This principle is sometimes called the wisdom of crowds applied to algorithms.
The three main ensemble strategies are bagging, boosting, and stacking. Bagging (Bootstrap Aggregating), introduced by Leo Breiman in 1996, trains multiple models on different random subsets of the training data and aggregates their predictions through voting or averaging. Random Forest is the most well-known bagging method, combining hundreds of decision trees trained on random data subsets with random feature selection. Bagging primarily reduces variance and helps prevent overfitting.
Boosting trains models sequentially, with each new model focusing on the errors made by previous ones. AdaBoost (1997) was the first successful boosting algorithm, followed by Gradient Boosting and its modern implementations like XGBoost, LightGBM, and CatBoost. These gradient boosting frameworks have dominated structured data competitions and remain among the most effective algorithms for tabular data. Boosting primarily reduces bias.
Stacking (stacked generalization) uses a meta-learner to combine the predictions of multiple diverse base models. Rather than simple voting or averaging, the meta-learner learns the optimal way to weight and combine base model outputs. This allows the ensemble to leverage the complementary strengths of very different model types.
Ensemble methods succeed because of the diversity-accuracy tradeoff. Individual models make different errors on different inputs. When these errors are sufficiently uncorrelated, combining predictions cancels out individual mistakes. This is formalized in Condorcet's jury theorem and the bias-variance decomposition. The key requirement is that base models must be both reasonably accurate and diverse in their error patterns.
In the deep learning era, ensemble principles appear in techniques like dropout (which implicitly trains an ensemble of sub-networks), mixture of experts (which routes inputs to specialized sub-models), and model averaging. Even modern large language models benefit from ensemble-like approaches through techniques like self-consistency prompting and committee-based evaluation.
Related Concepts
← Back to all concepts