Random Forest

Fundamentals

An ensemble learning method that builds multiple decision trees on random subsets of data and features, then combines their predictions for more accurate and robust results.

A random forest is a machine learning algorithm that constructs a collection (ensemble) of decision trees during training and outputs the average prediction (regression) or majority vote (classification) of the individual trees. Each tree is trained on a random bootstrap sample of the data, and at each split only a random subset of features is considered, which decorrelates the trees and reduces overfitting.

The key insight behind random forests is that while individual decision trees are prone to overfitting and high variance, averaging many diverse trees produces a model that generalizes much better. This technique - training multiple models on random subsets and combining their outputs - is called bagging (bootstrap aggregating). Random forests add an extra layer of randomness by also randomizing the feature selection at each split.

Random forests remain one of the most widely used algorithms in traditional machine learning due to their strong out-of-the-box performance, resistance to overfitting, and ability to handle both numerical and categorical data without extensive preprocessing. They also provide useful diagnostics like feature importance scores. While deep learning has surpassed them on unstructured data like images and text, random forests are still a go-to choice for tabular data and are often used as a strong baseline in machine learning competitions.

Related Terms

Machine Learning SVM KNN

Last updated: March 1, 2026