XGBoost (Boosting)

Fundamentals

An optimized gradient boosting library that adds regularization, parallel tree construction, and efficient handling of sparse data, making it one of the most widely used ML algorithms for tabular data.

Like a championship-winning sports team — individually decent players, but the coaching system (regularization, optimization) turns them into something dominant.

XGBoost — eXtreme Gradient Boosting — was released by Tianqi Chen in 2016 and quickly became the dominant algorithm in machine learning competitions and production systems. It implements gradient boosted decision trees with a set of engineering and algorithmic improvements that make it faster, more accurate, and more resistant to overfitting than previous implementations.

The key innovations include L1 and L2 regularization on leaf weights (preventing overfitting), a weighted quantile sketch for approximate split finding (enabling distributed training on large datasets), sparsity-aware algorithms (handling missing values natively), and column block structure for parallelized tree construction. These improvements sound incremental but their combined effect was dramatic — XGBoost won virtually every structured data competition on Kaggle for several years running.

XGBoost remains a top choice for tabular data problems in production: fraud detection, credit risk scoring, insurance pricing, recommendation system ranking, and ad click-through prediction. It is often the strongest baseline before trying neural approaches, and on many tabular tasks it outperforms deep learning entirely. The main downsides are that it requires more hyperparameter tuning than random forests and is slower to train than LightGBM on very large datasets.

Last updated: March 9, 2026

XGBoost (Boosting)

Related Terms