Home

XGBoost

XGBoost, short for eXtreme Gradient Boosting, is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements gradient boosted decision trees and is widely used for supervised learning tasks such as regression, classification and ranking. The project is open-source under the Apache License 2.0 and is implemented in C++ with bindings for Python, R, Java, and other languages.

Its core algorithm builds an ensemble of decision trees in a stagewise fashion. Each new tree is

XGBoost emphasizes speed and performance through a split finding algorithm, parallel tree construction, cache awareness, and

Usage and impact: It is widely adopted in data science competitions and industry models for tabular data,

History: XGBoost originated around 2014 as an efficient implementation of gradient boosting by Tianqi Chen and

trained
to
predict
the
residuals
or
gradients
of
the
loss
function
with
respect
to
the
current
ensemble,
using
second-order
(Hessian)
information
to
improve
optimization.
It
supports
tree-based
gbtree
as
well
as
linear
models
gblinear
and
a
dropout
variant
DART.
Regularization
terms
for
L1
and
L2
penalties
are
included
to
reduce
overfitting.
It
features
built-in
handling
of
missing
values
and
a
sparsity-aware
split
finding
algorithm.
optional
histogram-based
approximation
for
large
datasets.
It
supports
early
stopping,
cross-validation,
and
user-defined
objective
functions
and
evaluation
metrics.
It
can
operate
in
distributed
environments
using
MPI
or
systems
like
Dask
or
Spark,
and
exposes
a
scikit-learn
compatible
API
in
Python.
often
providing
strong
baseline
or
final
models.
It
has
a
large
ecosystem
with
hyperparameter
tuning,
feature
engineering,
and
interpretability
tools
built
around
it.
colleagues
at
the
University
of
Washington
and
collaborators;
it
gained
popularity
for
performance
improvements
over
traditional
gradient
boosting
implementations
and
has
since
become
a
standard
tool
in
machine
learning.