Experiments
We performed experiments on XGBDistribution
across various datasets for probabilistic
regression tasks. Comparison were made with both NGBoost’s NGBRegressor
, as well
as a standard xgboost XGBRegressor
(point estimate only).
Probabilistic regression
For probabilistic regression, within errorbars, XGBDistribution
performs essentially
identically to NGBRegressor
(measured on the negative log likelihood [NLL] of a normal
distribution).
However, XGBDistribution
is substantially faster, typically at least an order of
magnitude. For example, for the MSD dataset, the fit and predict steps took 18 minutes
for XGBDistribution
vs a full 6.7 hours for NGBRegressor
:
XGBDistribution |
NGBRegressor |
||||
---|---|---|---|---|---|
Dataset |
N |
NLL |
Time |
NLL |
Time |
Boston |
506 |
2.62(26) |
0.067(1) (s) |
2.55(24) |
2.68(45) (s) |
Concrete |
1,030 |
3.14(21) |
0.13(3) (s) |
3.09(13) |
5.79(59) (s) |
Energy |
768 |
0.58(41) |
0.15(3) (s) |
0.62(28) |
5.33(35) (s) |
Naval |
11,934 |
-5.11(6) |
5.8(8) (s) |
-3.91(2) |
43.6(5) (s) |
Power |
9,568 |
2.77(11) |
1.21(52) (s) |
2.77(7) |
14.9(3.1) (s) |
Protein |
45,730 |
2.81(4) |
12.2(4.0) (s) |
2.91(1) |
146.5(1.8) (s) |
Wine |
1,588 |
0.98(15) |
0.11(3) (s) |
0.93(7) |
4.85(99) (s) |
Yacht |
308 |
0.89(1.1) |
0.093(25) (s) |
0.75(64) |
4.95(50) (s) |
MSD |
515,345 |
3.450(4) |
18.9(1.5) (m) |
3.526(4) |
6.70(7) (h) |
Point estimation
For point estimates, we compared XGBDistribution
to both the NGBRegressor
and the
XGBRegressor
(measured on the RMSE). Generally, the XGBRegressor
will offer the
best performance for this task. However, compared with XGBRegressor
,
XGBDistribution
only incurs small penalties on both performance and speed, thus
making XGBDistribution
a viable “drop-in” replacement to obtain probabilistic predictions.
XGBDistribution |
NGBRegressor |
XGBRegressor |
||||
---|---|---|---|---|---|---|
Dataset |
RMSE |
Time |
RMSE |
Time |
RMSE |
Time |
Boston |
3.41(69) |
0.067(1) (s) |
3.25(66) |
2.68(45) (s) |
3.27(65) |
0.035(1) (s) |
Concrete |
5.41(74) |
0.13(3) (s) |
5.62(69) |
5.79(59) (s) |
4.38(70) |
0.09(2) (s) |
Energy |
0.45(7) |
0.15(3) (s) |
0.49(7) |
5.33(35) (s) |
0.40(6) |
0.05(2) (s) |
Naval |
0.0014(1) |
5.8(8) (s) |
0.0059(1) |
43.6(5) (s) |
0.00123(5) |
1.93(7) (s) |
Power |
3.79(24) |
1.21(52) (s) |
3.93(19) |
14.9(3.1) (s) |
3.31(22) |
0.59(19) (s) |
Protein |
4.35(12) |
12.2(4.0) (s) |
4.78(5) |
146.5(1.8) (s) |
4.09(7) |
8.26(1.4) (s) |
Wine |
0.63(4) |
0.11(3) (s) |
0.62(3) |
4.85(99) (s) |
0.62(3) |
0.035(13) (s) |
Yacht |
0.76(29) |
0.093(25) (s) |
0.74(28) |
4.95(50) (s) |
0.74(37) |
0.047(35) (s) |
MSD |
9.03(4) |
18.9(1.5) (m) |
9.47(4) |
6.70(7) (h) |
8.97(3) |
16.3(1.7) (m) |
Methodology
We used 10-fold cross-validation. In each training fold, 10% of the data were used as a validation set for early stopping. This process was repeated over 5 random seeds. For the MSD dataset, we used a single 5-fold cross-validation.
The negative log-likelihood (NLL) and root mean squared error (RMSE) were estimated for each test fold, the above are the mean and standard deviation of these metrics (across folds and random seeds).
For all estimators, we used default hyperparameters, with the exception of setting
max_depth=3
in XGBDistribution
and XGBRegressor
, since this is the default
value of NGBRegressor
. For all experiments, XGBDistribution
and NGBRegressor
estimated normal distributions, with natural gradients.
Please see the experiments script for the full details.