xgboost_distribution.distributions package

Submodules

xgboost_distribution.distributions.base module

Distribution base class

class xgboost_distribution.distributions.base.BaseDistribution[source]

Bases: ABC

Base class distribution for XGBDistribution.

Note that distributions are stateless, hence a distribution is just a collection of functions that operate on the data (y) and the outputs of the xgboost (params).

check_target(y)[source]: Ensure that the target is compatible with the chosen distribution

abstract gradient_and_hessian(y, params, natural_gradient=True)[source]: Compute the gradient and hessian of the distribution

abstract loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

abstract property params: The parameter names of the distribution

abstract predict(params)[source]: Predict the parameters of a given distribution

abstract starting_params(y)[source]: The starting parameters of the distribution

xgboost_distribution.distributions.exponential module

Exponential distribution

class xgboost_distribution.distributions.exponential.Exponential[source]

Bases: BaseDistribution

Exponential distribution with log score

Definition:

f(x) = 1 / scale * e^(-x / scale)

We reparameterize scale -> log(scale) = a to ensure scale >= 0. Gradient:

d/da -log[f(x)] = d/da -log[1/e^a e^(-x / e^a)]
= 1 - x e^-a = 1 - x / scale

The Fisher information = 1 / scale^2, when reparameterized:

1 / scale^2 = I ( d/d(scale) log(scale) )^2 = I ( 1/ scale )^2

Hence we find: I = 1

check_target(y)[source]: Ensure that the target is compatible with the chosen distribution

gradient_and_hessian(y, params, natural_gradient=True)[source]: Gradient and diagonal hessian

loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

property params: The parameter names of the distribution

predict(params)[source]: Predict the parameters of a given distribution

starting_params(y)[source]: The starting parameters of the distribution

class xgboost_distribution.distributions.exponential.Params(scale)

Bases: tuple

Create new instance of Params(scale,)

scale: Alias for field number 0

xgboost_distribution.distributions.laplace module

Laplace distribution

class xgboost_distribution.distributions.laplace.Laplace[source]

Bases: BaseDistribution

Laplace distribution with log scoring

Definition:

f(x) = 1/2 * exp( -| (x - loc) / scale | ) / scale

We reparameterize:: loc -> loc = a scale -> log(scale) = b to ensure scale >= 0.

Thus:

f(x) = 1/2 * exp( -| (x - a) / e^b | ) / e^b

We compute the gradients:

d/da -log[f(x)] = (a - x) / (scale * | a-x | ) d/db -log[f(x)] = 1 - | a-x | / scale

To second order:

d2/da2 -log[f(x)] = 2 δ(x-a) / scale d2/db2 -log[f(x)] = | a-x | / scale

The Fisher information is:

I(loc) = 1 / scale^2 I(scale) = 1 / scale^2

which needs to be expressed in reparameterized form:

1 / scale^2 = I_r(loc) (d/d(loc) loc) ^2
= I_r(loc)

1 / scale^2 = I_r(scale) ( d/d(scale) log(scale) )^2
= I_r(scale) ( 1/ scale )^2

Hence:

I_r(loc) = 1 / scale^2 I_r(scale) = 1

gradient_and_hessian(y, params, natural_gradient=True)[source]: Gradient and diagonal hessian

loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

property params: The parameter names of the distribution

predict(params)[source]: Predict the parameters of a given distribution

starting_params(y)[source]: The starting parameters of the distribution

class xgboost_distribution.distributions.laplace.Params(loc, scale)

Bases: tuple

Create new instance of Params(loc, scale)

loc: Alias for field number 0

scale: Alias for field number 1

xgboost_distribution.distributions.log_normal module

LogNormal distribution

class xgboost_distribution.distributions.log_normal.LogNormal[source]

Bases: BaseDistribution

LogNormal distribution with log scoring.

Definition:

f(x) = exp( -[ (log(x) - log(scale)) / (2 s^2) ]^2 / 2 ) / s

with parameters (scale, s).

We reparameterize:: s -> log(s) = a scale -> log(scale) = b

Note that b essentially becomes the ‘loc’ of the distribution:

log(x/scale) / s = ( log(x) - log(scale) ) / s

which can then be taken analogous to the normal distribution’s

(x - loc) / scale

Hence we can re-use the computations in distribution.normal, exchanging:

y -> log(y) scale -> s

check_target(y)[source]: Ensure that the target is compatible with the chosen distribution

gradient_and_hessian(y, params, natural_gradient=True)[source]: Gradient and diagonal hessian

loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

property params: The parameter names of the distribution

predict(params)[source]: Predict the parameters of a given distribution

starting_params(y)[source]: The starting parameters of the distribution

class xgboost_distribution.distributions.log_normal.Params(scale, s)

Bases: tuple

Create new instance of Params(scale, s)

s: Alias for field number 1

scale: Alias for field number 0

xgboost_distribution.distributions.negative_binomial module

Negative binomial distribution

class xgboost_distribution.distributions.negative_binomial.NegativeBinomial[source]

Bases: BaseDistribution

Negative binomial distribution with log score

Definition:

f(k) = p^n (1 - p)^k binomial(n + k - 1, n - 1)

with parameter (n, p), where n >= 0 and 1 >= p >= 0

We reparameterize:

n -> log(n) = a | e^a = n p -> log(p/(1-p)) = b | e^b = p / (1-p) | p = 1 / (1 + e^-b)

The gradients are:

d/da -log[f(k)] = -e^a [ digamma(k+e^a) - digamma(e^a) + log(p) ]
= -n [ digamma(k+n) - digamma(n) + log(p) ]

d/db -log[f(k)] = (k e^b - e^a) / (e^b + 1)
= (k - e^a e^-b) / (e^-b + 1) = p * (k - e^a e^-b) = p * (k - n e^-b)

The Fisher Information:

I(n) ~ p / [ n (p+1) ] I(p) = n / [ p (1-p)^2 ]

where we used an approximation for I(n) presented here:: http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/33803

In reparameterized form, we find I_r(n) and I_r(p):

p / [ n (p+1) ] = I_r(n) [ d/dn log(n) ]^2
= I_r(n) ( 1/n )^2

-> I_r(n) = np / (p+1)

n / [ p (1-p)^2 ] = I_r(p) [ d/dp log(p/(1-p)) ]^2
= I_r(p) ( 1/ [ p (1-p) ] )^2

-> I_r(p) = [ p^2 (1-p)^2 n ] / [ p (1-p)^2 ] = np

Hence the reparameterized Fisher information:

[ np / (p+1), 0 ] [ 0, np ]

Ref:

https://www.wolframalpha.com/input/?i=d%2Fda+-log%28+%5B1+%2F+%281+%2B+e%5E%28-b%29%29%5D+%5E%28e%5Ea%29+%281+-+%5B1+%2F+%281+%2B+e%5E%28-b%29%29%5D%29%5Ek+binomial%28%28e%5Ea%29+%2B+k+-+1%2C+%28e%5Ea%29+-+1%29+%29

check_target(y)[source]: Ensure that the target is compatible with the chosen distribution

gradient_and_hessian(y, params, natural_gradient=True)[source]: Gradient and diagonal hessian

loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

property params: The parameter names of the distribution

predict(params)[source]: Predict the parameters of a given distribution

starting_params(y)[source]: The starting parameters of the distribution

class xgboost_distribution.distributions.negative_binomial.Params(n, p)

Bases: tuple

Create new instance of Params(n, p)

n: Alias for field number 0

p: Alias for field number 1

xgboost_distribution.distributions.normal module

Normal distribution

class xgboost_distribution.distributions.normal.Normal[source]

Bases: BaseDistribution

Normal distribution with log scoring

Definition:

f(x) = exp( -[ (x-mean) / std ]^2 / 2 ) / std

We reparameterize:

a = mean | a = mean b = log ( std ) | e^b = std

(Note: reparameterizing to log(std) ensures that std >= 0, regardless of what the xgboost booster internally outputs, as std = e^b > 0.)

The gradients are:

d/da -log[f(x)] = e^(-2b) * (x-a) = (x-a) / var d/db -log[f(x)] = 1 - e^(-2b) * (x-a)^2 = 1 - (x-a)^2 / var

as var = std^2 = e^(2b)

The Fisher Information (diagonal):

I(mean) = 1 / var I(std) = 2 / var

In reparameterized form, we find I_r:

1 / var = I_r [ d/d(mean) mean ]^2 = I 2 / var = I_r [ d/d(std) log(std) ]^2 = I ( 1/(std) )^2

Hence the reparameterized Fisher information:

[ 1 / var, 0 ] [ 0, 2 ]

Ref:

https://www.wolframalpha.com/input/?i=d%2Fda+-log%28%28e%5E%28-%5B%28x-a%29%2Fe%5Eb%29%5D%5E2+%2F+2%29+%2F+e%5Eb%29%29 https://www.wolframalpha.com/input/?i=d%2Fdb+-log%28%28e%5E%28-%5B%28x-a%29%2Fe%5Eb%29%5D%5E2+%2F+2%29+%2F+e%5Eb%29%29

gradient_and_hessian(y, params, natural_gradient=True)[source]: Gradient and diagonal hessian

loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

property params: The parameter names of the distribution

predict(params)[source]: Predict the parameters of a given distribution

starting_params(y)[source]: The starting parameters of the distribution

class xgboost_distribution.distributions.normal.Params(loc, scale)

Bases: tuple

Create new instance of Params(loc, scale)

loc: Alias for field number 0

scale: Alias for field number 1

xgboost_distribution.distributions.poisson module

Poisson distribution

class xgboost_distribution.distributions.poisson.Params(mu)

Bases: tuple

Create new instance of Params(mu,)

mu: Alias for field number 0

class xgboost_distribution.distributions.poisson.Poisson[source]

Bases: BaseDistribution

Poisson distribution with log score

Definition:

f(k) = e^(-mu) mu^k / k!

We reparameterize mu -> log(mu) = a to ensure mu >= 0. Gradient:

d/da -log[f(k)] = e^a - k = mu - k

The Fisher information = 1 / mu, which needs to be expressed in the reparameterized form:

1 / mu = I ( d/dmu log(mu) )^2 = I ( 1/ mu )^2

Hence we find: I = mu

check_target(y)[source]: Ensure that the target is compatible with the chosen distribution

gradient_and_hessian(y, params, natural_gradient=True)[source]: Gradient and diagonal hessian

loss(y, params)[source]: Evaluate the per sample loss (typically negative log-likelihood)

property params: The parameter names of the distribution

predict(params)[source]: Predict the parameters of a given distribution

starting_params(y)[source]: The starting parameters of the distribution

xgboost_distribution.distributions.utils module

Utility functions for distributions

xgboost_distribution.distributions.utils.check_all_ge_zero(x)[source]

xgboost_distribution.distributions.utils.check_all_gt_zero(x)[source]

xgboost_distribution.distributions.utils.check_all_integer(x)[source]

xgboost_distribution.distributions.utils.safe_exp(x)[source]

Like np.exp, but clipped to prevent overflow (in float32 world)

Ensures that

large numbers cannot hit infinity
small numbers cannot hit precisely zero

NB: The limits are chosen such that we have some stability in subsequent computations. E.g the minimum returned value should be safe in a division with a numerator of size up to ~1e6.

Module contents

xgboost_distribution.distributions.format_distribution_name(class_name)[source]

xgboost_distribution.distributions.get_distribution(name)[source]: Get instantianted distribution based on name

xgboost_distribution.distributions.get_distribution_doc()[source]: Construct docstring for distribution param in XGBDistribution model