SymmetricalDataSecurity: GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

Thursday, June 24, 2021

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

Get started

GPBoost is a software library for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. The GPBoost library is predominantly written in C++, and there exist both a Python package and an R package.

For more information, you may want to have a look at:

Modeling Background

Both tree-boosting and Gaussian processes are techniques that achieve state-of-the-art predictive accuracy. Besides this, tree-boosting has the following advantages:

Automatic modeling of non-linearities, discontinuities, and complex high-order interactions
Robust to outliers in and multicollinearity among predictor variables
Scale-invariance to monotone transformations of the predictor variables
Automatic handling of missing values in predictor variables

Gaussian process and mixed effects models have the following advantages:

Probabilistic predictions which allows for uncertainty quantification
Modeling of dependency which, among other things, can allow for more efficient learning of the fixed effects / regression function

For the GPBoost algorithm, it is assumed that the response variable (aka label) y is the sum of a potentially non-linear mean function F(X) and random effects Zb:

where where xi is an independent error term and X are predictor variables (aka covariates or features).

The random effects can consists of

Gaussian processes (including random coefficient processes)
Grouped random effects (including nested, crossed, and random coefficient effects)
A sum of the above

The model is trained using the GPBoost algorithm, where training means learning the covariance parameters (aka hyperparameters) of the random effects and the predictor function F(X) using a tree ensemble. In brief, the GPBoost algorithm is a boosting algorithm that iteratively learns the covariance parameters and adds a tree to the ensemble of trees using a gradient and/or a Newton boosting step. In the GPBoost library, covariance parameters can be learned using (Nesterov accelerated) gradient descent or Fisher scoring (aka natural gradient descent). Further, trees are learned using the LightGBM library. See Sigrist (2020) and Sigrist (2021) for more details.

News

Open Issues - Contribute

Software issues

Computational issues

Add GPU support for Gaussian processes
Add CHOLMOD support

Methodological issues

Add a spatio-temporal Gaussian process model (e.g. a separable one)
Add possibility to predict latent Gaussian processes and random effects (e.g. random coefficients)
Implement more approaches such that computations scale well (memory and time) for Gaussian process models and mixed effects models with more than one grouping variable for non-Gaussian data

References

License

This project is licensed under the terms of the Apache License 2.0. See LICENSE for additional details.

from Hacker News https://ift.tt/3d5zqmf

SymmetricalDataSecurity

Thursday, June 24, 2021

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

Table of Contents

Get started

Modeling Background

News

Open Issues - Contribute

Software issues

Computational issues

Methodological issues

References

License

No comments:

Post a Comment

Blog Archive

Search This Blog

Total Pageviews