Initialization |
(1) Start with iteration counter . Initialize the additive predictor with an offset value. |
Specify a set of prediction functions as base-learners ; typically each |
base-learner is a regression function incorporating one possible candidate variable. |
Component-wise fitting of base-learners |
(2) Set iteration counter . |
(3) Fit the base-learners , , one-by-one: |
Gradient boosting |
Base-learners are fitted to the negative gradient vector of the loss function (e.g. the negative |
log-likelihood), evaluated at the current additive predictor . To ensure small steps, the |
base-learner fits are multiplied by a small step-length factor ,: . |
Likelihood-based boosting |
Base-learners are estimated via maximizing the overall likelihood, using one step of Fisher |
scoring with the current additive predictor as offset. To ensure small steps, a penalty |
term is attached to the likelihood. |
Update best performing component |
(4) Select the best performing base-learner : |
Gradient boosting |
Based on the smallest residual sum of squares with respect to the negative gradient vector. |
Likelihood-based boosting |
Based on the largest overall likelihood after the update. |
(5) Update the additive predictor via the corresponding base-learner: |
|
Iteration |
Iterate steps (2) to (5) until . The parameter is the main tuning parameter, |
typically selected via resampling procedures. |