Computational and Mathematical Methods in Medicine

Volume 2017, Article ID 6083072, 12 pages

https://doi.org/10.1155/2017/6083072

## An Update on Statistical Boosting in Biomedicine

^{1}Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany^{2}Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany^{3}Paul-Ehrlich-Institut, Langen, Germany

Correspondence should be addressed to Andreas Mayr; ed.uaf@ryam.saerdna

Received 24 February 2017; Accepted 8 June 2017; Published 2 August 2017

Academic Editor: Andrzej Kloczkowski

Copyright © 2017 Andreas Mayr et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

#### 1. Introduction

Statistical boosting algorithms are one of the advanced methods in the toolbox of a modern statistician or data scientist [1]. While still yielding classical statistical models with well-known interpretability, they offer multiple advantages in the presence of high-dimensional data as they are applicable in situations with more explanatory variables than observations [2, 3]. Key features in this context are automated variable selection and model choice [4, 5].

The research field embraces the world of statistics and computer science, bridging the gap between two rather different points of view on how to extract information from data [6]: on the one hand, there is the classical statistical modelling community who focus on models* describing* and* explaining* the outcome to find an approximation to the underlying stochastic data generating process. On the other hand, there is the machine learning community who focus primarily on algorithmic models* predicting* the outcome while treating the nature of the underlying process as unknown. Statistical boosting algorithms have their roots in machine learning [7] but were later adapted to estimate classical statistical models [8, 9]. A pivotal aspect of these algorithms is that they incorporate data-driven variable selection and shrinkage of effect estimates similar to classical penalized regression [10].

In a review some years ago [1], we highlighted this evolution of boosting from machine learning to statistical modelling. Furthermore, we emphasized the similarity of two boosting approaches, gradient boosting [2] and likelihood-based boosting [3], introducing* statistical boosting* as a generic term for these algorithms.

An accompanying article [11] highlighted the multiple extension of the basic algorithms towards (i) enhanced variable selection properties, (ii) new types of predictor effects, and (iii) new regression settings. Substantial methodological developments on statistical boosting algorithms throughout the last few years (e.g., stability selection [12]) and a growing community have opened the door to new model classes and frameworks (e.g., joint models [13] and functional data [14]), asking for an up-to-date review on the available extensions.

This article is structured as follows: In Section 2 we shortly highlight both basic structure and properties of statistical boosting algorithms and point to their connections to classical penalization approaches such as the lasso. In Section 3 we focus on new developments regarding variable selection (including exemplary analysis of gene expression data), which can also be combined with boosted functional regression models presented in Section 4. Section 5 focuses on advanced survival models such as joint modelling; in Section 6 we briefly summarize other relevant developments and applications in the framework of statistical boosting.

#### 2. Statistical Boosting

##### 2.1. From Machine Learning to Statistical Models

The original boosting concept by Schapire [15] and Freund [7] emerged from the field of supervised learning where typically a function is trained based on data with known outcome classes or labels to correctly classify new observations. The aim of the boosting concept is to* boost* (i.e., to improve) the accuracy of weak classifiers (i.e., classifiers with poor correct classification rates) by iteratively applying them to reweighted data. Even if these so called* base-learners* individually only slightly outperform random guessing, the ensemble solution can often be boosted to a perfect classification [16].

The introduction of AdaBoost [17] was the breakthrough for boosting in the field of supervised machine learning, allegedly leading Leo Breiman to praise its performance:* Boosting is the best off-the-shelf classifier in the world* [18].

The main target of classical machine learning approaches is predicting observations of the outcome given one or more input variables . The estimation of the prediction rule (also called generalization function) is based on an observed sample . However, the focus is not on quantifying or describing the underlying data generating process, but on predicting for new observations as accurately as possible. As a consequence, many machine learning approaches (also including the original AdaBoost with trees or stumps as base-learners) can be regarded as black box prediction schemes. Although typically yielding accurate predictions [19], they do not offer much insight into the structure of the relationship between explanatory variables and the outcome .

Statistical regression models on the other hand particularly aim at describing and explaining the underlying relationship in a structured way. Not only can the impact of single explanatory variables be quantified in terms of variable importance measures [20, 21], but also the actual effect of these variables is interpretable. The work of Friedman et al. [8, 9] laid the groundwork to understand the concept of boosting from a statistical perspective and to adapt the general idea in order to estimate statistical models.

##### 2.2. General Model Structure

The aim of* statistical boosting* algorithms is to estimate and select the effects in structured additive regression models. The most important model class are generalized additive models (“GAM” [22]), where the conditional distribution of the response variable is assumed to follow an exponential family distribution. The expected response is modelled given the observed value of one or more explanatory variables using a link function as In the typical case of multiple explanatory variables, the function , which is often called additive predictor, consists of the additive effects of the single predictors:where represents a common intercept and the functions , , are the individual effects of the variables . The generic notation may comprise different types of predictor effects such as classical linear effects, , smooth nonlinear effects constructed via regression splines, spatial effects, or random effects of the explanatory variable , to name but a few.

In statistical boosting algorithms, like the two approaches described in the following sections, the different effects are estimated by separate base-learners (*componentwise boosting* [2]). These base-learners are typically the corresponding simple regression-type prediction functions; for a linear effect, the corresponding base-learner would be a simple linear model: .

##### 2.3. The Generic Structure of Statistical Boosting

For a generic overview on the structure of statistical boosting algorithms, see Box 1. The base-learners are applied one by one, and in every iteration only the best performing base-learner is selected to be updated. The final additive model is hence the sum of all selected base-learner fits.