Journal of Probability and Statistics

Volume 2019, Article ID 8953530, 13 pages

https://doi.org/10.1155/2019/8953530

## On the Use of Min-Max Combination of Biomarkers to Maximize the Partial Area under the ROC Curve

^{1}Merck & Co. Inc., Kenilworth, NJ 07033, USA^{2}Department of Biostatistics and Bioinformatics, Box 2717, Duke University Medical Center, Durham, NC 27710, USA^{3}Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Rockville, MD, USA

Correspondence should be addressed to Susan Halabi; ude.ekud@ibalah.nasus

Received 3 May 2018; Revised 6 December 2018; Accepted 3 January 2019; Published 3 February 2019

Guest Editor: Yichuan Zhao

Copyright © 2019 Hua Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

*Background*. Evaluation of diagnostic assays and predictive performance of biomarkers based on the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are vital in diagnostic and targeted medicine. The partial area under the curve (pAUC) is an alternative metric focusing on a range of practical and clinical relevance of the diagnostic assay. In this article, we adopt and extend the min-max method to the estimation of the pAUC when multiple continuous scaled biomarkers are available and compare the performances of our proposed approach with existing approaches via simulations.* Methods*. We conducted extensive simulation studies to investigate the performance of different methods for the combination of biomarkers based on their abilities to produce the largest pAUC estimates. Data were generated from different multivariate distributions with equal and unequal variance-covariance matrices. Different shapes of the ROC curves, false positive fraction ranges, and sample size configurations were considered. We obtained the mean and standard deviation of the pAUC estimates through re-substitution and leave-one-pair-out cross-validation.* Results*. Our results demonstrate that the proposed method provides the largest pAUC estimates under the following three important practical scenarios: (1) multivariate normally distributed data for nondiseased and diseased participants have unequal variance-covariance matrices; or (2) the ROC curves generated from individual biomarker are relative close regardless of the latent normality distributional assumption; or (3) the ROC curves generated from individual biomarker have straight-line shapes.* Conclusions*. The proposed method is robust and investigators are encouraged to use this approach in the estimation of the pAUC for many practical scenarios.

#### 1. Introduction

The area under the entire curve (AUC) is one of the most commonly used summary indices in receiver operating characteristic (ROC) analysis and can be interpreted as the average value of sensitivity for all possible values of specificity [1]. The empirical estimate of the AUC is closely related to the Mann-Whitney U statistic for comparing ratings of nondiseased and diseased participants [1]. Although methods based on the AUC have been well developed and widely implemented [2, 3], one of the major limitations of the AUC is that it summarizes the performance over the entire curve, including regions that may not be clinically relevant (e.g., the regions with low specificity levels). The partial area under the ROC curve (pAUC) can be used as a summary index of diagnostic/prognostic accuracy over a certain range of specificity that is of clinical interest [4, 5]. In many applications, tests with false positive rates outside of a particular domain will be of no practical use and hence are irrelevant for evaluating the accuracy of the test. In particular, for a certain disease with low prevalence, the unnecessary follow-up resulting from high false positive rate will burden the health system. There are several proposed methods for analyzing the pAUC [4, 6–10].

When multiple continuous-scaled biomarkers are available in the evaluation of prognostic accuracy, it may be possible to improve the accuracy by combining several biomarkers. The use of linear combination is popular due to its ease of implementation and interpretation. Finding optimal linear combination to maximize the area under the ROC curve has been extensively studied [11–14]. By extending Fisher’s discriminant function, Su and Liu [11] first proposed the best linear combination to maximize AUC based on the multivariate normality assumption. Su and Liu’s method relies on the strong distributional assumption, and therefore pAUC may have unsatisfactory performance for many practical scenarios when the distributional assumption is not satisfied. Liu et al. [12] provided an approach to construct the best linear combination that can produce the ROC curve dominating any other ROC curves in some particular specificity ranges. However, this approach depends on the distributional assumption about the mean vectors and the specificity range. Therefore, it may fail to be dominant for a particular range of specificity and sensitivity that may be of clinical interest. In addition, this approach involves the calculation of the eigenvector corresponding to the eigenvalue, and thus the stability of this approach depends on the behavior of eigenvector under small perturbation of the corresponding matrix [15].

Under the assumption of generalized linear model, Jin and Lu [13] proved that the combination coefficients from the estimates of logistic regression yielded ROC curve with the highest sensitivity uniformly over the entire range of specificity. Without distributional assumptions on the data, Pepe and Thompson [16] considered maximizing AUC and pAUC through rank-based estimate, i.e., the Mann-Whitney U statistic [1]. They proposed an algorithm to search for optimal linear combinations with number of biomarkers equal to 2. This approach was computationally formidable when the number of biomarkers is greater than or equal to 3 [17]. Hsu and Hsueh [18] and Yu and Park [19] proposed methods to maximize the partial area under the ROC curve based on the multivariate normality assumption.

Liu et al. [20] developed a nonparametric min-max approach that reduces data into two dimensions to maximize the Mann-Whitney statistic of the AUC. This approach is robust against distributional assumptions due to its nonparametric nature and is computationally efficient since the min-max procedure involves searching for only one single coefficient. Although useful, this approach was developed based on the full range of specificity. In many medical areas, the ROC curve is only clinically relevant and of interest when the assay has high specificities. For example, high specificity of an assay is required for screening any healthy population. Similarly, in using diagnostic assay with multiple genes, only high sensitivity and specificity classifiers have clinical utility (Sparano 2015).

We adapt and extend the min-max method to estimating the pAUC when several markers are considered. This article is organized as follows. In Section 2, we provide a thorough review of existing methods that maximize the AUC and pAUC. In Section 3, we extend the min-max combination method to the optimization of the pAUC and discuss the leave-one-pair-out (LOPO) cross-validation approach for evaluation of the combination methods based on their accuracy for future observations. In Section 4, we then conduct extensive simulations to investigate the performance of the different combination methods based on their abilities to yield the largest pAUC estimates. In Section 5, two real life examples are presented. We then discuss the results in Section 6 and provide guidelines for practical use of the different approaches.

#### 2. Existing Methods

##### 2.1. Definition

Without loss of generality, we consider the partial area under the ROC curve (pAUC) over the range of high specificity values, i.e., In this article, less than or equal to 0.2, i.e., specificity greater than or equal to 0.8, were considered. This is due to the fact that an assay is unlikely to be used if it has a lower specificity rate.

Let , , and , , be the biomarker levels for nondiseased and diseased participants. The corresponding empirical estimate of pAUC by utilizing the Mann-Whitney U statistic iswhere is the quantile of the empirical distribution of* X*.

Assume that we have* p* diagnostic tests or biomarkers on each subject,* n*_{1} nondiseased participants with ratingsand* n*_{2} diseased participants with ratingsThe best linear combination coefficient which maximizes the pAUC can be estimated by maximizing the empirical estimate of pAUC, i.e.,where is the quantile of the empirical distribution of .

##### 2.2. Su and Liu’s Method for pAUC

Assume that and follow multivariate normal distribution with mean vector and covariance matrices and , i.e., and , respectively. Su and Liu derived the best linear combination coefficient that can maximize AUC based on the invariance property of ROC curve to scalar transformation and Fisher’s discriminant coefficient [11]. When the two covariance matrices are equal or proportional to each other, the best linear coefficient based on Su and Liu’s method also generates the ROC curve dominating all the others within any range of specificities.

##### 2.3. Liu et al.’s Method for pAUC

By realizing the unsatisfactory performance from the use of Su and Liu’s best linear combination coefficient, Liu et al. considered the scenario where [12]. The authors provided an approach to construct best linear combination that can maximize sensitivity over a certain range of specificities. In particular, if the high specificity region of an ROC curve is of interest, then the best linear combination coefficient is proportional to where is the eigenvector corresponding to the smallest eigenvalue of matrix . It has been showed that this linear combination produces the ROC curve dominating any other ROC curves in some particular specificity ranges.

##### 2.4. Logistic Regression for pAUC

The logistic regression has been widely used to predict binary outcomes by considering linear combination of multiple predictors [13]. It models the probability of disease for a given subject with covariates by using the logit link function, i.e.,where is the intercept and and are defined as before. Under the assumption of generalized linear model, the estimate of followed by the logistic regression can maximize the likelihood function of binary outcomes. Jin and Lu proved that this estimate also provides the highest sensitivity uniformly over the entire range of specificity. This implies that the best linear combination equals resulting in an ROC curve which not only has the maximum full AUC, but also dominates any other ROC curves within any range of potential interest and therefore leads to the maximum pAUC.

##### 2.5. Pepe and Thompon’s Method for pAUC

Without distributional assumptions on the data and , Pepe and Thompson [16] considered maximizing AUC and pAUC through rank-based estimate, i.e., the Mann-Whitney U statistics [1]. For simplicity, they proposed an algorithm to search for optimal linear combinations with number of biomarkers equal to 2 (*p*=2), i.e., for and for . Based on the fact that the ROC curve is variant to scale transformation, in order to maximize AUC or pAUC, finding the best combination coefficient , where is equivalent to finding , where . Let denote the range of false positive of potential interest. The estimate of AUC based on the Mann-Whitney U statistics and the estimate of pAUC can be obtained asandrespectively, where is the quantile of . The authors chose to implement a semiparametric method based on Heagerty and Pepe [21] to estimate , while they also pointed out that other quantile estimation methods may be applied.

##### 2.6. Min-Max Method for AUC

Liu et al. considered the min-max combination of biomarkers [20]. Letbe the maximum value of p biomarkers for nondiseased and diseased participants, respectively. Similarly, let be the minimum value of p biomarkers for nondiseased and diseased participants, respectively.

The nonparametric estimate of AUC based on the Mann-Whitney U statistics by linearly combining the minimum and maximum values of p biomarkers for each subject can be obtained asSince this is not a continuous function of *α*, a search rather than a derivative-based method is required for the maximization. The searching method for the best value of *α* is exactly the same as Pepe and Thompson’s method.

#### 3. Methodology Extension: Min-Max Method for pAUC

We extend the min-max method to maximize the pAUC. Let denote the range of false positive of potential interest. By considering only the minimum and maximum values of* p* biomarkers for each individual, it follows that the nonparametric estimate of pAUC can be obtained aswhere is the quantile of . For simplicity, the quantile of the empirical distribution of can be used to estimate . Then the Pepe and Thompson’s [16] algorithm can be applied to search for the optimal value of *α* to maximize the estimate of the pAUC.

The new marker has larger sensitivity and smaller specificity for any given threshold* c* than any other individual marker, given thatandfor all ; similarly, the new marker has smaller sensitivity and larger specificity for any given threshold* c* than any other individual marker, given thatandfor all . Therefore, we expect that the linear combination of the min-max biomarkers may provide larger partial area under the ROC curve than other methods. We employ simulation study to investigate how well the proposed method performs compared to other established methods.

The cross-validation has been widely used to evaluate the generalizability of the statistical results. Huang et al. [22] proposed a LOPO approach to evaluating the performance of the linear combination coefficient to estimate AUC for future observations. The estimate of AUC based on LOPO cross-validation is as follows:where is the best linear combination coefficient based on the observed data without both the* i*th observation from nondiseased subject and the* j*th observation from diseased subject. They also demonstrated that the 5-fold and 10-fold cross-validation can be computationally efficient and the resulting estimate can be asymptotically unbiased for the future observations.

We implement the LOPO cross-validation on the pAUC to evaluate the generalizability of the statistical results. The estimate of the pAUC based on the LOPO cross-validation can be obtained aswhere is the quantile of . For simplicity, in our simulation study the quantile of the empirical distribution of will be used to estimate of .

#### 4. Simulation

##### 4.1. Description of Simulations

We conducted extensive simulation studies to investigate the performance of our proposed method with established combination methods based on the partial area under the ROC curves. Ratings of participants were simulated from different multivariate distributions with equal and unequal variance-covariance matrices. We examined false positive fraction ranges 0 – 0.1 and 0 – 0.2 and we considered different samples sizes: 50:50, 50:100, 100:50, and 100:100 for nondiseased and diseased participants, respectively.

For each simulated dataset, we computed the pAUC based on four different approaches: min-max, denoted as MIN-MAX; Su and Liu’s [11], denoted as SULIU; (3) Liu et al.’s (2006), denoted as LIU; and the (4) logistic regression, denoted as LOGISTIC. In addition, we utilized two estimation methods: the re-substitution (denoted as Re-Sub) and 10-fold leave-one-pair-out cross-validation (denoted as LOPO) in computing the pAUC. The re-substitution method estimated the pAUC based on the linear combination of the coefficients derived using all the data for each method. The re-substitution method is usually overoptimistic for estimating the diagnostic/prognostic accuracy for future observations due to the reason between training set and validation set in the discipline of machine learning [22]. We obtained the mean of the pAUC by averaging over the 1,000 simulations, and standard deviation was the square root of the estimated sample variance of the estimated pAUC from 1,000 simulated datasets.

##### 4.2. Multivariate Normal Distributions with Equal Variance-Covariance

We first compared the performance of the min-max approach on the pAUC with the other methods by generating dataset consisting of ratings from multivariate normal distributions (*p*=4) with different mean vectors and equal variance-covariance matrices (scenario #1). Exploiting the invariance property of the ROC curve to monotonically increasing transformation of the ratings, the distributions of ratings of nondiseased participants were set to be a multivariate normal distribution with mean and variance-covariance matrix

Under this scenario, ratings of diseased participants were generated from multivariate normal distributions with variance-covariance matrix equal to , and the mean vectors were selected to generate the AUC equal to 0.70, 0.73, 0.76, and 0.80 for markers # 1, 2, 3, and 4, respectively (Case #1), and the AUC equal to 0.6, 0.7, 0.8, and 0.9 for markers # 1, 2, 3, and 4, respectively (Case #2).

##### 4.3. Multivariate Normal Distributions with Unequal Variance-Covariance

We also considered multivariate normal distributions with different mean and unequal variance-covariance matrices for nondiseased and diseased participants (scenario #2). The mean settings are the same as Case 1 and Case 2 as discussed in scenario 1. The variance-covariance matrices were

##### 4.4. Multivariate Log-Normal Distributions with Unequal Variance-Covariance

We investigated the performance of the different combination methods by generating dataset consisting of ratings from multivariate log-normal distributions (scenario #3). Ratings were first generated similarly to scenario #2 and then exponentiated to obtain the multivariate log-normal marker values.

##### 4.5. Multivariate Gamma Distributions

We further examined the performance of the different combination methods by generating gamma ROC curves with the AUC settings in Case 1 and Case 2 (scenario #4). The gamma family is one of the well-known families of ROC curves [9, 10, 23–26]. Due to the concavity and flexibility in the shape, Ma et al. [9] and Ma et al. [10] demonstrated that the families of gamma ROC curves provided practically reasonable straight-line shaped concave ROC curves, where the statistical inference based on pAUCs is preferable.

The probability density function of the underlying rating model of the gamma ROC curve has the following form:

When *κ* approaches 0, the gamma ROC curve approaches the shape of a straight-line and when the shape of the gamma ROC curve resembles an ROC curve with latent normality assumptions. When *κ*=1 the gamma ROC curve is equivalent to the power-law ROC curve [23, 27]. Here we are interested in the investigation of a scenario with straight-line shaped gamma ROC curves (*κ*=1/3), because this type of ROC curves cannot be generated by the previous scenarios.

Each simulated dataset consisted of ratings generated from multivariate gamma distributions with *κ*=1/3. Due to the invariance property of the ROC curves, without any loss of generality, we set *θ*=1 for latent ratings of nondiseased participants. We then selected *θ* for the latent diseased ratings to reflect the targeted area under the ROC curve in Case #1 and Case #2. The between-modality correlation of 0.5 was established using a Gaussian copula model [28]. All the programs were written by the first author in R version 2.15.3 and are available: https://duke.box.com/s/u32h7aayxd9bjo41b619xpb21sj1nm67.

##### 4.6. Simulation Results

We compared the performance of the min-max method in estimating the pAUC with three established methods assuming the ratings are from multivariate normal distributions with equal variance-covariance matrices (Table 1). The SULIU and LOGISTIC almost always performed better than the min-max and LIU based on the pAUCs estimated from both the re-substitution and the LOPO cross-validation. In addition, the performances of SULIU and LOGISTIC approaches were similar when the AUCs were either close or further apart. The min-max approach produced slightly smaller pAUC estimates than that of SULIU and LOGISTIC when the AUCs among biomarkers were relatively close (i.e., Case #1), while this approach became worse when the AUCs were far apart (i.e., Case #2).