Table of Contents Author Guidelines Submit a Manuscript
Journal of Probability and Statistics
Volume 2019, Article ID 8953530, 13 pages
https://doi.org/10.1155/2019/8953530
Research Article

On the Use of Min-Max Combination of Biomarkers to Maximize the Partial Area under the ROC Curve

1Merck & Co. Inc., Kenilworth, NJ 07033, USA
2Department of Biostatistics and Bioinformatics, Box 2717, Duke University Medical Center, Durham, NC 27710, USA
3Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Rockville, MD, USA

Correspondence should be addressed to Susan Halabi; ude.ekud@ibalah.nasus

Received 3 May 2018; Revised 6 December 2018; Accepted 3 January 2019; Published 3 February 2019

Guest Editor: Yichuan Zhao

Copyright © 2019 Hua Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background. Evaluation of diagnostic assays and predictive performance of biomarkers based on the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are vital in diagnostic and targeted medicine. The partial area under the curve (pAUC) is an alternative metric focusing on a range of practical and clinical relevance of the diagnostic assay. In this article, we adopt and extend the min-max method to the estimation of the pAUC when multiple continuous scaled biomarkers are available and compare the performances of our proposed approach with existing approaches via simulations. Methods. We conducted extensive simulation studies to investigate the performance of different methods for the combination of biomarkers based on their abilities to produce the largest pAUC estimates. Data were generated from different multivariate distributions with equal and unequal variance-covariance matrices. Different shapes of the ROC curves, false positive fraction ranges, and sample size configurations were considered. We obtained the mean and standard deviation of the pAUC estimates through re-substitution and leave-one-pair-out cross-validation. Results. Our results demonstrate that the proposed method provides the largest pAUC estimates under the following three important practical scenarios: (1) multivariate normally distributed data for nondiseased and diseased participants have unequal variance-covariance matrices; or (2) the ROC curves generated from individual biomarker are relative close regardless of the latent normality distributional assumption; or (3) the ROC curves generated from individual biomarker have straight-line shapes. Conclusions. The proposed method is robust and investigators are encouraged to use this approach in the estimation of the pAUC for many practical scenarios.

1. Introduction

The area under the entire curve (AUC) is one of the most commonly used summary indices in receiver operating characteristic (ROC) analysis and can be interpreted as the average value of sensitivity for all possible values of specificity [1]. The empirical estimate of the AUC is closely related to the Mann-Whitney U statistic for comparing ratings of nondiseased and diseased participants [1]. Although methods based on the AUC have been well developed and widely implemented [2, 3], one of the major limitations of the AUC is that it summarizes the performance over the entire curve, including regions that may not be clinically relevant (e.g., the regions with low specificity levels). The partial area under the ROC curve (pAUC) can be used as a summary index of diagnostic/prognostic accuracy over a certain range of specificity that is of clinical interest [4, 5]. In many applications, tests with false positive rates outside of a particular domain will be of no practical use and hence are irrelevant for evaluating the accuracy of the test. In particular, for a certain disease with low prevalence, the unnecessary follow-up resulting from high false positive rate will burden the health system. There are several proposed methods for analyzing the pAUC [4, 610].

When multiple continuous-scaled biomarkers are available in the evaluation of prognostic accuracy, it may be possible to improve the accuracy by combining several biomarkers. The use of linear combination is popular due to its ease of implementation and interpretation. Finding optimal linear combination to maximize the area under the ROC curve has been extensively studied [1114]. By extending Fisher’s discriminant function, Su and Liu [11] first proposed the best linear combination to maximize AUC based on the multivariate normality assumption. Su and Liu’s method relies on the strong distributional assumption, and therefore pAUC may have unsatisfactory performance for many practical scenarios when the distributional assumption is not satisfied. Liu et al. [12] provided an approach to construct the best linear combination that can produce the ROC curve dominating any other ROC curves in some particular specificity ranges. However, this approach depends on the distributional assumption about the mean vectors and the specificity range. Therefore, it may fail to be dominant for a particular range of specificity and sensitivity that may be of clinical interest. In addition, this approach involves the calculation of the eigenvector corresponding to the eigenvalue, and thus the stability of this approach depends on the behavior of eigenvector under small perturbation of the corresponding matrix [15].

Under the assumption of generalized linear model, Jin and Lu [13] proved that the combination coefficients from the estimates of logistic regression yielded ROC curve with the highest sensitivity uniformly over the entire range of specificity. Without distributional assumptions on the data, Pepe and Thompson [16] considered maximizing AUC and pAUC through rank-based estimate, i.e., the Mann-Whitney U statistic [1]. They proposed an algorithm to search for optimal linear combinations with number of biomarkers equal to 2. This approach was computationally formidable when the number of biomarkers is greater than or equal to 3 [17]. Hsu and Hsueh [18] and Yu and Park [19] proposed methods to maximize the partial area under the ROC curve based on the multivariate normality assumption.

Liu et al. [20] developed a nonparametric min-max approach that reduces data into two dimensions to maximize the Mann-Whitney statistic of the AUC. This approach is robust against distributional assumptions due to its nonparametric nature and is computationally efficient since the min-max procedure involves searching for only one single coefficient. Although useful, this approach was developed based on the full range of specificity. In many medical areas, the ROC curve is only clinically relevant and of interest when the assay has high specificities. For example, high specificity of an assay is required for screening any healthy population. Similarly, in using diagnostic assay with multiple genes, only high sensitivity and specificity classifiers have clinical utility (Sparano 2015).

We adapt and extend the min-max method to estimating the pAUC when several markers are considered. This article is organized as follows. In Section 2, we provide a thorough review of existing methods that maximize the AUC and pAUC. In Section 3, we extend the min-max combination method to the optimization of the pAUC and discuss the leave-one-pair-out (LOPO) cross-validation approach for evaluation of the combination methods based on their accuracy for future observations. In Section 4, we then conduct extensive simulations to investigate the performance of the different combination methods based on their abilities to yield the largest pAUC estimates. In Section 5, two real life examples are presented. We then discuss the results in Section 6 and provide guidelines for practical use of the different approaches.

2. Existing Methods

2.1. Definition

Without loss of generality, we consider the partial area under the ROC curve (pAUC) over the range of high specificity values, i.e., In this article, less than or equal to 0.2, i.e., specificity greater than or equal to 0.8, were considered. This is due to the fact that an assay is unlikely to be used if it has a lower specificity rate.

Let , , and , , be the biomarker levels for nondiseased and diseased participants. The corresponding empirical estimate of pAUC by utilizing the Mann-Whitney U statistic iswhere is the quantile of the empirical distribution of X.

Assume that we have p diagnostic tests or biomarkers on each subject, n1 nondiseased participants with ratingsand n2 diseased participants with ratingsThe best linear combination coefficient which maximizes the pAUC can be estimated by maximizing the empirical estimate of pAUC, i.e.,where is the quantile of the empirical distribution of .

2.2. Su and Liu’s Method for pAUC

Assume that and follow multivariate normal distribution with mean vector and covariance matrices and , i.e., and , respectively. Su and Liu derived the best linear combination coefficient that can maximize AUC based on the invariance property of ROC curve to scalar transformation and Fisher’s discriminant coefficient [11]. When the two covariance matrices are equal or proportional to each other, the best linear coefficient based on Su and Liu’s method also generates the ROC curve dominating all the others within any range of specificities.

2.3. Liu et al.’s Method for pAUC

By realizing the unsatisfactory performance from the use of Su and Liu’s best linear combination coefficient, Liu et al. considered the scenario where [12]. The authors provided an approach to construct best linear combination that can maximize sensitivity over a certain range of specificities. In particular, if the high specificity region of an ROC curve is of interest, then the best linear combination coefficient is proportional to where is the eigenvector corresponding to the smallest eigenvalue of matrix . It has been showed that this linear combination produces the ROC curve dominating any other ROC curves in some particular specificity ranges.

2.4. Logistic Regression for pAUC

The logistic regression has been widely used to predict binary outcomes by considering linear combination of multiple predictors [13]. It models the probability of disease for a given subject with covariates by using the logit link function, i.e.,where is the intercept and and are defined as before. Under the assumption of generalized linear model, the estimate of followed by the logistic regression can maximize the likelihood function of binary outcomes. Jin and Lu proved that this estimate also provides the highest sensitivity uniformly over the entire range of specificity. This implies that the best linear combination equals resulting in an ROC curve which not only has the maximum full AUC, but also dominates any other ROC curves within any range of potential interest and therefore leads to the maximum pAUC.

2.5. Pepe and Thompon’s Method for pAUC

Without distributional assumptions on the data and , Pepe and Thompson [16] considered maximizing AUC and pAUC through rank-based estimate, i.e., the Mann-Whitney U statistics [1]. For simplicity, they proposed an algorithm to search for optimal linear combinations with number of biomarkers equal to 2 (p=2), i.e., for and for . Based on the fact that the ROC curve is variant to scale transformation, in order to maximize AUC or pAUC, finding the best combination coefficient , where is equivalent to finding , where . Let denote the range of false positive of potential interest. The estimate of AUC based on the Mann-Whitney U statistics and the estimate of pAUC can be obtained asandrespectively, where is the quantile of . The authors chose to implement a semiparametric method based on Heagerty and Pepe [21] to estimate , while they also pointed out that other quantile estimation methods may be applied.

2.6. Min-Max Method for AUC

Liu et al. considered the min-max combination of biomarkers [20]. Letbe the maximum value of p biomarkers for nondiseased and diseased participants, respectively. Similarly, let be the minimum value of p biomarkers for nondiseased and diseased participants, respectively.

The nonparametric estimate of AUC based on the Mann-Whitney U statistics by linearly combining the minimum and maximum values of p biomarkers for each subject can be obtained asSince this is not a continuous function of α, a search rather than a derivative-based method is required for the maximization. The searching method for the best value of α is exactly the same as Pepe and Thompson’s method.

3. Methodology Extension: Min-Max Method for pAUC

We extend the min-max method to maximize the pAUC. Let denote the range of false positive of potential interest. By considering only the minimum and maximum values of p biomarkers for each individual, it follows that the nonparametric estimate of pAUC can be obtained aswhere is the quantile of . For simplicity, the quantile of the empirical distribution of can be used to estimate . Then the Pepe and Thompson’s [16] algorithm can be applied to search for the optimal value of α to maximize the estimate of the pAUC.

The new marker has larger sensitivity and smaller specificity for any given threshold c than any other individual marker, given thatandfor all ; similarly, the new marker has smaller sensitivity and larger specificity for any given threshold c than any other individual marker, given thatandfor all . Therefore, we expect that the linear combination of the min-max biomarkers may provide larger partial area under the ROC curve than other methods. We employ simulation study to investigate how well the proposed method performs compared to other established methods.

The cross-validation has been widely used to evaluate the generalizability of the statistical results. Huang et al. [22] proposed a LOPO approach to evaluating the performance of the linear combination coefficient to estimate AUC for future observations. The estimate of AUC based on LOPO cross-validation is as follows:where is the best linear combination coefficient based on the observed data without both the ith observation from nondiseased subject and the jth observation from diseased subject. They also demonstrated that the 5-fold and 10-fold cross-validation can be computationally efficient and the resulting estimate can be asymptotically unbiased for the future observations.

We implement the LOPO cross-validation on the pAUC to evaluate the generalizability of the statistical results. The estimate of the pAUC based on the LOPO cross-validation can be obtained aswhere is the quantile of . For simplicity, in our simulation study the quantile of the empirical distribution of will be used to estimate of .

4. Simulation

4.1. Description of Simulations

We conducted extensive simulation studies to investigate the performance of our proposed method with established combination methods based on the partial area under the ROC curves. Ratings of participants were simulated from different multivariate distributions with equal and unequal variance-covariance matrices. We examined false positive fraction ranges 0 – 0.1 and 0 – 0.2 and we considered different samples sizes: 50:50, 50:100, 100:50, and 100:100 for nondiseased and diseased participants, respectively.

For each simulated dataset, we computed the pAUC based on four different approaches: min-max, denoted as MIN-MAX; Su and Liu’s [11], denoted as SULIU; (3) Liu et al.’s (2006), denoted as LIU; and the (4) logistic regression, denoted as LOGISTIC. In addition, we utilized two estimation methods: the re-substitution (denoted as Re-Sub) and 10-fold leave-one-pair-out cross-validation (denoted as LOPO) in computing the pAUC. The re-substitution method estimated the pAUC based on the linear combination of the coefficients derived using all the data for each method. The re-substitution method is usually overoptimistic for estimating the diagnostic/prognostic accuracy for future observations due to the reason between training set and validation set in the discipline of machine learning [22]. We obtained the mean of the pAUC by averaging over the 1,000 simulations, and standard deviation was the square root of the estimated sample variance of the estimated pAUC from 1,000 simulated datasets.

4.2. Multivariate Normal Distributions with Equal Variance-Covariance

We first compared the performance of the min-max approach on the pAUC with the other methods by generating dataset consisting of ratings from multivariate normal distributions (p=4) with different mean vectors and equal variance-covariance matrices (scenario #1). Exploiting the invariance property of the ROC curve to monotonically increasing transformation of the ratings, the distributions of ratings of nondiseased participants were set to be a multivariate normal distribution with mean and variance-covariance matrix

Under this scenario, ratings of diseased participants were generated from multivariate normal distributions with variance-covariance matrix equal to , and the mean vectors were selected to generate the AUC equal to 0.70, 0.73, 0.76, and 0.80 for markers # 1, 2, 3, and 4, respectively (Case #1), and the AUC equal to 0.6, 0.7, 0.8, and 0.9 for markers # 1, 2, 3, and 4, respectively (Case #2).

4.3. Multivariate Normal Distributions with Unequal Variance-Covariance

We also considered multivariate normal distributions with different mean and unequal variance-covariance matrices for nondiseased and diseased participants (scenario #2). The mean settings are the same as Case 1 and Case 2 as discussed in scenario 1. The variance-covariance matrices were

4.4. Multivariate Log-Normal Distributions with Unequal Variance-Covariance

We investigated the performance of the different combination methods by generating dataset consisting of ratings from multivariate log-normal distributions (scenario #3). Ratings were first generated similarly to scenario #2 and then exponentiated to obtain the multivariate log-normal marker values.

4.5. Multivariate Gamma Distributions

We further examined the performance of the different combination methods by generating gamma ROC curves with the AUC settings in Case 1 and Case 2 (scenario #4). The gamma family is one of the well-known families of ROC curves [9, 10, 2326]. Due to the concavity and flexibility in the shape, Ma et al. [9] and Ma et al. [10] demonstrated that the families of gamma ROC curves provided practically reasonable straight-line shaped concave ROC curves, where the statistical inference based on pAUCs is preferable.

The probability density function of the underlying rating model of the gamma ROC curve has the following form:

When κ approaches 0, the gamma ROC curve approaches the shape of a straight-line and when the shape of the gamma ROC curve resembles an ROC curve with latent normality assumptions. When κ=1 the gamma ROC curve is equivalent to the power-law ROC curve [23, 27]. Here we are interested in the investigation of a scenario with straight-line shaped gamma ROC curves (κ=1/3), because this type of ROC curves cannot be generated by the previous scenarios.

Each simulated dataset consisted of ratings generated from multivariate gamma distributions with κ=1/3. Due to the invariance property of the ROC curves, without any loss of generality, we set θ=1 for latent ratings of nondiseased participants. We then selected θ for the latent diseased ratings to reflect the targeted area under the ROC curve in Case #1 and Case #2. The between-modality correlation of 0.5 was established using a Gaussian copula model [28]. All the programs were written by the first author in R version 2.15.3 and are available: https://duke.box.com/s/u32h7aayxd9bjo41b619xpb21sj1nm67.

4.6. Simulation Results

We compared the performance of the min-max method in estimating the pAUC with three established methods assuming the ratings are from multivariate normal distributions with equal variance-covariance matrices (Table 1). The SULIU and LOGISTIC almost always performed better than the min-max and LIU based on the pAUCs estimated from both the re-substitution and the LOPO cross-validation. In addition, the performances of SULIU and LOGISTIC approaches were similar when the AUCs were either close or further apart. The min-max approach produced slightly smaller pAUC estimates than that of SULIU and LOGISTIC when the AUCs among biomarkers were relatively close (i.e., Case #1), while this approach became worse when the AUCs were far apart (i.e., Case #2).

Table 1: Means (standard deviations) of the partial area under the ROC curve for different combination methods based on the dataset consisted of ratings from multivariate normal distributions with equal variance-covariance matrices (scenario#1) with 1000 simulated datasets.

Moreover, we examined the performance of the four methods, i.e., MIN-MAX, SULIU, LIU, and LOGISTIC, assuming ratings are from multivariate normal distributions with unequal variance-covariance matrices (Table 2). When the AUCs were close (Case #1), the min-max method was superior to the other methods in terms of its ability to produce the largest pAUCs based on both the re-substitution and the LOPO cross-validation. When the AUCs were far apart (i.e., Case #2), the SULIU and LOGISTIC methods had similar performances superior to the other two methods. The SULIU method was slightly better than the LOGISTIC based on the LOPO cross-validation since this takes into account the normality of data with unequal variance-covariance matrices. It should be noted that the difference in the estimates of the pAUCs between the re-substitution and the LOPO cross-validation was very small under this scenario.

Table 2: Means and standard deviation (SD) of the partial area under the ROC curve for different combination methods based on the dataset consisted of ratings from multivariate normal distributions with unequal variance-covariance matrices (scenario#2) with 1000 simulated datasets.

Furthermore, we studied the performance of the different combination methods assuming multivariate log-normal distributions. From Table 3, under this scenario where data are highly skewed, the min-max approach dominated the other approaches when the AUCs were close (Case #1). On the other hand, the LOGISTIC approach performed better when the AUCs are far apart. It is interesting to observe that the LIU method was suboptimal under both cases in terms of its ability to estimate the pAUC through the LOPO cross-validation whereas the SULIU method had the worst performance since the normality assumption was violated.

Table 3: Means and standard deviation (SD) of the partial area under the ROC curve for different combination methods based on the dataset consisted of ratings from multivariate log-normal distributions with unequal variance-covariance matrices (scenario#3) with 1,000 simulated datasets.

Lastly, we considered the performance of different combination methods by generating gamma ROC curves. From Table 4, (Scenario #4) where data suggest a straight-line shape ROC curve, when the AUCs were close, the min-max approach performed better than the other three approaches in obtaining the largest pAUCs through both the re-substitution and the LOPO cross-validation. When the AUCs were far apart (Case #2), the min-max approach yielded the best pAUC estimates through LOPO cross-validation. The LOGITIC approach was best based on the re-substitution.

Table 4: Means and standard deviation (SD) of the partial area under the ROC curve for different combination methods based on the dataset consisted of ratings from multivariate gamma distributions (scenario#4) with 1000 simulated datasets.

5. Example

5.1. Example 1

We used data from Cancer and Leukemia Group B study 90206, a Phase III clinical trial of metastatic renal-cell carcinoma [29, 30], to provide an example of our proposed method. The study randomized 732 patients, 369 to anti-VEGF treatment and 363 to a control group [29, 30]. The trial was designed with 588 deaths so that the log-rank statistic would have 86% power to detect a hazard ratio of 0.76 for deaths assuming a two-sided significance level of 0.05. The trial collected plasma from patients in order to study the relationship of angiogenic and inflammatory markers with clinical outcomes. A primary objective of the correlative science study was to associate the anti-VEGF biomarkers from the angioma assay with overall survival and build a prognostic model that predicts the clinical outcome [31, 32]. Another objective was to correlate the anti-VEGF biomarkers with the best objective response rate (defined as either partial or complete response). The angioma multiplex array has gone through a rigorous evaluation to ensure data quality [31, 32]. Markers performed include Ang-2, bFGF, BMP-9, CRP, Endoglin, Gro-a, HGF, ICAM-1, IGFBP-1, IGFBP-2, IGFBP-3, IL-6, IL-8, MCP-1, OPN, P-selectin, Pai-1-active, Pai-1-total, PDGF-AA, PDGF-BB, PEDF, PlGF, SDF-1, TGFβ1, TGFβ2, TGFβ3-R3, TSP-2, VCAM-1, VEGF, VEGF-C, VEGF-D, VEGF-R1, and VEGF-R2.

We used the random forest, LASSO, and adaptive LASSO to select the top three biomarkers of the 33 biomarkers for best objective response. The top three genes (HGF, IL_6, and VEGF_R2) with highest full AUC (0.576, 0.610, and 0.563) were chosen as an example to demonstrate the scenario where the AUCs were close to each other as a potential advantage of the use of the proposed method. The empirical estimates for the pAUC for these three biomarkers are 0.012, 0.012, and 0.028. The correlation matrices for nonresponders and responders are

The proposed method provided the following combination:with the estimated pAUC of 0.0427 and the estimated standard deviation of 0.0080 based on 1,000 bootstrap sampling.

In contrast, the SULIU method provided the following combination:with the estimated pAUC of 0.0426 and the estimated standard deviation of 0.0084.

The LIU method provided the following combination:with the estimated pAUC of 0.0254 and the estimated standard deviation of 0.0099, whereas the LOGISTIC’s method had the following combination:with the estimated pAUC of 0.0422 and the estimated standard deviation of 0.0084.

5.2. Example 2

In this section, the proposed method MIN-MAX as well as the SULIU, LIU, and the LOGISTIC is applied to a real dataset of 125 females on Duchenne Muscular Dystrophy (DMD) dataset. This biomedical data originally containing 209 observations (134 for “normals” and 75 for “carriers”) has been studied by Cox et al. [33] in order to develop screening methods to identify carriers of a rare genetic disorder based on four measurements made on blood samples. This dataset has been widely studied in the literature for improving the classification accuracy by using ROC analysis. The main objective is to combine four markers to increase the diagnostic accuracy of screening females as potential DMD carriers. For example, Kang et al. [14] applied the stepwise methods to combine four makers in this data to improve AUC; Hsu and Hsueh [18] and Yu and Park [19] applied their proposed algorithm to pAUC in this data.

Since four different variables M1–M4 were measured in each blood sample, we processed the data by taking average values for each measurement if one had blood drawn at several different time points. Among the 125 females, there are 87 normals and 38 carriers.

Similarly, we investigated the performance of the four different methods on the pAUC over the range 0–0.2. Since the four measurements are in different scales, we applied the standardization method by dividing each value by the range of that variable before the use of MIN-MAX approach. denote the standardized marker values. The empirical estimates for the pAUC for these four biomarkers are 0.1472, 0.0436, 0.1086, and 0.1229 for the M1–M4, respectively. The empirical estimates for the full AUC are 0.9034, 0.6057, 0.8232, and 0.8814. The correlation matrices for nonrespondents and respondents are

The proposed method provided the following combination (Table 5):with the estimated pAUC of 0.161 and the estimated standard deviation of 0.0119 based on 1,000 bootstrap sampling.

Table 5: The coefficients of the optimal linear combination and the corresponding estimated pAUC.

In contrast, the SULIU method provided the following combination (Table 5):with the estimated pAUC of 0.137 and the estimated standard deviation of 0.0157.

The LIU method provided the following combination (Table 5):with the estimated pAUC of 0.151 and the estimated standard deviation of 0.0135, whereas the LOGISTIC’s method had the following combination (Table 5):with the estimated pAUC of 0.156 and the estimated standard deviation of 0.0138.

Figure 1 presents the performance for each method.

Figure 1

6. Discussion

In this article, we extend the min-max method to the estimation of the pAUC and compare its performances to three commonly utilized methods. The proposed method has the advantage of both the min-max method and Pepe and Thompson’s method [16]. The expected advantages of this approach are threefold. First, it may yield larger partial area under the ROC curves. Second, it is a nonparametric approach and therefore it is robust against distributional assumptions. Lastly, it is computationally feasible and efficient since the min-max procedure involves searching for only one single coefficient. Our works [9, 10] have shown that the use of pAUC not only is clinically useful but also is statistically more efficient than the use of the full AUC in the families of area under the ROC curves that are nearly straight-line shaped. Another advantage of this method demonstrated through our simulation study is that in the scenario of straight-line shaped gamma ROC curves the estimate of pAUC based on re-substitution is close to the estimate based on the LOPO cross-validation. This implies that the min-max method on pAUC leads to good generalizability.

As pointed out by several authors [14, 22, 34], the use of the re-substitution to estimate the area under the ROC curve could usually lead to the overoptimistic result, or upward biased estimates for independent dataset, or future observations. Huang et al. [22] proposed to use the LOPO cross-validation to obtain less biased estimates. Kang et al. [14] applied the LOPO cross-validation to compare different combination methods to maximize the AUC. Because the estimates through cross-validation lead to more reliable results in terms of its ability to generalize to an independent dataset, we recommend using cross-validation which performs better when decisions based on the re-substitution and the cross-validation approaches are different. Based on our simulation results, it is not surprising to observe that the standard deviation of the estimated pAUC decreased as the sample size increased and that the estimate of the pAUC based on the re-substitution approach was becoming closer to the estimate of the pAUC based on the LOPO cross-validation as the sample size increased.

Evaluation of diagnostic assays and prognostic performance of biomarkers will continue to remain an important research topic in several medical areas. This is especially true in oncology where diagnostic assays based on several combinations of biomarkers are developed and validated. For example, a 22-gene model was developed and validated to predict prostate cancer risk [35]. In addition, identifying predictive markers of clinical outcomes is a hot area of research as finding the optimal treatment to tailor patients is attractive not only to patients but also to physicians, insurance company, and society as a whole. Currently, several predictors or signatures of outcomes are being used to guide therapies in clinical trials [35]. For example OncotypeDx, a 21-gene expression signature, is being used to select treatment in patients with breast cancer based on the recurrence score [36]. Recognizing the fact that more predictors will continue to be applied in the clinic, it is critical that when a combination of biomarkers is developed this would result in the highest pAUC.

Based on our extensive simulations, our recommendations are the following:

(1) Use the SULIU or LOGISTIC approach to estimate the pAUC with approximately equal variance multivariate normal data regardless whether the AUCs among biomarkers are relatively close or far apart. The LIU’s approach underestimated the pAUC approximately by 1/3. This is partly due to the instability of the eigenvector of the identity matrix, since LIU’s approach involves the calculation of the eigenvector corresponding to the smallest eigenvalue of which is an identity matrix under this scenario when , and the eigenvector corresponding to the smallest eigenvalue is not stable under small perturbation of the identity matrix [15].

(2) Utilize the min-max approach to estimate the pAUC with unequal variance multivariate normal data when the AUCs are relatively close and use the SULIU’s approach when the AUCs are far apart.

(3) Employ the min-max approach to estimate the pAUC with highly skewed data when the AUCs are relatively close, but use the LOGISTIC method when the AUCs are far apart.

(4) Use the min-max approach to estimate the pAUC with straight-line shaped ROC curves regardless whether the AUCs are close or far apart.

In summary, the min-max approach seems to be robust and investigators are encouraged to use it in the estimation of the pAUC. It is simple to implement and is computationally feasible. In an era of personalized medicine, it is anticipated that the evaluation of diagnostic assays and the performance of the combination of biomarkers will remain an important area of research not only in diagnosing patients but also in treating patients with the disease.

Data Availability

The data from the simulation are available from the first author. The data from CALGB 90206 can be accessed through the Alliance in Clinical trials in Oncology.

Disclosure

The content of this article was presented at the 2016 Eastern North American Region Annual Meeting in Austin, TX.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded in part by the NIH R01CA155296, U01CA157703, the Prostate Cancer Foundation Challenge Award, and the United States Army Medical Research (Awards W81XWH-15-1-0467 and W81XWH-18-1-0278). Research of A. Liu was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development Intramural Research Program.

References

  1. J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982. View at Publisher · View at Google Scholar · View at Scopus
  2. X.-H. Zhou, N. A. Obuchowski, and D. K. McClish, Statistical Methods in Diagnostic Medicine, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], New York, NY, USA, 2002. View at Publisher · View at Google Scholar · View at MathSciNet
  3. M. S. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction, vol. 28 of Oxford Statistical Science Series, Oxford University Press, Oxford, UK, 2003. View at MathSciNet
  4. D. K. McClish, “Analyzing a Portion of the ROC Curve,” Medical Decision Making, vol. 9, no. 3, pp. 190–195, 1989. View at Publisher · View at Google Scholar · View at Scopus
  5. Y. Jiang, C. E. Metz, and R. M. Nishikawa, “A receiver operating characteristic partial area index for highly sensitive diagnostic tests,” Radiology, vol. 201, no. 3, pp. 745–750, 1996. View at Publisher · View at Google Scholar · View at Scopus
  6. L. E. Dodd and M. S. Pepe, “Partial AUC estimation and regression,” Biometrics: Journal of the International Biometric Society, vol. 59, no. 3, pp. 614–623, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  7. Y. He and M. Escobar, “Nonparametric statistical inference method for partial areas under receiver operating characteristic curves, with application to genomic studies,” Statistics in Medicine, vol. 27, no. 25, pp. 5291–5308, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  8. D. D. Zhang, X.-H. Zhou, D. H. Freeman Jr., and J. L. Freeman, “A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets,” Statistics in Medicine, vol. 21, no. 5, pp. 701–715, 2002. View at Publisher · View at Google Scholar · View at Scopus
  9. H. Ma, A. I. Bandos, H. E. Rockette, and D. Gur, “On use of partial area under the ROC curve for evaluation of diagnostic performance,” Statistics in Medicine, vol. 32, no. 20, pp. 3449–3458, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  10. H. Ma, A. I. Bandos, and D. Gur, “On the use of partial area under the ROC curve for comparison of two diagnostic tests,” Biometrical Journal, vol. 57, no. 2, pp. 304–310, 2015. View at Publisher · View at Google Scholar · View at MathSciNet
  11. J. Q. Su and J. S. Liu, “Linear combinations of multiple diagnostic markers,” Journal of the American Statistical Association, vol. 88, no. 424, pp. 1350–1355, 1993. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  12. A. Liu, E. F. Schisterman, and Y. Zhu, “On linear combinations of biomarkers to improve diagnostic accuracy,” Statistics in Medicine, vol. 24, no. 1, pp. 37–47, 2005. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  13. H. Jin and Y. Lu, “The optimal linear combination of multiple predictors under the generalized linear models,” Statistics & Probability Letters, vol. 79, no. 22, pp. 2321–2327, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  14. L. Kang, A. Liu, and L. Tian, “Linear combination methods to improve diagnostic/prognostic accuracy on future observations,” Statistical Methods in Medical Research, vol. 25, no. 4, pp. 1359–1380, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  15. R. Allez and J. Bouchaud, “Eigenvector dynamics: General theory and some applications,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 86, no. 4, Article ID 046202, 2012. View at Publisher · View at Google Scholar
  16. M. S. Pepe and M. L. Thompson, “Combining diagnostic test results to increase accuracy,” Biostatistics, vol. 1, no. 2, pp. 123–140, 2000. View at Google Scholar
  17. M. S. Pepe, T. Cai, and G. Longton, “Combining predictors for classification using the area under the receiver operating characteristic curve,” Biometrics: Journal of the International Biometric Society, vol. 62, no. 1, pp. 221–229, 319, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  18. M.-J. Hsu and H.-M. Hsueh, “The linear combinations of biomarkers which maximize the partial area under the ROC curves,” Computational Statistics, vol. 28, no. 2, pp. 647–666, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  19. W. Yu and T. Park, “Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve,” Computational Statistics & Data Analysis, vol. 88, pp. 15–27, 2015. View at Publisher · View at Google Scholar · View at MathSciNet
  20. C. Liu, A. Liu, and S. Halabi, “A min-max combination of biomarkers to improve diagnostic accuracy,” Statistics in Medicine, vol. 30, no. 16, pp. 2005–2014, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  21. P. J. Heagerty and M. S. Pepe, “Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in US children,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 48, no. 4, pp. 533–551, 1999. View at Publisher · View at Google Scholar · View at Scopus
  22. X. Huang, G. Qin, and Y. Fang, “Optimal combinations of diagnostic tests based on AUC,” Biometrics: Journal of the International Biometric Society, vol. 67, no. 2, pp. 568–576, 2011. View at Publisher · View at Google Scholar · View at MathSciNet
  23. J. P. Egan, Signal Detection Theory and ROC Analysis, Academic Press, New York, NY, USA, 1975.
  24. D. D. Dorfman, K. S. Berbaum, C. E. Metz, R. V. Lenth, J. A. Hanley, and H. A. Dagga, “Proper receiver operating characteristic analysis: the bigamma model,” Academic Radiology, vol. 4, no. 2, pp. 138–149, 1997. View at Publisher · View at Google Scholar · View at Scopus
  25. D. Faraggi, B. Reiser, and E. F. Schisterman, “ROC curve analysis for biomarkers based on pooled assessments,” Statistics in Medicine, vol. 22, no. 15, pp. 2515–2527, 2003. View at Publisher · View at Google Scholar · View at Scopus
  26. Y. Huang and M. S. Pepe, “A parametric ROC model-based approach for evaluating the predictiveness of continuous markers in case-control studies,” Biometrics: Journal of the International Biometric Society, vol. 65, no. 4, pp. 1133–1144, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  27. J. A. Hanley, “Receiver operating characteristic (ROC) methodology: the state of the art,” Critical Reviews in Computed Tomography, vol. 29, no. 3, pp. 307–35, 1989. View at Google Scholar
  28. R. B. Nelsen, An Introduction to Copulas, vol. 139 of Lecture Notes in Statistics, Springer, Berlin, Germany, 1999. View at Publisher · View at Google Scholar · View at MathSciNet
  29. B. I. Rini, S. Halabi, J. E. Rosenberg et al., “Bevacizumab plus interferon alfa compared with interferon alfa monotherapy in patients with metastatic renal cell carcinoma: CALGB 90206,” Journal of Clinical Oncology, vol. 26, no. 33, pp. 5422–5428, 2008. View at Publisher · View at Google Scholar · View at Scopus
  30. B. I. Rini, S. Halabi, J. E. Rosenberg et al., “Phase III trial of bevacizumab plus interferon alfa versus interferon alfa monotherapy in patients with metastatic renal cell carcinoma: final results of CALGB 90206,” Journal of Clinical Oncology, vol. 28, no. 13, pp. 2137–2143, 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. A. B. Nixon, S. Halabi, I. Shterev et al., “Identification of predictive biomarkers of overall survival (OS) in patients (pts) with advanced renal cell carcinoma (RCC) treated with interferon alpha (I) +/- bevacizumab (B): Results from CALGB 90206 (Alliance),” Journal of Clinical Oncology, vol. 31, article no. 4520, 2013. View at Google Scholar
  32. A. B. Nixon, H. Pang, M. D. Starr et al., “Prognostic and predictive blood-based biomarkers in patients with advanced pancreatic cancer: Results from CALGB80303 (alliance),” Clinical Cancer Research, vol. 19, no. 24, pp. 6957–6966, 2013. View at Publisher · View at Google Scholar · View at Scopus
  33. L. H. Cox, M. M. Johnson, and K. Kafadar, “Exposition of statistical graphics technology,” in Proceedings of the ASA Proceedings of the Statistical Computation Section, pp. 55-56, 1982.
  34. J. B. Copas and P. Corbett, “Overestimation of the receiver operating characteristic curve for logistic regression,” Biometrika, vol. 89, no. 2, pp. 315–331, 2002. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  35. N. Erho, A. Crisan, I. A. Vergara et al., “Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy,” PLoS ONE, vol. 8, no. 6, Article ID e66855, 2013. View at Publisher · View at Google Scholar · View at Scopus
  36. J. A. Sparano, R. J. Gray, D. F. Makower et al., “Prospective validation of a 21-gene expression assay in breast cancer,” The New England Journal of Medicine, vol. 373, no. 21, pp. 2005–2014, 2015. View at Publisher · View at Google Scholar · View at Scopus