Confidence Interval Estimation of an ROC Curve: An Application of Generalized Half Normal and Weibull Distributions
In the recent past, the work in the area of ROC analysis gained attention in explaining the accuracy of a test and identification of the optimal threshold. Such types of ROC models are referred to as bidistributional ROC models, for example Binormal, Bi-Exponential, Bi-Logistic and so forth. However, in practical situations, we come across data which are skewed in nature with extended tails. Then to address this issue, the accuracy of a test is to be explained by involving the scale and shape parameters. Hence, the present paper focuses on proposing an ROC model which takes into account two generalized distributions which helps in explaining the accuracy of a test. Further, confidence intervals are constructed for the proposed curve; that is, coordinates of the curve (FPR, TPR) and accuracy measure, Area Under the Curve (AUC), which helps in explaining the variability of the curve and provides the sensitivity at a particular value of specificity and vice versa. The proposed methodology is supported by a real data set and simulation studies.
In classification analysis, the Receiver Operating Characteristic (ROC) curve is a widely used tool to evaluate the performance of a test. Further, the intrinsic measures such as sensitivity, specificity, and accuracy are essential to describe a diagnostic test’s ability to classify an individual into one of the two groups/populations. Sensitivity provides an estimate of how good the test is at predicting a disease. Specificity estimates how likely patients without disease can be correctly identified. ROC curve is a graphical representation of 1 − specificity and sensitivity. That is, the points of the curve are obtained by moving the classification threshold from the most positive classification value to the most negative. For a random classification, the ROC curve is a straight line connecting the origin to top right corner of the graph . Further, the accuracy measure is defined as the area under the ROC curve. Therefore, the criterion widely used to measure the accuracy of a test in ROC context is the area under an ROC curve (AUC).
In classification, the main aim is to discriminate between normal and abnormal populations with better accuracy. In the literature so far many ROC models exist based on bidistributional assumptions such as binormal (Egan ), bilogistic and bilognormal (Dorfman and Alf Jr. [2, 3]), bibeta and biexponential (Zou et al. ; Tang et al. ; Tang and Balakrishnan ), and bigamma etcetera (Hussain ). If the test scores of normal and abnormal populations follow different distributions, then these ROC forms will not produce reliable outputs. For instance, consider that a marker, namely, APACHE (Acute Physiology and Chronic Health Evaluation) II, is used to predict the mortality status of patients who gets admitted into ICU. The pattern of APACHE scores for live and dead patient’s does not possess the normality and explains skewed nature of the data. Here, the conventional binormal ROC model will fail to produce reliable outputs in terms of AUC, threshold, sensitivity, and specificity. However, the distribution of scores may follow any skewed distributions. Hence, the main concentration of the paper lies in handling the situations when distributions of two populations are different and the data skewed nature of the data. We propose an ROC model that takes into account Generalized Half Normal (normal population) and Weibull (abnormal population) distribution with shape and scale parameters. In medical, engineering, and life studies, data tend to have extended tails; in this situation, the conventional binormal ROC curve fails to explain the hidden accuracy of the test considered. Recently, Balaswamy et al.  addressed this issue and developed a Hybrid ROC (HROC) curve which is based on Half Normal and Exponential distributions. However, this model is restricted by considering only scale parameters to illustrate the accuracy. But there are other statistical measures which accounts the information about the tail property of the data. In this paper, an extended version of the HROC curve is proposed by considering the Generalized Half Normal and Weibull distributions with both scale and shape parameters corresponding to normal as well as abnormal populations. A bootstrap study is used to construct the 95% confidence intervals and other measures of the proposed ROC curve. Further, the proposed methodology is demonstrated using simulation studies as well as a real data set.
The present paper is organized as follows. The ROC curve is developed based on Generalized Half Normal (GHN) and Weibull distributions with scale () and shape () parameters of both functions and GHROC curve accuracy measure, Area Under the Curve, is derived. Further, the confidence intervals for AUC and proposed ROC curve are estimated through bootstrap method. Finally, the results obtained using proposed methodology are illustrated in Results and Discussion.
Let be the test scores, which are observed in normal () and abnormal () populations, respectively. Here, it is assumed that and populations follow Generalized Half Normal (GHN) and Weibull distributions with shape and scale parameters as and , respectively. The probability density function and cumulative distribution function of GHN (Cooray and Ananda ) and Weibull distributions are given as follows:where is the c.d.f. of the standard normal distribution:In classification, ROC curve is a graphical plot that illustrates the performance of a binary classifier as its discrimination threshold varies (Green and Swets ). The curve is obtained by plotting the false positive rate (FPR) against the true positive rate (TPR).
The expression for FPR is derived by using its probabilistic definition ason further simplification, the expression for can be obtained by the formulawhere is the inverse cumulative standard normal distribution function.
Similarly, the expression for TPR is derived by using its probabilistic definition from Weibull distribution ason substituting (4) into (6), the expression for TPR can be written ashere , , and (7) is the expression of ROC Curve based on Generalized Half Normal and Weibull distributions. This expression (7) can be referred to as the Generalized Hybrid ROC (GHROC) curve, since the ROC curve is developed based on two generalized distributions.
In ROC methodology, the statistical measure which helps in explaining the overlapping area and the accuracy of a classifier is the Area Under the Curve (AUC). It can be interpreted as the probability that a subject randomly selected from the group with the condition will have a discriminating score indicating greater likelihood than that of a randomly selected subject from the group without condition (Bamber ). The AUC can take values between 0 and 1 with practical lower bound value of 0.5 (chance diagonal). The expression for the accuracy measure AUC can be obtained by integrating the ROC expression (7) over the range with respect to the false positive rate asThe above expression has no closed form; hence it has to be solved using numerical integration. In the next subsection, the variance and confidence intervals for AUC are estimated through bootstrapping method.
2.1. Confidence Intervals for AUC
The confidence interval for AUC can be defined aswhere is the standard normal percentile and is the estimated variance of , which is obtained using bootstrapping. Let “” be the number of bootstraps obtained from the data with the sample sizes and , respectively, from normal and abnormal populations. Then the bootstrapped AUC estimate and its variance are where is the th bootstrap estimate of AUC. The next subsection deals with the construction of confidence intervals for the proposed ROC curve to explain the variability of the curve at each and every threshold value.
2.2. Confidence Intervals for GHROC Curve
The confidence intervals for the GHROC curve are estimated using delta method. This confidence interval for the ROC Curve represents the range at each point of false positive rate and its corresponding true positive rate. Therefore, the confidence intervals for FPR and TPR are as follows:where and are the estimated FPR and TPR, respectively, and their variances areFurther, the confidence intervals for FPR and TPR can be obtained using the following expression: (for complete proof, refer to appendix). These confidence interval lines show the variability of the proposed ROC curve at each and every point on the ROC curve.
In the next section, the results are carried out using simulation studies and a real data set to explain the proposed methodology. Further, the confidence intervals are evaluated for the summary measure AUC and the intrinsic measures FPR and TPR.
3. Results and Discussion
The proposed methodology is demonstrated using simulation studies and real data set (SAPS III).
3.1. Simulation Studies
Simulation studies are conducted with different combinations of scale and shape parameters of both normal and abnormal populations and the entire simulations are done at various sample sizes with bootstraps. At every parameter combination and sample size, the AUC and its confidence intervals are obtained. The main purpose of conducting simulations is to show how the AUC of GHROC curve possesses different values as the scale and shape parameters of the normal and abnormal distributions change. The variations in the parameter values of both populations are used to explain the overlapping area in terms of AUC; this mean that the higher the AUC, the lesser the overlapping area and vice versa. Further, to demonstrate the behavior of AUC, the entire simulation work is carried out with three different experiments. In the first experiment, the shape parameter of abnormal population is varied by fixing the other parameters as constant; in second experiment, the scale parameter of abnormal population is varied by fixing the other parameters as constant and, in the third experiment, the shape parameters of both populations are considered to be equal with varying scale in abnormal population. The results so obtained from these experiments are reported in Table 1.
In the first experiment, when with , , and , the AUC is observed to be around 0.6791 (67.91% of accuracy) and, as takes higher values as 3 and 5, the AUC is observed to have a better value indicating high level of accuracy, thus, reflecting the scenario that as the discrepancy between shape parameters of both normal and abnormal population’s increases, AUC attains a larger value indicating a better extent of correct classification with minimum percentage of overlapping area. Suppose that if we have real data set with these parameter values then that particular test will provide a better accuracy. Along with the shape, scale parameter also influences the measure AUC. Further, in experiment , scale parameter of abnormal population () is varied by keeping all the other parameters as constant. Moderate levels of discrepancy in the shape values and scale parameters influence the accuracy of the classification. As attains a larger value, the AUC of GHROC curve tend to have better values of accuracy. So this reveals that along with discrepancy in shape parameters of both populations, scale parameter also tends to explain better variability in the data giving rise to talk about the exact performance of the test considered. The accuracy of the test needs to be examined when there is an equal discrepancy in the shape parameter with varying scale parameters. This is addressed by conducting another experiment (third). Here, the first part is defined by considering the scale and shape parameters of both populations to be equal and, in the second part of this experiment, the scale parameters are varied by taking equal shape parameters. The first part reveals the finding that, when all parameters tend to be equal to unit value, then two populations get overlapped giving rise to having AUC nearer to 0.5. The results of the second part outline the observation that even though the shape parameters are equal, the discrepancy in scale parameters of abnormal population tends to explain the hidden accuracy and when the discrepancy between the scale values of two populations is larger, the explanation about the accuracy of the test can be given better. Thus, from three experiments it is noticed that shape parameter has its major influence in explaining better accuracy of a test than that observed with scale parameter alone. However, scale parameter also has its role in explaining the accuracy and it should not be neglected.
To demonstrate the proposed methodology with the help of graphical visualization, ROC curves are drawn for three experiments (Figure 1). From Figure 1(a), it is visualized that the curve moves towards the top left corner of the plot with increasing accuracy as the shape of abnormal population tends to have a larger value. Further, Figure 1(b) explains the effect of scale in abnormal population and it can be seen that the curve moves away from chance line with high accuracy as the scale attains larger value in abnormal population. Figure 1(c) illustrates the effect of scale parameter in presence of equal shape parameter and it is observed that the shape of the ROC curve is affected as the scale changes. From Figure 1, it is reasonable to say that the proposed ROC curve completely depends on the shape and scale parameters of normal and abnormal populations.
(a) Effect of shape parameter in abnormal population (experiment )
(b) Effect of scale parameter in abnormal population (experiment )
(c) Effect of scale parameter in abnormal population with constant shape parameter (experiment )
Apart from explaining the importance and the influence of the scale and shape parameters in GHROC context, it is essential to construct the confidence intervals for the measures of GHROC curve. This attempt is to illustrate the changing behavior of the estimates of the proposed ROC curve. In statistical literature, the theory of interval estimation has gained its importance over point estimation because it reveals the true information of the estimate within the potential uncertainties. Hence, it is very important to address the position of the true estimate in the presence of sample size within the range of potential uncertainties. The confidence intervals are constructed for all the combinations which are defined as three different experiments.
With respect to the approach of confidence interval, the perception about the impact of sample size on the width of the confidence intervals and the graphical visualization of the true estimates of GHROC curve along with its confidence intervals is more important and to be addressed. From the results, it is evident that the sample size effect can be witnessed in terms of the width of the confidence interval, notifying that the true estimate is independent from the effect of sample size and its corresponding confidence interval possesses a narrowing-down phenomenon. These simulation studies points out the information that, irrespective of the sample size and width of the confidence interval, the information about the true estimate of the ROC curve lies within the potential uncertainties. Even though this is a generally observed phenomenon but the fact to be noticed is that the variability in the populations will get diminished as the sample size takes a larger number, giving rise to a shortened confidence interval.
Figure 2 clearly explains the variability of GHROC curve at each and every point on the ROC curve. This means the lower control limit and the upper control limit for the proposed ROC curve are plotted at a particular sample size (Figure 2) and these curves explain the range of false positive rate and true positive rate at each and every threshold. Further, the optimal threshold is also depicted in Figure 2 along with the pair obtained at that particular optimal threshold. Further, this optimal threshold is used to classify the subjects with better accuracy and this can be used as a reference value for future classification. For example, consider the combination , , , and and, at this combination, the optimal threshold is found to be 2.0592 with true positive rate 0.8096. This explains the identification of abnormal subjects as abnormal with 80.96% of correct classification at the optimal threshold value 2.0592 for the considered combination. At the case of worst classification (equal scale and shape parameters), the optimal threshold is observed to be 1.9914 with very less value of true positive rate 0.1846 (Figure 2). Similarly, the interpretation can be given for all the remaining combinations which are considered in the study using Figure 2.
3.2. Real Data Set
The real data set is about the ICU scoring system; SAPS III is a system for predicting mortality (dead or alive) status of a patient in ICU. SAPS III has been designed to provide a real-life predicted mortality for a patient by following a well defined procedure, based on a mathematical model that needs calibration. This data consists of a total of 111 respondents of which 66 (59.45%) are alive and 45 (40.54%) are dead.
From this data set it is observed that the SAPS III scores for dead patients follow Weibull distribution (KS − Statistic = 0.1280; value = 0.4165 at 0.05 level of significance) whereas the scores for patients who are alive follow GHN distribution (KS − Statistic = 0.0901; value = 0.6243 at 0.05 level of significance). The results for the prognosis of disease are reported in Table 2. It is observed that the accuracy of the test is 62.78% indicating that the SAPS III score is able to identify the status of mortality about 62.78%. The optimal threshold value is observed to be 22.00 which means that when the SAPS III score exceeds the optimal threshold 22.00, the patient will have 71.35% chance of death. Further, the confidence interval of AUC is (0.5530, 0.6917) and the proposed ROC curve for SAPS III uniformly lies above the chance line to explain the mortality rate (Figure 3) depicting lower and upper confidence intervals for proposed ROC curve along with its optimal threshold.
The present paper is focused on addressing the practical issue where the populations with and without condition underlie two different generalized skewed distributions with scale and shape parameters which are useful in explaining and handling skewed nature of the data. Simulation studies are conducted at various combinations of the parameters. The entire exercise is done using three experiments and the effect of sample size is also noted. Further, it is observed that the width of the confidence interval is affected by the size of the sample in turn providing shortened confidence intervals as sample size is considered to be large. Moreover, from the proposed methodology it is feasible to identify the sensitivity at a specific false positive rate and vice versa.
Further, the proposed methodology is applied to a real data set, namely, SAPS III, which is used to predict the mortality status of the patient in ICU. The accuracy of SAPS III system in predicting the mortality event, death, is 62.78%. The optimal threshold is identified to be 22.00 which can be used to identify the status of a new individual whose SAPS III score is calculated.
The partial differentiations of FPR and TPR with respect to their parameters areNow, by substituting the above partial derivatives in (12), we haveThe bootstrapped estimates and their variances of the parameters , , , and arewhere , , , and are the th bootstrap estimates of , , , and , respectively.
Now, by substituting the above variances of the parameters of two considered distributions in (A.2) and (A.3), we can obtain the expressions for the variances of FPR and TPR, respectively. Further, using (A.2) and (A.3), the confidence intervals can be estimated for the intrinsic measures which results in producing the confidence intervals for the proposed ROC curve as follows:
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank and acknowledge Dr. Vimal Kumar, Department of Public Health and Medicine, SRM Medical College Hospital and Research Centre, Chennai, India, for sharing the data to carry out the results.
J. P. Egan, Signal Detection Theory and ROC Analysis, Academic Press, New York, NY, USA, 1975.
D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics, John Wiley & Sons, New York, NY, USA, 1966.