Computational and Mathematical Methods in Medicine

Volume 2015, Article ID 128930, 7 pages

http://dx.doi.org/10.1155/2015/128930

## Efficient Noninferiority Testing Procedures for Simultaneously Assessing Sensitivity and Specificity of Two Diagnostic Tests

^{1}Epidemiology and Biostatistics Program, Department of Environmental and Occupational Health, School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA^{2}Department of Mathematical Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA^{3}Division of Health Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA

Received 28 May 2015; Revised 3 August 2015; Accepted 6 August 2015

Academic Editor: Qi Dai

Copyright © 2015 Guogen Shan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Sensitivity and specificity are often used to assess the performance of a diagnostic test with binary outcomes. Wald-type test statistics have been proposed for testing sensitivity and specificity individually. In the presence of a gold standard, simultaneous comparison between two diagnostic tests for noninferiority of sensitivity and specificity based on an asymptotic approach has been studied by Chen et al. (2003). However, the asymptotic approach may suffer from unsatisfactory type I error control as observed from many studies, especially in small to medium sample settings. In this paper, we compare three unconditional approaches for simultaneously testing sensitivity and specificity. They are approaches based on estimation, maximization, and a combination of estimation and maximization. Although the estimation approach does not guarantee type I error, it has satisfactory performance with regard to type I error control. The other two unconditional approaches are exact. The approach based on estimation and maximization is generally more powerful than the approach based on maximization.

#### 1. Introduction

Sensitivity and specificity are often used to summarize the performance of a diagnostic or screening procedure. Sensitivity is the probability of positive diagnostic results given the subject having disease, and specificity is the probability of a negative outcome as the diagnostic result in the nondiseased group. Diagnostic tests with high values of sensitivity and specificity are often preferred and they can be estimated in the presence of a gold standard. For example, two diagnostic tests, the technetium-99m methoxyisobutylisonitrile single photon emission computed tomography (Tc-MIBI SPECT) and the computed tomography (CT), were compared for diagnosing recurrent or residual nasopharyngeal carcinoma (NPC) from benign lesions after radiotherapy in the study by Kao et al. [1]. The gold standard in their study is the biopsy method. The sensitivity and specificity are 73% and 88% for the CT test and 73% and 96% for the Tc-MIBI SPECT test.

Traditionally, noninferiority of sensitivity and specificity between two diagnostic procedures is tested individually using the the McNemar test [2–6]. Recently, Tange et al. [7] developed an approach to simultaneously test sensitivity and specificity in noninferiority studies. Lu and Bean [2] were among the first researchers to propose a Wald-type test statistic for testing a nonzero difference in sensitivity or specificity between two diagnostic tests for paired data. Later, it was pointed out by Nam [3] that the test statistic by Lu and Bean [2] has unsatisfactory type I error control. A new test statistic based on a restricted maximum likelihood method was then proposed by Nam [3] and was shown to have good performance with actual type I error rates closer to the desired rates. This test statistic was used by Chen et al. [8] to compare sensitivity and specificity simultaneously in the presence of a gold standard. Actual type I error rates for a compound asymptotic test were evaluated on some specific points in the sample space. It is well known that the asymptotic method behaves poorly when the sample size is small. Therefore, it is not necessary to comprehensively evaluate type I error rate [9–14].

An alternative to an asymptotic approach is an exact approach conducted by enumerating all the possible tables for given total sample sizes of diseased and nondiseased subjects. The first commonly used unconditional approach is a method based on maximization [15]. In the unconditional approach, only the number of subjects in the diseased and nondiseased group is fixed, not the total number of responses from both groups. The latter is considered as the usual conditional approach by treating both margins of the table as fixed. The value of the unconditional approach based on maximization is calculated as the maximum of the tail probability over the range of a nuisance parameter [15]. This approach has been studied for many years and it can be conservative due to a smaller actual type I error rate as compared to the test size in small sample settings. One possible reason leading to the conservativeness of this approach is the spikes in the tail probability curve. Storer and Kim [16] proposed another unconditional approach based on estimation which is also known as the parametric bootstrap approach. The maximum likelihood estimate (MLE) is plugged into the null likelihood for the nuisance parameter. Other estimates may be considered if the MLE is not available [7]. Although this estimation based approach is often shown to have type I error rates being closer to the desired size than asymptotic approaches, it still does not respect test size.

A combination of the two approaches based on estimation and maximization has been proposed by Lloyd [4, 17] for the testing of noninferiority with binary matched-pairs data, which can be obtained from a case-control study and a twin study. The value of the approach based on estimation is used as a test statistic in the following maximization step. It should be noted that there could be multiple estimation steps before the final maximization step. The final step must be a maximization step in order to make the test exact. This approach has been successfully extended for the testing trend with binary endpoints [5, 18]. The rest of this paper is organized as follows. Section 2 presents relevant notation and testing procedures for simultaneously testing sensitivity and specificity. In Section 3, we extensively compare the performance of the competing tests. A real example is illustrated in Section 4 for the application of asymptotic and exact procedures. Section 5 is given to discussion.

#### 2. Testing Approaches

Each subject in a study is evaluated by two dichotomous diagnostic tests, and , in the presence of a gold standard. Suppose each subject, either diseased or nondiseased, was already determined by the gold standard before performing the two diagnostic tests. Within the diseased group, () is the number of subjects with diagnostic results and , where and represent negative and positive diagnostic results from the th test , respectively, with being the associated probability. The total number of diseased subjects is . Similarly, () is the number of subjects with diagnostic results and in the nondiseased group, is the associated probability, and is the total number of nondiseased patients. Such data can be organized in a contingency table (Table 1), where and . It is reasonable to assume that the diseased group is independent of the nondiseased group.