#### Abstract

We propose a new general Bayesian latent class model for evaluation of the performance of multiple diagnostic tests in situations in which no gold standard test exists based on a computationally intensive approach. The modeling represents an interesting and suitable alternative to models with complex structures that involve the general case of several conditionally independent diagnostic tests, covariates, and strata with different disease prevalences. The technique of stratifying the population according to different disease prevalence rates does not add further marked complexity to the modeling, but it makes the model more flexible and interpretable. To illustrate the general model proposed, we evaluate the performance of six diagnostic screening tests for Chagas disease considering some epidemiological variables. Serology at the time of donation (negative, positive, inconclusive) was considered as a factor of stratification in the model. The general model with stratification of the population performed better in comparison with its concurrents without stratification. The group formed by the testing laboratory Biomanguinhos FIOCRUZ-kit (c-ELISA and rec-ELISA) is the best option in the confirmation process by presenting false-negative rate of 0.0002% from the serial scheme. We are 100% sure that the donor is healthy when these two tests have negative results and he is chagasic when they have positive results.

#### 1. Introduction

A major challenge for diagnostic medicine is the determination of the true health status of an individual (sick or healthy) in relation to a certain disease when the gold standard test does not exist or its use is limited. Such problem may occur especially when the gold standard technique is an invasive procedure, risky, or financially costly and the definitive verification for apparently healthy individuals is thus neither practical nor ethical.

The development of statistical procedures for the estimation of performance parameters (sensitivity and specificity) of diagnostic tests is relatively straightforward when the gold standard exists and is applied in all subject of sample under investigation (see [1–3]). When the gold standard test does not exist or its use is limited, the most widely used alternative is the latent class modeling [4–8].

The latent class structure allows for modeling flexibility for both frequentist and Bayesian approaches, and the estimates of interest can be obtained based on numerical methods. For instance, on has expectation-maximization algorithms [9] in the case of the frequentist approach and Gibbs sampling [10] and Metropolis-Hastings [11] algorithms in the case of the Bayesian approach.

Considering the conditional independence structure between tests, that is, given the health condition of the subject, the result of test is independent of the result of test , several researchers have developed and implemented models that evaluate the performance of one or multiple diagnostic tests considering a model structure with partial or complete absence of a gold standard, sample stratification, and inclusion of covariates.

Several of these studies have tested models in which only subjects with positive results in at least one of the tests are submitted to the gold standard. For instance, Schatzkin et al. introduced a model to evaluate the relative performance of two tests using a frequentist approach [12]. In a simulation study, Cheng et al. evaluated the average coverage probability of the confidence intervals of these model parameters [13], while Pepe and Alonzo proposed the inclusion of a vector of covariates in this model [14]. Also, using a frequentist approach, Macaskill et al. presented a model for the estimation of point and interval likelihood ratios [15] and Walter proposed an alternative model able to directly estimate these measures from the latent class [6]. The Bayesian extensions of this model has developed by Van der Merwe and Maritiz [16] and Martinez et al. [17].

Situations in which none of the sample subjects are submitted to the gold standard are also widely reported in the literature. Gastwirth et al. proposed Bayesian modeling for the case of only one test and low disease prevalence, and they called attention to the fact that care should be taken when choosing a prior distribution [18]. Johnson and Gastwirth added transformations in the parameters to this model [19]. Joseph et al. applied Bayesian modeling based on the Gibbs sampling algorithm to the case of two diagnostic tests [4]. Leisenring and Pepe developed a logistic regression model for the case of one test in the presence of a covariate vector [20]. Using a frequentist approach, Albert et al. proposed a model including a finite mixture of distributions and covariates based on a random effect model [21]. Rathouz et al. presented a conditional logistic regression model including missing covariate data [22]. Martinez et al. first developed an extension of the model proposed by [4] including a covariate vector based on Metropolis-Hasting algorithms [23], and then a generalization to the case of tests and covariates, which were applied to a problem with three diagnostic tests and two covariates [8].

To evaluate the performance of two conditionally independent diagnostic tests, Hui and Walter developed, from the frequentist point of view, a model based on the stratification of the sample into two strata, considering different disease prevalences in each stratum but similar sensitivity and specificity values [24]. Johnson and Gastwirth discussed the applicability of this method to other datasets [25]. Enøe et al. proposed a Bayesian extension of this model based on a Gibbs sampling scheme [7]. According to Toft et al., it is important to find a stratifying factor which does not violate the assumption of constant sensitivity and specificity among the strata. When this assumption is questionable is indicated find another stratifier factor, preferably from a practical criterion, and to consider a model that allows inclusion of covariates [26].

The model in [24] is known as the Hui-Walter paradigm and has been widely discussed and applied in the literature using frequentist and Bayesian approaches, (see e.g., [8, 17, 23, 26–34], among others), but without including the general case of tests, covariates, and strata in the model structure which is properly proposed here.

Based on the above statement, the main aim of this paper is to present a Bayesian latent class approach when any of the subjects of the sample under investigation is subjected to the gold standard for evaluating the performance of imperfect tests applied to subsets and considering the presence of covariates.

The paper structure is as follows. Our general model is present in Section 2, including the covariate case as well as the inferential procedure. Section 3 presents the results of a small simulation study where the general model is compared to its concurrent particular cases. Section 4 presents the results of our modeling applied to a real dataset on Chagas' disease. Section 5 presents the final comments.

#### 2. Model: Full Absence of Gold Standard

We present in this section a latent class modeling considering the population stratified on the assumption of Hui and Walter [24] and full absence of gold standard.

##### 2.1. Performance Parameters

In the study of the performance of diagnostic tests, the probability of a test to yield a positive result , given that the individual has the disease , is known as the sensitivity of test and is mathematically expressed as , . If test is not 100% sensitive, it will fail to detect the disease in some individuals known to be sick. The proportion of negative results of test among sick individuals is known as the false-negative rate .

The specificity of test refers to the probability of this test to yield a negative result , given that the individual does not have the disease , and it is expressed as , . If test is not specific, it will falsely indicate the presence of disease in individuals known to be healthy. The proportion of positive results among healthy individuals is known as the false-positive rate .

##### 2.2. Stratification

According to Singer et al. [27], considering the assumption of the model of Hui and Walter [24] in which the population is divided into strata with different disease prevalences , but with similar performance of the tests among strata , the likelihood function of the observed data can be described as where, , is the number of known subjects in the stratum, and is the observed result of the test in the stratum to individual.

When none of these diagnostic tests can be considered to be the gold standard for the disease in question, the true but unknown health status of the subject (0: healthy or 1: sick) in the stratum, called , can be modeled from a Bernoulli distribution with success probability . Then some algebraic work is given by

Combining the likelihood function of observed data (1) with the likelihood function of latent variable , after some algebra, we obtain the augmented likelihood function of the latent class model for the general case of conditionally independent tests and strata, given by

##### 2.3. Including Covariates

The model structure including covariates can be constructed by linking the covariate matrix to the mean function of the response variable from the linear predictor according to the relation , where is a monotone and differentiable function, is a link function, is the vector of original parameters, and is the new vector of parameters of dimension .

On the basis of the logit link function, the relations of the vector of covariates with the original parameters of interest are given by where, , and , correspond to the intercepts of the logit link functions (4) for the prevalence in the stratum and sensitivity and specificity of the test, respectively (, ), that is, the estimates of these parameters when all covariates are null. We have and indicating how much the covariate influences the sensitivity and specificity, of the test, respectively and the disease prevalence in the stratum . Therefore, the new vector of parameters , , , will have dimension .

Replacement of the logit links (4) with their respective original parameters in the augmented likelihood function (3) results in the augmented likelihood function:

Model (5) covers a wide spectrum of latent class model. The frequentist framework proposed by Hui and Walter [24], Bayesian approach proposed by Joseph et al. [4] and Martinez et al. [8] are some particular cases of our model (5), when , , , , , and , , , respectively.

##### 2.4. Inference

For inference, we adopt a fully Bayesian approach. This choice was based on the complex structure of this model, which may have many parameters to be estimated depending on the configuration adopted. Also, if necessary or appropriate, expert opinion may be included in modeling via a prior distribution.

This approach is by no means the only modeling method, but it is the first and natural step that has the advantage of simplifying calculations, especially in view of the computational implementation of the Markov Chain Monte Carlo (MCMC) algorithms.

###### 2.4.1. Priors

Assuming independence between the parameters of the vector and following Dendukuri and Joseph [35] and Menten et al. [36], we consider the normal distribution to model a prior knowledge about each of the parameters of interest the vector . Since and hyperparameters known location and scale of the normal distribution:

###### 2.4.2. Conditional Posteriors

Irrespective of the form of the priors considered, the joint posterior distributions of the parameters are analytically intractable. We overcome this computational difficulty by using the Metropolis-Hastings algorithm [11], which allows us to simulate observations from nontrivial joint distributions by generating random samples successively from the full conditional distributions for the unknown parameters. The full conditional posterior densities for the parameters which are used in each step of the iterative sampling-based algorithms are given by where, , , , , and are given by

Convergence of the Metropolis-Hastings algorithm to a stationary distribution was evaluated using the reduction factor proposed by Gelman and Rubin [37].

#### 3. Simulation Study

A small simulation study was performed in order to confront the performance of the general model proposed with previous models which can be seem as its the particular cases.

We considered a situation with three tests under investigation , stratified in two stratums , in presence of a dichotomous covariate for a sample with 150 observations. From (9) and (10) we calculated the probabilities as well as their respective quantity of elements , for each combination of results of the tests under investigation in the stratum *v* conditioned on the dichotomous covariate health condition of the subject , which are given by
where and are the sensitivity and specificity rates of the test on the covariate level, respectively. And is the result of the test in the stratum for the subject:

For generation of the dataset, we considered prevalence sensitivity and specificity rates as shown in (Table 1).

The proposed model (5) was adjusted to simulated dataset for four special cases: SSensI , , ; SSensII , , , SSensIII , , , and SSensIV , , .

Following Martinez et al. [8], the hyperparameters of the normal distributions in (6), referring to the respective intercepts, were determined from the respective estimates of , obtained from the particular model without covariates considering noninformative Beta priors (Uniform: Beta) for each parameter of the vector . For instance, in this fitting, the sensitivity of test presented a 95% credible interval of (91.71%–99.99%; thus, the value of the location hyperparameter for the intercept of this test, in the logit model (4), was considered to be the mean of the two results between and , that is, . All scale hyperparameters for the intercepts were defined as of the respective location hyperparameter, with for this test. The location and scale hyperparameters, referring to the respective parameters , , and , were fixed at and .

We considered two parallel MCMC chains of iterations were run, with the first iterations of each chain being discarded and the remaining iterations being selected at intervals of . Thus, a final independent and identically distributed stationary sample with a size equals to of conditional posterior distributions (7) was obtained for each particular case (SSensI, SSensII, SSensIII, and SSensIV).

In order to decide for the best model to be fitted, we considered the information criteria proposed by Akaike [38] (Akaike's information criteria, AIC) and Schwarz [39] (Bayesian information criterion, BIC). These criteria have been discussed by Raftery [40], Kuha [41], and Posada and Buckley [42], among others. We also consider the deviance information criterion (DIC), which was proposed by Spiegelhalter et al. [43] and subsequently discussed by Kateri et al. [44] and Shriner and Yi [45], among others. Iliopoulos et al. suggested a Bayesian approximation for AIC and BIC [46]. Basically, these criteria quantify the deviation of the fitted model from the observed data. Model presenting the lowest AIC, BIC, and DIC values leads to the best fitting.

The Metropolis-Hastings algorithms, their convergence evaluation, and the AIC, BIC, and DIC criteria can be easily implemented in software **R**, which is freely available at http://www.r-project.org/, by researchers with moderate computer programming knowledge. The codes build for the present study can be requested by e-mail from the authors.

In addition to presenting estimates closer to the nominal (Table 1), the general model proposed demonstrated, from the information criteria, superior performance than the structures without stratification of the population and the absence of covariates (see Table 2).

#### 4. Application: Chagas Disease Data

To illustrate the proposed model, we consider a study on the performance of diagnostic tests used in screening for Chagas disease. A sample of 90 blood donors from a blood center in the region of Triângulo Mineiro, Brazil, was considered.

The participants were selected randomly from the three serology strata (SI: 30 donors selected among those with positive serology; SII: 30 donors selected among those with negative serology; SIII: 30 donors selected among those with inconclusive serology in the ELISA screening test). This study was approved by the UFTM Ethics Committee (protocol number. 464).

The ELISA test used for Chagas' disease screening at the time of blood donation was repeated and the participants were submitted to the following five other diagnostic tests: four serological tests including indirect immunofluorescence (IIF), indirect hemagglutination (IHA), which uses red blood cells covered with soluble *T. cruzi* antigen (performed at the Central Laboratory of UFTM), conventional and recombinant ELISA, which use CRA—and FRA—specific membrane antigens of *T. cruzi* (kit from Biomanguinhos Laboratory Fundação Oswaldo Cruz/FIOCRUZ, Brazilian Ministry of Health). The fifth test was the Hemoculture a parasitological test (blood culture in *T. cruzi*-specific) which is 100% specific. However, none of these tests is a gold standard for the detection of Chagas' disease.

In addition to the application of these six tests, the following epidemiological data were obtained from the donor records in order to evaluate their influence on the performance parameters of the diagnostic tests: age (, years), gender (male, female), and presence (yes, no) of at least two of three epidemiological risk factors (origin from an endemic region, a contact with the vector transmitting Chagas' disease, and a family history of Chagas' disease).

On the basis of a practical need, where it is desired to estimate the prevalence rate of *T. cruzi* infection among donors serologically inconclusive, we consider the serology at the time of donation as a stratifying factor in our model, where the first stratum (SI) is composed of donors with negative or positive serology and the second stratum (SII) is composed of donors with inconclusive serology at the time of screening.

However, to judge whether this decision was adequate, we consider here three particular cases of the model (5), hereafter called model ModI : no stratification), ModII : two strata, SI: negative or positive serology at the time of screening; SII: inconclusive serology), and ModIII : three strata, SI: negative serology at the time of screening; SII: positive serology; SIII: inconclusive serology), in all cases, we consider six tests and three covariates .

The location and scale hyperparameters for model ModII were obtained as described in Section 3 except for the related to blood culture test (HEMO) which has a specificity of 100% And they are shown in Table 3.

##### 4.1. Model Summary

Following the sensitivity study, based on the Bayesian approximation of AIC and BIC criteria as well as on the DIC, the model with stratification of the population structure showed superior performance to the structure without stratification as we can see from Table 4.

The results reported and discussed here refer to the posterior inferences obtained for the particular case ModII of the proposed model (5), which better fits the data according our model comparison criteria, and then it is considered as our working model.

Table 5 shows the estimates of the parameters regarding the effects of the covariate: age, gender, and the presence of at least two of three epidemiological risk factors (origin from an endemic region, a contact with the vector transmitting Chagas' disease, and a family history of Chagas' disease) on the sensitivity and specificity of the diagnostic test and on disease prevalence in the stratum. The posterior probability estimates most distant from zero indicate an effect of the covariate, with these estimates being significant when the credible interval (; ) obtained for each parameter does not contain zero.

With respect to the effect of the covariate on the original parameters , we observed a significant effect of all covariates on the specificity of all tests, except for blood culture, which is specific according to expert opinion. Regarding the prevalence of Chagas' disease, a significant effect of all covariates on all strata was observed, except for the age and presence of at least two of three risk factors epidemiological (origin from an endemic region, contact with the vector transmitting the disease, and family history of Chagas' disease) covariates in the stratum II (inconclusive serology, see, Table 5).

The following covariates exerted nonsignificant effects on the sensitivity of the tests: presence of at least two of three risk factors epidemiological (yes, no) for ELISA, blood culture (HEMO), and rec-ELISA; age group ; years) for blood culture and rec-ELISA. On the other hand, the gender (male, female) had a significant effect on the sensitivity of all tests (see Table 5).

##### 4.2. Practical Results

Table 6 shows the estimates of the original parameters, sensitivity , and specificity for the test.

The estimates of disease prevalence in the stratum are shown in Table 7. In both tables, the results are presented according to each level of the three analyzed covariates. In summary, the lowest sensitivity was observed for the blood culture test (HEMO) in donors younger than 30 years . And the highest sensitivity was observed for the ELISA in female donors older than 30 years ). Despite closely similar nominal values, the sensitivity rates of the serological tests (IIF, IHA, and c-ELISA) were significantly higher in donors older than years, female donors, and donors with presence of at least two of three risk factors epidemiological. In contrast, the blood culture test (HEMO) and the rec-ELISA, presented higher sensitivity only for female donors. The ELISA, IIF, IHA, c-ELISA, and rec-ELISA tests were found to be significantly more specific in donors younger than years, male donors, and donors with presence of at least two of three risk factors epidemiological.

Overall, in the present study, sensitivity rates of , and , and specificity of , and were obtained for ELISA, IHA, and IIF tests, respectively. However, analysis of the estimates obtained with the present model according to each covariate level resulted in sensitivity rates of , , and , respectively, for donors younger than years, of , , and for male donors, and of , , and for donors with presence of at least two of three risk factors epidemiological. Specificity rates were , , and , respectively, for donors older than years; , , and for female donors, and , , and for donors without the presence of at least two of three risk factors epidemiological (see Table 6).

With respect to the prevalence of Chagas' disease, an overall estimate of was observed among donors with inconclusive serology in the screening test at the time of blood donation (stratum II). Considering the levels of covariates, we observed that the prevalence of Chagas disease was significantly higher in donors older than years, female and the presence of at least two of three risk factors epidemiological (see Table 7).

Particularly, the estimated rate of chagasic infection of among donors with inconclusive results in the serological screening test (stratum II) indicates that of these donors do not have Chagas' disease. This result is below the rates reported by Furuchó et al. [47] who observed that of donors in the inconclusive group were positive for Chagas' disease, that is, of donors with inconclusive serology in the screening test do not have the disease.

The mean false-positive rate , which estimates the probability of the test being positive given that the individual is healthy and excluding the blood culture test, with a false-positive rate of , was lower for rec-ELISA testing and higher for c-ELISA , whereas the mean false-negative rate , which estimates the probability of a test being negative given the individual has the disease, was lower for c-ELISA and higher for blood culture . Analysis of the parallel testing scheme, indicated for urgent cases or for quality control as done at blood banks, in which the set of tests is considered to be positive when at least one of tests presents a positive result, showed lower false-positive rates of and for the sets of serological tests performed using the tests conducted at the Central Laboratory of UFTM (Sit2:ELISA, IIF, IHA) and the Biomanguinhos-FIOCRUZ kit (Sit4:c-ELISA and rec-ELISA), with false-negative rates of and , respectively. For the serial testing scheme in which the set of tests is considered to be positive when all tests performed are positive, the false-positive rates were and for the same serological test sets (Sit2 and Sit4), respectively.

Overall analysis showed that the probability of predicting that an individual is truly infected, given each combination of the results of the six tests, is for each of the two strata always when the blood culture result is positive, irrespective of the results of the other tests since blood culture is specific. When three tests are positive and three tests are negative, excluding the positive blood culture result, the mean probability of predicting that an individual is truly infected is for stratum II.

#### 5. Final Comments

To the best of our knowledge, this is the first study proposing a general Bayesian latent class model as the model (5) based on MCMC algorithms for the evaluation of the performance of conditionally independent tests considering the general case of covariates and strata according to the assumption of the proposed model by Hui and Walter [24]. The true health condition of the subject (sick or healthy) in the stratum is determined by the latent variable with a Bernoulli distribution instead of a gold standard, which is not available in some case such as for the Chagas' disease.

The general model proposed is very flexible and includes all possible configurations for tests, covariates, and strata as particular cases. Besides, it has as particular cases the models proposed by Hui and Walet [24], Joseph et al. [4], and Martinez et al. [8, 23].

Our simulation study finds, based on the information criteria AIC, BIC and DIC, superior performance of the overall structure proposal in comparison with its particular cases (without stratification and/or without covariates) was achieved.

The ModII presented a better fitting for the Chagas' disease dataset than ModI and ModIII, corroborating the initial idea of considering a two-stratum structure (SI: negative or positive serology; SII: inconclusive serology) instead of three strata (SI: negative; SII: positive and SIII: inconclusive serology) or no stratification (SI: negative or positive or inconclusive serology).

According to Swartz et al., a strong correlation between posterior parameters may impair convergence of the algorithm within a reasonable computation time, a fact not observed for the present modeling fitted to the Chagas' data discussed here [48].

Although the latent class structure is not new in studies on diagnostic evaluation in the absence of a gold standard, in the specific case of screening tests for Chagas' disease among blood donors, there is a lack of studies in the literature regarding more elaborate models to estimate the sensitivity and specificity of diagnostic tests and disease prevalence. In most cases, the proposed models estimate these parameters individually for each test and/or considers an imperfect test as gold standard.

The performance parameters of the tests, such as sensitivity and specificity, reached values close to depending on the level of some covariates, a finding supporting the need for a model structure containing covariates to evaluate the performance of multiple conditionally independent diagnostic tests. Particularly, the technique of stratifying the population according to different disease prevalence rates does not add further marked complexity to the model but makes the model more flexible and comprehensive. It should be emphasized that the probability of predicting that an individual is truly infected or healthy is strongly influenced by disease prevalence rates and may widely differ between strata as observed in the present study. Thus, the lack of stratification in the model may contribute to invalidated estimates of these important predictors to predict the presence or absence of a disease given the results of the combinations of the tests used, such as the diagnosis of Chagas' disease in blood donors.

The probability of predicting that an individual is truly infected provides interesting information that contributes to the regarding the true health status of the subject based on the combination of multiple diagnostic tests under evaluation. In the specific case of the diagnosis of Chagas' disease in blood donors, when the results of the five serological tests are combined, repetition of ELISA used at the time of blood donation plus the four other serological tests (IIF, IHA, c-ELISA, rec-ELISA) resulted in a probability of to identify a truly chagasic subject as long as at least four of these tests present positive results in strata II.

When the assumption of conditional independence is violated, that is, the tests are correlated, possible bias may occur in the estimates of the performance parameters of the diagnostic tests as demonstrated by Thibodeau [49] and Vacek [50]. However, according to Georgiadis et al., the model with a conditional independence structure estimates closely similar to those with a conditional dependency model when the performance parameters of the tests are close to 100%, even in the presence of a moderate-to-strong correlation between pairs of tests [51]. Correlated testing framework should be considered further in future on the basis of the proposed model.

According Forman and Engel et al., a latent class model can present weak identifiability if a small sample size in relation to the number of latent classes is observed, with the number of individuals being insufficient to attribute an element to each class and, consequently, to estimate its probabilities. Although we have not faced identifiability problems in our modeling, the problem of weak or lack of identifiability should be the subject of future research in the context of the proposed model [34, 52].

#### Acknowledgments

We thank the Brazilian funding agencies Capes and CNPq for financial support.