Abstract

We introduce the new estimator of odds ratios in rare events using Empirical Bayes method in two independent binomial distributions. We compare the proposed estimates of odds ratios with two estimators, modified maximum likelihood estimator (MMLE) and modified median unbiased estimator (MMUE), using the Estimated Relative Error (ERE) as a criterion of comparison. It is found that the new estimator is more efficient when compared to the other methods.

1. Introduction

The odds ratio is a measure of association between two independent groups on a categorical response with two possible outcomes, success and failure. The two independent groups can be two treatment groups or treatment and control groups. The odds ratio is widely used in many fields of medical and social science research. It is most commonly used in epidemiology to express the results of some clinical trials, such as in case-control studies.

A number of subjects in each group falling in each category can be summarized in a two-way contingency table. Total numbers of subjects in group 1 and group 2 are and , which are assumed to be fixed. Numbers of successes in group 1 and group 2 are and , which are considered as independent binomial random variables. Let and be probabilities of success in group 1 and group 2, respectively. The odds of success in group 1 are defined to be , similar to group 2. The usual maximum likelihood estimator of odds ratio is defined asOdds ratio is nonnegative real value. When successes are similar in both groups, the odds ratio is equal to 1, meaning that groups are independent of response. When the odds of a positive response are higher in group 1 than in group 2, the odds ratio is greater than 1 and vice versa for the value less than 1. The father of odds ratio from 1 in a given direction represents stronger association. In addition, its sampling distribution is highly skewed. Sample natural logarithm of odds ratio, which is less skewed, is often utilized for inference. However, odds ratio can be zero (if zero cell count appears in numerator of (1)) or infinity (if zero cell count is in denominator of (1)) or undefined (if there are zero cell counts in both the numerator and denominator of (1)). Haldane [1] and Gart and Zweifel [2] suggested to add a correction term 0.5 to each cell, when having zero cell count, which gives the modified maximum likelihood estimator (MMLE) asEven though still laid between 0 and infinity, some investigators discouraged adding 0.5 to each cell because of the appearance of adding “fake data”; see Bishop et al. [3] and Agresti and Yang [4]. Among controversy, several similar alternatives to this modified maximum likelihood estimator have been proposed. Hirji et al. [5] proposed the median unbiased estimator (MUE) of the odds ratio, obtained from the conditional noncentral hypergeometric distribution. However, the median unbiased estimator of the odds ratio still caused a problem when and or , and then the MUE was undefined. Parzen et al. [6] proposed an estimator of the odds ratio based on MUE called the modified median unbiased estimator (MMUE) of which the estimated probability of success was always in the interval , even if there were 0 or successes in each group. Consequently, the estimated odds ratio always laid between 0 and infinity. Additionally, this method performed well with respect to bias in small sample and was an alternative to adding “fake data.”

In this paper, we focus on “rare events” which occasionally observed zero or small counts of interesting events which happened within a given time period or a given sample such as natural disasters or some diseases. As aforementioned, rare events caused difficulty in estimation of odds ratio due to the occurrence of zeros or small observed counts in numerator or in denominator or in both, resulting in the large standard error and therefore less precise confidence interval. Only a rough estimate of the odds ratio is thus obtained. Researches involving association between categorical variables in contingency table have long been studied, using both classical and Bayesian approaches. Good [7] studied association factor, at early stage, in large contingency table with small entries, assuming log-normal and Pearson type III distribution. The author also mentioned that these assumptions may be less accurate but easy to handle. Fisher [8] estimated the odds ratio based on hypergeometric distribution utilizing exact method in a table. Thomas and Gart [9] constructed a table for 95% confidence limits of differences and ratio of two proportions, including odds ratio and one-tailed value for Fisher-Irwin Exact test in various types of table. Altham [10] studied association and exact value in a contingency table based on the cumulative posterior probabilities which was not easy to extract. Nurminen and Mutanen [11] proposed Bayesian approach for the estimation of difference between two proportions, risk ratio and odds ratio, using independent beta prior and provided integral expressions for the cumulative posterior distribution. They also applied the proposed method to real data regarding malignant lymphoma and colon cancer cases exposed to phenoxy acids and chlorophenols in agriculture. Nouri et al. [12] presented the estimation of the odds ratio in tables when exposure was misclassified. They compared the matrix and inverse matrix methods to the MLE method using simulation study and found that the inverse matrix method having a closed form was more efficient than the matrix method.

As previously mentioned, the estimates of association measure in two-way contingency table can be carried out based on classical and Bayesian approaches. The exact distribution using classical approach is, however, rather difficult for mathematical tractability. In Bayesian approach, where prior belief is incorporated into derivation of posterior density, the hyperparameters, characterizing the prior density, are often unknown to researchers and need to be assessed irrespective of current data. However, controversy still exists. Alternatively, the estimation of hyperparameters is plausibly carried out with the notion of Empirical Bayes method using current data to estimate the unknown hyperparameters, contrary to Bayesian approach. As a consequence, we focus on the utilization of Empirical Bayes method to estimate the odds ratio in a two-way contingency table, focusing on small proportions of success. Our purposed estimation tends to outperform the traditional estimator, MMLE, and MMUE without interference in the original data.

The rest of this paper is organized in the following sequence. In the next section, we discuss the median unbiased estimator. The third section describes the odds ratio estimation using EB method. The forth section illustrates simulated results, and the efficiency of EB is compared with MMLE and MUE. The fifth section displays the application of our method to real data. Our conclusion is drawn in the final section.

2. The Modified Median Unbiased Estimator of Odds Ratio

Parzen et al. [6] suggested the modified median unbiased estimator (MMUE) in two independent binomial distributions. Let be the estimator of success probability which satisfiesTo obtain , they use the binomial distribution, , where denotes random variable representing success in the group . Let be the observed value of The MMUE can be computed from the distribution of sufficient statistics for binomial data.

Compute the values of and to be those value of for which where and are the smallest and largest values of , respectively. Then, the MMUE is defined asWhen , we can find values of   and which satisfyThen, solve fromand solve fromThe values of   and can then actually be obtained by using the relationship between the cumulative beta distribution and the cumulative binomial distribution function as follows (Daly [13] and Johnson et al. [14]).

Let :We need to find and such thatIn particular,where is the quantile of the beta-distribution with parameters and

Now suppose , and then Any value of in the interval satisfieswhere is the smallest possible value of

Similarly, when ,   satisfiesConsequently, ;   equalsSimilarly, when is the largest possible value of , then satisfieswhen and

Then, the MMUE of odds ratio estimation is defined aswhere and denote success probability estimators in groups 1 and 2, respectively.

3. Proposed Estimation of Odds Ratio

In this section, we proposed a new method for odds ratio estimation using Empirical Bayes method in two independent binomial distributions. Let and be random variables, distributed as binomial with equal and unequal sample sizes and unknown probability, and , where and denote two sample sizes and two unknown success probabilities. Adopt information priors on ,  ,  , where and denote unknown hyperparameters. The estimation of hyperparameters can be obtained from the posterior marginal distribution function as follows:Consequently, the posterior marginal distribution function of is the beta-binomial distribution (BBD).

Then, both hyperparameters in each group can be estimated using maximum likelihood method. The likelihood function of posterior marginal distribution function is then written asApplying Newton-Raphson method to solve a nonlinear equation, the maximum likelihood estimator of hyperparameters can be obtained fromwherewhere the moment estimators of hyperparameters in beta-binomial distribution are used as initial values; see Minka [15].

The posterior distribution function of is thus calculated, yieldingSubstituting the estimators of and , we obtainLet and be estimators of and , respectively, whereThus, the EB estimator of odds ratio can be obtained as follows:where and denote success probability estimators in groups 1 and 2, respectively.

4. Simulation Study for MMLE, MMUE, and EB Method

Simulation studies have been carried out using R program (version 3.2.0) [16] to assess the efficiency of the EB method in comparison with two existing methods. Binomial data are generated with equal and unequal sample sizes: with probabilities of success in group 1: , and 0.15. For each value of is varied to 0.01, 0.03, 0.05, 0.1, and Each situation is repeated 5,000 times after removing the first 1,000 iterations (1,000 burn-ins). The efficiency of proposed estimator is evaluated using Estimated Relative Error (ERE), defined aswhere denotes the usual maximum likelihood estimator of odds ratio and denotes the estimate of odds ratio using EB, MMLE, and MMUE , respectively.

The simulation results with odds ratio estimates for sample sizes and are given in Tables 13. The performance of estimation uses ERE given in Tables 46 and compares this result with graph in Figure 1; the other case provides similar results. It is found that the odds ratio estimation using EB method mostly yields smallest ERE with 78.67%, while those using MMLE and MMUE methods result in smallest ERE with only 6.67% and 14.66%, respectively.

5. Illustrative Examples Using Real Data

Our first example is taken from the studies of Good [7] and Hardell [17]. As shown in Table 7, subjects with malignant lymphoma and colon cancer cases and controls who are shortly exposed to phenoxy acids in agriculture and forestry were observed, including the true odds ratios and their estimates using EB, MMLE, and MMUE. For outcome in which out of for cases and control, respectively, the estimate of the odds ratio using EB method yields the least ERE with 0.5523, while those using MMLE and MMUE methods result in ERE with 1.2805 and 4.1483, respectively.

The second example is taken from the study of Perondi et al. [18], as shown in Table 7, which compared high-dose epinephrine and standard-dose epinephrine in children with cardiac arrest with 34 children in each treatment, including the true odds ratios and their estimates using EB, MMLE, and MMUE. For outcome measure was survival at 24 hours in which out of for high dose and standard dose, respectively. The estimate of the odds ratio using EB method yields the least ERE with 5.2097, while those using MMUE and MMLE methods result in ERE with 15.5305 and 40.4643, respectively.

6. Conclusion

Based on simulated study for odds ratio estimation in rare events with two independent binomial data, the result indicates that the proposed method performs rather well. The EB estimator of odds ratio is also more efficient than the other two estimators, MMLE and MMUE. In addition, our purposed estimator is an alternative method for odds ratio estimation to the MMLE method without disturbing the original data.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors are grateful to the Graduate College, King Mongkut’s University of Technology North Bangkok, for the financial support.