Estimation of Log-Linear-Binomial  Distribution with Applications

Habib, Elsayed Ali

doi:https://doi.org/10.1155/2010/423654

Journal of Probability and Statistics

On this page

Abstract Introduction Applications Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2010 | Article ID 423654 | https://doi.org/10.1155/2010/423654

Estimation of Log-Linear-Binomial Distribution with Applications

Elsayed Ali Habib^1,2

Academic Editor: Daniel Zelterman

Received14 Apr 2010

Revised21 Aug 2010

Accepted23 Nov 2010

Published26 Dec 2010

Abstract

Log-linear-binomial distribution was introduced for describing the behavior of the sum of dependent Bernoulli random variables. The distribution is a generalization of binomial distribution that allows construction of a broad class of distributions. In this paper, we consider the problem of estimating the two parameters of log-linearbinomial distribution by moment and maximum likelihood methods. The distribution is used to fit genetic data and to obtain the sampling distribution of the sign test under dependence among trials.

1. Introduction

During the last three decades, a growing amount of literature has been observed in generalizing the classical discrete distributions. The main idea was to apply the extended versions of modeling different kinds of dependent count or frequency structure in various fields; see the work of Johnson et al. in [1], and of Bowman and George in [2], and of George and Bowman in [3], Yu and Zelterman [4, 5]. Failure to take account of correlation in the data will cause less precision for binomial-based estimates; see, for example, Kolev and Paiva [6].

As a generalization for the binomial distribution, Lovison [7] has derived the distribution of the sum of dependent Bernoulli random variables as an alternative of Altham's multiplicative-binomial distribution [8] from Cox's log-linear representation [9] for the joint distribution of binary-dependent responses and it will be called log-linear-binomial distribution. This distribution is characterized by two parameters and provides wider range of distributions than are provided by the binomial distribution where the log-linear binomial distribution includes underdispersion, overdispersion models and the binomial distribution as a special case.

In this paper, the methods of moment and maximum likelihood are used to estimate the two parameters of Lovison's log-linear binomial distribution. The variance-covariance matrix for the estimated parameters is obtained. Log-linear binomial distribution is also used to fit genetic data, and to obtain the sampling distribution of the sign test under dependence among trials.

In Section 2, we define Lovison's log-linear binomial distribution. Estimation of the parameters of the distribution based on moments and maximum likelihood methods are derived in Section 3. Ungrouped data is studied in Section 4 as a special case. Two applications are given in Section 5.

2. Lovison's Log-Linear-Binomial Distribution

Consider the random vector , being a binary response which measures whether some event of interest is present, “1”, or absent, “0” for a sample of units and denotes the sample frequency of successes. To accommodate the possible dependence among , and under the assumption that the units are exchangeable Lovison [7] has obtained the distribution of as where and are the parameters; for more details about this distribution; see the work of Lovison in [7]. This distribution provides a wider range of distributions than is provided by the binomial distribution, for example, Figures 1 and 2 show the distribution of for , , and different values of . For the values of , the distribution is sharper in the middle than the binomial.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

As can be seen from the figures for some values of , the distribution can be , bimodal and unimodal shapes. The expected value and the variance of are given by Note that for and stands for covariance. The variance of the binomial is and the covariance of and is where Therefore, the variance of the log-linear binomial is equal to the variance of binomial when , greater than the variance of binomial when , and less than the variance of binomial when .

The expected value and the variance of are nonlinear on both and . For example, the nonlinearity in the variance of is depicted in Figures 3 and 4.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

3. Estimation of the Parameters

Let a random sample of sets of trials each be available, the number of given successes being , and .

3.1. Method of Moments

We can use the first two sample moments to find moment estimates for and as follows. The first sample moment is Equating these sample moments to the corresponding population moments we obtain the estimates of and by solving the two equations The solution of these two equations needs numerical methods. The numerical solution of these two equations may be found by using nonlinear equation solver (nleqslv) in statistical -software. The author has a program written in for finding the moment estimates upon request.

3.2. Method of Maximum Likelihood

The method of maximum likelihood provides estimators that have a reasonable intuitive basis and many desirable statistical properties. The likelihood of the sample can be written as Take the logarithm of this function The first partial derivatives with respect to and are Let Hence,

3.2.1. Estimation of and

After simplification the first partial derivative for and can be written as where By solving and simultaneously we obtain the maximum likelihood estimates and under the sufficient condition and .

The numerical solution for these two equations may be obtained using nonlinear equation solver (nleqslv) in -software. The author has a program written in for finding the maximum likelihood estimates upon request.

3.2.2. The Asymptotic Variances

Suppose that and under certain regularity conditions the information matrix is The asymptotic variance-covariance matrix can be obtained as To find the information matrix for the log-linear binomial distribution, we note that where Taking the expectation, the information matrix is obtained as where . The variance-covariance matrix will be The estimated variance-covariance matrix is The author has a program written in for finding the estimated variance-covariance matrix upon request.

4. Special Case: Ungrouped Data

If the values of Bernoulli random variables are known, the parameters and can be estimated as follows. By noticing that in a vector of binary responses there are pairs of responses, and if the order is irrelevant three types of pairs are distinguishable: pairs of , pairs of , and pairs of , or where ; see the work of Lovison in [7]. Therefore, the estimate of is The estimated cross-product ratio (CPR) is To obtain the maximum likelihood estimate of , we need to solve The estimated variance of can be obtained when .

5. Applications

5.1. Sampling Distribution of the Sign Test for Comparing Paired Sample

The sign test is a nonparametric test which makes very few assumptions about the nature of the distributions under test. It is for use with two repeated (or correlated) measures, and measurement is assumed to be at least ordinal. The usual null hypothesis for this test is that there is no difference between the two treatments (groups, , ). Formally, let ), and then test the null hypothesis : for no differences against : for differences. The sign test can be written as where Under the assumptions of two outcomes, fixed probability of success, and independent trials it is assumed that the sampling distribution of is binomial distribution. The rejection region is and . For a two-tailed test we reject if , else we do not reject , where is prespecified value.

5.1.1. Sign Test Under Dependence of the Trials

Suppose the assumption of mutual independence in the data is violated and the trails are dependent; see, for example, the work of Tallis in [10] and Luceño [11]. In this case, we suggest the log-linear binomial distribution as a sampling distribution of rather than the binomial distribution. Let represent the probability of success. Then the null hypothesis : is equivalent to : . Therefore, under the null hypothesis, the rejection region can be obtained as where and . For a two-tailed test, we reject if , else do not reject and is a prespecified value (: there is no difference against : there is difference.)

Example 5.1. A physiologist wants to know if monkeys prefer stimulation of brain area A to stimulation of brain area B. In the experiment, 14 rhesus monkeys from the same family are taught to press two bars. When a light comes on, presses on Bar 1 result in stimulation of area A and presses on Bar 2 result in stimulation of area B. After learning to press the bars, the monkeys are tested for 15 minutes, during which time the frequencies for the two bars are recorded. The data are shown in Table 1.
Using the binomial distribution, we have , , and is Since , we do not reject .
Using the log-linear binomial distribution, we have , , and and the is where , so we reject . That is, we would conclude that monkeys prefer stimulation in brain area B to stimulation in area A. Note that the rejection of agrees with Wilcoxon Signed-Ranks test for the same data; see the work of Weaver in [12] and Siegel and Castellan in [13].

Power of the Test
Following the method given by Groebner et al. in [14], we may use the normal approximation to study the power of the sign test at , and using one-tail test for simplicity as follows. The power of the test is For one-tailed test : and : using binomial distribution with , and . The critical value with is The power of the test is for .
Using log-linear binomial distribution with , , and , we obtain , then under and , the critical value is The power of the test under log-linear binomial distribution is .
Power of the sign test is given in Table 2. In this case, the log-linear binomial distribution shows improvement in the power of the sign test over the sign test under binomial; for example, when , the power increases from 0.44 to 0.61 about 1.40 times.

5.2. Fitting Genetic Data

The data are taken from Salmaniyaa hospital records in Bahrain for a genetic study on the gender ratio. Table 3 shows the number of male children in 3475 families with 7 children.

The first two sample moments are If we use binomial distribution, the estimated value of from the sample is This value of was used to obtain the expected frequencies shown in Table 3. The value of with 7 degrees of freedom gives, the simple binomial model has to be rejected. If we use log-linear binomial distribution, and fit the data by maximum likelihood, we find that The estimated variance-covariance matrix is The expected frequencies based on these estimates are shown in Table 3. The value of with 6 degrees of freedom gives. Thus, the log-linear binomial distribution provides a good fit to the observed data.

6. Conclusion

The parameters of the log-linear binomial distribution were estimated by the moment and maximum likelihood methods. Both methods needed solving nonlinear equations to obtain the estimators of the parameters. We used nonlinear equation solver (nleqslv) package in statistical -software to find the estimates of the parameters. The variance-covariance matrix for the maximum likelihood estimates was obtained. Moreover, the sampling distribution of the sign test was studied when trials are dependent. A set of genetic data from Salmaniyaa hospital in Bahrain has been fitted using log-linear binomial distribution. The fit is found preferable over fitting the binomial distribution.

Acknowledgment

The author wishes to thank the two referees for their valuable comments that enhanced the contents and presentation of the paper.

References

N. L. Johnson, S. Kotz, and A. W. Kemp, Univariate Discrete Distributions, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 2nd edition, 1992.
D. Bowman and E. O. George, “A saturated model for analyzing exchangeable binary data: application to clinical and development toxicity studies,” Journal of American Statistical Association, vol. 90, pp. 871–879, 1995.
View at: Publisher Site | Google Scholar
E. O. George and D. Bowman, “A full likelihood procedure for analysing exchangeable binary data,” Biometrics, vol. 51, no. 2, pp. 512–523, 1995.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
C. Yu and D. Zelterman, “Sums of dependent Bernoulli random variables and disease clustering,” Statistics & Probability Letters, vol. 57, no. 4, pp. 363–373, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
C. Yu and D. Zelterman, “Sums of exchangeable Bernoulli random variables for family and litter frequency data,” Computational Statistics & Data Analysis, vol. 52, no. 3, pp. 1636–1649, 2008.
View at: Publisher Site | Google Scholar
N. Kolev and D. Paiva, “Random sums of exchangeable variables and actuarial applications,” Insurance, vol. 42, no. 1, pp. 147–153, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
G. Lovison, “An alternative representation of Altham's multiplicative-binomial distribution,” Statistics & Probability Letters, vol. 36, no. 4, pp. 415–420, 1998.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
P. M. E. Altham, “Two generalizations of the binomial distribution,” Journal of the Royal Statistical Society Series C, vol. 27, no. 2, pp. 162–167, 1978.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
D. R. Cox, “The analysis of multivariate binary data,” Applied Statistics, vol. 21, pp. 113–120, 1972.
View at: Publisher Site | Google Scholar
G. M. Tallis, “The use of a generalized multinomial distribution in the estimation of correlation in discrete data,” Journal of the Royal Statistical Society Series B, vol. 24, pp. 530–534, 1962.
View at: Google Scholar | Zentralblatt MATH
A. Luceño, “A family of partially correlated Poisson models for overdispersion,” Computational Statistics and Data Analysis, vol. 20, no. 5, pp. 511–520, 1995.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
B. Weaver, Nonparametric Tests, Lecture Notes, chapter 3, 2002.
S. Siegel and N. J. Castellan, Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, New York, NY, USA, 2nd edition, 1988.
D. F. Groebner, P. W. Shannon, P. C. Fry, and K. D. Smith, Business Statistics: A Decision Making Approach, Prentice Hall, Upper Saddle River, NJ, USA, 7th edition, 2007.

Copyright

Copyright © 2010 Elsayed Ali Habib. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3132

Downloads

1078

Citations