Estimation of Finite Population Mean in Multivariate Stratified Sampling under Cost Function Using Goal Programming
In practical utilization of stratified random sampling scheme, the investigator meets a problem to select a sample that maximizes the precision of a finite population mean under cost constraint. An allocation of sample size becomes complicated when more than one characteristic is observed from each selected unit in a sample. In many real life situations, a linear cost function of a sample size is not a good approximation to actual cost of sample survey when traveling cost between selected units in a stratum is significant. In this paper, sample allocation problem in multivariate stratified random sampling with proposed cost function is formulated in integer nonlinear multiobjective mathematical programming. A solution procedure is proposed using extended lexicographic goal programming approach. A numerical example is presented to illustrate the computational details and to compare the efficiency of proposed compromise allocation.
It is common practice in sample survey related to agriculture, market, industries, and social research, and so forth that usually more than one characteristic is observed from each sampled unit of population. Stratified random sampling is more suitable than other survey designs used for obtaining information from heterogeneous population for reasons of economy and efficiency. The theory of stratified random sampling deals with the properties of estimator constructed from stratified random sample and with the best (optimum) choice of sample size to be selected from various strata either to maximize the precision of constructed estimator for a fixed cost or to minimize the cost of survey for fixed precision of estimator. The sample sizes selected according to above criteria are known as “optimum allocation.” In general, variance of study variate varies from stratum to stratum that provides basis for selecting optimum sample size.
Tschuprow  and Neyman  independently proposed an allocation procedure that minimizes variance of sample mean under a linear cost function of sample size in stratified random sampling scheme. Neyman  used Lagrange multiplier optimization technique to get optimum sample size for single variable under study. In stratified sampling, sample allocation problem becomes complicated when more than one characteristic is observed from each selected unit of a finite population. An allocation which is optimum for single characteristic may not be optimum for others unless the characteristics are highly correlated. There is need to use some compromise allocation criteria which produce an optimum allocation for all characteristics in some sense, for example, an allocation that minimizes the trace of variance-covariance matrix of the estimator of population mean or an allocation that minimizes the weighted average of variances or an allocation that maximizes the total relative efficiency of the estimators as compared to corresponding individual optimum allocation (Varshney et al. ). Many authors such as Dalenius [4, 5], Ghosh , Folks and Antle , Chromy , Bethel , Jahan et al. [10, 11], Khan et al. , Khan et al. [13, 14], Ansari et al. , Khan et al. , and Varshney et al.  used different compromise criterion to solve allocation problem in stratified random sampling scheme.
The cost of survey is an important factor of sample allocation to various strata. The linear cost function used in stratified sampling is given as where denotes total budget available for survey, for represents measurement per unit cost in the th stratum, represents fixed cost of survey, and is number of sample units selected in the th stratum. In many practical situations, measurement unit cost and travel cost within strata are important factors of survey cost. The nonlinear cost function including measurement unit cost and traveling cost within strata is good approximation to actual cost of survey. Beardwood et al.  suggested that the shortest rout among randomly disperse destination within a region is asymptotically proportional to for large . Varshney et al.  used nonlinear cost function for large sample size given in (2). Consider where is travel cost within th stratum. The problem of finding the shortest rout among selected units in th stratum is often called the “shortest rout problem” in the operation research literature. If rout map and its length is given for each strata, we find shortest rout among units within strata that is either small or large. This shortest rout is used for practical purpose with confidence (Beardwood et al. ).
Consider following proposed nonlinear cost function: where and represents the effect of travel within strata to cost function. The value of is determined by solving shortest rout problem using methods discussed by Hiller and Lieberman . The cost function in (2) becomes particular case of our proposed cost function given in (3) if .
Generally, Lagrange multiplier technique (LMT) is used to determine sample size. However, the constraint , where is an integer neglected in using LMT. For integer value of sample size , rounding rule is used which may lead to violating the optimality or feasibility conditions (or both). We need integer value of sample size for practical implementation. Therefore, the authors did not try to use LMT and used integer programming for integer value of strata sample sizes .
In this paper, we discuss compromise allocation based on minimization of coefficients of variation of regression estimators of population mean in multivariate stratified random sampling design under proposed nonlinear cost function (3). The problem is formulated in multiobjective integer nonlinear programming. The extended lexicographic goal programming technique is applied to solve formulated allocation problem. The GAMS—AlphaECP Rosenthal  optimization software is used to solve numerical example which illustrates the computational detail of allocation procedure.
2. Formulation of the Problem
Consider a population of units divided in to mutually exclusive strata of size such that . The simple random sample of size is drawn from each stratum independently. Suppose we observe , , characteristics from each unit in th stratum and estimate population mean of characteristics. Let and be the sample means and and the population means of study variable and auxiliary variable , respectively, of th characteristics in the th stratum. and are population variance and is population covariance between the th study and auxiliary variable in the th stratum. and are sample and population regression coefficients and is stratum weight.
Consider an estimator, where .
The mean square error of is given as where If we ignore the second term in RHS of (6) because it is independent of sample size , then Since different characteristics are measured with different units, we need to use an estimate which should be independent of measurement unit. Therefore, coefficient of variation is used instead of mean square error; that is, or where A sample size is determined under proposed nonlinear cost function in (3) that minimizes coefficients of variation of the estimator of population mean for each characteristics . This problem may be formulated in multiobjective integer nonlinear programming as in (12). Consider where represents the feasible region that fulfills all constraints and sign restrictions. Any solution that exists within feasible region is implementable in practice.
3. Extended Lexicographic Goal Programming
Romero  proposed extended lexicographic goal programming method that provides a general framework which covers and allows the mixture of most common method of solving multiobjective decision making problems. It is also encompasses distance based multicriteria decision making technique. Romero  extended this work to make more general form of objective function. It is a technique used by decision makers for optimizing more than one objective under some constraints. In goal programming, all specified objectives are included in the model. The decision maker tries to minimize the potential deviations from specified objectives.
Consider the following individual optimum problem:
Let be the individual optimum values of obtained by solving above problem. These optimum values specify objectives and try to achieve these objectives using multiobjective mathematical programming. Let be values of objectives obtained by applying multiobjective optimization method. It is obvious that or is the increase in due to compromise among objectives using compromise criterion. Suppose this increase is . To achieve these specified objectives, we must have or In goal programming method, we minimize the deviations using additional constraint equation (15). To solve multiobjective allocation problem (12), the extended lexicographic goal programming has following mathematical model: where is a constant that can assume minimum value zero and maximum value one. is positive deviational variable.
4. Some Other Compromise Allocations
In this section, some other compromise allocations are discussed for the sake of comparison with the proposed allocation.
4.1. Cochran Compromise Allocation
Cochran  proposed a compromise allocation criteria by averaging the individual optimum allocation that is solution to integer nonlinear programming problem (INLPP) (13) over the characteristics.
Cochran’s compromise allocation is given by
4.2. Khan et al. Compromise Allocation
Khan et al.  compromise allocation is obtained by minimizing the weighted sum of variances. The mathematical model of Khan et al.  compromise allocation is given by where is the relative weights proposed by Khan et al.
5. Numerical Example
The data are taken from agricultural census in Iowa state conducted by National Agricultural Statistics Service, USDA, Washington DC as reported by Khan et al. . We assume that , , , , , , , and .
Letdenote the quantity of corn harvested in 2002; denote the quantity of oats harvested in 2002; denote the quantity of corn harvested in 1997; denote the quantity of oats harvested in 1997.
The data summary is given as , , , and . The detailed summary of data is given in Tables 1 and 2.
The allocation problem formulated in multiobjective integer nonlinear programming isSubject to
5.1. (a) Individual Optimum Allocation Method
5.1.1. Individual Optimum Allocation for Characteristic
Consider subject to
5.1.2. Individual Optimum Allocation for Characteristic
Consider subject to and are coefficients of variation under individual allocation at different values of and given in Table 3.
5.2. Proposed Compromise Allocation
We used extended lexicographic goal programming model (16) for sample allocation to different strata taking into account two characteristics and . Consider subject to
Let and be the coefficients of variation at various values of constants and under proposed allocation given in Table 6.
5.3. Khan et al. Compromise Allocation
We have applied model (18) to find compromise allocation proposed by Khan et al. Considersubject to The values and are the coefficients of variation under Khan et al. compromise allocation obtained by solving above model at different values of constants and given in Table 5.
In this section, a comparative study of proposed compromise allocation with Cochran compromise allocation, Khan et al. compromise allocation, and individual optimum allocation has been made. The comparison is based on trace of variance-covariance matrix of the estimates of finite population means under compromise allocations. We assume that characteristics are independent; therefore, covariances are zero. Table 3 gives a individual optimum allocation. Tables 4 and 5 give Cochran compromise allocation and Khan compromise allocation as discussed in Section 5. The proposed compromise allocation is given in Table 6.
Table 4 shows that Cochran compromise allocation gives high trace values for as compared to proposed compromise allocation given in Table 6. For , Cochran compromise allocation gives slightly low value of trace but is infeasible because corresponding cost exceeds the available cost. Table 5 shows that Khan et al. compromise allocation gives higher trace values than proposed compromise allocation. The performance comparison of proposed compromise allocation relative to individual optimum allocation of one characteristic is used for both characteristics given in Table 7 based on percentage relative efficiency (PRE) expression given as where is the value of trace using individual optimum allocation and is the value of trace using proposed compromise allocation. Table 7 shows that proposed compromise allocation provides more efficient estimates of population means as compared to individual optimum allocation.
On the basis of the comparison made in Section 6, we can conclude that the extended lexicographic goal programming approach always secures a feasible solution which is not granted Cochran’s compromise method and it provides better results comparative to Khan et al. compromise approach and individual optimum allocation approach from the point of view of efficiency.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
A. A. Tschuprow, “On the mathematical expection of the moments of frequency distributions in the case of correlated observations,” Metron, vol. 2, pp. 461–493, 1923.View at: Google Scholar
J. Neyman, “On the two different aspect of representative method: the method of stratified sampling and method of purposive selection,” Journal of the Royal Statistical Society, vol. 97, no. 4, pp. 558–625, 1934.View at: Publisher Site | Google Scholar
R. Varshney, M. J. Ahsan, and M. G. M. Khan, “An optimum multivariate stratified sampling design with nonresponse: a lexicographic goal programming approach,” Journal of Mathematical Modelling and Algorithms, vol. 10, no. 4, pp. 393–405, 2011.View at: Publisher Site | Google Scholar | MathSciNet
T. Dalenius, “The problem of optimum stratification-II,” Scandinavian Actuarial, vol. 33, pp. 203–213, 1950.View at: Google Scholar
T. Dalenius, Sampling in Sweden: Contributions to the Methods and Theories of Sample Survey Practice, Almqvist and Wiksell, Stockholm, Sweden, 1957.View at: MathSciNet
S. P. Ghosh, “A note on stratified random sampling with multiple characters,” Calcutta Statistical Association Bulletin, vol. 8, pp. 81–90, 1958.View at: Google Scholar | MathSciNet
J. L. Folks and C. E. Antle, “Optimum allocation of sampling units to strata when there are R responses of interest,” Journal of the American Statistical Association, vol. 60, pp. 225–233, 1965.View at: Google Scholar | MathSciNet
J. R. Chromy, “Design optimization with multiple objectives,” in Proceeding of the Survey Research Section, pp. 194–199, American Statistical Association, Washington, DC, USA, 1987.View at: Google Scholar
J. Bethel, “An optimum allocation algorithm for multivariate surveys,” Proceedings of the Survey Research Section, pp. 209–212, 1985.View at: Google Scholar
N. Jahan, M. G. M. Khan, and M. J. Ahsan, “A generalized compromise allocation,” Journal of Indian Statistical Association, vol. 32, pp. 95–101, 1994.View at: Google Scholar
N. Jahan, M. G. M. Khan, and M. J. Ahsan, “Optimum compromise allocation using dynamic programming,” Dhaka University Journal of Science, vol. 49, no. 2, pp. 197–202, 2001.View at: Google Scholar
E. A. Khan, M. G. M. Khan, and M. J. Ahsan, “Optimum stratification: a mathematical programming approach,” Calcutta Statistical Association Bulletin, vol. 52, pp. 323–333, 2002.View at: Google Scholar
M. G. M. Khan, E. A. Khan, and M. J. Ahsan, “An optimal multivariate stratified sampling design using dynamical programming,” Australian & New Zealand Journal of Statistics, vol. 45, no. 1, pp. 107–113, 2003.View at: Publisher Site | Google Scholar | MathSciNet
M. G. M. Khan, E. A. Khan, and M. J. Ahsan, “Optimum allocation in multivariate stratified sampling in presence of non-response,” Journal of the Indian Society of Agricultural Statistics, vol. 62, no. 1, pp. 42–48, 2008.View at: Google Scholar | MathSciNet
A. H. Ansari, Najmussehar, and M. J. Ahsan, “On multiple response stratified random sampling design,” International Journal of Statistical Sciences, Kolkata, India, vol. 1, no. 1, pp. 45–54, 2009.View at: Google Scholar
M. G. M. Khan, T. Maiti, and M. J. Ahsan, “An optimal multivariate stratified sampling design using auxiliary information: an integer solution using goal programming approach,” Journal of Official Statistics, vol. 26, no. 4, pp. 695–708, 2010.View at: Google Scholar
R. Varshney, Najmussehar, and M. J. Ahsan, “Estimation of more than one parameters in stratified sampling with fixed budget,” Mathematical Methods of Operations Research, vol. 75, no. 2, pp. 185–197, 2012.View at: Publisher Site | Google Scholar | MathSciNet
J. Beardwood, J. H. Halton, and J. M. Hammersley, “The shortest path through many points,” vol. 55, pp. 299–327, 1959, Mathematical Proceedings of the Cambridge Philosophical Society.View at: Google Scholar | MathSciNet
F. S. Hiller and G. J. Lieberman, Introduction to Operation Research, McGRAW-Hill, New York, NY, USA, 1995.
R. E. Rosenthal, A User's Guide Tutorial, Gams Development Corporation, Washington, DC, USA, 2008.
C. Romero, “Extended lexicographic goal programming: a unifying approach,” Omega, vol. 29, no. 1, pp. 63–71, 2001.View at: Publisher Site | Google Scholar
C. Romero, “A general structure of achievement function for a goal programming model,” European Journal of Operational Research, vol. 153, no. 3, pp. 675–686, 2004.View at: Publisher Site | Google Scholar | MathSciNet
W. G. Cochran, Sampling Techniques, John Wiley & Sons, New York, NY, USA, 1977.View at: MathSciNet