Abstract

In this study, we developed a model which elaborates relationship among efficiency of an estimator and survey cost. This model is based on a multiobjective optimization programming structure. Survey cost and efficiency of related estimator(s) lie in different directions, i.e., if one increases, the other decreases. The model presented in this study computes cost for a desired level of efficiency on various characteristics (goals). The calibrated model minimizes the cost for the compromise optimal sample selection from different strata when characteristic is subject to achieve at least level of efficiency of its estimator. In the first step, the proposed model minimizes the variance for a fixed cost, and it then finds the rise in cost for an percent rise in efficiency of any characteristic j. The resultant model is a multiobjective compromise allocation goal programming model.

1. Introduction

An efficient estimator (estimator with reduced mean square error, MSE) is always desirable while estimating the true population parameter from the sample data. Cost is directly proportional to sample size which increases the efficiency [1] of an estimator as it increases.

This situation becomes more interesting in a multivariate multiobjective study. This estimation mechanism demands a compromise allocation programming model when several goals are competing with each other in a multivariate study. This gives us an inspiration to calibrate a multiobjective compromise allocation model when several estimators are being optimized (minimize mean square error, MSE) on a single feasible space.

Sample allocation in multivariate survey plays an important role in determining the cost and efficiency of an estimator. It has significant impact in case when variance is high among groups. Sampling design, methods of estimation/estimator, and variable cost among groups are main factors that contribute while allocating sample size. Among the many objectives of a survey, the major two are as follows: it must give efficient estimates for a fixed cost and it should optimize the cost for a complete survey plan with a reasonable efficiency level. In multivariate surveys, sample allocation among heterogeneous strata is an important decision-making technique to increase efficiency of estimates.

Sample selection in survey sampling with a balanced design (i.e., equal allocation) is common. It maximizes the statistical power, but many other scenarios need to address. For example, the groups with large variation usually have to allocate a large sample to reduce variation due to that group. In one-way analysis of variance (ANOVA), if we assume heterogeneous variances among groups, alternate estimators with variable sample size allocation is reasonable to be considered [2]. The heterogeneity among groups and survey cost are the important factors that should be considered while allocating samples. The scarcity of valuable participants of survey is becoming more and more costly, and the survey cost increases.

The mathematical structure of trading between precision or efficiency and cost optimizes the variance or cost function subject to basic constraints. Multiobjective programming models for multiple goals while optimizing transportation problems have been discussed in the study of Roy et al. [3] where authors calibrated these models to minimize transportation cost. The limited budget and the optimal variance suggest only a rough approximation of a sampling design. The efficiency or precision for a fixed cost and vice versa that may actually be achieved by implementation have been discussed in depth, for example, the studies of Cochran [1], Groves [4], and Kish [5]. For an extensive discussion on cost and precision as the only criterion for evaluation, see the study of de Vries [6]. A statistical analysis of the costs and a comprehensive discussion on cost or on error variance trade-off is given in the studies of Kish [5] and Groves [4] for various survey designs.

For an optimum allocation of various characteristics, optimal costs, or optimal variance, the solution of such problems needs some compromise allocation criteria. Varshney et al. [7] discussed in detail the optimal allocation which minimizes some cost vectors, minimizes the weighted average of variances, or maximizes the relative efficiency of the estimators. Many other studies including Bethel [8], Khan et al. [9], Ansari el al. [10], Khan et al. [11], and Varshney et al. [12] worked out different compromise criterion. Most of them solved the allocation problem in a stratified sampling design.

This paper focuses on the trade-off between cost and relative efficiency of estimators. In this context, it investigates the extent to which survey costs and other related components can improve the design and technique for similar such surveys. It tries to take on the problem of reducing the cost of a survey or increasing the efficiency of the estimate. Such analysis are useful in twofold; firstly, it potentially provides a practical solution to the problem in which there is scarcity of information on cost or efficiency of estimates and, secondly, addresses the similarities of sampling designs, survey infrastructure, and population distributions. This information can be obtained from one survey and used in another survey, or it improves the efficiency of same survey estimates. A variety of components are typically a problem, specifically, when some fixed factors are to be compared.

Survey science always recommends a better sampling design, efficient estimator, and reduced total cost of a survey [812]. This method comprehensively addresses all the above positive aspects simultaneously and contributes to those studies when researchers are conducting some surveys with competing effects of cost and efficiency of survey. The model minimizes the survey cost for a desired level of efficiency to be achieved for a particular characteristic which is introduced as a constraint in the model. In other words you can say, efficiency is a function of survey cost. Increasing the cost will increase the efficiency of survey. While allocating the cost for some proposed studies, in advance, you will be aware of efficiency level by using some previous studies. Now, if team wants much efficient results, they should expect a relatively high cost for the survey. This model will provide a direct equation of relative change in cost and efficiency even before starting the survey.

Rest of the paper is organized as follows. Section 2 is about problem formulation; in Section 3, we discussed the multiobjective goal programming structure for this model. The successive Sections 4 and 5 elaborate our solution methodology and numerical example, respectively; results and discussion is presented in Section 6, and Section 7 is the summary of our research work.

2. Problem Formulation

Minimum solution is desired either for survey cost or efficient estimates when the remaining solutions are adjusted as constraint. Survey cost and variance or relative efficiency of estimates are major contributing elements of allocation in stratified samples. There are many surveys which are based on a multistage sampling design.

2.1. Sampling Frame

Let we have the data (study variable) and (associated auxiliary variable) for with sampling units in the stratum on characteristics. Structure of auxiliary attribute can be defined as

Let and are the sample and population means of the study variable , respectively, in the stratum for the characteristic defined as

2.2. Survey Cost

Minimizing the survey cost was discussed in the study of Kokan and Khan [13] on various characteristics for a desired precision as a convex problem. Many other authors discussed linear functions which minimize the cost of selecting a sample in stratified sampling design. Suppose a population which has mutually exclusive strata such that total size is . An independent random sample of size is selected without replacement from each stratum. The linear cost function iswhere is the total budget, is per unit cost in the stratum, and is the initial fixed cost of the survey.

Beardwood et al. [14] proposed a cost function, considering the unit and traveling cost as a shortest route among selected units in the strata, asymptotically proportional to , if k is large. A quadratic cost function discussed in the study of Varshney et al. [12] iswhere is per unit travel cost.

A true functional form of cost is important to determine. Practically, in a survey, the unit cost, travel cost, reward to respondent, and labor cost are important factors. A more representative polynomial cost function which includes unit cost, traveling cost, reward to respondent, and labor cost are discussed in the study of Muhammad et al. [15]. They described that labor cost can be computed over time units consumed for a particular respondents. The cost is given as follows:where is per unit cost including the reward paid to respondents in the stratum, is the traveling effect, and represents the aggregate labor time consumed in stratum to collect data on units.

2.3. Estimators

Let we have first estimator say of with variancewhere and is the known stratum weight.

When auxiliary information is given, then the regression estimator can be used. It provides precise estimates when the regression line passes through the origin. Consider the regression estimator for multivariate stratified sampling. The traditional regression estimator is where

The variance of is given bywhere is the sample regression coefficient in the stratum for characteristic. The minimum can be computed as [1]where .

The minimum can be obtained by minimizing the variance subject to a fixed cost. The difference between two minimum variances (both subject to same fixed cost) is

As discussed in the studies of Cochran [1] and Yousaf and Ijaz [16], the relative increase in optimum variances can be expressed aswhere and are the two vectors of sample allocation in strata while optimizing and , respectively. The above two variances are optimum for a same cost, and the only difference between estimates is only due to their structure. The regression estimate partitions an additional component from total variation due to explanatory auxiliary variable involved. If we assume that correlation coefficients among study variable and auxiliary variable are the same or almost the same in all strata, then it is reasonable to suppose that the above two vectors of sample allocation and are almost similar. Substituting  =   =  in the above equation and solving (ignoring fpc) for efficiency of two estimates, we havewhere

If , the above equation reduces to

3. Multiobjective Goal Programming

3.1. Single Characteristic Optimization Program

The optimum allocation within each stratum to minimize the cost for a desired level of efficiency for a particular character “j” can be obtained subject to necessary constraints. The cost function equation (5) minimizes subject to the desired level of efficiency, i.e., , or Eq. (10) along with . The optimization program for a minimum cost to the characteristic “j” is given:where is the feasible region established by grid points formed by decision space.

3.2. Multiobjective Goal Program

Our problem is a multivariate problem. We need an allocation within each stratum which should be optimum for all the characteristics. The allocations, optimizing individual characteristic, may differ from one another if characteristics are not correlated. So we need a suitable Compromise allocation as discussed in the study of Cochran [1].

In the context of data frame discussed above, a total sample of size is determined that minimizes cost vector for all characteristics . We formulate this problem as a multiobjective, nonlinear integer goal program using the single vector of samples from all strata:

4. Solution Methodology

Our goal program have goals, which are . To optimize all goals, we define p vectors of decision variables with each dimension L, i.e., Decision-maker(s) has the control over these vectors. Among the list of all p goals, every goal has an optimum value say , which is the function of on their fundamentals (14).

There are no assumptions on the generic form discussed in (14) regarding the decision variables. It is the decision-maker(s) who sets a target level for all goals say in (14); they are generally individual optimal values for all objectives, respectively. If any other vector, then the optimal decision vector variables are selected, and the goal leads to the following deviation structure:where is the objective value at the compromised allocation vector and and are negative and positive deviational variables, respectively. and are also called unmet goal variables.

If a set of goals is a set of constraints, the decision-maker(s) objective is to optimize each goal. The solution obtained is feasible even if the goal remained unachieved. If hard constraints that are the part of goal programming are violated, the solution turns into infeasible. If is feasible vector space established by decision vectors, we add the following condition on stratified samples:

The solution thus obtained performs less than our individual optimal solution. We sum up all unwanted deviations as defined in Section 4 and minimize this sum to ensure that our solution is “as close as possible” to our desired goals. Lexicographic goal programming is one solution technique, applied in such problems. It works according to some predefined priority sequence of objectives which are prioritized according to their importance. According to this priority sequence, the first goal is optimized and the remaining goals are compromised by minimizing the unwanted deviations as discussed in Section 4. A generic form of such compromise allocation is given aswhere is a sequence of priority of functions and are directional deviations vectors for maximization or minimization which are unwanted. Weighted goal programming (WGP) technique minimizes a weighted deviation objective function composed by unwanted deviations. Few other techniques are also used in such compromise allocation problems. A goal programming function can be considered as a utility function which always subjects to maximize. It can be described in either form, linear or nonlinear, and sets an aspiration goal within the feasible region [17]. The programming techniques discussed in the studies of Charnes and Cooper [18] and Romero [17] minimize a weighted unmet aspiration vector of goals in the feasible region. Now, if the goal objective is influenced differently by a negative deviational variable and a positive deviational variable, let are weight vectors of negative and positive deviations on the goal, respectively, then above formulation (18) is changed by the following objective [17]:

If there is some difference between the desired aspiration level and the maximum utility, the Archimedean goal programming interprets it as “the maximization of a separable and additive utility function in the attributes considered” (Romero [17]). The Minimax (Chebyshev) minimizes the maximum deviation of utility function. This is discussed in the studies of Tamiz et al. [19] and Romero [17]. If D is the maximum deviation from the respective utility or aspiration level, the above program (18) turns into the following equation:

If φ represents the importance attached to maximizing the achieved aspiration level (20) and (1 − φ) weight attached the weighted sum (Eq. (19)) of unwanted deviation model, then a generalized goal programming model of utility maximization (20) can be written as

The feasible region can be expanded by relaxing the constraint intoand . D is an arbitrary choice how someone’s positive and negative directional goals may vary. It depends upon how close the modelers or surveyors want to get their results. This strategy provides the leverage to setting the feasible region. This will help us to extend our feasible region and allocating an integer sample size. Integer nonlinear programming problems have always a smaller feasible solution grid.

5. Numerical Illustration

We used the data for numerical illustration given in the study of Shafi [20]. We are using these data as our objective to compute trade-off of various estimators over each other how they show change (in sample allocation and cost) against each other for a certain change in the desired level of their MSE.

To explain the data, the study variables are given as follows:Y1 = biological yield of wheat varieties, and Y2 = harvest index of wheat varieties.

Other variables used as covariates are as follows:X1 = the number of plants of wheat varieties, and X2 = total Grain weight of wheat varieties. is 45.19618523 and is 16.1223179.

According to protein contents, data are divided into 2 strata. is the artificial dichotomous variable, the cutoff for quantitative variable to be transformed into attribute is set as the respective stratum mean for each characteristic in the stratum. Table 1 shows the summary statistics of the used data.

In this illustration, we suppose that is unit cost selection in the stratum. A unit of time cost ω and the constant δ is determined using methodology discussed in the studies of Winston [21] and Taha [22]. In the cost function (Eq. (5)), the time taken, to collect the data within stratum as well as the whole sample , is considered as the collection of independently and identically distributed random variables, i.e., the time consumed for a unit. All of them follow exponential distribution, and their sum follows a Gamma distribution with parameters say (sample size in stratum) with an average of (Ross [23] and Hogg et al. [24]). Furthermore, the same is true for whole sample from all strata, i.e., , with average . The total expected time can be computed as [23, 24], where . A complete formulation of the problem is shown in Appendix.

6. Results and Discussion

A complete population selected as the sample provides the highest efficiency level of the estimators (variance is minimum). The efficiency is calculated as (see Eq. (10))

For an arbitrary survey cost of 25000 (Figure 1), it is 1.2915 for characteristic one and 1.5645 for characteristic two. We compute minimum cost (21) for varying levels of and similarly as shown in Figures 1 and 2. Setting and beyond these values, solution repeated or model gives no integer solution. A minimum 25000 cost produces 29.15 percent efficient solution for characteristic one and 56.45 for characteristic two, when only single characteristic’s variance is optimized, respectively. Different combinations of and give a minimum cost solution using compromise allocation structure discussed in (21). We see that compromise solution even reduces our initial survey cost of 25000 if choice of pair of and is small. Keeping one of and constants and increasing the other increase our cost for a compromise solution. Increasing both parameters simultaneously increase our survey cost (Figure 2). Figures 1(a) and 1(b) are plots of percentage rise in efficiency of an estimator (x-axis) and rise in cost (y-axis) for each characteristic, which shows that if we increase one percent successively, rise in the compromise cost is not as higher as in the case of second characteristic. Figures 1(c) and 1(d) show sample allocation in each strata for various pair of choice of or . This is because each time we make a change in our choice of or or both; (21) makes such a selection of samples considering cost and variance of both stratum independently. This might be due to difference in cost and variance structure in both strata. In the above goal programming structure, we can make any percentage to proportionally rise in the variances of both characteristics. For some particular choice of or , an integer solution might not be possible as it always returns a grid point of n1 and n2. We restricted the goal programming model to select at least 2 sampling units from strata 1 and 2. Figure 2 is a 3-D plot of percentage rise in efficiency of an estimator 1 (), percentage rise in efficiency of an estimator 2 (), and compromised cost which (21) suggests.

7. Summary

It may not be possible to obtain a compromise integer solution for all possible choices of and , i.e., the percentage rise in the variances of characteristics. However, a real value solution may exist for all choices. A high unit cost in a particular stratum may limit its selection. This optimization structure works as a dual procedure in which the cost of the survey is optimized when we require a certain level of efficiency among different estimates, and initially we optimize a compromise variance for a supposed survey cost. Application of other goal programming techniques may also be interesting. We used the nonlinear cost function, but selection of some other cost function can produce different results. Furthermore, we can extend this work using stochastic, dynamic, or stochastic dynamic programming.

Appendix

A. Single Characteristic Models for

Our decision variables are . Values of and are replaced from Table 1. δ replaced with arbitrary values 0.5 and 2.0, and ω is the cost for a unit time of labor (say 100, 150, etc.). Estimation of the Gamma function is replaced with its total expected time (say 15 minutes, 20 minutes, etc.). The first stage programs for are given below (14). The values of and are computed from Table 1:

B. Multiobjective Model for

From the above models, we obtain optimal values, say and for two characteristics . Selecting an arbitrary value(s) of φ (say ), we establish the following model (let φ = 0.3, an arbitrary choice):

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group no RG-1437-027.