Abstract

In this study, we have utilized two study variables and one auxiliary variable. The auxiliary variable is used as the stratification variable, and we selected the sample using the stratification variable with a mixture of ratio and product estimators. Under super population set-up, minimal equations have been obtained through minimization of the aggregated variance with the help of the variables under study. The objective function is minimized with respect to the constraints under consideration. The dynamic programming approach has been used to minimize the variance and obtain the optimum strata boundaries. Empirical studies have also been made on the proposed rule utilizing different distributions. A simulation study has been done which shows the gain in precision using the proposed method.

1. Introduction

In stratified random sampling, carefully choosing the optimum strata boundaries would lead to a higher degree of relative precision. Hansen and Hurwitz [1] pioneered the concept of strata boundaries as an extension to Dalenius’ [2] work for univariate cases. Sadasivan and Aggarwal [3] studied the variables under consideration as stratification variables under Neyman allocation. For several characteristics under consideration for estimation, it is not possible to utilize direct optimum allocation. Ghosh [4] considered the proportional method of allocation using two stratification variables. Several methods have been proposed in different situations such as Singh [5], Dalenius and Gurney [6], Danish et al. [7], Danish and Rizvi [8], Danish and Rizvi [9], and Gupt and Ahamed [10].

In recent years, there has been an incredible interest in researchers in the area of stratification points. Rizvi et al. [11] used the compromised method and Verma’s [12] ratio and regression method for obtaining approximately optimum strata boundaries (AOSB). Danish and Rizvi [8] proposed a method for obtaining stratification points using two highly related variables. Two stratification variables have been used by Danish et al. [13], Danish and Rizvi [9], and Danish and Rizvi [14]. Abo-El Hassan et al. [15] proposed goal programming for obtaining the stratification points. There has been a dramatic increase in studies regarding obtaining strata boundaries, with some of them being the most recent work of Brito et al. [16], Reddy and Khan [17], and Danish et al. [18]. Alshqaq et al. [19] discussed the linear approximation of the multivariate stratified sampling problem with examples. Hamid et al. [20] suggested that the mathematical goal programming model can determine the optimum strata boundaries by bivariate variables in multiobjective problems with minimum variance.

Brito et al. [21] proposed a hybrid approach that works to identify the cutoff points of the strata through an optimization method and then, an exact method proposed, perform the optimal allocation of the sample to the strata. More specifically, Fadal et al. [22] proposed a heuristic algorithm based on the Biased Random Key Genetic Algorithm (BRKGA) meta-heuristic for the univariate stratification problem for objective (ii). de Moura Brito et al. [23]proposed an exact algorithm based on the concepts of graph theory and the minimization of the expression of variance and application of proportional allocation. Lisic et al. [24], under the hypothesis that the stratification variable has a Weibull distribution, solved the stratification problem using the dynamic programming technique and Neyman allocation. Furthermore, Rizvi and Danish [25] utilized a product estimator for obtaining the stratification points using the classical approach.

In practical situations with two study variables, it may happen that one study variable is highly positively correlated with the stratification variable and the other variables are negatively correlated with the stratification variable. Let us assume that the study variable Y has a high and positive correlation with the auxiliary variable and that the correlation between another study variable and the stratification variable is negative.

In the present investigation, the issue of stratification points for two study variables is investigated by simple random sampling by selecting a sample for estimating the population means using an auxiliary variable with a mixture of ratio and product estimators implementing the technique of dynamic programming. Under the super population setup, minimal equations have been obtained through minimization of the aggregated variance with the help of the variables under study. Furthermore, past information on the functional relationship of and on and the conditional variance functions and are also assumed. The problem is solved as a multistage decision criterion. The auxiliary variable is used as a stratification variable for selecting the sample used as the stratification variable with a mixture of ratio and product estimators. We have utilized dynamic programming for obtaining the stratification points. A simulation study is performed to obtain the relative precision to compare the existing proposed methods.

We present the variance and covariance for the mixture of ratio and product estimators under the superpopulation minimal equations and dynamic programming as a solution procedure, obtaining the optimum sample size, empirical study, simulation study, and conclusions in this paper.

2. Variance and Covariance Expressions under Super-PopulationSet-Up

Let us make strata from the given population of size and assume that in each stratum, the regression lines of the two interested variables on the highly related variable are linear and pass through origin.

Let us assume the model aswhere is a real function of , and is disturbance so that , , for , , , , . but . If denoted joint density function of and marginal of in the superpopulation model, then we havewhere stratification points, is the average value, and is the conditional variance of the subpopulation.

Let us assume that the population of “” units are split into “” strata. The separate ratio estimates for the population mean in stratified random sampling are given bywhere stratum weight, is the mean of is the sample mean of , is the population mean of the auxiliary variable , denotes the separate ratio estimates for the population mean in stratified random sampling.

Now, we assume that in each stratum, the regression lines of the stratification variable on the auxiliary variable are linear and pass through the origin. Furthermore, we assume from characteristics Z, so that we can use a combined product estimator. The combined estimators in the case of stratified sampling are given bywhere , the strata weight, is the sample mean of , is the population mean of , is the sample mean of .

If the finite population correction (FPC) is neglected, the approximate variances of these estimators, under proportional allocation, are given by

For the covariance expression, we have the following lemma.

Lemma 1. The covariance expression between the estimators and as defined by equations (3) and (5), respectively, up to the first order of approximation, is given by

Proof. Using partially the proofs of Lemma 5.1 and Lemma 5.3 from Rizvi [26], we haveWhich by simplification results inFinally, we haveThereby, proving the lemma.
Under the proportional method of allocating the sample size to different strata, the formula for covariance as given by equation (7) reduces to

3. Minimal Equations

Let represents stratification points in of the stratification variable; corresponding to those strata boundaries, the generalized variance as given by equation.where , and denote , and , respectively.

Differentiating partially with respect to and equating it derivative to zero, we get

Inserting the values of , and from equations (6) and (11) in equation (13), we have

Now let us assume that the functional relationship on on and on is linear in each stratum and that the regression lines pass through the origin. Then, the approximate regression model can be given aswhere and represent the error terms in the study variables and , respectively.

Now, the variance expressions for proportional allocation under these models [27] can be expressed aswhere denotes the general variance of the study variables and , respectively.

The covariance term can be obtained as

If is known and is integrable, then, , and can be expressed in terms of as follows:

Let be the estimated frequency distribution of the variable in the range of then we need to find the intermediates points of to cut up the range

at points such that the total variance given in equations (17)–(19) is minimum.

This can be written as

For a constant sample size n, reducing the previous variance is equivalent to reducing the variance

Thus, the optimization function in equation (23) can be written as a function of the stratification points only as

Thus, the problem of obtaining the stratification points can be expressed as

Minimize

The length of can be written as . In the same fashion, , where indicates the length of stratum. Thus, we can write

Hence, the last stratification point can be expressed as

Taking equation (26) as a subject to constraint, the optimization problem can be expressed as

Minimize

Subject to constraints

Obviously, if is given, then the initial term , the objective function of Mathematical Programming Problem (MPP), given in equation (28) is a function of only. Similarly, if is given, the second term will be the function of only and in the same way, the proceeding terms will be expressed as a function of the succeeding terms.

Keeping in view the particular connection between different terms, the optimization problem can be expressed as

Minimize

Subject to constraint

In practical situations, usually, the variable of interest is not known at the initial stage of designing the experiment, thus the highly associated variable is being used for the estimation of stratification points. In the proposed technique, we carry out the optimization technique for the equation (30) on the defined range “” which is derived from its highly associated variables. It is to be noted here that if the objective function is comprised of any parameters, it should be either fixed or chosen from literature in advance.

4. Dynamic Programming as Solution Procedure

The problem given in equation (28) is a type of problem that can be solved at different stages having a main function along with constraints as separable functions of , which enhances us to utilize the technique of dynamic programming (DP) [28]. Dynamic programming is prominently used in the case of recursion but a plain one and has replicated calls for the same inputs. The approach is to utilize one subproblem’s optimal solution as an initial feasible solution in other sub problems to get the optimal solution.

Now, we take a fraction of the problem as

Minimize

Subject to constraintwhere

Let indicates the lowest value of the MPP equation (30), which means

With this procedure, equation (28) is equal to finding recursively by estimating for and , we have

For the particular value of , we have

Thus, we can utilize Bellman’s principle of optimality and the recursion equation of the DP for

If we put , which is for the first stagewhere is the total deviation or range of the first stratum. Thus, equations (36) and (37) can be solved in a forward manner for different values of to determine the optimum fraction of the problem’s objective and then estimate it in a backward manner to estimate the optimum strata boundaries (OSB).

5. Obtaining the Optimum Sample Size

When the stratification points are determined as per the section discussed above, the estimation of the optimum sample size , for the stratum can be easily determined.

As per the functional relationship defined in equations (16) and (17) for the study variables and auxiliary variable for all strata, we use equation (23) but for the fixed constant sample size “.

For stratum the sample size iswhere , denotes the weight and variance of the hth stratum. denotes the variance of the functional form of the auxiliary variable and denotes variance of the error term, which can be derived in terms of the stratification points . Furthermore, it is to be noted that , where denotes the total size of stratum.

6. Empirical Study

Let us assume the log-normally distribution auxiliary variable with probability density function (pdf) as

Using equations (20a)–(20c), we getwhere,whereand its properties

Using equations (40)–(42) in equation (30), we get

Minimize

Subject to constraintand

Let us assume now that the standard log-normal distribution is defined in the interval that is and , . This implies and have fixed sample size . Executing the MPP given in equation (47), we get the stratification points along with variance and sample size as presented in Table 1.

Now let us assume the variable follows gamma distribution with probability density function aswhere “” is the slope and “” is the scale parameter and is a gamma distribution function defined as

This function is also defined by the upper incomplete gamma function and a lower incomplete gamma function , respectively, asand

There is also an incomplete gamma function whose values are from 0 to 1 aswhere and represent upper and lower regularized incomplete gamma function, respectively.

Using these values in equations (20a)–(20c), we get

Using equations (52) and (53) in equation (30), we have

Minimize

Subject to constraintand

The maximum likelihood estimate of the parameters for the gamma distribution was found to be and

By assuming the auxiliary variable with mean , and fixed sample size and executing the MPP given in equation (47), we get the stratification points presented in Table 2.

7. A Simulation Study

We performed a simulation study to verify the validity of the proposed method by checking its relative precision using the DP technique comparative with the below-mentioned methods utilizing R statistical software.(i)Dalenius et al. [29] cum method(ii)Gunning and Horgan [30] geometric method(iii)Lavallée and Hidiroglou [31] approach using Kozak’s [32] method(iv)Khan et al. [33] mathematical programming approach(v)Proposed method.

We utilized a uniformly distributed auxiliary variable with a data set of 8000, in R software for our simulation. Our minimum and maximum values came out to be 0.0046 and 1.8842, respectively, with a total deviation .

Thus, we have outlined the stratification points using our proposed method as discussed above with the comparative methods. The variance obtained by all these methods along with the proposed method is presented in Table 3. Our proposed method gives a better estimate than the existing methods.

8. Conclusion

In the current investigation, the case of a mixture of ratio and product methods of estimation has been dealt with using mathematical equations obtained after minimizing the variance, which evolved in the estimation. We proposed a method for the estimation of strata boundaries using dynamic programming along with the sample size for each stratum. Through empirical study, it is seen that the gain in efficiency is remarkably high for different distribution functions for the auxiliary variable. Furthermore, Tables 1 to 3 suggest the superiority of our developed method over the existing methods. As a result, our proposed methodology will be useful for obtaining OSB for the variables or characteristics under consideration while using the frequency distribution of the auxiliary variables. When the data are coming from a complex process, neutrosophic statistics is prioritized over classical statistics. Several studies have been done in this regard such as Reddy et al. [34], Martínez et al. [35], Cruzaty et al. [36], and Danish [37]. Thus, the utilization of neutrosophic statistics can be considered in future studies.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Faizan Danish, Rafia Jan, Muhammad Daniyal, and Kassim Tawiah conceptualized the data. Faizan Danish, Rafia Jan, and Muhammad Daniyal did the formal analysis. Faizan Danish, Rafia Jan, Muhammad Daniyal, and KassimTawiah gave the methodology. Faizan Danish, Rafia Jan, Muhammad Daniyal, and Kassim Tawiah gave the validation. Faizan Danish, Rafia Jan, Muhammad Daniyal, and Kassim Tawiah did the visualization. Faizan Danish, Rafia Jan, and Muhammad Daniyal did the writing–original draft. Muhammad Daniyal and Kassim Tawiah did the writing–editing and reviewing:.