Analytical Methods to Model NatureView this Special Issue
Coverage Properties of a Neural Network Estimator of Finite Population Total in High-Dimensional Space
The problem in nonparametric estimation of finite population total particularly when dealing with high-dimensional datasets is addressed in this paper. The coverage properties of a robust finite population total estimator based on a feedforward backpropagation neural network developed with the help of a superpopulation model are computed, and a comparison with existing model-based estimators that can handle high-dimensional datasets is conducted to evaluate the estimator’s performance using simulated datasets. The results presented in this paper show good performance in terms of bias, MSE, and mean absolute error for the feedforward backpropagation neural network estimator as compared to other identified existing estimators of finite population total in high-dimensional datasets. In this regard, the paper recommends the use of the proposed estimator in estimating population parameters such as population total in the presence of high-dimensional datasets.
Assume that there is a finite population of unique and identifiable units; . Let each population unit to have the variable of interest Y. It is assumed that auxiliary variable exists which is closely related with and is known for the entire population (i.e., ). Researchers encounter the problem of estimating a population function (i.e., a function of ), for instance, the population total.
While estimating the population total , a sample is picked so that the pair and , is obtained from the variables and . These are then used in the design, estimation, or both stages. For these auxiliary variables, a superpopulation model [1, 2] can be used at the estimation stage of inference. It should be noted that all these methods are based on simple statistical models that describe the underlying relationships between the survey and auxiliary variables (linear regression models). Hansen  showed that under the parametric superpopulation, model misspecification can lead to substantial errors in inference. To solve this problem, nonparametric regression involving robust estimators in finite population sampling has been proposed [4–6].
When applying nonparametric kernel-based regression estimators over a finite range in estimating finite population parameters, one of the most common problems that is encountered is the bias at the edges . It is also known that kernel and polynomial regression estimators provide good estimates for the population totals when and [6, 8].
Despite the fact that high-dimensional auxiliary knowledge can be accounted for in the aforesaid estimators, the problem of sparse regressors in the design space makes kernel methods and local polynomials unfeasible, as performance deteriorates significantly as the dimension increases [8–10]. The reason behind this poor performance is due to the curse of dimensionality. The “curse of dimensionality” is a phenomenon induced by the sparsity of data in high-dimensional spaces leading to a decrease in the fastest attainable rates of convergence of regression function estimators towards their target curve as the dimension of the regressor vector grows. Friedman  provided an overview of the concept of the curse of dimensionality.
Given the challenge of the curse of dimensionality, one has to use different nonparametric estimators to retain a large degree of flexibility. Using recursive covering in model-based approach  and generalized additive modelling in a model-assisted framework  is one way to get around this curse of dimensionality when dealing with multivariate auxiliary information. These estimation methods come at a cost of reduced flexibility with the associated risk of increased bias [9–11, 14].
In this regard, a robust nonparametric estimator of finite population total based on a feedforward backpropagation neural network method is proposed in this paper to help in resolving the failures of previously identified estimation approaches. Despite the fact that kernel and local approximators have the same property as artificial neural networks (ANNs), they usually require a large number of components to achieve similar approximation accuracy . As a consequence, ANNs are regarded as an efficient method of performing parametric and nonparametric functional analysis.
2. Neural Network Estimator of Finite Population Total
In describing this estimator, the procedure provided in  is followed. Let be the survey variable associated with an auxiliary variable assumed to follow a superpopulation model under a model-based approach. A commonly used working model for the finite population issuch that , i.i.d with mean zero and are considered as the auxiliary information.
Also letbe the finite population total where are the sample units and are the nonsampled units. Assume that is given according to equation (2) with , i.i.d. Consider estimating based on a feedforward backpropagation neural network. The neurons which act as the basic building blocks can be considered as a nonlinear transformation of the input variables .
Feedforward neural network that has least one layer of hidden units is considered to be a complex network and allow for information feedback can be specified. Without loss of generality, the paper will only concentrate on the structure presented in equation (4), which is commonly used for a wide range of applications and has appealing features of being implemented in statistical softwares.
In the simplest case of one hidden layer with neurons, the network can be written to represent the network function as follows:with andwhere represents the vector of all parameters of weights of the network. is a given activation function. Regarding regression issues, sigmoid functions that resemble the distribution function of a genuine random variable, for example, typically produce good results. The logistic sigmoid and the bipolar sigmoid are two extensively used sigmoid functions that can be employed depending on the needed output. Whenever the goal is to approximate functions that map into probability space, the logistic function is preferred. The activation function is viewed as a smooth equivalent of the indicator function when the input signals are “squashed” between zero and one. As an illustration of the logistic function, consider the following:which tends to one (zero) since its arguments approach infinity (negative infinity). As a result, based on the received input signals, the logistic activation function creates partially on/off signals.
For this work, specifies a one-dimensional mapping from the input space to the output space. ; for each continuous function , any , and any compact set , there exists a function with uniform approximation qualities [17–19], for example,
This suggests that any regression function can be well estimated with a big sufficient number of neurons and the right parameters .
Therefore, a nonparametric estimate for is obtained by first choosing , which serves as a tuning parameter and determines the smoothness of the estimate. The parameter is estimated from the data by nonlinear least squares:with
Under the right circumstances, converges in probability as and , constant to the parameter vector which equates to the best approximation of by a function of type with
Also, under some stronger assumptions, the asymptotic normality of and thus the estimator of also follow the regression function . Therefore, the immediate consequence of these is that for .
The estimation error can be broken down into two asymptotically independent parts: , where the valueminimizes the sample version of . converges to the regression function for , due to the universal approximation property of neural networks. As grows with at an adequate rate, becomes a consistent nonparametric estimator of . As a result of these findings, Were and Orwa  showed that the corresponding estimate of the finite population total is as follows:which is the proposed estimator for the finite population total where .
As noted in , is a model-based estimator, so that all the inference is with respect to the model for the , not the survey design. This estimator is identical to that proposed in , except that the NN is replaced by a kernel-based regression. Lastly, this estimator can be used to estimate the population totals of a finite population as long as each of the unsampled elements has the same distribution as the sample.
It should be noted that(1).Where certain conditions are satisfied and if the activation function is Lipschitz continuous and strictly increasing, then it can be shown that the neural network estimate of the population total given by (12) with and given by (8) is consistent in the following sense. where with , provided that the number and the bound of the network weights satisfy such that where determines how fast the tail probability of the and decreases. White  showed that the appropriate choice for is such that as and , i.e., as .(2).Where certain conditions are met, it can be shown that the mean squared error defined by where denotes the true population total of the proposed estimators reduces to where estimate of is given as For the details and complete proof of these properties, see .
3. Coverage Properties
In order to compute and understand the coverage properties of the proposed estimator and how it is compared against other existing nonparametric regression estimators, the proposed estimator’s performance is compared to that of identified estimators: multivariate adaptive regression splines (MARS), generalized additive model (GAM), and local polynomial (LP), which can handle high-dimensional data through a simulation study.
Scenarios where the true function is the sum of two-dimensional linear function, two-dimensional quadratic function, and three-dimensional mixed function given below are considered: 2-dim linear: . 2-dim quadratic: . 3-dim mixed model: .
For all of the simulation performed, data are generated according to model 2 where . The auxiliary variable vector was generated from uniform (0,1) random vector. The errors were generated from i.i.d with noise level . is used as the activation function in this neural network.
1000 samples of sizes and were generated using simple random sampling from a population of size . Because of the hypothesized relationship between the study variable and the auxiliary variable, which must be depicted in the simulation, the sampling is done with indices.
Tables 1–3 summarize the findings of this simulation investigation. The unconditional bias (UB), unconditional mean square error (UMSE), unconditional relative mean square error (URMSE), and unconditional mean absolute error (UMAE) for said estimators at different sample sizes are shown in Tables 1–3. The MAE reveals how near the estimate being examined is to the true value, while the MSE and RMSE represent the estimator’s precision. For example, if TNN’s UMSE and URMSE are comparable, it will reasonably be considered “better” or “more desirable” than other estimators.
The deviation of the estimator’s expected value from the true total value is known as the bias of a population total estimator. All of the estimators of the finite population total discussed here are biased, but is the least biased. can be seen as most efficient estimator of finite population total in all models and sample sizes, closely followed by . Because of their relatively large bias values, the generalized additive estimator and the local polynomial regression estimator overestimate the finite population total under all models.
In addition, has lower mean square error, relative mean square error, and mean absolute error which is followed closed with the estimator . It is also observed that as the sample increases, all the estimators recorded a significant improvement in their performance in estimating the finite population totals. The local polynomial regression estimator with a significant reduction in bias and mean square error is noteworthy. This follows the argument by Stone : to improve the efficiency of the local smoother in high-dimensional spaces, one has to use a large sample size. The neural network estimator still outperforms other estimators with significant reduction in biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors as sample sizes increases.
The results provided in Table 3 which provides the results for performance of the estimators for a three-dimensional mixed model are noteworthy. Compared to the two-dimensional case, the performance of all the estimators has marginally decreased as indicated by marginal increase in biases, mean square errors, relative mean square errors, and mean absolute errors across all the estimators of finite population total. It is also observed that the generalized additive estimator and local polynomial regression still recorded poor performance in terms of biases, mean square errors, relative mean square errors, and mean absolute errors in estimating the finite population total. In the other case, has lower biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors which is followed closely by the estimator TMARS.
Even with the increasing sample size, all the estimators record a significant improvement in their performance in estimating the finite population totals. For instance, a local polynomial regression estimator is noted to have a significance reduction in bias and mean square errors as the sample size increases. The neural network estimator still remains the estimator of choice compared to other estimators as sample sizes increases.
The estimator’s conditional performance was assessed and compared to that of other finite population total estimators in high-dimensional space that have been identified. To do this, 1000 simple random samples were sorted using the sample means of value criterion. The samples were then grouped into sets of twenty samples such that the first set is made of samples with the lowest sample means of values, the second set consists of samples with means of that are larger than the sample means of the first set, and so on until the last set that consists of samples with the largest sample means of values. In each of the group, the bias, mean square error, relative mean square error, and mean absolute error were computed.
The results of group conditional bias (CB), conditional mean square error (CMSE), conditional relative mean square error (CRMSE), and conditional mean absolute error (CMAE) for the finite population total estimators , , , and are plotted against group average values denoted as Xbar in the fifty groups of mean of .
The conditional findings for the estimators under the two-dimensional linear model, two-dimensional quadratic model, and three-dimensional mixed model are shown in Figures 1–3. The bias characteristics of the numerous estimators differ significantly in the majority of circumstances. A closer look at the plots reveals that and have lower levels of bias overall, as seen by the displayed curves’ proximity to the horizontal (no bias) line at 0.0 on the vertical axis. Consequently, despite the complex structure of some of the plots, estimator emerges as the least biased for practically every set of auxiliary variable means and distinct models.
Similarly, plotting conditional MSE vs. group means of auxiliary variable means reveals that the estimators behave in a similar way. The lowest MSE values are produced by and . , for instance, has the lowest MSE of any of the other estimators in the majority of circumstances. For bias, MSE, and MAE, consistently outperforms all other estimators.
In this paper, the coverage properties of an estimator for finite population total based on a feedforward backpropagation neural network technique in nonparametric regression have been studied. The properties such as the bias, mean squared error, and mean absolute error have been computed for the case of high-dimensional datasets through a simulation, and the findings were compared with those of existing estimators such as multivariate adaptive regression splines (MARS), generalized additive model (GAM), and local polynomial (LP) which can handle high-dimensional data.
From the results, the following observations and conclusions have been made:(i)The neural network estimator estimates the finite population total better than all other robust estimators in high-dimensional case.(ii)The performance of local polynomial estimator in the estimation of finite population becomes poor as the dimension of the data increases.(iii)For all the estimators, as the sample sizes increases, biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors decrease for the four models considered.(iv)For all the estimators, as the dimension increases, biases, mean square errors, relative root mean square errors, mean absolute errors, and mean absolute percentage errors decrease for all the four models considered.
To this end, the main conclusion is that the estimator of finite population total based on the feedforward backpropagation neural network has proved to yield results with great precision, and therefore it is recommended for estimating finite population total. It should be noted that the proposed estimator has been considered in case of simple random sampling without replacement (SRSWoR). An extension to other sampling techniques such as stratification may be done since they rely on SRSWoR, and it is hypothesized that efficiency will be improved compared to other existing estimators in the literature.
The data used are artificial data from simulation process using a specified model.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
F. J. Breidt and J. D. Opsomer, “Local polynomial regression estimators in survey sampling,” Annals of Statistics, vol. 28, no. 4, pp. 1026–1053, 2000.View at: Google Scholar
A. H Dorfman, “Nonparametric regression for estimating totals in finite populations,” Proceedings of the Section on Survey Research Methods, American Statistical Association Alexandria, Alexandria, VA, USA, pp. 622–625, 1992.View at: Google Scholar
R. Odhiambo Otieno and T. Mbithi Mwalili, “Nonparametric Regression Method for Estimating the Error Variance in Unistage Sampling,” East African Journal of Science, vol. 2, no. 2, pp. 107–112, 2002.View at: Google Scholar
G. E Montanari and M. G. Ranalli, “On calibration methods for design based finite population inferences,” Bulletin of the International Statistical Institute, vol. 54, p. 60, 2003.View at: Google Scholar
P. J. Bickel and B. Li, “Local polynomial regression on unknown manifolds,” Lecture Notes-Monograph Series, vol. 54, pp. 177–186, 2007.View at: Google Scholar
A. Di Ciaccio and M. Ge, “A nonparametric regression estimator of a finite population mean,” Book of Short Papers, CLADAG, pp. 173–176, 2001.View at: Google Scholar
A. R. Barron, “Universal approximation bounds for superpositions of a sigmoidal function,” IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 930–945, 1993.View at: Google Scholar
F. Were, O. Orwa, and R. Odhiambo, “Estimation of finite population totals in high dimensional datasets,” in Proceedings of the Analytical Methods to Model Nature, Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya, 2022.View at: Google Scholar