Computing the Average Body Mass Index: A Study with Systematic Sampling Using Auxiliary Information
Background. The use of body mass index (BMI) is prevalent, to measure the fat in the body. Sometimes, during a clinical survey, different measures of body parts of people may be available, but the actual weight and height are not available. In this article, we have shown a method to estimate the body mass index using the measures of different body parts. Systematic sampling is to be applied only if the given population is logically homogeneous because systematic sample units are uniformly distributed over the population. Methods. The method of estimation for the mean of the study variable under systematic sampling using auxiliary information has been used to estimate the body mass index (BMI). We also have shown the effect of observational error in the estimation. The measures of different body parts are taken as auxiliary variables. The correlation coefficient between BMI and the circumference of different body parts has been obtained. The efficacy of methods in terms of mean square error has been obtained in the estimation of BMI. Also, the observations available on different body parts are assumed to be recorded with observational error. Thus, we propose a method of estimation of BMI in the presence of observational error. A simulation study has been conducted to demonstrate the effect of the observational error on the estimation of body mass index. Results. The properties of the proposed estimation method have been derived under large sampling approximation, and the conditions under which the proposed method is more efficient are found. We assume the presence of observational error in the study of 252 men. The efficiency of the difference estimators is better in the presence of observational error. Also, the presence of observational error does not change the properties of the estimators. Conclusions. The study provides an easy approach and the simplest way to obtain the BMI estimation with and without observational error. Thus, the suggested method may be used by statisticians for this problem and for many other similar problems in the estimation of mean.
In a survey, it may often happen that the data are observed with some error, and it is termed as measurement error or observational error. It is defined as the discrepancy between the observed value and the true value of the sample. There are several examples of real-life situations when data are obtained with errors [1, 2]. The observational error in the context of linear and nonlinear regression models also has been thoroughly discussed in the literature [3–6]. Several research studies have been performed on the observational errors in the estimation of ratio, product, and regression methods of estimation, which are available in [7–11]. If the data are systematically distributed, the systematic sampling has nice features of selecting every kth element by choosing the first element arbitrary. Many authors have done pioneered work using systematic sampling at the estimation stage (see [12–16]). The estimation of parameters for certain natural population is convenient using systematic sampling [17, 18]. The use of auxiliary variables is prevalent as ratio, product, and regression estimator. In case of estimating the volume of timber, the proposed ratio estimator under systematic sampling suggested that the leaf area or the girth of the tree may be taken as the auxiliary variable . The product estimators in the context of systematic sampling have been discussed in . Some pioneer works in systematic sampling have been introduced in [21, 22].
A study was conducted to derive a prediction equation for body fat percentage in men (n = 252, age 22–81 years) from simple body measurements . Body density determined by underwater weighing and body fat percentage was determined. The dataset includes the following variables, given in , pp. 45–48), for observational techniques: density determined from underwater weighing, percent body-fat from Siri’s equation , age in years, weight in lbs, height in inches, and circumference of the neck, chest, abdomen, hip, thigh, knee, ankle, bicep, arm, and wrist in centimetre. In this article, we propose a different method to estimate the body mass index rather than the already established multiple regression method in [23, 26]. The body mass index is highly correlated with the body parts, so in case if BMI is not known for a large population, we can estimate using sampling methods ratio, product, and difference estimator. We attempt to estimate the body mass index in place of body fat by using one of the auxiliary variables. The circumference of hip, thigh, knee, ankle, bicep, arm, and wrist can be taken as a single auxiliary variable to estimate the body mass index. The correlation coefficient for each auxiliary variable has been obtained. Since the data are natural, we used systematic sampling. An estimated optimal sample size by using the body mass index for a dietetic supplement has been calculated .
The method of estimation for the mean of the study variable under systematic sampling using auxiliary information has been used to estimate the body mass index (BMI). We also have shown the effect of observational error in the estimation. The measures of different body parts are taken as auxiliary variables. The correlation coefficient between BMI and the circumference of different body parts has been obtained. The efficacy of methods in terms of mean square error has been obtained in the estimation of BMI. Also, the observations available on different body parts are assumed to be recorded with observational error. Thus, we also propose a method of estimation of BMI in the presence of observational error. A simulation study has been conducted to demonstrate the effect of the observational error on the estimation of body mass index.
Suppose, the population consists of units from a finite population. The population size is divided into intervals such that . To select a sample, the first unit is selected at random from the first units. This sampling method is similar to that of selecting a cluster at random out of cluster (each cluster containing units) made such that ith cluster contains serially numbered units . After the sampling of units, we observe both the study and auxiliary variables. In this article, we consider a situation where each data value may be observed with error. In order to compute the effect of observational error, it is assumed that are the observed values instead of their true values for every unit. In such a way, these values are expressible in additive form as and . We consider that the errors are normally distributed with mean zero and variance . We assume that the error variables and are uncorrelated to each other as well as uncorrelated to all combinations with and , respectively. This implies = = = = = 0, and . Let , be the population mean and , be the population variance of the study and the auxiliary variables, respectively. is the correlation coefficient between the study and auxiliary variable. Furthermore, the sample means of the observed data are the unbiased estimators of the population means and , respectively.
The population means are and .
The sample means are the unbiased estimators of the population means and , respectively.
For determining variance, it is expressed by means of error terms and , which are defined as and .
We can write
Three well-known forms of the estimator have been proposed to estimate the body mass index. We use ratio estimator , product estimator , and difference estimator under systematic sampling.
The mean square error of the ratio estimator is given as
The mean square error of the product estimator is obtained in  as
The variance of the difference estimator is given as
2.1. The Proposed Estimation under Observational Error
The observation recorded during data collection is obtained with some error. We consider the severity of misleading inference based on data obtained with observational error. In this section, we propose ratio, product, difference, and mean estimators when the data are recorded with observational errors. In the previous section, we have used well-known methods of estimation, but in this section, we derive the expression for mean square error and variance for all estimators when the data are observed with error.
Considering that the observations are recorded with observational error, then the variance iswhere the term is the variance due to observational error.
There are situations when both the study variables and the auxiliary variables are observed with observational error. In that case, we propose the ratio estimator as
In order to obtain the bias and mean square error, we can write equation (8) as
For the bias of the estimator, we obtained from equation (10) as
Taking the expectation of equation (11), we get the bias of the estimator as
For the mean square error, we can write from equation (10) as
Taking the expectation of equation (13), we get the mean square error as
We can obtain the result under no observational error by putting and equal to zero. This will give the same result as obtained in . From equations (4) and (14), we can write that MSE in the presence of the observational error is always high.
The product estimator is proposed under the consideration of observational error as
To obtain the bias and mean square error, we can write equation (15) as
For the bias, by taking the expectation of equation (16), we get
For the mean square error, we can write from equation (16) as
Taking the expectation of equation (18), we get the mean square error as
By substituting the value and equal to zero, we can obtain the MSE without observational error which is the same as obtained in . From equations (4) and (19), we can conclude that MSE is always high in the presence of observational error.
The difference type estimator as proposed under the influence of observational error is
In order to obtain variance, we can write equation (20) as
By squaring both sides of equation (22) and taking expectation,
From equation (23), we can get the variance of the estimator as
To obtain minimum variance differentiate from equation (24) with respect to and equate it to zero, we get
By substituting the value of in equation (24), we get the minimum variance of the estimator as
From equations (6) and (25), we can write that MSE in the presence of observational error is always high. By putting and equal to zero, we can obtain the MSE under no observational error which is the same as given in equation (6).
A numerical study has been carried out to show the efficacy of the proposed methods. We have taken the data from https://lib.stat.cmu.edu/datasets/bodyfat. This is a comprehensive dataset that lists estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. With this population, two sample populations for have been chosen using systematic sampling.
In this manuscript, we also consider the presence of observational error in sample data. For the study of observational error, we have conducted a simulation study. A hypothetical population has been generated by using the mean and variance of original data under study. A population of size 5000 units with mean vector and a covariance matrix has been generated. The data matrices on , , , and have been generated using multivariate normal distribution for four variables with the mean vector and covariance matrix
Two sets for have been chosen by using systematic sampling. The mean and variances have been computed for all the auxiliary variables. The mean square error and the variance have been computed. The above process has been replicated 5000 times, and the corresponding grand mean has been obtained. The percent relative efficiency of an estimator with respect to the usual unbiased estimator is calculated by
The results of the numerical and simulation studies are given in Tables 1 and 2. Table 1 shows the MSE and PRE of the data linked in the abstract. From the table, we can see for all the measures of body parts, ratio and difference estimators perform better than the usual estimator. In all cases, the use of body measures of the hip has maximum efficiency over other body measures as it has maximum correlation coefficient with the body mass index. After the hip, the use of body measures of the thigh has more efficiency in the estimation. The body measures of the abdomen have also better correlation with the body mass index, so it has better efficiency. The body measures of the ankle and the forearm have less correlation coefficient with the body mass index, and the resultant has less efficiency in the estimation. The circumference of the wrist has minimum correlation coefficient with the body mass index. The mean square error for the wrist is maximum; thus, it is better not to use the wrist circumference in the estimation of the body mass index. Table 2 shows the results of the data with error variance (, = 0.5, 0.1). The MSE in the presence of observational errors is always high for all the estimators. The above results of different body measures follow the same trends in the presence of observational error. Hence, the properties of estimators do not change in the presence of observational error, but the value of mean square error is large. In a study related to the sample size, the value of mean square error is less when the size of sample is large, i.e., is small. When is large, the size of the sample is small and MSE is high for all the proposed estimators for all the body measures. This result can be seen from Tables 1 and 2.
We have given a different approach to estimate BMI rather than the available method . This study is used for systematic sampling by using auxiliary variables. The different measures of the body are used as auxiliary variables. From the study, we may conclude that a difference estimator under systematic sampling has maximum efficiency in the estimation of the body mass index. The efficacy of the methods depends on the correlation between the body mass index and the circumference of the different measures of the body. The correlation coefficient for the body measurement of the hip, abdomen, and thigh is good, so these variables provide better estimation for the body mass index when the circumferences of these parts are used as auxiliary variables. The circumferences of body parts the wrist, forearm, and ankle have the least correlation coefficient with the body mass index and thus may not be used in the estimation of BMI. From the tables, we can also conclude that the ratio estimator and difference estimator are always more efficient than the unbiased mean estimator. So it is better to use ratio and difference methods of estimation by using the different measures of the body as auxiliary variables. Since in this article, we are assuming the presence of observational error in the study of 252 men. The efficiency of the difference estimators is better in the presence of observational error. Also, the presence of observational error does not change the properties of the estimators. From Tables 1 and 2, we can conclude the effect of the observational error on mean square error. The above study provided an easy approach and the simplest way to obtain the BMI estimation with and without observational error. Thus, the suggested method may be used by statisticians for this problem and many other similar problems in the estimation of parameters of a natural population.
5. Limitations of Study
The present study proposes a simple method to estimate BMI. Although, the current methodology is confined to the homogeneous population or natural population or population for close geographical areas. The strengths contain the fact that BMI is cheap and relatively easy to use. The weaknesses include the fact that BMI percentiles are not extensively used, and the classification of BMI percentiles may not satisfactorily define the risk of comorbid conditions. In addition, for stratifying children and adolescents with a very high BMI, percentiles are not optimal. In spite of limitations, BMI and BMI percentiles have immense utility in the clinical setting, and the impending to be even more constructive as BMI is used more frequently and more suitably by primary care providers.
|BMI:||Body mass index|
|MSE:||Mean square error|
|PRE:||Percent relative efficiency.|
All the relevant data are included in the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
GKV conceptualized the study. NS developed the data analysis methodology. NS analyzed the data and drafted this manuscript. SPS reviewed the draft manuscript and provided technical inputs to GKV to finalize the manuscript. All the authors read and approved the final manuscript.
The authors thank the editor Professor Peter Dabnichki and learned referees for their constructive suggestions leading to improving the quality of contents and presentation of the original manuscript.
W. G. Cochran, “Relative accuracy of systematic and stratified random samples for a certain class of populations,” The Annals of Mathematical Statistics, vol. 17, no. 2, pp. 164–177, 1946.View at: Publisher Site | Google Scholar
M. N. Murthy, Sampling Theory and Methods, Statistical Publishing Society, Calcutta, India, 1967.
P. Biemer and S. L. Stokes, “Approaches to the modeling of measurement error,” Measurement Error in Surveys, Wiley, Hoboken, NJ, USA, 1991.View at: Google Scholar
W. A. Fuller, Measurement Error Models, John Wiley & Sons, Hoboken, NJ, USA, 1987.
C.-L. Cheng and J. W. Van Ness, “On estimating linear relationships when both variables are subject to errors,” Journal of the Royal Statistical Society: Series B, vol. 56, no. 1, pp. 167–183, 1994.View at: Publisher Site | Google Scholar
R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu, Measurement Error in Nonlinear Models: A Modern Perspective, CRC Press, Boca Raton, FL, USA, 2006.
Shalabh, “Ratio method of estimation in the presence of measurement error,” Journal of the Indian Society of Agricultural Statistics, vol. 50, no. 2, pp. 150–155, 1997.View at: Google Scholar
S. Maneesha and R. K. Singh, “Role of regression estimator involving measurement errors,” Brazilian Journal of Probability and Statistics, vol. 16, pp. 39–46, 2002.View at: Google Scholar
L. N. Sahoo, R. K. Sahoo, and S. C. Senapati, “An empirical study on the accuracy of ratio and regression estimators in the presence of measurement error,” Monte Carlo Methods and Applications, vol. 12, no. 5, pp. 495–501, 2006.View at: Publisher Site | Google Scholar
N. Singh, G. K. Vishwakarma, and J. M. Kim, “Computing the effect of measurement error on efficient variant of ratio and product estimator using auxiliary variable,” Communications in Statistics-Simulation and Computation, 2019.View at: Publisher Site | Google Scholar
N. Singh and G. K. Vishwakarma, “A generalised class of estimator of population mean with the combined effect of measurement errors and non-response in sample survey,” Revista Investigación Operacional, vol. 40, no. 2, pp. 275–285, 2019.View at: Google Scholar
W. G. Cochran, “Errors of measurement in statistics,” Technometrics, vol. 10, no. 4, pp. 637–666, 1968.View at: Publisher Site | Google Scholar
W. Gautschi, “Some remarks on systematic sampling,” The Annals of Mathematical Statistics, vol. 28, no. 2, pp. 385–394, 1957.View at: Publisher Site | Google Scholar
W. G. Madow, “On the theory of systematic sampling, II,” The Annals of Mathematical Statistics, vol. 20, no. 3, pp. 333–354, 1949.View at: Publisher Site | Google Scholar
W. G. Madow, “On the theory of systematic sampling, III. Comparison of centered and random start systematic sampling,” The Annals of Mathematical Statistics, vol. 24, no. 1, pp. 101–106, 1953.View at: Publisher Site | Google Scholar
R. M. Williams, “The variance of the mean of systematic samples,” Biometrika, vol. 43, no. 1-2, pp. 137–148, 1956.View at: Publisher Site | Google Scholar
A. A. Hasel, “Estimation of volume in timber stands by strip sampling,” The Annals of Mathematical Statistics, vol. 13, no. 2, pp. 179–206, 1942.View at: Publisher Site | Google Scholar
A. L. Griffth, “The efficiency of enumerations,” Indian Forest Leaflets, Forest- Research Institute, Dehradun, India, 1945.View at: Google Scholar
A. K. P. C. Swain, “The use of systematic sampling in ratio estimate,” Journal of Indian Statistical Association, vol. 2, no. 213, pp. 160–164, 1964.View at: Google Scholar
N. D. Shukla, “Systematic sampling and product method of estimation,” in Proceedings of the all India Seminar on Demography and Statistics, BHU, Varanasi, India, 1971.View at: Google Scholar
Z. Khan, J. Shabbir, S. Gupta, and A. Shamim, “An optimal systematic sampling scheme,” Journal of Statistical Computation and Simulation, vol. 90, no. 11, Article ID 29023, 2020.View at: Publisher Site | Google Scholar
Z. Khan, J. Shabbir, and S. Gupta, “Generalized systematic sampling,” Communications in Statistics - Simulation and Computation, vol. 44, no. 9, pp. 2240–2250, 2015.View at: Publisher Site | Google Scholar
K. W. Penrose, A. G. Nelson, and A. G. Fisher, “Generalized body composition prediction equation for men using simple measurement techniques,” Medicine & Science in Sports & Exercise, vol. 17, no. 2, p. 189, 1985.View at: Publisher Site | Google Scholar
A. R. Behnke and J. H. Wilmore, Evaluation and Regulation of Body Build and Composition, Prentice-Hall, Hoboken, NJ, USA, 1974.
W. E. Siri, “The gross composition of the body,” Advances in Biological and Medical Physics, vol. 4, pp. 239–280, 1956.View at: Publisher Site | Google Scholar
R. W. Johnson, “Fitting percentage of body fat to simple body measurements,” Journal of Statistics Education, vol. 4, no. 1, pp. 1–8, 1996.View at: Publisher Site | Google Scholar
C. N. Bouza, S. M. Allende-Alonso, G. K. Vishwakarma, and N. Singh, “Estimation of optimum sample size allocation: an illustration with body mass index for evaluating the effect of a dietetic supplement,” International Journal of Biomathematics, vol. 12, no. 8, Article ID 1950086, 2019.View at: Publisher Site | Google Scholar