Abstract

Predicting the adult height of children accurately has great social value for the selection of outstanding athlete as well as early detection of children’s growth disorders. Currently, the mainstream method used to predict adult height in China has three problems: its standards are not uniform; it is stale for current Chinese children; its accuracy is not satisfactory. This article uses the data collected by the Chinese Children and Adolescents’ Physical Fitness and Growth Health Project in Zhejiang primary and secondary schools. We put forward a new multidimensional and high-precision youth growth curve prediction model, which is based on multilayer perceptron. First, this model uses multidimensional growth data of children as predictors and then utilizes multilayer perceptron to predict the children’s adult height. Second, we find the Table of Height Standard Deviation of Chinese Children and fit the data of zero standard deviation to obtain the curve. This curve is regarded as Chinese children’s mean growth curve. Third, we use the least-squares method and the mean curve to calculate the individual growth curve. Finally, the individual curve can be used to predict children’s state height. Experimental results show that this adult height prediction model’s accuracy (between 2 cm) of boys and girls reached 90.20% and 88.89% and the state height prediction accuracy reached 77.46% and 74.93%. Compared with Bayley–Pinneau, the adult height prediction is improved 19.61% for boys and 13.33% for girls. Compared with BoneXpert, the adult height prediction is improved 25.49% for boys and 6.67% for girls. Compared with the method based on the bone age growth map, the adult height prediction is improved 15.69% for boys and 24.45% for girls.

1. Introduction

Children and adolescents are the future and hope of the development of the country and the nation. It is a research topic of great significance to scientifically describe and predict the height growth of children and adolescents. Through the research results, experts can intervene the abnormal conditions of height growth stage in time to ensure children’s normal growth.

Because of the difference in personal physical conditions and growth phase, every child has his own tempo of growth. Some of them have a high growth velocity during puberty, but this state ends early. Some adolescents have a low growth velocity, but their puberty ends late. Predicting state height during adolescents’ development can determine whether the child needs medical intervention and whether the treatment for growth is effective, so as to provide the basis for clinical diagnosis [1]. Therefore, we need to find a practical model, which can accurately predict children’s adult height and stage height.

The about the height growth of children and adolescents is difficult. It needs a large number of samples of normally developing adolescents. Researchers need to observe and record the multiple growth and development indicators chronically. Besides, the validation of whether the predicting adult height is correct is a long process. So, it is difficult to verify the accuracy in a short period of time. At present, statistical method is the main method in adult height prediction. It is used to classify samples and put forward prediction methods. The simplest height prediction method is the genetic height algorithm. It was proposed by Luo in Sweden in 1998, and it has been continuously improved [2]. The new genetic height prediction formula proposed in China is as follows [3]:

Actually, apart from genetic factors, the growth environment also has a great influence on growth and development. So, the relative error range of genetic height and adult height is very large.

Tanner–Whitehouse method is a classical adult height prediction method [4]. It was proposed by Tanner et al. This method is based on a series of multiple regression models compiled by different sex, age, and information (menarche, height increase, and bone age increase) to predict the adult height of children. It is obvious that the calculation is too complicated. This method originated from aboard and there is a large error in Chinese children’s adult height prediction [5]. Apart from that, Chinese hospitals do not have a TW3 adult height prediction model, which is adapted to Chinese children. Although some sports organizations use TW3 model, their criteria are made for the selection of athletes but not for normal adolescents’ adult height prediction.

In 1952, Nancy Bayley of the University of California in the United States proposed the Bayley–Pinneau height prediction method, which is also a classical adult height prediction method [6]. In this method, the developmental types of children and adolescents are divided into early developmental type, normal developmental type, and late developmental type. This method separately calculates the three types of adolescents’ proportion of heights in different bone age stages to adult height. When getting child’s age, bone age, and height at present, his adult height can be determined. The prediction formula is as follows:

Here, Y is the predicted adult height, H is the current height, and FP is the growth rate obtained by looking up the Height Growth Percentage Table.

In addition, there is an attribute model named PB curve proposed by Preece and Baines in 1978 to describe the growth curve. In 2013, the research team also corrected three errors in the manuscript [7].

In China, the most classic adult height prediction method is the RC (radius, ulna, and short bone-Chinese wrist bone development standard) height prediction method proposed by Zhang Shaoyan using samples from five cities in Dalian, Shijiazhuang, Shanghai, Wenzhou, and Guangzhou for several years [8]. The CHN (Chinese wrist bone development standard) method of bone age assessment is performed every three months. After the bone age is fully developed, the adult height of children and adolescents is compared with the height of any bone age, and the percentile of height increase is proposed in the form of Bayley–Pinneau height prediction method. However, this method is not applicable to children in most provinces and cities in China because of the small sample size and uneven coverage.

After the millennium, many scholars have proposed new methods of predicting adult height. In 2005, Sherar et al. proposed a cumulative height velocity curve based on maturity to predict adult height [9]. In 2009, Thodberg et al. proposed an adult height prediction method based on BoneXpert automatic determination of bone age [10]. This method uses the data of adolescents from the first longitudinal study and the third longitudinal study in Zurich and the height information of parents combined with Bayesian nerves [11]. The prediction of adult height on the Internet has a very good effect. Zhang Shaoyan also conducted a study on the new Chinese bone age reference standard based on BoneXpert in 2013 [12]. In 2010, Tim J Cole combined the nonlinear mixed-effects model and proposed the SITAR (superimposition by translation and rotation) model to describe the growth trend of adolescents [13]. Based on the height and weight data of children and adolescents, Michael et al. analysed the feasibility of a series of regression models such as linear, decision tree, and random forest in predicting the direction of children’s adult height. The experimental results found that the random forest regression model predicted the accuracy is better, and the adult height prediction for children aged 0–6 is better than the TW3 method based on bone age [14]. Zhi et al. screened and used 22 height-related SNP loci to construct a height prediction model from the perspective of genes. They verified it by 1220 Han populations in northern and southern China. The feasibility of the results needs further study. If more SNP loci closely related to the height of the Chinese Han population can be found, then the accuracy of the model can be further improved [15]. Shi proposed the growth trend map of children and adolescents in Zhejiang Province based on bone age. Then, he designed an algorithm based on the shortest Euclidean distance fitting to predict individual growth trends and adult height [16]. Kang established a multifeature time-series evaluation model through a neural network to evaluate the height of children and adolescents after 6, 12, 18, and 24 months of growth hormone intervention [17].

With the popularity of machine learning technology in recent years, experts and scholars in the medical field are also actively exploring the application of the multilayer perceptron model. Raoul combined floor sensors and multilayer perceptrons to interpret the sensor data to judge a person’s gait pattern and predict age to determine the aging. This study is expected to have further research and applications in healthcare and medicine [3]. Boyang Su is the first attempt to predict wall shear stress in stenotic coronary arteries using multiple linear regression, multilayer perceptron, and convolutional neural network architectures in machine learning [18]. Satish selects categorical features through the AdaBoost technique and develops a new stacking technique of multilayer perceptron, support vector machine, and logistic regression. The stacking technique performs better than other models on the PIMA Indian Diabetes dataset [19]. Ertugrul uses multilayer perceptrons to predict the number of people recovering from COVID-19 to determine potential donors for convalescent (immune) plasma (CIP) treatment of COVID-19 [20]. Lee used patient information for malaria diagnosis through machine learning models. They compared the predictive performance of six machine learning models: support vector machine, random forest, multilayered perceptron, AdaBoost, gradient boosting, and CatBoost [21].

To sum up, there are difficulties in height prediction research: the long-term data are difficult to collect, the result verification cycle is long, most prediction methods are based on foreign countries’ research, and the prediction accuracy rate is low. Although there are many methods for adult height forecasting, there is no unified forecasting method. With the continuous improvement of Chinese people’s living standards, the previous methods are no longer suitable for contemporary children and adolescents. Through clinical diagnosis data, It was found that CHN-BP (Chinese wrist bone development standard-Bayley–Pinneau) method is generally used in Zhejiang Province to predict adult height, and the results are generally higher. The average error of prediction results for children and adolescents with developmental delays has reached 4 cm [22]. In order to accurately predict the adult height of children and adolescents, this study proposes a model based on multilayer perceptron adult height and stage height prediction for children and adolescents. The main contributions of this study are as follows:(1)Constructed adult height dataset and state height dataset. These two datasets contain children and adolescents in some cities of Zhejiang Province including Hangzhou, Shaoxing, Wenzhou, and other cities. There are 1068 data items in the adult height dataset and 45,416 data items in the state height dataset. The content of the data includes height, weight, age, bone age, and so on.(2)Adopted the optimized multi-layer perceptron model to predict adult height. In addition to age and bone age, this model added BMI to improve the prediction accuracy. The loss function was improved. So that the model training effect of height prediction can be better.(3)Compared the MLP adult height model with other models. The accuracy of boys is increased by 15.69%–25.49%, and the accuracy of girls is increased by 6.67%–24.45%.

2. Main Research Work

2.1. Datasets

Data used in this study are mainly from the students’ physical health in recent years in Zhejiang Province. The number of samples with complete bone age, height, and BMI data reaches 88,752. Among the data, the student data that have been tested only once are excluded because it cannot be verified whether the model’s prediction is accurate or not. Therefore, there are 11814 boys and 10894 girls with an interval of more than 1 year, and a total of 45,416 stage height data. At the same time, 1068 people’s data (including 615 boys and 453 girls) were obtained from the Bone Age Research Center of Zhejiang Province, who had undergone bone age assessment and are now adults. Through telephone return visits, we obtained the adult height of these adolescents to verify if the prediction is accurate. The adult height data include gender, age, bone age, height, and weight. Tables 1 and 2 list the part of return visit data. Compared with the data collected by hospitals, the health status of the test samples taken from various primary and middle schools in Zhejiang Province is in line with the general situation. Processing the data of urban and rural schools together can better improve the generalization of the model.

2.2. Multilayer Perceptron

The multilayer perceptron is a mathematical model that mimics the activity mechanism of biological brain neurons. But it is not equivalent to the biological brain and nervous system. It does not rely on targets, objects, and datasets. It abstracts a certain function of biological neurons and uses simple mapping to approximate and implement complex mapping mathematical models [23].

The multilayer perceptron is promoted from perceptron. The main feature is that they have multiple neuron layers, also called deep neural networks. It is a neural network model constructed by the input layer, the hidden layer (one layer or more), and the output layer together. Compared with the single-layer perceptron, it can solve the linear inseparability problem [24]. Figure 1 shows the topology of a multilayer perceptron network with one hidden layer.

2.3. Adult Height Prediction

Based on the sklearn platform, this study builds an MLP regression model with adult height prediction for adolescents. The multilayer perceptron structure is designed as 3 layers: 1 input layer, 1 hidden layer, and 1 output layer. In the input layer, we select the adolescent’s bone age (xboneage), age (xage), height (xheight), weight (xweight), BMI (xbmi), and standard BMI difference (xdeltbmi) as input values. BMI (body mass index) is a commonly used international standard to measure the degree of body weight and health. The calculation formula of the BMI index is as follows:Here, W is the weight (in kilograms) and H is the height (in meters). Height, weight, and BMI are all important growth indicators to evaluate the growth and development of children and adolescents. Studies [25] have shown that excessive production of more aromatase in adipose tissue can induce androgen to be converted into estrogen. The estrogen represented by estradiol degrades the neoplastic matrix proteins of cartilage proliferating cells and mast cell populations. It promotes the maturation and apoptosis of cartilage mast cells, thereby accelerating the development of bone age, so that the development of obese children has the characteristics of early bone age, faster height growth, and delayed development of secondary sexual characteristics. The standard BMI used in this article comes from data released by the World Health Organization [26]. The expression formula of the input layer is as follows:

After many experiments, the number of neurons in the hidden layer is set to 100 neurons in the male and female models [27]. In terms of the activation function, the ReLU function has a very fast calculation speed. Its convergence speed is better than the sigmoid function. It solves the disappearance of the gradient. So, we set it as an activation function. Due to the characteristics of multi-hidden-layer processing of the multilayer perceptron, it is suitable for fitting nonlinear functions. Using BP (back propagation) algorithm and by adjusting the learning rate, we can avoid falling into the local optimal solution. For the human body, height is a value that will change slightly. Measured in the morning or evening, there will be a 1 cm change in height. Therefore, in order to avoid overfitting, we do not use mean absolute error as a loss function. The expression formula of new loss function is as follows:

The loss function divides the error between the expected value and the actual value into two parts: if the error is within 0.5 cm, it is regarded as the expected result; if the error exceeds the range of 0.5 cm, it is trained in a way similar to the mean square error.

The samples of male and female students with adult height data used in this article are 615 and 453. After dividing the training set and the validation set at a ratio of 3 : 1, the amount of data is limited, so all the training set data are imported in each iteration to improve the coverage of the model training samples. The result of the division of training set and test set is listed in Table 3.

After many experiments, it is finally determined that the boy’s adult height network performs 46,000 iterations to achieve the optimal convergence effect. The girl’s adult height network performs 48,500 iterations to achieve the optimal convergence effect. In addition, the initial learning rate is determined to be 0.00005, so that in the process of gradient descent, the network can quickly search for the optimal solution without falling into the local optimal situation.

2.4. Least-Squares Method

The basic idea of the least-squares method is as follows [28]: given a set of experimental data, these data are often pairs of ordinal numbers, according to the principle of minimizing the sum of squares of errors, and the best function matching of these data is found. The following is the standard formula of the least-squares method, where P(x) is the polynomial to be fitted:

The solution process of the least-squares method is the solution process of P(x). The curve equation is set to be fitted as a polynomial of degree n, and the formula is as follows:

The above equation is transformed into a matrix form, and the formula is transformed into , where A is the coefficient matrix. The formula of X is as follows, where k represents the number of points to be fitted:

Both sides of the equation are multiplied by the transposed matrix of X at the same time to get

Finally, the coefficient matrix A of the polynomial is obtained by solving the equation.

2.5. Average Growth Curve

According to the data in the Chinese Children and Adolescents’ Height Unit Standard Deviation Numerical Table published by the Beijing Capital Academy of Pediatrics [18], the heights in the 0SD column of the male and female table are selected as the standard heights corresponding to each age, and then the least-squares method is used to fit these points. The obtained curve equation is set as the standard height curve. The average growth curve of boys and girls is shown in Figure 2.

2.6. Personal Growth Curve

First of all, we set the height equation obtained by the least-squares method as . According to the principle of curve transformation, we add three parameters α, β, γ, so that the height equation becomes . In this formula, α represents the contraction and extension of the curve in the x-axis direction. Children with a fast puberty growth rate have an α value of <1, and a relatively flat child with α ≥ 1.

β represents the translation of the curve on the x axis. For children who develop early, the value of β is positive, and for children who develop late, the value of β is negative.

γ represents the amount of translation of the curve in the y-axis direction, and the value of γ in the growth curve of short children is relatively small.

The effect of the three parameters is shown in Figure 3. The solid line in the figure is assumed as the original curve . Then, the dotted curve is a curve that is compressed left and right after being affected by an α value greater than 1. The dashed-dotted line is the curve that is shifted to the left after being affected by the positive β value. The dashed line is the curve that translates downward after being affected by the negative γ value.

We use the previous multilayer perceptron adult height prediction model to obtain the height as the current child’s height Hfinal at the age of 18, and the parameter γ can be calculated as follows:

Under the premise that the current age height Hcurrent and the 18-year-old height are known, the simultaneous equations can be obtained:

Due to the measurement error and the different factors of each body’s constitution and growth environment, the above equations cannot accurately solve the values of α and β. The system of contradictory equations similar to the above is a problem often encountered in engineering. But, we can get a solution through the least-squares method to minimize the sum of squares of errors in each equation [29]. After obtaining the above personal growth curve, in order to verify the correctness of the curve, we substituted the age of these children in the second test into the curve equation and compared the predicted the target age height with the true value. It can be seen from Tables 4 and 5 that the prediction error is within a reasonable range, and the staged height calculated by this method is reliable. Figures 4 and 5 show the personal growth curves of the 40 children predicted by the model.

3. Result

After adjusting the multilayer perceptron models of male and female adolescents, the results show that the male adult height model performs well, and the maximum absolute error on the verification set is 3.25 cm. The performance of the female model is a bit worse than the male model, and its absolute error reached 5.13 cm. Due to the problem of the accuracy of the equipment and the posture of the human body during the height detection process, the obtained height data will have an error of 0.5 cm. For the human body, height is a value that will change slightly. Measured in the morning or evening, there will be a 1 cm change in height. Therefore, this study selects the difference between the predicted value and the true value within ±2 cm as the basis for accurate prediction. Table 6 respectively lists the error and accuracy of the adult height prediction model of male and female students on the adult height verification set. This model can be used for the next stage of height prediction.

In addition, this experiment compares the adult height prediction model for teenagers based on multilayer perceptions with the traditional BP (Bayley–Pinneau) method, the Bayesian network-based BX (BoneXpert) method, and the method based on bone age growth trend map (BAGTM) to compare the prediction results of the same batch of samples. Tables 7 and 8 list a part of prediction of boy’s and girl’s adult height by three models. The three models’ accuracy in whole test set results is listed in Tables 9 and 10.

Then, we adopt boxplots to analyse the error of this experiment. Boxplots can intuitively determine the discrete distribution of errors, understand the distribution of errors, and identify outliers in errors. The range of error is expressed by the vertical distance between the minimum and maximum values, and the interquartile range (IQR) of the error is expressed by the height of the box. Figure 6 shows the error analysis result of boys, and Figure 7 shows the error analysis result of girls. As can be seen from Figure 6, the quartile interval and median of MLP model prediction error are better than other prediction models. As can be seen from Figure 7, the quartile interval and median of MLP model prediction error are similar to the Bayley–Pinneau method, but the number of outliers is less than the Bayley–Pinneau method, indicating that the MLP model has a better stability. Based on the above analysis, we can conclude that the performance of the MLP model prediction model is superior to other prediction models.

On the basis of the adult height prediction model, we draw the personal growth curve and predict the stage height. The experiment has a large amount of stage data for verification, which can better show the generalization of the model. Different from the above twenty-person test experiment, the data of 12,793 boys and 11,427 girls were imported into the staged height prediction model, and the corresponding next test data were used for verification. The experimental results are listed in Table 11.

4. Discussion

Comparing the table analysis in the experimental results, it is concluded that the multilayer perceptron model’s accuracy about adult height prediction is superior to the Bayley–Pinneau prediction method and the adult height prediction method of BoneXpert, especially boys’ accuracy has been greatly improved. On this basis, the individual height growth curve of each child is drawn, and then the stage height is predicted. According to the stage height prediction results in Table 6, we can put it into the growth and development diagnosis and judge whether the child’s growth and development are abnormal, whether medical intervention is needed, and whether the intervention treatment is effective.

Compared with the adult height prediction model, the accuracy rate of the stage prediction model within 2 cm is about 13% lower. There are three main aspects to analyse the reasons for this situation: one is that in the process of feature learning of the MLP adult height prediction model, due to the relatively small amount of data, it cannot achieve the best results in terms of generalization. Therefore, the adult height prediction will produce certain errors. Because the data for girls in this article is relatively small, the prediction effect is worse than that of boys. The second is the error in personal curve matching. When solving equations, the least-squares method is based on the principle of minimizing the sum of the squares of the errors in each equation to obtain the values of the parameters α and β. The actual growth process of adolescents is also affected by factors such as environmental work and rest, which cannot be summarized into the model. The third is that the error of curve matching is superimposed on the error of the adult height prediction model, which expands the fluctuation range of the error, making the accuracy of the stage height lower than that of the adult height prediction.

How to solve the above situation and improve the accuracy of the model are mainly analysed in two aspects: one is to continue to adjust the neuron structure and parameters of the multilayer perceptron, increase the amount of training data, and then improve the accuracy of the adult high prediction model. The second is to optimize the method for determining the parameters α and β in the staged height prediction to improve the accuracy. The data are filtered out with an error greater than 2 cm, the characteristics are analysed, and the new methods to make state height predictions are found.

5. Conclusion

The adolescent adult height prediction model proposed in this study uses a multilayer perceptron to add the dimension of BMI on the basis of traditional age and bone age prediction, which improves the accuracy of adult height prediction. The experimental data of 1068 boys’ and girls’ adult height samples show that the accuracy rate of the boys’ adult height model within 2 cm reaches 90.20%, and the accuracy rate of the girls’ adult height model’s prediction results within 2 cm reaches 88.89%. This model surpasses the traditional adult height prediction method. The subsequent staged height prediction model uses the least-squares method to fit the average growth curve and combines the adult height prediction results to derive a growth curve suitable for everyone. Validation of the staged height data of 12,793 boys and 11,427 girls shows that the accuracy rate of the prediction results of the stage height of boys within 2 cm reaches 77.467%, and the accuracy rate of the stage height prediction results of girls within 2 cm reaches 74.931%. Compared with the traditional height prediction method, this staged height prediction model can more intuitively show the future growth of adolescents. There are two main shortcomings of this model: there is not enough sample data with adult height and because based on adult height prediction, the accuracy of state height prediction decreases greatly. The future research work mainly has two aspects: one is to collect more samples of adult height to improve the accuracy of adult height prediction and another is to improve the prediction method of state height so that the generated growth curve is more in line with everyone’s growth situation.

Data Availability

The dataset used to support the findings of this study was supplied by the Zhejiang Provincial Bone Age Research Center in China, under license, and the dataset involving privacy cannot be shared.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Basic Public Welfare Research Project of Zhejiang Province (Grant no. LGG22F020014), National Natural Science Foundation of China (Grant no. 62072410), and the researchers supporting project (Grant no. RSP2022R509) King Saud University, Riyadh, Saudi Arabia.