Abstract

Due to the harsh working environment of the construction machinery, a simple distribution cannot be used to approximate the shape of the rainflow matrix. In this paper, the Weibull-normal (W-n) mixture distribution is used. The lowest Akaike information criterion (AIC) value is employed to determine the components number of the mixture. A parameter estimation method based on the idea of optimization is proposed. The method estimates parameters of the mixture by maximizing the log likelihood function (LLF) using an intelligent optimization algorithm (IOA), genetic algorithm (GA). To verify the performance of the proposed method, one of the already existing methods is applied in the simulation study and the practical case study. The fitting effects of the fitted distributions are compared by calculating the AIC and chi-square () value. It can be concluded that the proposed method is feasible and effective for parameter estimation of the mixture distribution.

1. Introduction

The construction machinery, such as the wheel loader, often operates under complex and changeable operating conditions, which leads to the random characteristics of the load time history ‎[1, 2]. To extract load cycles for extrapolation from load time history, the rainflow counting method is used. The rainflow matrix contains the number of load cycles ordered by range and mean is obtained. As a result, the rainflow matrix also presents complex and uncertain features which cannot be modelled by a simple distribution ‎[3]. The finite mixture distribution is available as it is a flexible and powerful probabilistic modelling tool for univariate and multivariate data ‎[4].

The mixture distribution model is widely acknowledged in the area which involves the statistical modelling of data ‎[4]. The fitting effect of the load data can be improved when the mixture distribution replaces a poor single distribution. Nagode M ‎[5] and Tovo R ‎[6] suggested that the load range probability density function (PDF) should be defined as a linear combination of Weibull distributions. Klemenc J ‎[7] thought the PDFs of both load mean and load range are the multivariate normal PDF. Ni Yiqing ‎[8] used three types of finite mixture distributions (normal, lognormal, and Weibull) to model the stress range. For the load range in time domain, Wang Jixin ‎[9] adopted the normal mixture distribution. In order to have an accurate mixture distribution, the parameters estimation methods have been used. For the rainflow matrix, ‎[7] shows the advantages of mixture distribution with the Expectation Maximum (EM) algorithm on which the correlation between ranges and means is considered. According to the nonnegative characteristics of the load ranges of the rainflow matrix, Nagode M ‎[3] proposed a mixture of joint Weibull-normal (W-n) distributions to avoid the shortcomings by the multivariate normal PDF. Reference ‎[10] reviewed that load range of the rainflow matrix of the construction machinery could be defined as the Weibull distribution with 3 parameters. In the present paper, the mixture of joint Weibull with 3 parameters and normal distributions are used to model the rainflow matrix for load extrapolation.

The EM algorithm is proved to be the most appropriate algorithm to estimate parameters of the mixtures ‎[8, 9, 11], but it also has a strong sensitivity to the initial values, which is the cause that the solution does not converge. Some studies are aimed on the initial values ‎[12]. Laird proposed a grid search for setting the initial values ‎[13]. Leroux used the means of clusters as the initial values ‎[14]. McLachlan suggested the use of principal component analysis for selecting initial values ‎[15]. Nagode and Fajdiga proposed an alternative algorithm for the parameter estimation of the finite mixture distribution ‎[1618]. Bučar T used the results of the algorithm proposed by Nagode and Fajdiga as initial values to reduce the effects of the drawback ‎[11]. Although there are many improvements, the main idea of EM, which is the maximization of the conditional expectation, is unchanged. Consequently the results do not necessarily converge, especially for a great number of components ‎[3]. In order to avoid the above problem, the intelligent optimization algorithm (IOA) is employed to convert the problem into an optimization problem. The genetic algorithm (GA) ‎[19], particle swarm optimization ‎[20], and ant colony optimization ‎[21] belong to the IOA. In this paper, we prefer GA for its ability to maximize and search globally. And it is an attempt for the parameter estimation study of W-n mixture distribution.

The rest of paper is organized as follows: In Section 2, the W-n mixture distribution for the rainflow matrix is introduced. In Section 3, the proposed method based on GA is illustrated in detail. Comparative analysis are shown in Section 4, the feasibility and effectiveness of the proposed method are proven through the simulation study and case study. Section 5 ends the paper by presenting some conclusions and discussions.

2. W-n Mixture Distribution Model

In general, an arbitrary rainflow matrix can be defined as the form of mixture distribution functions ‎[3].which satisfies the following constraints:where represents a vector of component unknown parameters. Constant stands for a weighting factor, whereas represents a component PDF of two random variables which is the ranges and means points of the rainflow matrix. The corresponding vector parameter contains two parts, which are the Weibull distribution and normal distribution, respectively.

The random variables corresponding to the th component distribution are statistically independent as preconditions for the establishment of the mixture of joint W-n distributions ‎[22]. According to the difference between the 3-parameter Weibull distribution and the 2-parameter Weibull distribution, the load ranges are shifted by threshold . Therefore, the mixture of joint Weibull with 3 parameters and normal distributions is shown as follows:The cumulative distribution function (CDF) can be expressed asThe mixture of joint Weibull with 3 parameters and normal distributions is obtained through inserting (3) into (1).where and represent the scale and shape parameters of the Weibull distribution, is the threshold, and and stand for the mean value and standard deviation of the normal distribution, respectively.

3. Parameters Estimation Based on GA

The Maximum Likelihood estimation fits the parameters of the mixture PDF by maximizing the likelihood function (LF). To facilitate the calculation, (8), the log likelihood function (LLF), is used instead of the LF. For EM algorithm, the Newton-Raphson method is involved to solve the maximized LLF equation ‎[22]. However, unsuitable selection of initial values may cause the fact that the solution does not converge.

GA ‎[23] is a parallel and efficient global search method. The chromosome, population, and generation are involved. The unknown parameter is set as a chromosome representing the individual’s original characteristic. The unknown parameters of (8) form a set of chromosomes called population here. The dimension of the chromosome D represents the number of unknown parameters. The number of the initial populations NP is typically from 10 to 200. Through the operations of selection, crossover, and mutation of the populations, a new set of individuals is generated. The process of evolving from the predefined individuals to a new set of individuals is named a generation. And the number of generations NG is the number of evolutions. The general range of NG is 100~1000. GA is an optimization process. Equation (8) is the objective function (OBF). During the operations of selection, crossover, and mutation, the offspring are produced. The offspring meet the requirements of OBF which are selected as the parents of the next generation. The new parents produce new offspring through combination which depends on the crossover rate . The value range of is 0.25~1. According to the mutation rate with the value range of 0.001~0.1, the mutation step is handled. Then the offspring obtained after mutating become the next generation. The above procedures are repeated until the OBF values obtained by the parents and the offspring are nearly the same. Finally, the offspring are the desired parameter estimation results.

In addition, when dealing with the data, the number of components of the mixture distribution should be known. The Akaike information criterion (AIC) ‎[24] is used to estimate the components number . The value of AIC varies with the value of and represents the number of estimated parameters. The best mixture distribution model will be the model with lowest AIC value.

Above all, the process chart of the proposed method is shown in Figure 1.

4. Numerical Examples

In this section, two cases are considered for validation the feasibility of the parameter estimation method based on GA. At the same time, the EM is also applied. The FlexMix provides infrastructure for flexible fitting of mixture distributions using the EM algorithm. The performance has been shown in ‎[25, 26]. So the role of EM using the FlexMix here is to show the effectiveness of the method based on GA.

During the application of GA, the relevant parameter values are set as follows. The number of the initial populations NP is 10. The dimension of the chromosome is , which is related to the components number c. The number of generations NG is 1000. The values of crossover and mutation rates are 0.8 and 0.1, respectively.

4.1. Simulation Study

The simulation data are selected from ‎[22]. The 10000 random data are generated from the 3-component W-n mixture distribution. The parameters of the mixture are set with the following values: , , , , and .

The components number c of the mixture is supposed to be unknown. The value of c is set at the range from 1 to 5. According to (9), the curves of AIC values are obtained by the two parameter estimation methods based on GA and EM, respectively.

Figure 2 shows the AIC values under different components number. The figure also shows a partial enlargement of AIC values when c is not less than 3. According to the AIC value, the W-n mixture distribution with 3 components is the optimal choice for both methods. The estimated parameters and the AIC values are listed in Table 1.

Compare the estimated parameter values with the preset true values. Each true value is set as the reference value. The differences between the estimated values and the true values have been calculated. The percentages of true value occupied by the difference have been marked in Figure 3. The positive values indicate that the estimated parameter is larger than the corresponding true value, and the negative values show the contrary result. For the simulated data, the estimated parameters by GA and EM are both near the true value. The estimated error range by GA is [-3.25%, 8%] and by EM is [-1.68%, 2.6%]. Since the relevance exists not only between the scale and shape parameters but also between the mean and standard deviation, different degrees of influence on other parameters will occur when one of the parameters is changed. In order to verify the effectiveness of the method based on GA, the fitting test criteria are used.

The AIC provides the entitle description of the whole data without any other operations such as binning. So the AIC value is chosen as a criteria to judge the mixture distribution fitting effect. The AIC value obtained by GA is 223749, which is 0.0058% larger than the value 223736 obtained by EM. According to the comparison result of AIC values, the fitting effect of the two methods is almost the same. Then the modelled mixed PDFs for simulated data are obtained with the aid of the estimated parameters.

Seen from Figure 4, there is almost no difference between the PDF that are modelled by the GA and EM. Though the difference of 0.0058% is quite small, the projections of the models on xy surface have been obtained in Figure 5 to find the difference between these two methods in detail.

In Figure 5, the empirical densities are represented by colored circles, the fitted W-n mixture densities are shown by colored contour lines, and the color bar represents the densities of the contour lines. The densities and the shape of the contour lines show a good agreement of the empirical and fitted probability densities by the both methods.

4.2. Case Study

The mechanical components often suffer from fatigue load. The mean and amplitude of fatigue load and their corresponding frequencies are obtained through the rainflow counting method. In this section, the torque data of the semiaxle of wheel loader is collected for mixture distribution estimating. According to working mode and load characteristics, six phases ‎[1, 27] are included, shown in Figure 6(a). The spading phase has been extracted as an example for analysis. Figure 6(b) is the load data for analysis.

During the data acquisition, the torque data is obtained through the sensor. Equation (10) is used to convert the torque load to stress load. Figure 7 is the histogram of mean-range rainflow matrix.where is the torsion section coefficient, = 54 mm is the diameter of the semiaxle, and is the torque, Nm.

The components number of the mixture is supposed to be unknown. The mixture of joint Weibull with 3 parameters and normal distributions is the component PDF. The minimum stress range 16.64 is chosen as the shifted value . The value of is set at the range from 1 to 9. Due to the convergence problem, the search range in EM algorithm becomes [1, 7]. According to (9), the curves of AIC values are obtained by the two parameter estimation methods based on GA and EM, respectively.

Figure 8 shows the AIC values under different components number. The dashed line in the figure indicates that it does not converge. According to the AIC value, the W-n mixture distribution with 6 and 7 components is the optimal choice for GA and EM, respectively. The estimated parameters and the related values are listed in Table 2

The modelled mixed PDFs for stress load are obtained with the aid of the estimated parameters.

Seen from Figure 9, there is a little difference between the PDF that are modelled by the GA and EM. The projections of the models on xy surface have been obtained in Figure 10 to find the difference between the two methods in detail.

In Figure 10, the empirical densities are represented by colored circles, and the fitted densities are shown by colored contour lines. And the color bar represents the densities of the contour lines. The shapes of the contour lines by the two methods are similar. And the fitted probability densities of these two methods are in good agreement with their empirical probability densities. However, there are some differences between the two fitted probability densities figures. Three high-density regions existed in Figure 10(a), and two high-density regions existed in Figure 10(b). The position of region a in both figures are basically the same. But the area of region a in Figure 10(a) is larger. The position of region b in Figure 10(b) is almost between b and c of Figure 10(a). Another difference can be seen from the density values. The largest density in Figure 10(a) is , and the corresponding value in Figure 10(b) is . The inconsistency of the information on figures indicates that the two methods have different emphasis when doing estimation.

The true values of the distribution function of the spading phase are unknown. It is subjective to judge the fitting effect using the graphic method. To assess the quality of agreement between the modelled mixed PDF and the experimental rainflow matrix based on mean-range PDF, the fitting test criteria are needed. In addition to the AIC value, a chi-square () value ‎[11] is used. The test values are shown in Table 3.

The smaller the test value, the better the fitting effect. Comparing the fitting test values obtained by GA and EM, the AIC value obtained by GA is 20804, which is 0.13% larger than the value 20778 obtained by EM, while the comparison of shows a contrary result. The difference between the (0.892) obtained by EM and the (0.7319) obtained by GA is 21.87% with a factor of 1.22. In general, the lowest does not coincide with the lowest AIC ‎[11]. Due to the different focuses of different test criteria for fitting effect, the proposed method is feasible and effective from the comparison results.

5. Conclusions and Discussions

In order to avoid the possibility that the solution of the common used EM dose not converge, this paper proposes a parameter estimation method for mixture distribution from the view of optimization. The main idea of the method based on GA is to maximize the LLF. The application of the method based on GA does not require solving complex equations and problem involving convergence. And different kinds of distributions can be estimated only by changing the OBF. To verify the estimation performance of the proposed method, simulation study and practical case study are adopted.

The simulation study shows that the components number and unknown parameters were properly estimated by the both methods. And a good agreement of the empirical and fitted probability densities by the both methods were shown. In the practical case, the mixture of joint Weibull with 3 parameters and normal distributions is used for model fit of the semiaxle. The optimal number of components is selected as the minimum AIC value. The performances of the methods based on GA and EM have shown their advantages from different perspectives according to different fitting test criteria. The method proposed in this paper can be taken as an alternative especially when the EM does not converge. And the proposed method in this paper can be seen as an attempt for the study of the mixture distribution parameter estimation.

The proposed method used is GA which is a kind of IOA. The computational efficiency of the method will be seriously affected when the number of components increases. Other IOA can be considered for alternatives. In addition, not only the AIC but also Bayesian information criterion (BIC) ‎[28] can be used for components number determination. The principles of the methods are different. For the same dataset, the same parameter estimation method, different components numbers may be obtained through the AIC and BIC. Therefore, finding a method to determine the reasonable components number will be the next research content.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors acknowledge the funding support from National Natural Science Foundation of China, no. 51375202, and also acknowledge the support of Programs for Science Research and Development of Jilin Province, no. 20160101285JC.