Identification of Mine Water Inrush Source Based on PCA-FDA: Xiandewang Coal Mine Case
When mine water inrush accidents occur, timely and accurately identifying the water inrush source plays an important role in determining the cause of water inrush and making a solution to a disaster. According to the differences of water chemical composition in each water sources of mine, eight kinds of indicators of water chemical composition were selected as sample variables for water inrush source identification. On this basis, an identification model of water inrush source was established by using principal component analysis (PCA) and Fisher discriminant analysis (FDA) combined. The model was used to identify the water inrush source of 14 groups of training samples and 12 groups of samples to be judged in different water sources of the Xiandewang coal mine, and it was compared with the results of the conventional identification model which used the FDA method. Results of this study showed that having processed data by using the PCA method can effectively eliminate the effects of information superposition between sample indicators, and the identification accuracy of mine water inrush source was significantly increased. Related study in this paper can provide some basis and reference for the study of mine water inrush source identification technology.
Coal is a kind of primary energy. Various kinds of disaster accidents often occur in the process of coal mining, and water inrush is one of the factors that cause serious accidents in coal mines [1–3]. Many water inrush cases show that when a water inrush accident occurs in a coal mine, timely and accurately identifying the source of water inrush can find causes of the occurrence quickly and make solutions to the water inrush timely, which is important for the prevention and control of water inrush disasters in a coal mine [4–7]. People often use groundwater chemistry, isotope, water temperature, water level, and other indicators to identify the water inrush source. Experience has shown that the water chemistry analysis method is a more effective method among them [8–10]. The basis of using the water chemistry analysis method to identify the water inrush source is that the groundwater of different aquifers has different water chemical composition. These components which can be used to distinguish the characteristics of groundwater in different aquifers are called “standard components”, and “standard components” used more extensively are conventional components like Na+, K+, Ca2+, Mg2+, Cl−, SO42−, HCO3−, alkalinity, acidity, hardness, TDS, and pH. Based on the analysis of the water chemical composition of aquifers, different researchers have proposed different methods for water source identification, and examples below are representative  used the fuzzy comprehensive evaluation method and the cluster analysis method to identify the water inrush sources of the Mindong No. 1 mine  used the maximum likelihood method to identify the potential water sources of the Sanshandao Gold Mine based on the hydrogeochemical and isotopic analyses  used the distance discrimination method to identify mine inrush water sources, and the results were verified through the grey relational analysis method. Liu et al. (2015) established a water source identification model based on the BP neural network theory and randomly selected water samples collected during mine excavation to predict water source identification. These methods above have a positive effect on enriching the water inrush source identification technology, but they did not take information superposition between identification indicators of water chemical into consideration, which caused problems like low precision of classification and long response time. To solve these problems, this paper introduced the principal component analysis (PCA) method into the water inrush source identification technology, refined the water chemical indicator data of different water sources, converted multiple related indicator variables into a new independent one by linear combination, and eliminated the effects caused by information superposition between indicators so that characteristics of different water sources can be described more effectively. On this basis, the Fisher discrimination analysis (FDA) method was combined to establish a water source identification analysis model. By using this model, the water inrush source of the typical coal mine was identified and the results of identification were good.
2. Methods of Mine Water Inrush Source Identification
2.1. Principal Component Analysis (PCA)
As a statistical method, PCA is aimed at converting a set of potentially correlated variables to a new set of variables that are linearly uncorrelated by means of orthogonal transformation; and the new variables obtained through transformation are known as principal components and they are capable of keeping the original information to be revealed unchanged in the aspect of expressing information. Data processing based on PCA plays a part in effectively eliminating the correlation of high-dimensional data, realizes data dimension reduction, and simplifies data structures [14, 15]. A mathematical model of PCA is expressed as follows.
variables () of a raw data matrix form a linear combination denoted as , namely, where , is uncorrelated with (; ), has the maximum variance if compared with all linear combinations of ; among all linear combinations of that are uncorrelated with , has the maximum variance; has the maximum variance in comparison with those of all linear combinations correlated with none of ; and the sum of variances of is equal to that of variances of .
Steps of figuring out its principal components are generally as follows: (1)The original variable data are normalized, which is followed by calculations of a covariance matrix for all variables(2)Eigenvectors of the covariance matrix figured out can be ranked as and the corresponding unit eigenvectors are . In the event of conversion matrix , row of represents an eigenvalue in the place of ; and a variance of principal component is just an eigenvalue in place of as well(3)The variance contribution rate of principal component is denoted by . If principal components are adopted, the cumulative variance contribution rate of principal components and is expressed in (4)Determination of the number of principal components depends on the cumulative variance contribution rate in general. Usually, the fact that the cumulative variance contribution rate reaches at least 80% indicates that the following requirement can be satisfied: sample information of the first principal components extracted contains most of the information about primary samples
2.2. Fisher Discrimination Analysis (FDA)
FDA serves as a multivariable statistical analysis method that uses eigenvalues of a research object to identify its type. Basic thoughts of FDA can be described as follows. Dimensionality reduction for a multidimensional data is achieved through a projection so as to simplify the corresponding problem and determine a discrimination function based on a principle of maximum interclass distance and minimum within-class distance [16–19]. A mathematical model of FDA is expressed as follows.
It is assumed that there are ensembles and (); and their corresponding mean vectors and covariance matrixes are and , respectively. In case that samples with a size of are taken from the ensemble , that is, then, () stands for a projection of sample on a space axis, which can be denoted as follows: where and are mean values of the selected samples and the total sample, respectively; in which case, intragroup deviation for a set of samples is where refers to a sample difference of sample () projection on the space axis; and is a “total within-class scatter” matrix. Between-group deviation of the sample is where is a “within-class scatter” matrix of the sample. In order to discriminate it from the total sample under the circumstance that a discrimination function is adopted, can be expressed as
If and its partial differential is calculated, then
In Equation (7), is its eigenvalue. Through simplification, the following equation is acquired.
In the above equation, is an eigenvector to which the maximum eigenvalue corresponds; and represents the ratio of a within-sample sum-of-squared difference for the total sample to a sum-of-squared difference between adjacent samples. As can be observed from the above equation, both the maximum eigenvalue and eigenvector of can be obtained, thereby figuring out its discrimination function.
2.3. Procedures of Mine Water Inrush Source Identification Based on PCA–FDA
The discriminating idea of using the PCA and PDA methods to identify the water source of mine water inrush is shown in Figure S1. The discriminating process is as follows: (1)Water sample data are normalized(2)A correlation matrix of normalized data is figured out to analyze and identify correlation of variables(3)PCA is conducted for correlated variables for the purpose of reducing the dimensionality of such variables(4)FDA is utilized to identify training samples, determine differences between their actual values and discriminant values, and verify accuracy of the model(5)FDA model is adopted to identify samples to be discriminated
3. Project Case
3.1. Study Area
3.1.1. Physical Geography
Under the jurisdiction of Baita Town in Shahe City, the Xiandewang coal mine is located at the southwest 35 km away from Xingtai City of Hebei Province, China. The location is shown in Figure 1. Being high in the west and low in the east, it is undulating in terrain and has maximum and minimum elevations of +339.6 m and +194.10 m, respectively. As this region belongs to a semiarid warm temperate continental monsoon climate, precipitation mainly takes place from July to September each year. For the past 10 years, annual precipitation ranges between 351.5 mm and 800 mm, generating an average annual precipitation of 507.74 mm. Moreover, its geographical coordinates are 114°1115 ~114°1500E and 36°4845 ~36°5500N.
According to hydrogeologic prospecting data of the mining area, aquifers threatening safe mining include an Ordovician limestone karst fractured aquifer, a Permian sandstone fractured aquifer, and Daqing and Yeqing Carboniferous limestone karst fractured aquifers. Due to differences in chemical components, structures, lithological associations, and fracture development of various rock strata, their water-bearing characteristics and water yield properties are also significantly different. Once a fault is encountered or any damage is caused to roof-floor strata in the process of coal mining, it is much likely for underground water in an aquifer to burst into a mine and thus result in water inrush. For this reason, an investigation on mine water inrush source identification for Xiandewang coal mine is beneficial for water inrush control there.
4. Indicators for Source Identification
Considering that each aquifer contains diversified water chemical compositions, it is infeasible to adopt chemical constituents of a type of water as indicators for source identification. Taking into account the groundwater detection of coal mines, a brief water quality analysis method is generally used, and, refer to relevant literature [20–22], contents of 8 chemical constituents such as hardness (), pH value (), Na++K+ (), Ca2+ (), Mg2+ (), Cl- (), SO4- (), and HCO3- () were selected as identification indicators of a mine water inrush source discrimination model. From underground water monitoring data of the Xiandewang coal mine over the years, hydrochemical data of 14 water samples taken from 4 different aquifers were used as training samples. To be specific, the aquifers consist of an Ordovician limestone karst fractured aquifer (I), a Permian sandstone fractured aquifer (II), a Carboniferous Daqing limestone karst fractured aquifer (III) and a Carboniferous Yeqing limestone karst fractured aquifer (IV). As for sample data, please refer to Table 1.
4.1. Data Processing Based on PCA
Data of training samples in Table 1 were firstly normalized, and their normalized values were equal to a difference between the actual value and the minimum value divided by a difference of the maximum and the minimum values. Table 2 presents the normalized data. Subsequently, the normalized data were processed based on PCA and the correlation coefficient matrix for hydrochemical constituents of various water sources is shown in Table 3. It can be observed from Table 3 that such 8 hydrochemical constituents are clearly correlated with each other. For example, a correlation coefficient of Ca2+ and the total hardness is 0.983, while that of Na++Ks+ and Cl- is equal to 0.919; in addition, information about sample indicators significantly overlaps, so that it is inevitable to affect accuracy of the corresponding mine water inrush source identification model when data about such 8 water samples are utilized to identify sources of water inrush. Consequently, misjudgments may be made. Therefore, data of training samples were processed based on PCA by virtue of the abovementioned PCA mathematical model. In this way, a cumulative contribution rate diagram of all indicators can be acquired. In Figure 2, it is clear that the first 4 indicators include 97.37% of the raw data information content and thus can effectively summarize the information of raw data.
In order to reconstruct the new characteristics of the original data, the eight indicators (, , , , , , , and ) that had a certain correlation were recombined into a new set of independent indicators (, , , and ) to replace the original indicators; according to the PCA calculation, a PCA matrix can be obtained, as shown in Table 4.
Each value in the principal component analysis matrix in Table 4 indicates the degree of influence of the four extracted principal components on the original eight indicators ; higher values in the matrix indicate a closer relationship between the indicator and the principal components, and at the same time, the values in the matrix are the coefficients of the factor expressions of the indicators. According to Table 4, the extracted principal components and the original indicators are expressed as
4.2. Analysis of Source Identification
The four principal components and gained through PCA were selected as identification indicators; then, water sources could be classified into 4 types in consistency with differences in aquifers. On the assumption that the covariance matrices in the group are equal, the coefficients of the discriminant formula are calculated according to the formula (6) in the FDA principle mentioned above. The principle of determining the coefficient is to maximize the distance difference between the various types and minimize the distance within the type. The following discriminant function is obtained as follows:
Central values of such 3 discrimination functions in 4 groups are presented in Table 5. Taking the first discriminant, for example, its central values for type I water source (Ordovician limestone), type II water source (Permian sandstone water), type III water source (Carboniferous Daqing limestone water), and type IV water source (Carboniferous Yeqing limestone water) are 3.164, -2.261, 1.406, and -3.248, respectively. The three discriminant functions calculate the coordinates of each water sample in each dimension. By comparing the distance between the water sample to be judged and the center value of the four types of water source groups, the water source group to which the sample belongs is determined.
4.3. Validity Check for Source Identification
For the sake of validity check for a PCA and FDA-based water source identification model, all training samples given in Table 1 were substituted in the established identification model one by one for reverse identification. Regarding reverse identification results, they have been presented in Table 6. It can be observed from this table that misidentification is incurred for water samples 4 and 11 with identification accuracy at . In terms of the conventional Fisher water source identification model, water samples 3, 4, and 12 are subjected to misidentification as well and their accuracy of identification is only up to . Comparison of identification results between such 2 models signifies that the PCA and FDA-based water source identification model is rather reliable and can meet source identification requirements of mine water inrush better.
To further check the accuracy of model, the verified PCA and FDA-based water source identification model was utilized to identify 12 samples taken from the Xiandewang coal mine. As for relevant identification samples and results, they have been shown in Table 7. It can be known from this table that except for water sample 4 with misidentification, identification results of other water samples are consistent with their actual outcomes, generating identification accuracy at . Therefore, such a model can be used to classify water samples to be identified effectively. Additionally, the conventional Fisher identification model was also employed for water source identification and it results in identification accuracy of . To sum up, the PCA and FDA-based water source identification model performs better in identifying water samples.
According to the characteristics of mine water inrush source, the water source identification analysis model based on PCA-FDA and the conventional water source identification analysis model based on FDA were used to identify the water inrush source of the 14 groups of training samples and 12 groups of samples to be judged, respectively, and the identification accuracies were 85.7% and 78.6%, 91.7% and 75% correspondingly. The result shows that after processing data with the PCA method, the identification accuracy of mine water inrush source has increased greatly, compared with using the FDA method only.
In the process of water source identification, the eight indicators in the original data are reduced to four principal components; the PCA method can project high-dimensional data into a low-dimensional space, complete the process of dimensionality reduction of the data, and accurately characterize the water chemistry of each water source highly precisely with less independent identification indicators. This method greatly reduces the number of input-influencing factors when building the identification model and eliminates the effects caused by information superposition between each identification indicator in the identifying process. The FDA method can gather samples of the same kind together, distinguish the different samples, and achieve the analysis of mine water inrush identification. Combining these two methods can establish an accurate water source identification model with minimum characteristic information, simplify data structure, shorten analysis time, and improve analysis precision, which is a more effective analysis method for identifying water inrush sources.
In this case, when the data of the water sample was returned to the identification model, there were a few misjudgments because the number of training samples was not enough relatively. In order to increase the prediction accuracy of the model, in subsequent research, hydrochemical data should be collected in large quantities, comprehensive database of the mine hydrochemistry should be established, the training of the model should be enhanced, and the identification accuracy of the identification model should be improved.
All data included in this study are available upon request by contact with the corresponding author.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was financially supported by the National Natural Science Foundation (41702270), Guizhou Science and Technology Department (Qian Ke He Ji Chu (2019)1413; Qian Ke He Zhi Cheng (2020)4Y048, (2020)4Y007, and (2017)5788), Department of Education of Guizhou Province ((2018)113).
The technology idea of water inrush water identification. (Supplementary Materials)
J. Qian, L. Wang, L. Ma, Y. H. Lu, W. Zhao, and Y. Zhang, “Multivariate statistical analysis of water chemistry in evaluating groundwater geochemical evolution and aquifer connectivity near a large coal mine, Anhui, China,” Environmental Earth Sciences, vol. 75, no. 9, p. 747, 2016.View at: Publisher Site | Google Scholar
Z. L. Guan, Z. F. Jia, Z. Q. Zhao, and Q. Y. You, “Identification of inrush water recharge sources using hydrochemistry and stable isotopes: a case study of Mindong No. 1 coal mine in north-east Inner Mongolia, China,” Systems Science, vol. 128, no. 7, p. 200, 2019.View at: Publisher Site | Google Scholar
J. Zhou, X. Z. Shi, and H. Y. Wang, “Water-bursting source determination of mine based on distance discriminant analysis model,” Journal of China Coal Society, vol. 35, no. 2, pp. 278–282, 2010.View at: Google Scholar
I. M. Farnham, K. J. Stetzenbach, A. K. Singh, and K. H. Johannesson, “Deciphering groundwater flow systems in Oasis Valley, Nevada, using trace element chemistry, multivariate statistics, and geographical information system,” Mathematical Geology, vol. 32, no. 8, pp. 943–968, 2000.View at: Publisher Site | Google Scholar
H. J. Chen, X. B. Li, A. H. Liu, and S. Q. Peng, “Identifying of mine water in rush sources by Fisher discriminant analysis method,” Journal of Central South University, vol. 40, no. 4, pp. 1114–1120, 2009.View at: Google Scholar
S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. R. Mullers, “Fisher discriminant analysis with kernels,” in Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), pp. 41–48, Madison, WI, USA, August 1999.View at: Publisher Site | Google Scholar
P. C. Yan, M. R. Zhou, Q. M. Liu, R. Wang, and J. Liu, “Research on the source identification of mine water inrush based on LIF technology and PLS-DA algorithm,” Guang pu xue yu guang pu fen xi= Guang pu, vol. 36, no. 9, pp. 2858–2862, 2016.View at: Google Scholar
M. Sugiyama, “Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis,” Journal of Machine Learning Research, vol. 8, pp. 1027–1061, 2007.View at: Google Scholar
X. Liu, L. W. Chen, M. L. Lin, and S. D. Li, “Fisher recognition analysis for coal mining inrush water source under mining-induced disturbance and inversion of groundwater recharge relation,” Hydrogeology & Engineering Geology, vol. 40, no. 4, pp. 36–43, 2013.View at: Google Scholar