#### Abstract

In view of the problems of low measurement accuracy and repeated calibration during the use of coal mine water quality analysis, the hyperspectral reflection noncontact measurement technology was proposed to solve the existing problems. KCl, NaCl, pH, NaHCO_{3}, and CaCl_{2}were used to indicate the characteristic ion information of Na^{+}, K^{+}, Ca^{2+}, Cl^{−}, HCO_{3}^{−}, and pH mine water in the laboratory, and 2220 spectral data were obtained by spectral determination. Savitzky–Golay convolution smoothing was used to smooth and denoise the original spectral data of each ion, and the relationship between the spectrum and the concentration of each reagent was obvious after smoothing and denoising pretreatment. The principal component regression method was used to build the inversion model of each ion content, and through the modeling study, the prediction set of KCl was found: the coefficient R^{∧}2 reaches 0.907, RPD is up to 2.7; the prediction set of NaCl was found: the coefficient R^{∧}2 reaches 0.957, RPD is up to 3.1; the PH prediction set was found: the coefficient R^{∧}2 reaches 0.785, RPD is up to 2.1; the prediction set of NaHCO3 was found: the coefficient R^{∧}2 reaches 0.137, RPD is up to 1.2; the prediction set of CaCl2 was found: the coefficient R^{∧}2 reaches 0.622, and RPD is up to 1.7. The results show that the hyperspectral method can play a better role in the extraction of K^{+}, Cl^{−}, Na^{+}, Ca^{2+}, and pH. It is difficult to extract HCO_{3}^{−} ions.

#### 1. Introduction

Water hazard is one of the main threats to the safety of coal mine production, which causes serious loss of life and property. The prevention and control of water disaster in coal mines take water filling channel, water filling source, and water filling intensity as the main objects and take exploration, prevention, blocking, dredging, drainage, interception, and monitoring as the main means. Water samples are collected after water inrush or water gushing occurs in a mine, and the source of the water inrush or water gushing is judged by using the chemical composition of the water. It is a method widely used by technicians of geological survey and water control engineering in coal mines.

In foreign countries, the rock mass structure of coal seam floor and the prevention and drainage technology have been studied in depth, and a lot of experience has been accumulated in the mechanism of water inrush and the identification of water hazards. In the book Hydrogeochemistry written by Clevers et al., the application of groundwater pollution and chemical evaluation in hydrochemical analysis is systematically discussed from the perspective of hydrogeochemistry [1–4]. Clevers et al. obtained it by using the 3D edge detection seismic attribute method [1–4]. Clevers et al. used hydrological observation and a tracer test to test the effect of the tunnel drainage system [1–4]. However, there is little research work on the application of mine water chemistry and the identification of mine water inrush sources.

The main method of discriminating the source of water inrush in coal mines in China is the conventional hydrochemical discrimination method. By measuring the eight most widely distributed ions in groundwater, such as Ca^{2+}, Mg^{2+}, K^{+}, Na^{+}, CO3^{2−}, HCO_{3}^{−}, SO4^{2−}, and Cl^{−}. Its concentration accounts for more than 90% of the total ion concentration in groundwater, as well as the characteristic ion ratio, hardness, temperature, TDS index, and pH value [5–10]. The mine water chemical data of Taoyuan Coal Mine was processed by using Piper’s three-line diagram [5–7]. The hydrochemical characteristics of each aquifer in the Xuzhou mining area were introduced [8, 9]. Conventional hydrochemical methods were used to carry out hydrogeochemical analysis of underground aquifers in a mine in Xuzhou [10–12]. The conventional hydrochemistry of four water-bearing subsystems in Yaoqiao Mine, Xuzhou, was studied [13–16]. A systematic study on the hydrochemical characteristics of groundwater in the Ordovician karst aquifer in the middle part of the Taihang Mountains was made [17–20]. The Chongqing Research Institute of China Coal Science and Technology Group, Beijing Huaan Auto, and Wuhan Dida Huarui have carried out relevant research on water quality analysis technology and equipment and have applied it in various coal mine groups [21–29].

However, there are still some problems in the current underground ion electrode monitoring, such as inaccurate measurement and repeated calibration during use, which cannot meet the needs of online identification of water sources. It is urgent to develop a new type of online water quality analysis sensors.

#### 2. Hyperspectral Experimental Determination of Common Ions in Mine Water

The purpose of the experimental test is to find the hyperspectral characteristic band of the liquid related to the coal mine. The experimental spectral acquisition equipment is a self-made spectral probe, and the experimental measurement process is composed of three parts of spectrometer calibration, standard solution production, spectral measurement, and accuracy evaluation [30].

Five reagents, NaCl, KCl, CaCl_{2}, NaHCO_{3}, and pH buffer, were measured to indicate Na^{+}, K^{+}, Ca^{2+}, Cl^{−}, HCO_{3}^{−}, and pH ion information, wherein the potassium ion and the chloride ion are indicated by KCl standard solution for a set of data (see Table 1 for details) [31–33]. Before measurement, the mother liquor is diluted with deionized water, and according to the test requirements, the sodium ion, potassium ion, chloride ion, and calcium ion dilution levels are 10, 50, 100, 500, 1000, and 10000 mg/L, the carbonate dilution levels are 0.44, 2.2, 4.4, 22, 44, and 440 mg/L, and the pH dilution levels are 4, 6.86, and 9.18. According to the order of KCl, NaCl, pH, NaHCO_{3}, CaCl_{2}, pure water, empty barrel, and green plants, 8 kinds of targets were measured, totaling 2220 hyperspectral data. Figure 1 shows the number of spectra of various standard solutions.

#### 3. Ion Hyperspectral Data Preprocessing and Sensitive Band Selection

We carry out spectral quality evaluation on all obtained spectral data and select qualified spectral data [34–37]. At the same time, due to the influence of the external environment, there are many “burr” noises on the spectral curve, so it is necessary to reduce the noise on the spectral curve after smoothing and filtering. In this study, Savitzky–Golay convolution smoothing was used to smooth and denoise the original spectral data of each ion. The value of the spectrum after Savitzky–Golay smoothing at wavelength I is

In the formula, is the smoothed value at the wavelength *I*, is the value before smoothing, *m* is the number of smoothing windows on the wavelength side, *N* is the normalization index, and is the smoothing coefficient, which can be obtained by polynomial fitting.

After smooth denoising pretreatment, the relationship between the spectrum and the concentration of each reagent is evident. Compared with the spectral data of “pure water + gradient” concentration, KCl, NaCl, pH, NaHCO_{3}, and CaCl_{2} have obvious sensitive bands and rules. The higher the concentration of KCl, NaCl, and CaCl_{2}, the lower the overall reflectivity, which should be the mechanism under the action of Cl^{−}. The pH data show that the reflectivity of pure water and acidic liquid is in the middle. The reflectivity of neutral liquid is low and that of alkaline liquid is the highest. As a whole, the higher the concentration of NaHCO_{3}, the higher the reflectivity. Figure 2 shows the comparison of the KCl, NaCl, and pH spectral data before and after denoising, while Figure 3 shows the comparison of the spectral data of NaHCO_{3}, CaCl_{2}, and pure water before and after denoising.

#### 4. Establishment of the Quantitative Inversion Prediction Model for Ion Hyperspectral Data

The mine water is a complex system composed of various chemical ions in the water. In this study, the principal component regression (PCR) method is used to establish the quantitative inversion model, which is based on principal component analysis (PCA) [38–46]. PCA is a multiple collinearity regression analysis method. The principle is that after the multicollinearity in the regression model is eliminated by the principal component analysis method, the principal component variables are used as independent variables for regression analysis, and then, the original variables are substituted back into the new model according to the score coefficient matrix.

The basic steps of PCA are as follows:(1)The aim is to acquire a principal component of independent variable data through principal component analysis and select a principal component subset through standardized classification.(2)The principal component obtained in step (1) is used as a new independent variable, and an estimated regression coefficient vector is obtained through linear regression analysis (the dimension is equal to the number of the selected principal components).(3)We transform the regression coefficient vector into the proportion of the actual independent variables and use the selected PCA load (corresponding to the eigenvector of the selected principal component) to obtain the final PCR estimator (dimension equal to the total number of independent variables) for estimating the regression coefficients.

For model evaluation, cross-validation was used to evaluate the model, and determination coefficients (*R*^{2} and root mean of squared error (RMSE) were selected. The RMSE and relative percent deviation (RPD) were used as evaluation indexes. When the *R*^{2} value of the calculated validation set is closer to 1, the RMSE value is lower, and when the RPD value is closer to 2, the model is more stable, the accuracy is higher, and the model is better. When *R*^{2} is less than 0.50 and RPD is less than 1.40, the estimation ability of the model to the sample is poor, and the model is not available; 0.50 < *R*^{2} < 0.75 and 1.40 < RPD < 2.00, the estimation ability of the model to the sample is improved, but only rough estimation can be made, and the model is available. When *R*^{2} > 0.75 and RPD > 2.00, the model accuracy is high, the model is good, and the calculation formula is

In the formula, represents the measured value of the sample I, represents the predicted value of the sample I, represents the mean of all samples, *n* is the number of samples, and SD is the standard deviation of the measured values of the validation set samples.

##### 4.1. KCl Content Spectral Prediction Modeling

382 standard solution spectral data were selected, the largest 7 principal components were selected, and the weights were set equally [47–51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and modeling set was 0.70. The first three principal components can represent more than 80% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.908, and in the prediction set, the coefficient reaches R2 which reaches 0.907, and RPD is up to 2.7. In the process of computational modeling, the importance of all sample points and the samples collected in the middle section play a greater role. Figure 4 shows the KCl principal component results, Figure 5 shows the comparison between the KCl actual measurement set and prediction sets, and Figure 6 shows the role of sample points in the calculation of KCl content.

##### 4.2. NaCl Content Spectral Prediction Modeling

Three hundred and ninety-nine standard solution spectral data were selected, the largest seven principal components were selected, and the weights were set equally [47–51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. The first three principal components can represent more than 90% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.958, and in the prediction set, the coefficient reaches R2 which reaches 0.957, and RPD is up to 3.1. In the process of computational modeling, the importance of all sample points and the samples collected in the middle section play a greater role. Figure 7 shows the NaCl principal component results, Figure 8 shows the comparison between measured and predicted NaCl sets, and Figure 9 shows the role of sample points in the calculation of NaCl content.

##### 4.3. pH Content Spectral Prediction Modeling

240 spectral data of standard solution were selected, the largest 7 principal components were selected, and the weights were set equally [47–51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. The first three principal components can represent more than 85% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.791, and in the prediction set, the coefficient reaches R2 which reaches 0.785, and RPD is up to 2.1. In the process of calculation and modeling, the importance of all sample points and the samples collected in the previous section play a greater role. Figure 10 shows the pH principal component results, Figure 11 shows the comparison between measured and predicted pH sets, and Figure 12 shows the role of sample points in the calculation of pH content.

##### 4.4. NaHCO_{3} Content Spectral Prediction Modeling

404 standard solution spectral data were selected, the largest 7 principal components were selected, and the weights were set equally [47–51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. The first three principal components can represent more than 75% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.162, and in the prediction set, the coefficient reaches R2 which reaches 0.137, and RPD is up to 1.2. In the process of computational modeling, the importance of all sample points and the samples collected in the middle and back end play a greater role. Figure 13 shows the NaHCO_{3} principal component results, Figure 14 shows the comparison between measured and predicted NaHCO_{3} sets, and Figure 15 shows the role of sample points in the calculation of NaHCO_{3} content.

##### 4.5. CaCl_{2} Content Spectrum Prediction Modeling

Four hundred and seventeen standard solution spectral data were selected, the largest seven principal components were selected, and the weights were set equally [47–51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. The first three principal components can represent more than 55% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.630, and in the prediction set, the coefficient reaches R2 which reaches 0.622, and RPD is up to 1.7. In the process of computational modeling, the importance of all sample points and the samples collected in the middle section play a greater role. Figure 16 shows the CaCl_{2} principal component results, Figure 17 shows the comparison between measured and predicted sets of CaCl_{2}, Figure 18 shows the role of sample points in the calculation of CaCl_{2} content, and Figure 19 shows the comparison of extraction accuracy of various ions.

#### 5. Conclusion

Through the spectrum analysis of the characteristic ions of the mine water, the principal component regression method is used to carry out the quantitative inversion modeling of various ions, and the five standard solutions of KCl, NaCl, pH, NaHCO_{3}, and CaCl_{2} indicate six ions (KCl includes K ions and Cl ions). The extraction precision of KCl and NaCl is higher than 0.9, followed by pH and CaCl_{2}, the precision is more than 0.6. The extraction precision of HCO_{3} is the lowest, only 0.162. The results show that the hyperspectral method can play a better role in the extraction of K^{+}, Cl^{−}, Na^{+}, Ca^{2+}, and pH. It is difficult to extract HCO_{3}^{−}ions.

#### Data Availability

The dataset can be accessed from the corresponding author upon request.

#### Conflicts of Interest

The author declares no conflicts of interest.

#### Acknowledgments

The project was supported by the “Thirteenth Five-Year Plan,” the Key National R&D Program (Online Dynamic Detection Technology and Equipment of Radio Wave Perspective in Mining Face, 2018YFC0807805).