Abstract

Efficient, comprehensive, continuous, and accurate monitoring of organic pollution in lakes can provide a reliable basis for water quality assessment and water pollution prevention This paper takes Dianchi Lake as the research object, aiming at the four important water quality indexes of permanganate index (COD), dissolved oxygen (DO), hydrogen ion (pH), and ammonia nitrogen (NH3-N); based on the correlation analysis of Landsat 8 data and measured water quality data, an inversion model is constructed to obtain the spatial distribution of the four indexes. The results show that the relative errors of permanganate index (COD) in neural network and multiple regression are 9.68% and 17.48%, respectively; 3.81% and 3.36% in dissolved oxygen (DO); 1.25% and 1.58% in hydrogen ion (pH); in ammonia nitrogen (NH3-N), it is 15.39% and 24.97%, respectively. The lowest COD in the study area is 6.2 mg/L and the highest is 9.8 mg/L; in 2018, the DO is 5.81 mg/L at the lowest and 9.05 mg/L at the highest; the lowest pH is 5.9 mg/L, the highest is 8.54 mg/L, and the lowest NH3-N is 0.22 mg/L, the highest is 0.41 mg/L. The inversion results of the overall pollutant concentration in the study area are consistent with the actual situation, with only some slight deviations in some areas. The two inversion models can effectively monitor the water quality and spatial distribution of Dianchi Lake. The remote sensing inversion model of water quality has the value of in-depth research and promotion.

1. Introduction

At present, industrial production, livestock and poultry breeding, and human daily life have produced a large number of organic pollutants, which have seriously polluted the surrounding water bodies and endangered aquatic organisms. With the gradual amplification of the food chain, it has gradually affected human health [14]. Therefore, it is of great significance for water quality evaluation and inland water pollution prevention to grasp the pollution degree of organic pollutants in time and accurately.

The traditional water resources monitoring is mainly carried out by manual sampling and testing, which is mainly based on fixed-point and profile sampling. The method has a large workload, a high cost and is time-consuming and laborious. With the development of earth observation technology, remote sensing inversion has gradually become an important way of water resources and water environment monitoring. The basic principle of remote sensing inversion of water quality is to use appropriate remote sensing image bands to build an inversion model for qualitative or quantitative evaluation of water quality. Due to the large coverage of remote sensing, this method is mostly used for water quality inversion monitoring in large-area waters.

With the development of remote sensing technology, the work of using remote sensing to monitor water quality information is gradually carried out. The types of pollutants that can be retrieved from satellite images are also greatly increased, and the retrieval accuracy is continuously improved. The retrieval methods of water quality parameters mainly include an empirical method [5], a semiempirical method [6], and an analytical method [7]. At present, many algorithms have been applied to remotely sense the inversion of water quality parameters [8, 9]. Based on the research on the inversion of surface water environmental parameters from remote sensing data and measured data, the inversion models of water quality parameters such as statistical regression and neural network are established [10]. Deshpande et al. took the Jui dam of Jalna District, Maharashtra as the study area and considered pH, total soluble solids, total hardness, total alkalinity, calcium, magnesium, sodium, potassium, chlorine, sulfate, nitrate, and fluoride. The weighted arithmetic water quality index method is used to find the surface water quality index [11]. Artificial neural network [1216] has the ability of Distributed Association, self-learning, and self-organization. It is often used to mine data relationships and build prediction models. In recent years, it has been widely used in the field of water quality parameter inversion and water quality evaluation. Scholars at home and abroad have made some progress in the application of neural network to water environment: Li et al. combined particle swarm optimization (PSO), chaos theory, adaptive strategy, and back propagation artificial neural network (BP ANN), and proposed a new Weihe River water quality evaluation model [17]. Woo Kim Young et al. used the machine learning (ML) algorithm to extract Chla from MSI. The model interpretation and spatial change of Chla within and between lakes confirmed the effectiveness of lgbm in retrieving the derived Chla from MSI from lakes and estuaries [18]. Chen Jinyue et al. used the machine learning method (GA ANN) combining genetic algorithm and artificial neural network to retrieve Chla concentration [19]. Chen Zhu et al. made correlation analysis between MODIS remote sensing data and measured chlorophyll a concentration [20]. Based on multitemporal remote sensing images and field observation data, Yanhu et al. used back propagation (BP) neural network to establish an inversion model for water quality parameters of inland reservoirs, [21]. Huang Jingjing et al. used machine learning algorithm to establish a remote sensing model for Shenzhen Bay [22].

At present, the research on chlorophyll a concentration, water temperature, suspended solids, salt content, etc. is relatively mature, while the research on other water quality indicators is relatively few [2325]. Chemical oxygen demand (COD) is the amount of oxidant required to oxidize the reducing substances in water samples when measuring water samples by chemical methods [26]. In the research and management of river pollution caused by domestic wastewater and industrial wastewater, chemical oxygen demand (COD) is one of the important indicators that can be measured quickly. Ammonia nitrogen is a nutrient in the water and one of the main oxygen-consuming pollutants in the river. Excessive ammonia nitrogen content will lead to eutrophication of the water body, which is not conducive to the healthy growth of aquatic organisms. Nonaqueous color water quality parameters, such as ammonia nitrogen, hydrogen ion, permanganate index, and dissolved oxygen, are also important indicators to measure water quality. The high concentration of nitrogen and phosphorus in water will directly lead to water eutrophication, which is the main problem in most of the water environment at present.

Multiple regression model and artificial neural network method were used to establish remote sensing inversion models of four water quality parameters: permanganate index (COD), dissolved oxygen (DO), hydrogen ion (pH), and ammonia nitrogen (NH3-N). Through the comparison and analysis of the inversion accuracy between the multiple regression model and the neural network model with parameters such as relative error and correlation coefficient, the inversion of water quality using remote sensing technology has the characteristics of strong information synthesis ability, fast acquisition speed, time saving and labor saving, and can better reflect the spatial distribution characteristics of water quality. Artificial neural network is a large-scale parallel nonlinear dynamic system. Further in-depth analysis of the inversion of water quality parameters, and establishing a rigorous, scientific and efficient water quality monitoring system belonging to Dianchi Lake are required as soon as possible, so as to provide a guarantee for the solution of increasingly frequent water resource shortages, water environment pollution, and flood disasters.

2. Study Area and Data

2.1. Overview of the Study Area

Kunming, Yunnan Province is located at 102° 10′-103° 40′ E and 24° 23′-26° 22′ N. Located in the southwest border of China and the middle of Yunnan Guizhou Plateau, the city center is 1895 meters above sea level, surrounded by mountains on three sides and Dianchi Lake on the south. The Dianchi Lake Reserve in the study area is located in the southwest of Kunming City, Yunnan Province (Figure 1). It is the largest freshwater lake in Yunnan Province, located in the plain area of the central Yunnan basin, with 34 rivers flowing into it. The average elevation of the lake surface is about 1886 m, the average water depth of the whole lake is 5.3 m, the length of the lake shoreline is about 163 km, the Lake area is 309.5 km2 (when the water level is 1887.4 m), and the water storage capacity is 1.56 billion cubic meters. In the north of the lake, there is a ridge across the east and west, which divides the lake into open sea (298.7 km2) and Caohai (10.8 km2). Dianchi Lake is a lake of earthquake fault subsidence type. With low latitude, high altitude, constant temperature, and long water change cycle, the rainfall in Dianchi Lake Basin is divided into rainy season and dry season, where the rainy season is from May to October and the dry season is from November to April. The physical and chemical environment of Dianchi Lake water body is constantly changing with the development of Dianchi Lake pollution, and the ecological environment quality of Dianchi Lake Basin is only at the medium level.

2.2. Data Collection and Processing

On the premise of ensuring data quality, this paper selects Landsat 8 images and Dianchi Lake measured points as the research data. Remote sensing images are from the United States Geological Survey (USGS) (https://glovis.usgs.gov/) The measured data are from China national environmental monitoring station. Ten Landsat 8 images with little or no cloud from 2013–2018 were selected.

In order to objectively and truly reflect the surface information and reduce the errors caused by data factors, the remote sensing image data are preprocessed such as radiometric calibration, atmospheric correction, image clipping, etc. (1) Radiometric calibration: the process of converting the gray value of the remote sensing image recorded by the sensor into the radiance of the entrance pupil of the sensor. (2) Atmospheric correction: using the hyperspectral image header file to obtain various correction parameters required, obtaining the reflectivity through the atmospheric radiative transfer equation, and then using the spectral smoothing technology to remove the noise left in the atmospheric correction process. (3) Crop: removing areas outside the study. When acquiring the emissivity data of the sampling point, due to force majeure factors such as weather and climate, the data will change a little and will need to be calculated so that the data can be used normally.

The measured data used in the experiment come from the observation data of buoy stations along the coast of Dianchi Lake and the results of subsequent processing in the laboratory. The distribution of buoy stations is shown in Figure 1. The stations are evenly distributed in the Dianchi Lake area, thus ensuring the integrity and scientificity of water environment monitoring in the Dianchi Lake Basin. There are 36 groups of measured data (Table 1). After removing some abnormal points, 24 groups of data are selected to establish the inversion model, and the other 8 groups are used to test the accuracy of the model. The inversion models of permanganate (COD), dissolved oxygen (DO), hydrogen ion (pH), and ammonia nitrogen (NH3-N) were constructed by multiple regression and neural network. The accuracy of the two remote sensing inversion models is tested and compared by using the test sample points.

The purpose of water quality remote sensing monitoring is to reverse study the relationship between water quality concentration information contained in water reflection information and remote sensing total radiation information obtained by satellite sensors. If we can make full use of the water body information captured by remote sensing satellites, it must be helpful to improve the accuracy of water quality inversion.

3. Establishment of Remote Sensing Inversion Model for Water Quality Parameters

3.1. Multiple Regression Model

Multiple regression model is a statistical analysis method to study the linear or nonlinear relationship between a dependent variable and one or more independent variables. The regression model determines the causal relationship between variables by specifying dependent variables and independent variables, establishes the regression model, estimates the parameters of the model according to the measured data, and then evaluates whether the regression model can fit the measured data well. Further prediction can be made according to independent variables.

The study of multiple linear regression model belongs to a completely accidental correlation, and there is no physical relationship between Landsat satellite bands and variables. Therefore, although the program is scalable, the results still have limitations, which can be applied to Landsat satellites and lakes in the study.

3.2. Artificial Neural Network Model

The full name of neural network is artificial neural network (ANN). It uses physically realizable devices or computers to simulate some structures and functions of neural networks in biology, and is applied to engineering fields. The focus of neural network is not to use physical devices to completely replicate the neural cell network in the organism, but to extract the available parts to overcome the problems that cannot be solved by the current computer or other systems such as learning, control, recognition, and expert system.

Using the measured water quality parameters and Landsat 8 remote sensing image data, a three-layer neural network model including input layer, hidden layer, and output layer is constructed. The input layer is the single band and band combination data of remote sensing reflectance extracted from remote sensing images, and the output layer is the water quality parameter concentration data. Through experimental comparison and analysis, the number of neurons in the hidden layer is set. The number of neurons in different hidden layers will directly affect the accuracy of the model. After training, the training results of each group tend to be stable and meet the error requirements, and the error performance change, training situation, and linear regression fitting are better. The fitting of each part of the sample is shown in Figure 2. Several groups of highly correlated data are used as the input layer of the neural network, and the measured water quality parameters are used as the output layer. The neural network model with appropriate number of neurons in the hidden layer is selected.

4. Results and Analysis

4.1. Multiple Regression Model Analysis

Analyze the correlation between measured data and single band and band combination of remote sensing images [27], and select the data group with good correlation to build a statistical regression model [28]. Compare and analyze the fitting effects of linear, quadratic polynomial, cubic polynomial, logarithm, and other algorithms. List all the functional relations with large R2 in the models with different factors, and select the fitting model with single band or band combination with high R2. These models are tested with 8 sample data used for testing. Finally, the best multiple regression model is selected according to R2 and error size.

Analyze the correlation between permanganate index (COD), dissolved oxygen (DO), hydrogen ion (pH), and ammonia nitrogen (NH3-N). The bands with high correlation of COD concentration in water are B5, B2, and B4, and the band combination is B2 + B5, B4 + B5, B5/B2, (B2 − B5)/(B2 + B5), B5/(B2 + B5) (Table 2); The bands with high correlation of DO concentration in water are B7 and B1, and the band combination is B1/B7, B1/(B1 + B7), (B1 − B7)/(B1 + B7), B1 − B7, B1 + B7, B1B7 (Table 3); the bands with high correlation of pH concentration in water are B3 and B4, and the band combination is B3 + B4, B3 − B4, B3B4, B3/(B3 + B4) (Table 4); the bands with high correlation of nh3n concentration in water are B6 and B7, and the band combination is B6 + B7 and B6B7 (Table 5). The single band and band combination of these remote sensing images with high correlation with water quality parameters can be used as factors for retrieving water quality parameters and establishing multiple regression models.

According to the band combination factor selected as the model, a multiple regression model between the band combination factor and water quality parameters is established. B2 + B5 is used as the factor for retrieving COD concentration, and the R2 of the linear model is the best, reaching 0.828. The best model of each factor is tested with the data of eight test stations. Finally, according to R, R2, and error, the linear model with B2 + B5 as the factor is selected as the statistical regression model (Table 6). (B1 − B7)/(B1 + B7) is used as the factor for inversion of DO concentration, and the R2 of the cubic polynomial model is the best, reaching 0.820. After comparison and analysis, the linear model with (B1 − B7)/(B1 + B7) as the factor is selected as the statistical regression model (Table 7). B3 + b4 is used as the factor for inversion of pH concentration, and the R2 of the linear model is the best, reaching 0.654. After comparison and analysis, the linear model with B3 + B4 as the factor is selected as the multiple regression model (Table 8). B6 + B7 is used as the factor for inversion of NH3-N concentration, and the R2 of the cubic polynomial model is the best, reaching 0.672. After comparison and analysis, the cubic polynomial model with B6 + B7 as the factor is selected as the multiple regression model (Table 9).

4.2. Analysis of Artificial Neural Network Mode

By analyzing the correlation between single band and band combination and measured water quality parameters, it is concluded that B5, B2, B2 + B5, B4 + B5, B5/B2, (B2 − B5)/(B2 + B5), and B5/(B2 + B5) have high correlation with measured COD concentration, and the model fitting effect is the best when the number of neurons in the hidden layer is 7. The fitting degree of each group of samples is above 0.86, and the model fitting degree is very good (Figure 2(a)). The data of B7, B1, B1/B7, B1/(B1 + B7), (B1 − B7)/(B1 + B7), B1 − B7, B1 + B7, B1B7, and 9 groups are highly correlated with the measured DO concentration, and the model fitting degree is the best when the number of neurons is 8. The fitting degree of samples in each group is above 0.91, and the model fitting degree is good (Figure 2(b)). B3, B4, B3 + B4, B3 − B4, B3B4, B3/(B3 + B4), and 6 groups of data have high correlation with the measured pH concentration, and the model fitting degree is the best when the number of neurons is 7. The fitting degree of each group of samples is above 088, and the model fitting degree is very good (Figure 2(c)). The data of B6, B7, B6 + B7, B6B7, and 4 groups have high correlation with the measured NH3-N concentration, and the model fitting degree is the best when the number of neurons is 6. The fitting degree of samples in each group is above 0.96, and the model fitting degree is very good (Figure 2(d)). Overall, the fitting effect of water quality parameters is good.

4.3. Model Comparison

For better verification, the same test sample for difference and error comparison is selected (Figure 3 and Table 10). In COD (Figure 3(a)), the overall trend of the predicted value of neural network is closer to the measured value than the predicted value of multiple regression. However, the difference of individual points is large. In the average relative error, the neural network is 9.68%, while the multiple regression is 17.48%. The effect of the neural network model is better. In DO (Figure 3(b)), the predicted value of neural network and the predicted value of multiple regression are close to the measured value. In the average relative error, the neural network is 3.31%, while the multiple regression is 3.86%. The effects of neural network and multiple regression models are good. In pH (Figure 3(c)), the predicted value of neural network and multiple regression are close to the measured value. In the average relative error, the neural network is 1.25%, while the multiple regression is 1.58%. The effects of neural network and multiple regression models are good. In NH3-N (Figure 3(d)), the difference between the predicted value of neural network and the predicted value of multiple regression is smaller than the measured value, but the difference between the predicted value and the measured value is larger. The average relative error of neural network is 15.39%, while that of multiple regression is 24.97%. The effect of neural network model and multiple regression model is general.

The analysis results show that neural network has obvious advantages in monitoring the concentration of pollutants in Dianchi Lake. The COD, DO, and pH concentrations retrieved by this model are in good agreement with the measured data, and the ammonia nitrogen concentration is relatively low. Therefore, the applicability of this model to the retrieval of NH3-N concentration in Dianchi Lake is limited.

4.4. Distribution Map of Water Quality Parameters

Based on the neural network inversion model with high accuracy and the data information of each band of Landsat 8 image, the concentrations of four water pollutants were inversed to obtain the spatial distribution of COD, DO, pH, and NH3-N concentrations in the study area in 2013, 2014, and 2018 (Figure 4). The monitoring results show that the COD concentration index shows a downward trend, the small decrease in DO is consistent with the actual situation, the pH and NH3-N index change little, and may be affected by time, water temperature, etc.

Taking the inversion results of water quality in 2018 as an example. The monitoring results show that the lowest COD of each water body in the study area is 6.2 mg/L and the highest is 9.8 mg/L; DO is 5.81 mg/L at the lowest and 9.05 mg/L at the highest; The lowest pH is 5.9 mg/L and the highest is 8.54 mg/L, and the lowest NH3-N is 0.22 mg/L and the highest is 0.41 mg/L. The inversion results of the overall pollutant concentration in the study area are consistent with the actual situation, with only some slight deviations in some areas. The concentrations of COD and pH are relatively high. Generally speaking, the water quality in this section is poor, and there are some problems.

The experimental results show that the water quality of Dianchi Lake is moderately polluted, belonging to the first-class surface water source protection zone. However, when the midstream flows through urban and rural residential areas, the discharge of human daily domestic sewage and industrial sewage will increase accordingly, resulting in an increase in COD content, and then the degree of water pollution. In addition, with the application of nitrogen fertilizer in farmland runoff, the content of NH3-N in the water body will also increase. DO content in water is affected by many factors, such as water temperature, dissolved ions, microorganisms, etc; in eutrophic water, it is mainly controlled by biological processes. Therefore, when the number of algae rises to a certain order of magnitude, the number of algae and the vigorous degree of life activities will inevitably play a leading role in the change of DO in the water body. The pH in the water body is mainly affected by the CO2 content. The rapid consumption of CO2 by algae under photosynthesis increases the pH. It is verified that the actual NH3-N content in the water body in this area is more than 0.3 mg/L, but the spatial distribution of the monitoring results of the pollutant concentration in Dianchi Lake is not consistent with the actual situation: the concentration is low compared with the actual situation. The possible reasons for the monitoring error of pollutant content in some water bodies are as follows: (1) Although the shooting time of remote sensing images and the sampling time of water quality data are the same day, the time points cannot be accurately matched, and there will always be some differences between the time in the morning and the time in the evening. Objectively, there is a gap in the concentration of pollutants in water bodies flowing through residential areas during this period; (2) Errors caused by atmospheric correction; (3) The error of the measured water quality concentration at the sampling point. For the water quality samples at the sampling point, the error generated when measuring the water quality concentration in the laboratory, as a result, the water quality factor content at the measured sampling point is relatively small; (4) As some rivers are relatively narrow, and the images used in this study are Landsat 8 remote sensing images with a resolution of 30m, the number of pixels in the narrow river reach is small, and the distinction between the river and the bank surface features is not obvious, resulting in mixed pixels, which will affect the DN value of the image, resulting in low monitoring results of water pollution concentration. The evaluation results are similar to the water quality status in the water resources bulletin. The neural network model can monitor the concentration of water pollutants with high accuracy.

5. Discussion and Conclusion

Based on the measured water quality data and synchronous image data, the correlation analysis results show that there is an internal relationship between the spectral reflectance and the concentration of organic pollutants, the spectral characteristic bands of four important water quality parameters in Dianchi Lake area are found by using the correlation analysis method. The Landsat 8 data of Landsat series are used for band operation. The single band or band combination image data are correlated with the measured water quality factor data. The band or band combination with high correlation is selected to establish a neural network model for inversion of water quality factors, and the inversion results are compared with the measured data. The results show that it is feasible to use Landsat 8 remote sensing image data to monitor the concentration of water pollutants.

Statistical regression model cannot deal with the nonlinear relationship between data well; Artificial neural network has the ability to deal with complex nonlinear relationships between various interacting variables, and shows good estimation performance Therefore, the development of artificial neural network model will become a research hotspot in the field of pollution in recent years. In addition, the combination of various neural network models and statistical regression models has become an important means of estimating concentration. The addition of neural network models can effectively improve the accuracy of concentration estimation.

The construction of water quality parameter inversion model needs to consider many factors and inaccurate information processing. In view of this situation, the powerful nonlinear fitting ability of neural network can greatly improve the accuracy of remote sensing inversion model of water quality parameters. At present, neural network algorithm has been widely used in the inversion of water quality parameters. Based on the measured data, combined with the multiple regression model and the inversion model of permanganate index (COD), dissolved oxygen (DO), hydrogen ion (pH) and ammonia nitrogen (NH3-N) and suspended solids concentration developed by artificial neural network, the temporal and spatial analysis of the inversion results is briefly analyzed. In addition, for the components of inland water bodies, seasonal and regional differences are large. Therefore, it is still necessary to build local relevant models based on the actual situation to provide reasonable data support for local water pollution prevention and control.

Data Availability

Data used to support the findings of this study are included within the article and additional data are available from the author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

The authors gratefully acknowledge the USGS and the Hydrology and Environmental Monitoring of China for providing data used in this study.