Abstract

This study attempts to examine environmental controls of the underground CO2 concentration, taking the CO2 concentration 4 m beneath the soil as an example. An SVD-PCA-ANN (singular value decomposition-principal component analysis-artificial neural network) preview model is proposed with the data of underground CO2 concentration and 12 environmental variables (the soil and meteorological data). The R2, RMSE, and RPD values of the proposed model are, respectively, 0.8874, 0.3351, and 2.7929, performing better than the popular preview models like SAE (stacked autoencoders), SVM (support vector machine), and LSTM (long short-term memory). It is proved that the underground CO2 concentration can be approximated by a nonlinear function of the considered variables. Soil temperature, salinity, and wind speed are the leading environmental controls, which explain 32.04%, 13.68%, and 11.21% in the variability of the underground CO2 concentration, respectively. Possible mechanisms associated with the environmental controls are also preliminarily discussed.

1. Introduction

In the previous studies of land-atmosphere CO2 exchange, the measurements of CO2 concentration mainly focused on the dynamics of soil surface CO2 concentration, which is characterized as soil CO2 fluxes or soil respiration [15]. However, the soil surface CO2 concentration not only is determined by the atmospheric CO2 concentration but also is potentially affected by the underground CO2 concentration [68].

Especially, some recent studies of desert ecosystems revealed an unneglectable CO2 absorption by the saline-alkali soils [911]. This implied that the underground CO2 concentration in arid regions is closely linked with the soil surface CO2 concentration. However, the controls of soil absorption of CO2 in arid regions were not well-explained until now [1215]. One significant reason is that the CO2 concentration beneath the soil might be influenced by many factors and up to now, and the corresponding theoretical basis has not been well-described [16]. There is a lack of research on the effects of various environmental factors on the underground CO2 concentration in arid regions [17].

In order to further understand the soil absorption of CO2 in arid regions, we will do research on the effects of various environmental factors on the underground CO2 concentration. Objectives of this study are (1) to examine environmental controls of the underground CO2 concentration, taking the CO2 concentration 4 m beneath the soil as an example, (2) to propose a preview model for analyzing the concentration dynamics, and (3) to discuss how environmental factors are influencing the underground CO2 concentration. The preview model will be proposed in Section 2, and in Section 3, we will examine environmental controls of the underground CO2 concentration utilizing the proposed model and discuss the possible mechanisms.

2. Materials and Methods

2.1. Data Collection

All data used in this paper, including the underground CO2 concentration (FX), soil temperature (Ts), soil moisture (Sm), atmospheric temperature (At), atmospheric humidity (Ah), soil salinity (EC), wind speed (WS), air pressure (AP), wind direction (WD), groundwater level (WL), soil alkalinity (pH), atmospheric CO2 concentration (Ca), and rainfall (R), were collected from three automatic weather stations (equipped with 12 sensors for the soil and meteorological data), where FX is the dependent variable and the other 12 environmental factors are independent variables. These three weather stations are located at the south edge of the Gurbantunggut Desert and the north of Xinjiang Uygur Autonomous Region, China, as shown in Figure 1.

2.2. The Proposed Model

Differing from the previous studies on the soil absorption of CO2 in arid regions, we integrate two adaptive methods such as the principal component analysis (PCA) and the artificial neural network (ANN) to examine both the linear and nonlinear relationships between FX and the 12 possible environmental controls. A series of PCA algorithms were proposed in the previous studies, some of which have been widely used in the dimensionality reduction of various data [1823]. The singular value decomposition (SVD) can improve PCA [24, 25]. Hence, we will utilize the SVD-based PCA to examine the contributions of the 12 environmental variables to FX, where SVD is used to calculate the eigenvalue and eigenvector of the covariance matrix [25]. For the statement convenience, we symbolize the proposed model as SVD-PCA-ANN throughout the paper.

We will reconstruct k-dimensional features based on the original n-dimensional features and then map the n-dimensional features to k-dimensional features (known as the main components) [19, 21, 25]. The output of the SVD-based PCA will be input into ANN—a multilayer forward back propagation network with 2 input layers, 4 hidden layers, and 1 output layer. ReLU transfer functions (nonlinear) are selected for the hidden layer and linear transfer functions for the output layer to approximate the nonlinear relationship between the ANN input and output. The structure diagram of the proposed model is shown in Figure 2.

2.3. Examination of the Controls

The following 3 indices are calculated to quantify the robustness of the SVD-PCA-ANN model in previewing the possible environmental controls of the underground CO2 concentration, which are also utilized in the comparison with stacked autoencoders (SAE), support vector machine (SVM), and LSTM (long short-term memory) [2630]. For a reliable comparison, the data set was uniformly divided into 3 subsets in all the experiments, one for training (half of the input data), one for validation (one quarter of the input data), and one for testing (one quarter of the input data).

The ratio of prediction to deviation is as follows:where is the true value, is the previewed value, is the average of the true value, and N is the number of environmental variables.

The pseudocode for the SVD-PCA-ANN model is shown in Figure 3.

3. Experimental Results

3.1. Outputs of the SVD-Based PCA

The calculated contribution ratios of the 12 environmental variables and the explained ratios of the 9 principal components from the SVD-based PCA are shown in Figure 4. The contribution ratios of soil moisture (Sm), soil salinity (EC), wind speed (WS), soil temperature (Ts), groundwater level (WL), wind direction (WD), atmospheric temperature (Ta), atmospheric humidity (Ha), air pressure (AP), atmospheric CO2 concentration (Ca), soil alkalinity (pH), and rainfall (R) to the underground CO2 concentration are 32.04%, 13.68%, 11.21%, 8.38%, 8.31%, 7.87%, 7.21%, 5.22%, 3.46%, 1.23%, 0.82%, and 0.51%, respectively. The explained ratios of the 9 principal components from the SVD-based PCA are, respectively, 28.5%, 15.2%, 11.4%, 9.08%, 7.87%, 7.63%, 6.23%, 5.77%, and 5.05%, respectively.

This suggests that soil moisture, soil salinity, and wind speed are 3 leading drivers of the underground CO2 concentration. The total contribution ratio of these 3 environmental controls to FX is 56.93%. The overall contribution ratio of soil temperature, groundwater level, wind direction, atmospheric temperature, atmospheric humidity, air pressure, atmospheric CO2 concentration, soil alkalinity, and rainfall is 43.07%, among which the contribution ratios of soil alkalinity and rainfall are both less than 1%. According to the outputs of the SVD-based PCA, the 9 principal components can explain 96.73% of changes in the underground CO2 concentration.

3.2. Performance of the SVD-PCA-ANN Model

The original PCA model is linear and simple, allowing a feasible calculation of the contribution ratio of each environmental variable. However, such a pure linear model is not robust enough as a preview model (RMSE = 1.2468; R2 = 0.5028). Since the 9 principal components can explain 96.73% of changes in the underground CO2 concentration, these 9 principal components from the SVD-based PCA can be utilized as the inputs of ANN. This step integrated the advantages of the linear model (PCA) and nonlinear model (ANN). The calculated , RMSE, and RPD of the integrated model (the SVD-PCA-ANN preview model) with the increase in model training epochs are shown in Figures 57, respectively.

According to the calculated , RMSE, and RPD of the SVD-PCA-ANN model, the proposed model displays a good performance with the training, validation, and test data sets. This suggests that the proposed model is robust for predicting the underground CO2 concentration in arid regions.

3.3. Comparison with SAE, SVM, and LSTM

In comparison of the PCA, SAE, and SVD-PCA-ANN, as seen in Table 1, we find that the proposed model performs better than both PCA and SAE on the training, validation, and testing data sets. The SVD-PCA-ANN model can explain more than 88.7% of the variability in the training data set with an accuracy of RMSE = 0.3351, while the SVD-based PCA can only explain 48.9% of the variability in the training data set with an accuracy of RMSE = 1.2330. SAE can explain not more than 1% of the variability in the training data set, but RMSE from the SAE model is a half of RMSE from the PCA model. In explanation of the variability in the validation and testing data sets, the performance of the SVD-PCA-ANN model also indicates a good prediction. SAE explains about 17% and 37% of the variability in the validation and testing data sets, respectively, but the generated RMSE increased. PCA explains about 50% of the variability in the validation and testing data sets with no evident changes in the generated RMSE.

Comparing the performance of the SVM model and the SVD-PCA-ANN model, we also find that the proposed model performs better. The SVD-PCA-ANN model explains about 80% of the variability in the training, validation, and testing data sets with a good accuracy (RMSE <0.43). The SVM model can only explain 12.1%, 48%, and 61% of the variability in the training, validation, and testing data sets, respectively, and the generated RMSE >1. The SVD-PCA-ANN model also performs better than LSTM. As seen in Table 1, LSTM explains about 33.8%, 32.7%, and 27.2% of the variability in the training, validation, and testing data sets, respectively, and the generated RMSE values are obviously bigger than the generated RMSE from SVD-PCA-ANN. The calculated RPD further demonstrates a better prediction of PCA-ANN than SAE, SVM, and LSTM.

4. Discussion

Differing from the variability in soil surface CO2 concentration and the atmospheric CO2 concentration, the dynamics of CO2 concentration beneath the ground might be influenced by many environmental factors and unknown subterranean processes [917]. Until now, the whole story of soil CO2 absorption in arid regions is still a gap in our knowledge [31]. The calculated contribution ratios of the considered 12 environmental variables in the presented study suggest that soil moisture, soil alkalinity, and wind direction are 3 leading controls among the considered 12 environmental variables. Possible mechanisms are as follows. Soil moisture can integrate with CO2 under the condition that soil salinity is high. The integrating processes can be influenced by wind speed [32]. Additionally, our experimental results also imply that the wind direction is less significant than wind speed. Among the other 9 variables, the contribution ratio of groundwater level is only less than soil temperature. This presents new evidence for the necessity to take into account groundwater discharge and recharge as a factor in analyzing the underground CO2 concentration in arid regions [33].

In the present study, we proposed the SVD-PCA-ANN model for predicting the CO2 concentration 4 m beneath the soil. This is a first SVD-PCA-based neural network for learning the underground CO2 concentration in arid regions. Based on all the experimental results in the present study, the major advantage of the proposed method is that SVD-PCA-ANN integrated the linear model PCA and the nonlinear model ANN. However, according to the outputs of the SVD-based PCA, the 9 principal components can only explain 96.73% of changes in the underground CO2 concentration. This represents one limitation of the proposed method. The SVD-PCA-ANN model introduces 9 linear components in predicting the underground CO2 concentration in arid regions to improve the interpretability of traditional neural networks [3436]. This is also the major reason why the proposed method has a better prediction than SAE, SVM, and LSTM. However, there are still 11.3%, 18.1%, and 23.4% in the variability of the training, validation, and testing data sets beyond the explanation of the proposed model. This represents another limitation of the SVD-PCA-ANN model.

A next research priority in the subsequent studies is to break through these two limitations of the SVD-PCA-ANN model for a fully understanding of changes in the underground CO2 concentration in arid regions. To break through such limitations, the first principles to be considered in the subsequent studies include constructing better linear components for the preview model, finding more efficient learning systems, and introducing new environmental factors to reduce the uncertainty in the model analyses.

5. Conclusion

The considered environmental variables, including the soil and meteorological factors, can be recognized as potential controls of the underground CO2 concentration in arid regions, among which soil moisture, soil salinity, and wind speed are leading controls. Experimental results demonstrated that the proposed method performs better than SAE, SVM, and the well-known LSTM on the training, validation, and test data sets. The relationship between the underground CO2 concentration in arid regions and the considered environmental variables cannot be characterized as a single linear function. The SVD-PCA-ANN model can effectively predict the underground CO2 concentration in arid regions by integrating the advantages of both linear and nonlinear models and therefore presents a novel method for studying soil CO2 absorption in arid regions.

Data Availability

The data utilized to support the theory and models of the present study are available from the corresponding authors upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (41571299), the Shanghai High-Level Base-Building Project for Industrial Technology Innovation (1021GN204005-A06), and the Ningbo Natural Science Foundation (2019A610106).