Previous studies have demonstrated CO2 absorption by soils in arid regions, where the absorbed CO2 is conjectured to be finally sequestrated in the “subterranean ocean”—groundwater. This study compares environmental controls of ocean CO2 concentration (surface ocean pCO2) and quasi-ocean CO2 concentration (deep-soil pCO2). We aim to explore the latent relationship, both linear and nonlinear between the environmental variables, and CO2 concentration, utilizing two intelligent algorithms—the partial least linear regression (PLSR) algorithm and the artificial neural network (ANN) algorithm. For quasi-ocean CO2 concentration, RPD <1.4 and R2 <40%. While for ocean CO2 concentration, RPD >1.4 and R2 is 99.7%. Linear relationships between the considered environmental controls and ocean CO2 concentration are proved; however, there is no evident relationship between most of the considered environmental controls and quasi-ocean CO2 concentration. Groundwater level is proved to be a relatively important environmental control on the quasi-ocean CO2 concentration, suggesting groundwater discharge/recharge as a significant modulator of soil CO2 absorption in arid regions.

1. Introduction

During the past decades, emissions of carbon dioxide (CO2) from fossil fuel combustion and deforestation are rapidly increasing the atmospheric concentration of CO2 [15]. This has potentially motivated global warming and reduced the pH of the oceans [6, 7]. The increase in atmospheric CO2 concentration will also resulted in the decrease in marine biodiversity and change in the structure of phytoplankton communities [8]. Scientists around the world are making great efforts to a better control the atmospheric CO2 concentration, and the technology of CO2 sequestration keeps on developing in recent years [9, 10].

Benefiting from rapid development of communications technology, sensing technology, and automation technology, a series of electrochemical and semiconductor sensors have been manufactured and applied for real-time measurement of atmospheric CO2 concentration [11]. The monitoring system of above-ground CO2 concentration was also developed based on the mid-infrared absorption spectroscopy technology and has been widely applied [12]. However, the observations of below-ground CO2 concentration are few conducted [13].

Some previous studies have demonstrated CO2 absorption by soils in arid regions, where the absorbed CO2 is conjectured to have been largely sequestrated in the “subterranean ocean”—groundwater [1215]. Hence, interpretation of the environmental controls of the above-ground CO2 concentration and the below-ground CO2 concentration is both essentially important for understanding the evolution of atmospheric CO2 concentration [16]. Considering the physical and chemical properties of soils in arid regions, many researchers tried to analyze important environmental factors which affect the above-ground CO2 concentration, but environmental controls of below-ground CO2 concentration are still poorly understood [1116].

Taking into account the subterranean runoff, there is strong relationship between the ocean and groundwater. Furthermore, CO2 uptake in ocean and groundwater is also similar. Consequently, we are motivated to analyze and compare the environmental controls of ocean CO2 concentration (surface ocean pCO2) and quasi-ocean CO2 concentration (deep-soil pCO2, i.e., the underground high humidity pCO2).

Objectives of this study are as follows: (1) to present a better understanding of environmental controls on the below-ground CO2 concentration and (2) to find latent factors having not been taken into consideration. For the convenience of the problem formulation and theoretical analyses, we utilized two additive algorithms—partial least linear regression (PLSR, representing a linear approach) and the artificial neural network (ANN, representing a nonlinear approach).

2. Materials and Methods

2.1. Data Collection

Collected data of ocean CO2 concentration for the present study are from the MORB database PetDB (http://www.erathchem.org/petdb). This dataset also includes a corresponding dataset of all the environmental controls, including pH value (on a special scale), TA (total alkalinity, μmol kg−1), salinity, temp (temperature), pA (pH on total scale), fCO2 (fugacity of CO2), HCO3 (the concentration of bicarbonate ion, μmol kg−1), CO3 (the concentration of carbonate ion, μmol kg−1), DIC (total dissolved CO2), OA (omega aragonite), and OC (Omega calcite). The data of quasi-ocean CO2 (deep-soil pCO2, i.e., the underground high humidity pCO2) in the present study, along with environmental data, are collected from three automatic weather stations (located in the 144th regiment, Xinjiang, China), where the environmental variables include WS (wind speed), WD (wind direction), AT (atmospheric temperature), AH (atmospheric humidity), AP (atmospheric pressure), UHH CO2 (underground high humidity CO2) concentration, ST (soil surface temperature), SM (soil moisture), salinity, groundwater level (WL), and pH value.

2.2. PLSR and ANN

In order to investigate relationship between two dependent variables (ocean CO2 concentration and quasi-ocean CO2 concentration) and other 10 independent variables, both linear and nonlinear, we, respectively, use two different algorithms, which are partial least squares regression (PLSR) and artificial neural network (ANN). As a kind of regression modeling method which deals with two groups of multiple correlated variables, PLSR combines the strengths of multivariate linear regression, principal component analysis, and canonical correlation analysis. Besides, for discovering the linear relationships between two groups of multiple correlated variables, PLSR can also be used in feature selection for ANN.

It is well-known that ANN is a ‘Blackbox Model’ [1721]. Even though the predicting results can be better, they are actually hard to explain. Under this circumstance, PLSR can also make up such drawback of ANN for its strong ability to explain. The combination of PLSR and ANN can bring about a more comprehensive exploration on the considering problem. The implement steps of PLSR are shown in Figure 1.

ANN is a system composed of a large number of interconnected processing units which processes nonlinear and adaptive information. Its four characteristics are nonlinear, nonlimiting, abnormal qualitative, and nonconvexity. In the present study, ANN is utilized in exploring a nonlinear relationship because ANN performs better than PLSR on nonlinear database. The implement steps of artificial neural network are shown in Figure 2.

We employed Python 3.7 to function PLSR and ANN, where the corresponding functions are PLSRegression() and MLPRegression(), respectively. Some important tuned parameters of these two functions are listed in Table 1.

Two groups of experiments were conducted (first, regard quasi-ocean CO2 concentration as the dependent variables; second, consider ocean CO2 concentration as the dependent variable), where the important parameter “n_components” is tuned from 0 to 10 in each group of experiment. In each experiment, we split the database into training set (1/3 of the dataset) and testing set (2/3 of the dataset).

Performances of PLSR and ANN are both evaluated by RMSE, RPD, and R2. Let C and C′ be the measured and predicted values, respectively. Let n be the number of observations. RPD is defined as the standard deviation of prediction (SDP) over RMSE, as shown in Table 2.

Finally, for each PLSR and ANN model in the present study, if RPD >2, then we will conclude that the model has a good ability for prediction; if the RPD is less than 1.4, then we will claim that the model is unable to make good estimation. This rule has been proposed and widely recognized in some previous studies [22, 23].

3. The Major Results

3.1. Performance of PLSR and ANN with regard to Quasi-Ocean CO2 Concentration

As what can be seen from Figure 3, the RPD values are all less than 1.4 whichever number of factors we choose. In other words, we cannot get effective information about the crucial factors controlling the quasi-ocean CO2 concentration from the PLSR models. Furthermore, with is only 12.5 for predicting the concentration of quasi-ocean CO2, we concluded that the PLSR model built is not a good predictor.

Considering that the PLSR algorithm can only verify the linear correlation between independent variables and dependent variables, we train artificial neural network algorithms to explore the possible nonlinear correlation between the environmental factors and CO2 concentration.

For ANN models of quasi-ocean CO2 concentration (Figure 4), we added one independent variable one time to the model. The R2 presents an overall upward trend as more factors are taken into consideration. However, the maximum R2 value is about 40%, which is a relatively low value for precise prediction. This means that the nonlinear relationship between quasi-ocean CO2 concentration and other 10 environmental variables is also weak, and the changes of quasi-ocean CO2 concentration cannot be explained by these environment factors.

3.2. Performance of PLSR and ANN with regard to Ocean CO2 Concentration

Different from the performance of PLSR using the data about quasi-ocean CO2, it can be evidently seen from Figure 5 that all the RPD are over 1.4. The positive indicator-RPD experienced a sharp increase while the negative indicator-RMSE decreases sharply, implying that the PLSR model has a good capability to do prediction. Another positive indicator-R2 reaching up to almost 100% (99.7%) shows that the linear relationship between ocean CO2 concentration and the considered environmental variables is strong.

Next, we train ANN models to explore the possible nonlinear correlation between the environmental factors and ocean CO2 concentration. Similar to what we have done in training the ANN algorithm for quasi-ocean CO2 concentration, we still added one independent variable one time to the model. As shown in Figure 6, although R2 of the first five ANN algorithms we trained is less than zero, once variables which are highly linearly correlated to the ocean CO2 concentration are taken into the model, R2 always remains a high level, even reaching up to 100%. The results above show that the variables considered, especially those which have strong linear relationship with the ocean CO2 concentration, also have a strong nonlinear relationship with the mechanism of ocean CO2 concentration changes.

4. Discussion

Both PLSR and ANN model algorithms were developed in their interdisciplinary applications. Bong et al. compared the performance of three kinetic models using the predicted value of the ANN model [24]. Parzhin et al. designed an ANN algorithm for image processing [25]. Ali Sohani et al. conducted a comprehensive parameter study using the ANN model [26]. Ma et al. applied the ANN model in the field of deep reinforcement learning [27]. More than a wide range of applications, the ANN algorithm can also be implemented by multiple forms. The ANN model can also be practiced using MATLAB and FPGA [28].

Since ANN is a ‘Blackbox Model,’ PLSR models were also studied for its strong ability to explain. It is true that ANN models are much more complicated than PLSR models (there are more parameters in ANN models needed to be confirmed compared with PLSR models), but it is also true that ANN models perform better than PLSR models. Farifteh et al. concluded that ANN is superior to PLSR in predicting salt concentration [29]. Same conclusion that ANN is a good predictor compared with PLSR was drawn by Xu et al. [30].

The dynamics of ocean CO2 concentration has attracted enough attention in previous studies, and the employed models and methods also make sense for the present study. Until today, various kinds of machine learning methods including MLR, MNR, PCR, decision tree, SVMs, MPNN, and RFRE have been used to estimate surface ocean pCO2 concentration with a total R2 about 0.95 [31], while the performance of PLSR and ANN in our study was better, R2 being reaching up to 0.997 and 0.982, respectively. Among the machine learning methods mentioned above, RFRE proved to be the best approach [32].

In the process of training an ANN model, we found that with the factor “groundwater level” added to the model, the R2 experienced a more obvious increase (about 10%), in comparison to other factors being added to the model. From this phenomenon, we conjectured that groundwater level, along with fCO2, HCO3, and CO3 in the groundwater, maybe significant environmental controls of quasi-ocean CO2 concentration. If this was true, groundwater discharge/recharge is a significant modulator of soil CO2 absorption in arid regions.

To further improve the robustness of PLSR and ANN models, we should, on the one hand, collect data about additional environmental variables and also take into account of the evolution in groundwater environments [33]. On the other hand, we can conduct reliability analysis of the employed models and methods utilizing the Monte Carlo simulation [34]. Last but not least, there are many improvements that can be done on both PLSR and ANN. In addition to the application of ANN with a single model, ANN could also work well with other algorithms. Hadi et al. combined ANN with MLP [20, 23]. ANN was also applied together with Molecular Dynamic (MD) [21, 35]. Hervice et al. found out that proposed optimal ANN model usually had higher accuracy for prediction [36]. Dynamic changes were also described by the dynamic model-based ANN algorithm [28]. In some previous studies, the SVM algorithm has already been combined with PLSR and ANN, respectively [3748]. Before we fully understood the change mechanism of underground CO2 concentration, all the above regression methods were reasonable based on our current knowledge. In this sense, the further improvements of models and methods also require a development of understanding of the underlining mechanisms for CO2 absorption by saline-alkaline soils.

5. Conclusion

Taking an overview of the performance of all the above models, we can conclude that the environmental controls of quasi-ocean CO2 concentration are still poorly understood. However, the good performance of PLSR and ANN for prediction of ocean CO2 concentration reveals many useful information. The ten environmental variables we took into consideration could not explain the changes of quasi-ocean CO2 concentration well. A next research priority is to investigate the influences of the groundwater level and groundwater chemical properties on the dynamics of the quasi-ocean CO2 concentration.

Data Availability

The data utilized to support the theory and models of the present study are available from the corresponding authors upon request. The set of data related to ocean CO2 is from the MORB database PetDB (http://www.erathchem.org/petdb). And others are collected by the authors’ project.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.


This research was funded by the National Natural Science Foundation of China (41571299), the Shanghai High-Level Base-Building Project for Industrial Technology Innovation (1021GN204005-A06), and the Ningbo Natural Science Foundation (2019A610106).