Abstract

In this paper, we propose a supervised deep learning neural network (D-CNN) approach to predict CO2 adsorption form the textural and compositional features of biomass porous carbon waste and adsorption features. Both the textural and compositional features of biomass porous carbon waste are utilized as inputs for the D-CNN architecture. A deep learning neural network (D-CNN) is proposed to predict the adsorption rate of on zeolites. The adsorbed amount will be classified and predicted by the D-CNN. Three tree machine learning models, namely, gradient decision model (GDM), scalable boosting tree model (SBT), and gradient variant decision tree model (GVD), were fused. A feature importance metric was proposed using feature permutation, and the effect of each feature on the target output variable was investigated. The important extracted features from the three employed model were fused and used as the fusion feature set in our proposed model: fusion matrix deep learning model (FMDL). A dataset of 1400 data items, on adsorbent type and various adsorption pressure, is used as inputs for the D-CNN model. Comparison of the proposed model is done against the three tree models, which utilizes a single training layer. The error measure of the D-CNN and the tree model architectures utilize the mean square error confirming the efficiency of 0.00003 for our model, 0.00062 for the SBT, 0.00091 for the GDM, and 0.00098 for the GVD, after 150 epochs. The produced weight matrix was able to predict the adsorption under diverse process settings with high accuracy of 96.4%.

1. Introduction

The introduction of machine and deep learning models started in 1980s [15], and machine learning models have been at the front position for intelligent computer models for prediction and classification. Deep learning is a type of machine learning with more depth for feature extraction. Deep learning, with its dependence on feature extraction instead of clear training to achieve optimal solutions, accomplishes high performance in stochastic settings [6] and is thus associated in a way to simulate the inference thinking found in human brains, using supervised learning process [7]. Many scientists have studied machine learning and their mathematical and stochastic nature for numerous applications. This paper utilizes the mathematical nature of machine learning model in an organic application of adsorption prediction. It first studies existing models of deep learning in such discipline.

As an advance of machine learning models, deep learning is distinguished from machine learning because it extracts deep features through unsupervised learning iteratively. Such feature extraction guarantees model independence from human control (human feature engineering). Also, deep learning model accuracy is powered by the existence of adequate data for training and assembles significant information and data correlations [8, 9].

Carbon capture is employed as an essential tool for dropping emission rate [1014], as concentration endures a steady increase. In [10], the authors concluded that process of capture is considered an expensive operation, accounting for more than 60% of the total capture and storing process cost. In [11], postcombustion CO2 is considered as the main method for capturing from manufacturing emission sources and also considered a cost-effective technique, but due to, its low concentration with less than 20%, the key challenge of this operation is to develop of a cost-effective capture process. Absorption operation named regenerative amine solvent operation, for capture, is not cost-effective and has high degree of corrosion, degradation loss, and toxicity [12]. Development of nonexpensive membranes with carbon permeability for capture from gas is under consideration in the research community. In [13], authors proposed a solid porous carbon adsorption for second-generation carbon dioxide capture process. It is characterized with low cost, regulated pore arrangement, with low energy requirement. Also, biomass waste has encouraging, nonexpensive, and plentiful sources for manufacturing porous carbon adsorbents. Biowaste porous carbons are compounds that are extensively employed in ecological discard control and carbon emission. In [14], Biowaste porous carbon process (BWPC) for capture can alleviate the ecological pollution triggered by biowaste management and can attain decarbonization and for climate change diminution.

The adsorption of BWPC at various temperatures has been studied for explicating the thermodynamic features of the adsorption operations and managing the adsorption operation optimization. Thermodynamic features such as entropy and isostatic heat emission specify that adsorption on solid adsorbents is controlled by physio-sorption [11, 12]. adsorption improvement can be achieved using carbon selectivity and heteroatom [13, 14].

Deep learning model (DLM) is an intelligent model that undergoes training phase to accomplish classification from input data such as images with high precision. DLM is trained by utilizing supervised learning from labeled dataset [15]. Machine learning and deep learning in many adsorption fields, such as waste-to-energy transfiguration [1620], compound sorption [21], and biowaste treatment [22, 23] has received intensive interest. Machine learning models comprise linear regression, -nearest neighbors, support vector machines, neural networks, and deep learning models. A deep learning model has more layers than convolutional neural network (CNN) with deeper layers for higher classification accuracy [9]. Tree-based classification models are a type of supervised machine learning model, that utilize recursive data splitting in a binary manner by minimizing the mean sum of squares. Some of the standard models comprise decision trees, gradient decision models, random forest, and gradient boost models. Classification tree-based models are characterized by the capacity to work with small-size datasets, with less overfitting, and noisy counterpart’s resistant features [2427].

In this paper, we propose a data-driven model of mapping the adsorption by BWPC based on textural and compositional properties and adsorption parameters. Mapping utilizes the adsorption pressure and temperature conduct of the adsorption process. The core idea of the proposed research is to expose that machine learning techniques for projecting models and utilized to observe appreciated insights into the capture. With this purpose, three tree machine learning methods, namely, gradient decision model (GDM), scalable boosting tree model (SBT), and gradient variant decision tree model (GVD), were employed and validated for the prediction accuracy for capture. Using tree models, a feature metric was proposed using feature permutation, and the effect of each feature on the target output variable was investigated. The important extracted features from the three employed model were fused and used as the fusion feature set in our proposed model: fusion matrix deep learning model (FMDL).

Table 1 depicts different gas adsorption simulation deep learning prediction models.

In this paper, adsorption data are used as input for the training layers and the adsorption rate is predicted by the D-CNN. Using the D-CNN model, adsorption of on zeolite is predicted. This research employs the deep learning CNN to predict the amount of adsorption.

This paper is structured as follows: Section 2 proposes the new methodology. Section 3 presents the data collection and statistics and. In Section 4, experimental results are demonstrated. Section 5 depicts a comparative study and result discussion. Section 6 depicts conclusions of the proposed work.

2. Model Description

Feature fusion process enhances neural network prediction performances after getting rid of redundant properties in the datasets. In our proposed model, we initiate feature fusion process and then we enter the learning process followed by the prediction process. A D-CNN with 12 convolutional layers and a dropping layer incorporated with feature fusion procedure. Accuracy of the D-CNN with different weights is computed and ordered so that the D-CNN can pick the attribute of the highest and the least imperative for each run of the D-CNN learning phase. The algorithm recaps itself to eliminate multiple input attributes. When the D-CNN will not reach a sufficient accuracy as defined in the model, the process will halt, and no more features will be removed. Dropout layer is employed to lessen the overfitting of the input training data. Frame work of the proposed model in Figure 1.

2.1. Model Work Frame

As seen in Figure 2, the work frame for the D-CNN model consists of many phase. The first phase is mainly to identify and gather data items based on the following parameters such as textural composition and adsorption parameters. The next phase is to input these data into the D-CNN network training phase. The input variables range between the values −1 and 1 and normalized according to the following equation:

The indicates normalized data item, indicates the raw data, and and denote the maximum and minimum of .

The proposed model will tune the D-CNN parameters and weights to enhance prediction performance. In the subsequent phase, the validation data is tuned using a dataset partition, and the verification is then executed using the validated inputs. The ratio of the correlation () and the mean square error are employed to evaluate the D-CNN prediction accuracy. The D-CNN model with the highest accuracy is selected and constructed. In various machine learning and regression models, the number of inner layers and the total neurons are accordingly updated. The training model halts when the optimal D-CNN reaches the best assessment for the parameters. The mean square error and are employed as valuation parameters, and the outputs are matched to the data set to decide on the best network.

The model simulation was implemented in the 32-bit Windows-10, using Python 3.6. The experiment was implemented using Keras on the Tensor platform. The model training was also performed on the tensor platform with Keras. The system was programmed using Intel ® Core™ i7-7300 CPU @5.60GHz with 16 M Cache, 32 GB RAM, and GTX 1070 video card. The model was implemented in Python 3.6, employing the Sickie-learn function library [28].

3. Data Collection and Statistics

3.1. Data Collection

Data collection was conducted by collecting data from literature review on BWPC for carbon capture utilizing keywords such as biomass, porous carbon, waste, adsorption, and capture from indexed databases. 1400 data items were collected and utilized in our research [7, 8]. The common features for the dataset of adsorbents BWPC incorporate adsorption capacity, cost-effectiveness, and adsorbent selectivity adsorption kinetics. In our research, we focus on the adsorption capability achieved at various temperatures and air pressures versus the textural features and composition.

The data collection presumes the following assumptions: (1)Screened data were accepted, without bias towards data validity(2)The data were acquired from experiments performed by scientists. Data items that are not listed directly were extracted from listed figures utilizing Plot-Digitizer web crawler to extract the required data and cleaned to circumvent duplicates(3)The selected input features extracted and categorized into classified into three classes: (1) texture features, (2) BWPC compositions, and (3) adsorption properties such as temperature at which adsorption were done(4)The primary texture features of the BWPC incorporate the surface area and the pore volume(5)The secondary texture features include macropore area and volume and the weight content of (6)The uptake rate at various adsorption properties was utilized as the target optimization variable

3.2. Data Preprocessing

The collected data will be transferred into predefined units. At the data cleaning phase, missed data were found to be mostly total pore and macro pore volume. This missing data is due to the variance in the data published and the selection of textural features stated. Most papers described the surface area and total pore and macropore volumes. Therefore, many data items are either area or pore volume, while macropore volume was not listed, and in alternate cases, the area and macropore volume were listed, but total pore volume was not listed. Data cleaning is very crucial for missing data reproach of total pore volume and macropore volume using machine learning. Data cleaning techniques are performed to evade removal of tuples with missing attributes.

The linear correlation between the inputs is formulated using Pearson coefficient (𝛙) as depicted in the following equation:

where 𝛙 is the Pearson coefficient for the feature and the target , whereas and are the average values of the input and output , respectively. The value of 𝛙 is in the range {−1, 1}.

3.3. The Collected Dataset (DS)

The processed 1400 data items are exposed to several training phases by dividing the original dataset into random training set and another test set. 80% of the data items are labeled and utilized for the learning data, and the 20% partition are used as testing for the supervised models. Crossvalidation process is utilized to tune the parameters to concurrently enhance the model’s prediction accuracy using -fold [19]. In the -fold validation process, the data will be partitioned into -folds to be used for every training iteration, while one of them will be employed for validation. This aids in solving the overfitting challenge to solve the bias problem in machine learning model. We utilized different values of (5, 7, 10, 12) and tuned the model to the best value as depicted in Table 2. is found to be the best one in validation. The training dataset has 1120 data items (80% of 1400 total items). Therefore, using generates 8 partitions of 160 data items (Table 2).

3.4. Parameter Tuning

Three tree machine learning models were employed and fused to predict adsorption on BWPCs. Recent research [34-39] have revealed the fitness of tree machine learning models in small size datasets with less than 1200 data items when the number of selected features is between 5 and 20 features. Such data are usually collected from experimental research for midsize data.

The gradient decision model (GDM) is an ensemble learning model that syndicates several connected sequence decision trees [30]. Decision trees are considered weak learners. But with incorporation of gradient decision models, by totaling sequence decision trees in an ensemble will induce learning boosting. Each sequence decision subtree optimizes the solutions from the previous subtree, and the boosting algorithm induces high efficiency of the gradient decision model. The scalable boosting tree model (SBT) is a scalable decision tree with a trailing gradient decision algorithm, using many decision trees and biased quantile search to execute distributed computing. GVD is gradient variant decision tree model that utilizes fast feature bundling to enhance efficiency without compromising the accuracy. Then, as a summary, the three employed tree models are as follows: the gradient decision model (GDM), the scalable boosting tree model (SBT), and gradient variant decision tree model (GVD).

Parameter adjusting is the procedure of extracting parameters to attenuate the performance. Parameters are usually tuned in the training phase. Parameter tunings are performed using grid search and Bayesian inference [31]. In our research, multiple parameter tuning procedures are performed and tested, and then best parameters are selected with the highest accuracy. In the propose model, the grid search technique was performed for parameter tuning yielding a small set of input features (seven features).

3.4.1. Metrics

The performance metrics of the regression techniques are mostly use linear dependency metric () and the mean absolute error (MAE) [32, 33]. The greater the and the lesser the MAE, the higher the accuracy, as depicted in the following equations:

where pi and ti are the predicted and the ground truth, respectively. is the number of various classes. is the average values of the predicted classes, and is the number of the data items.

3.4.2. Feature Importance

Decision tree models face the challenge of computing the significance of an input feature and its effect on the output accuracy. Permutation mean decline accuracy (PMD) is a method utilized to identify the significance of each feature for tree prediction models by computing the changes in the model prediction performance when an input feature is used or not used. PMD is the average drop in the Gini score, which computes the contribution of each independent variable to the homogeneity of the tree nodes [34]. The greater the value of the PMD score, the significance of the input in the prediction. The model is fit for identifying the significance when the permutation count of the features is reasonable or the computation will be resource and time intensive. Our proposed model utilizes seven input features; the permutation model is utilized to compute the feature ensemble importance. Permutation significance accuracy can identify the respective importance of an input feature to the prediction model as a whole and the effect of the input on the target dependent variable. The PMD is a regression formula that lowers the effect of the inputs on the prediction of the deep learning model, with the exception of the single input of interest, thus denoting sensitivity analysis. Thus, by disregarding the impact of one independent input feature is attained. In our model, each single feature is used at an instance of time to measure its impact on the target output for each data item in the dataset. We should note that the GDM utilizes global and local estimates, and local sensitivity study from the PMD was utilized in this research. PMD allows the effect of the inputs on the output variables by computing their impact across individual data items.

3.4.3. Data Statistics

Analysis of the features and the target dependent variable is performed using raw data; for instance, the dataset consists of 1400 data items [7, 8] that were analyzed statistically, realizing the minimum and maximum values as well as the mean of the input values. The target dependent variable is also analyzed to gain insights. Figures 35 depict the representation of the data statistics for the inputs and the target output. The average values of adsorbed on the porous biomass for the data composed was , with adsorption of maximum 8.23 mmol/g at 0°C and 1 bar, and minimum of 0.24 mmol/g at 0.13 bar. The surface area is the texture feature that was described for the dataset. The reported surface area (SA) varies from 800 to 2436 m2/g, with an average of . The average values for pore volume (PV) are 0.77 with standard deviation 0.46 while macropore volume (MV) has an average of 0.52 with standard deviation of 0.49 cm3/g. The results summaries are depicted in Figures 35. The SA and PV were considerably impacted by the carbonation treatments. The authors in [35], produced coconut shell porous for adsorption using KOH galvanization. The outcomes depicted that the SA ranges from 880 to 2687 m2/g, and the pore volume ranges from 0.378 to 1.329 cm3/g. The greatest uptake of 4.257 mmol/g at 26°C and 1 bar was produced with SA of 1478 m2/g and a pore volume of 0.67 cm3/g. This results infer that no simple path was available to produce the best porous carbons for capture from biomass wastes.

4. Experiments

The three different feature extraction algorithms that were utilized to get best features are the gradient decision models, random forest, and gradient boost models, and the texture type, composition, and adsorption parameters are utilized as input data. The amount of adsorbed is the target output data. The experimental dataset (1400 data items) was distributed into training subset (85%) and validation subset (15%) at random. By measuring different parameters, the D-CNN activation functions are reached. The CPU utilized in the executing the experiment is an Intel i7-8200 CPU @ 3.30 GHz, and the memory is 16.00 GB. We used MATLAB R3012a. The experiments were executed 200 runs, and we took the average for the results.

4.1. Neuron Selection

A number of activation functions have been employed in this research. Sigmoid function is utilized for the hidden layers, and Purlin transfer is utilized for the output layers. The least mean square error function value and the maximum correlation () value are utilized to compute the optimum number of neurons that ranges from 1 to 80, as depicted in Figure 6. The adsorption parameters (pressure and temperature) are selected using the Bayesian ordered function with the least mean square error values as depicted in Table 3.

4.2. D-CNN Architecture for Adsorption

Both the machine learning and the Bayesian regression models have several hidden layers of 20 and 15 neurons and one layer of 65 neurons. The D-CNN consists of input, hidden, and output layers. The input layer utilizes the input data for adsorption parameters and other inputs. The hidden layers are selected by the precision required. The hidden layers in this research are set to three layers which achieve suitable precision. The neurons in the hidden layers are set to 15 or 20. The Sigmoid function is employed in these layers. The output layer utilizes a linear transfer function, namely, Purlin.

The parameter for the D-CNN model is depicted in Figure 7 and is nearly one (). Figure 7 exhibits that the D-CNN model outputs and the labeled adsorption amount from the benchmark dataset have exactly value of correlation.

The machine learning model with the Bayesian regression technique for the adsorption procedure is fitted to normalize the impact. These results indicate that the predicted D-CNN model fit the labeled adsorption dataset concisely. The regression correlation coefficient () of the proposed model is 0.99989 and 0.99784, respectively. We can denote that the developed D-CNN offers a prediction accuracy that is consistent with the benchmark labeled dataset. Based on the experimental results, the proposed model avoided being caught in local optima by altering the radial basis range function. The results have also shown that the RBF model can perform the same functions as the MLP model in most datasets.

To study the relationship of the adsorption parameters (i.e., pressure, temperature, and adsorbents amount) and to identify each variable impact on the adsorption amount, charts of a response surface for the proposed model predicted versus the labeled benchmark data are depicted in Figure 8. The average values of adsorbed on the porous biomass for the data composed were , with adsorption of maximum 8.23 mmol/g at 0°C and 1 bar, and minimum of 0.24 mmol/g at 25°C and 0.13 bar. The surface area is the texture feature that was described for the dataset. The reported surface area (SA) varies from 800 to 2436 m2/g, with an average of . The average values for pore volume (PV) are 0.77 with standard deviation 0.46 while macropore volume (MV) has an average of 0.52 with standard deviation of 0.49 cm3/g. The result summaries are depicted in Figure 8. The SA and PV were considerably impacted by the carbonation treatments. The authors in [38-41] produced coconut shell porous for adsorption using KOH galvanization. The outcomes depicted that the SA ranges from 880 to 2687 m2/g and the pores volume ranges from 0.378 to 1.329 cm3/g. The greatest uptake of 4.257 mmol/g at 26°C and 1 bar was produced with SA of 1478 m2/g and a pore volume of 0.67 cm3/g. This results infer that no simple path was available to produce the best porous carbons for capture from biomass wastes.

The proposed model underwent experiments and the results are summarized using the optimized D-CNN weights for adsorption rate prediction. The results are summarized in Table 4.

In the linear dependency correlation between the input independent variables, a high positive correlation was perceived in the textural features including the surface area, prone, and macroprone volume. The Pearson coefficient for those variables is greater than 0.736, indicating a high correlation. Only textural properties have high correlation but no substantial correlation was perceived for the other input variables with Pearson coefficient between −0.5 and 0.5. Table 3 represents the Pearson coefficient matrix. The shortage of a correlation between the inputs aided in maintaining all of them for constructing the prediction method, as every single feature contributes independently to the prediction. There is a high correlation between the textural features, and this set of inputs contained a large portion of the missing information in the raw data.

The overfitting is countered as the 7-fold validation is reached in terms of and thus enhancing model generalization as depicted in Table 5.

Figure 9 displays the joint plots of the actual cases versus predicted cases of CO2 adsorption, as computed by the three tree models. Although the GDM and SBT presented analogous performances in the training and crossvalidation, GDM outperformed SBT within higher test at 0.85 and 0.78 and lesser MAE at 0.63 and 0.70. These experimental results prove that all the models have similar performances. In general, the GDM outperformed the SBT and GVD with less overfitting yielding the minimum , proving the generalization competence. Our fusion matrix deep learning model outperforms the three models when they act separately.

4.3. Feature Analysis

The permutation mean decline accuracy (PMD) is utilized to define the impact of the inputs, which contained the compositions and textural features versus the adsorption parameters on the output target (CO2 adsorption rate). This study was performed for the GDM model, which is indicated as the best model in performance in our research. Figure 10 depicts the impact of each input on the target output. The experiment illustrates the permutation importance for each independent variable on the output. A high permutation importance value for the input indicates a reduction in the model accuracy when the factor is not employed. Thus, a factor with a high permutation importance specifies a weighty impact on the accuracy.

5. Comparison Study and Discussion

5.1. Comparison Study

We conducted a comparative study of our model versus similar machine and deep learning published models. We completed the comparison on the collected dataset (DS). The comparison is portrayed in Table 6 of the recall, precision, and -measure metrics. The results designate that our adsorption prediction model outperforms other models in adsorption precision. Our system demonstrated enhancement of 6.59%, 6.59%, and 4.3%, in the -measure w.r.t. the other state of the art adsorption prediction model. It is found that our model’s performance is greater than peer models.

Table 7 demonstrates the statistical metrics for the compared adsorption prediction models. Table 8 portrays the confusion matrix of the accuracy, specificity, and sensitivity for the compared adsorption prediction models for three adsorption states (high, moderate, and low) for temperature 25°C and pressure of 1 bar.

5.2. Discussion

In the experimental study, we applied accuracy, specificity, and sensitivity metrics. The results demonstrate that by employing feature fusion, all prediction models including our proposed model were more able in terms of accuracy with respect to the same classifiers without feature fusion. The best accuracy level for adsorption detection was attained by our proposed D-CNN classifier, which gained 98% accuracy outperforming other classifiers by about 6%.

The experiments with feature fusion designated that suitable feature space fusion can enhance the results by a realistic margin. Accuracy outcomes of these cases are represented in Figure 11.

Figure 12 exhibits correctly predicted cases and incorrectly predicted cases. The results display upgrading with feature fusion. The mean square error stayed less considerably when using feature fusion. The Kappa metric for the all compared prediction model also was better when feature fusion was incorporated. This suggests that feature fusion upsurges accuracy because it fuses all relevant features. Of all the compared models, our model attained the maximum improvement with feature fusion.

Table 9 depicts the comparison study of execution time of classifying CO2 adsorption with the same training dataset. Our model with feature fusion has the least prediction time (in contrary to training time because of more features incorporated in training). Model 1 is the next in prediction time with feature fusion still slower than our model by an order 2.

6. Conclusions

In this paper, we employed a supervised deep learning model for adsorption prediction form fused adsorption features. Both the textural and compositional features of biomass porous carbon waste are utilized as inputs for the D-CNN architecture. The deep learning neural network (D-CNN) predicts the adsorption rate of on zeolites. 1400 data items of different adsorbent rates and adsorption pressure is built and used as inputs for the D-CNN model. The adsorbed rate will be classified and predicted by the D-CNN. The correlation () for the deep learning model and Bayesian model models was 0.9998 and 0.9978, respectively. The produced weight matrix was able to predict the adsorption under diverse process settings with high accuracy of 96.4%.

The permutation importance of fused features yields the following observations: the pressure and temperature are the weightiest parameters impacting the model prediction accuracy. The textural features (SA, TV, MV) in the order of declining precedence and the last important are the compositional factors on the feature importance analysis, and the significance of the features was individually observed and categorized into three classes, with the adsorption being the utmost important one.

A comparison of our model versus deep learning published models (recall, precision and F-measure, execution time) was performed. The comparison results indicate that our adsorption prediction D-CNN model is better than other models in adsorption precision. Our system demonstrated enhancement of 6.59%, 6.59%, and 4.3%, in the -measure versus the other models. Our model exhibits fast computational time with an average execution time of 4.2 seconds which is better than all other models by half the time at least.

These results indicate that adsorption parameters highly impacted the CO2 adsorption rate. For example, we will find a decrease in the CO2 adsorption with the more temperature and less pressure.

Data Availability

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflict of interest to report regarding the present study.

Acknowledgments

This research was funded by Princess Nourah Bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R113), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.