#### Abstract

Gas sensors have been widely reported for industrial gas detection and monitoring. However, the rapid detection and identification of industrial gases are still a challenge. In this work, we measure four typical industrial gases including CO_{2}, CH_{4}, NH_{3}, and volatile organic compounds (VOCs) based on electronic nose (EN) at different concentrations. To solve the problem of effective classification and identification of different industrial gases, we propose an algorithm based on the selective local linear embedding (SLLE) to reduce the dimensionality and extract the features of high-dimensional data. Combining the Euclidean distance (ED) formula with the proposed algorithm, we can achieve better classification and identification of four kinds of gases. We compared the classification and recognition results of classical principal component analysis (PCA), linear discriminate analysis (LDA), and PCA + LDA algorithms with the proposed SLLE algorithm after selecting the original data and performing feature extraction. The experimental results show that the recognition accuracy rate of the SLLE reaches 91.36%, which is better than the other three algorithms. In addition, the SLLE algorithm provides more efficient and accurate responses to high-dimensional industrial gas data. It can be used in real-time industrial gas detection and monitoring combined with gas sensor networks.

#### 1. Introduction

Industrial gases are widely produced in industrial raw materials manufacturing and result in environmental pollution and disastrous accidents. Therefore, industrial gases need to be rapidly and accurately detected and identified. It is a valuable work. Electronic nose (EN) has been rapidly developed as a novel gas detection technology in the past two decades. It is widely implicated in toxic/harmful gas detection [1], food safety [2], environmental monitoring [3], medical industry [4], and other fields [5–10].

Sensors and pattern recognition methods are hot topics in the field of electronic nose (EN) systems and identification tools. Hossein-Babaei and Hooshyar Zare reported a novel conductive polymer electrochemical sensor for the detection of hexane, benzene, and CO [11].

On the other hand, the pretreatment of odor detection signal, feature extraction, and feature space reconstruction have attracted more and more attention. A common tin dioxide gas sensor to identify complex odors was reported [12], which was dependent on the heating function of the device with a predetermined temperature that affected the voltage pulses to produce different odor signal differences.

Furthermore, the classical linear dimensionality methods such as the principal component analysis (PCA) and the Fisher linear discriminate analysis (FLDA) combined with the extreme learning machine (ELM) were applied to the determination of Chinese liquor quality [13] and fruit freshness [14].

The learning method of artificial neural networks (ANN) has a good effect applied to detection of cigarettes and tea quality and freshness of chicken [15–17]. The electronic nose (EN) system was also used in the diagnosis of respiratory diseases [18] and rapid identification of Chinese herbal medicines [19], and the machine olfactory joint robot is applied to the collection of local odor source positioning [20]. However, the current electronic nose detection and identification of the report mainly focused on qualitative analysis of the material. How to detect and identify the same substance at different concentrations is a higher challenge.

At the same time, when multiple sensor arrays of the electronic nose system are used, the acquired high-dimensional raw data is the reason of “dimensional disaster.” In this study, we propose the selective local linear embedding (SLLE) method to solve the problem of dimensionality reduction and classification of high-dimensional data in industrial gas detection. Comparing with the classification and recognition results of classical PCA, LDA, and PCA + LDA, the experimental results visually demonstrate the features and benefits of the SLLE algorithm. This provides a new method for rapid detection and identification of industrial gases.

#### 2. Experimental and Methods

##### 2.1. Experimental System of Electronic Nose

Industrial gas detection experimental structure is shown in Figure 1. Firstly, the industrial gas sample is stored in the reinforced steel bottle, and the gas releases the gas flow by adjusting the air pressure valve. Then, when the gas flows in the electronic nose sampling interface, the sensor array responds and records the gas response map. The saved response map data will be used for further feature extraction and analysis by the pattern recognition method and finally outputting the identified results.

We used a dynamic cylinder formulation method for standard gas distribution, by controlling the flow rate and time of the release gas, mixed with pure dry air. The samples gas is shown in Table 1.

In this study, we used the PEN3 electronic nose (Germany, AIRSENSE Ltd.) to measure different concentrations of industrial gases. The PEN3 consists of 10 metal oxide sensor arrays, as shown in Table 2.

The electronic nose technique simulated the olfactory function of organism, transformed the multidimensional response signal into the sensory evaluation index value, and completed the intelligent explanation of the qualitative and quantitative analysis results of odor.

The detailed information of the selected electronic nose and its technical specifications is presented in Supplementary Materials Table S1.

We measured the gas samples at room temperature and other room conditions (the temperature of °C, relative humidity of %). The sampling period is 120 seconds and the intake air flows when sampling is set to 7.76 ml/s.

##### 2.2. Sensor Acquisition Signal

We obtained a typical four gas response map and a response to the sensor array when the concentration was 300 ppm via the electronic nose (EN), as shown in Figure 2.

**(a)**

**(b)**

**(c)**

**(d)**

This diagram clearly shows the response values of the 10 sensors for different signal intensities of carbon dioxide (CO_{2}), methane (CH_{4}), ammonia (NH_{3}), and volatile organic compounds (VOCs) at different time periods. We can find that the response values between 50 and 100 seconds show significant differences between the four gases in the response map. It can extract odor information effectively and distinguish four kinds of gases. Therefore, using electronic nose system to build odor libraries of different gases can help to identify multiphase gases and identify complex gases.

##### 2.3. Data Processing Method

###### 2.3.1. An Overview of PCA

Principal component analysis (PCA) is one of the widely used linear dimensionality reduction methods. The key to the kernel is to measure the size of the variance as a measurement of the amount of information that the greater the variance, the more information provided, and vice versa. PCA through the original component of the linear combination of large variance have a large amount of information on the main components, thereby reducing the data dimension. The calculation process of PCA can be realized by matrix singular value decomposition (SVD).

###### 2.3.2. An Overview of LDA

Linear discriminant analysis (LDA) is a generalization of Fisher’s linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier or, more commonly, for dimensionality reduction before the classification.

###### 2.3.3. Selective Local Linear Embedding (SLLE) Method

Local linear embedding (LLE) is a very important dimensionality reduction method. Compared with traditional principal component analysis (PCA) and linear discriminant analysis (LDA), the LLE focuses on the linear features of the local area of the sample when the dimensionality is reduced. It is mainly through complex nonlinear region, divided into multiple small linear local areas. Then by dimensionality reduction, the dimensionality-preserving data contains more linear features of the original data. However, the LLE algorithm has some shortcomings. For example, the LLE algorithm is limited by the unevenness of the training samples and affects the distribution of the feature space after dimensionality reduction.

In this study, selective local linear embedding (SLLE) was proposed to reconstruct a linear combination of neighborhoods in a sample region. Each sample point area can be reconstructed by a linear combination of multiple neighbor points. When the dimensionality reduction of the output is selected, the optimal dimensionality reduction feature space is calculated by calculating the minimum reconstruction error and the weight of the reconstruction sample point in the low-dimensional space, keeping the weight of the local neighborhood unchanged and distributed. The SLLE algorithm and its steps are described in Figure 7.

In Step 1, the space distance between each sample point and other sample points is calculated, and the nearest sample point is used as the nearest neighbor of the sample point. The distance formula is as follows:

In Step 2, the error function is defined as follows:where () is the neighbors of . denotes the weighting coefficient of the th nearest neighbor when reconstructing from the linear combination of nearest neighbors satisfying the following equation:

Then (2) can be equivalent to

Let ; it represents the local reconstruction weight vector for the th sample point. Then

Let , and then

Equation (6) can be solved by LaGrange multiplier method as

Then by taking the partial derivative of on both sides of (7) at the same time we can solve for :

In Step 3, the output dimension matrix * is calculated*. The matrix should satisfy the following conditions:

Then,

Let ; then .

Repeat Step 2 of the solution process, available:

Finally, we select the eigenvalue of the smallest characteristic values of the matrix as the dimensionality reduction output. In order to minimize the loss function value, the eigenvalues of the reconstructed matrix are calculated. The eigenvalues from the** 2**nd to the th index* are selected*, as well as the corresponding eigenvector as the output of the dimensionality reduction.

###### 2.3.4. Euclidean Distance (ED) Formula

In this study, the Euclidean distance (ED) formula was used for classifying the testing samples. The calculation was based on the two-dimensional sample feature space using the following formula:

The absolute distance value is an effective reference for discriminating the classification, where and are the two-dimensional feature distributions of the testing samples; and are the two-dimensional feature distributions of the training samples.

#### 3. Results and Discussion

In order to show the characteristics of the proposed method in a concise and intuitive manner, we select a total of 124 samples (31 groups of each class) of the four measured gases to train the classification. The gas concentration can reach 300 ppm, through the electronic nose measurement. Firstly, we present a detailed classification of four types of sample gas training results, through PCA, LDA, and PCA + LDA before and after feature extraction. Finally, the results of SLLE algorithm training classification presented in this study are compared with the results shown before.

##### 3.1. Classification Results of Previous Feature Extraction

The classifications results of steady-state PCA, steady-state LDA, and steady-state PCA + LDA before the feature extractions are shown in Figures 3(a), 3(b), and 3(c), respectively. We can find that the four gas samples in the graph can not be clearly classified based on the original data. Based on sensor variance analysis, many data points overlap in two-dimensional distribution space.

**(a)**

**(b)**

**(c)**

##### 3.2. Classification Results after Feature Extraction

The classification results show three classification methods after feature extraction in Figure 4. In Figure 4(a), the classification results of the four gas samples are shown. Black, red, green, and blue cross represent CO_{2}, CH_{4}, NH_{3}, and VOCs, respectively. Since the degree of polymerization within the class is not good, the classification of the sample is scattered in the space, and even the samples are wrongly divided. Thus, PCA is not the optimal method for this study.

**(a)**

**(b)**

**(c)**

It clearly shows the classification of four gases through LDA, in Figure 4(b). Simultaneously, we can find that the classification effect is significantly better than PCA. However, due to the limited projection space, we can find that when the sample data are highly similar, the interclass distance of the sample classification is not obvious and prone to misjudgment.

In Figure 4(c), the classification results are processed by PCA + LDA. PCA is used for the first step of the data to be dimensionally reduced, and then the reduced dimension data is calculated by optimizing the classification projection space through LDA. However, due to the high overlap and similarity of the original data samples, we can find that the classification space of the four gas samples has improved but is not optimal from the classification results.

##### 3.3. Classification Results of SLLE

The results of the reduction and classification of the four industrial gases by the SLLE algorithm are shown in Figure 5.

**(a)**

**(b)**

**(c)**

Figures 5(a), 5(b), and 5(c) correspond to the classification results, when the selected sample nearest neighbor number = 10, 20, 30, respectively. We can find that the spatial distribution of the four gas samples is significantly better than that of Figures 3 and 4. Because SLLE algorithm is characterized by the linear dimensionality reduction of high-dimensional complex data, it can maintain its spatial features and facilitate feature reconstruction.

Furthermore, we can classify high-dimensional, nonlinear, and high-similar sample data by selecting the neighborhood of the sample and the dimension of the output sample. However, its flaws are also significant. As the nearest neighbor of the selected sample space increases, its computational complexity is also increased, and the computational complexity is (equivalent to matrix calculation).

##### 3.4. Comparison of Recognition Rate of Four Kinds of Algorithms

The classification training for four sample gases’ data has been presented in Section 3. The test recognition results parameters are shown in Table 3.

In Table 3, the data sets of four gas samples of 168 groups, 124 groups, 95 groups, and 84 groups were trained, and 115 groups, 80 groups, 64 groups, and 81 groups were tested, respectively.

The accuracy of the four gas samples is shown in Figure 6. We can find four samples’ test accuracy from 73.43% to 91.36%. In detail, the accuracy of the SLLE test is close to 90%, and the LDA and PCA + LDA tests accuracy rate is followed. PCA is used to test the accuracy rate of no more than 78%. Therefore, the SLLE algorithm proposed in this paper is best used in the four algorithms when used to test the four kinds of sample gas data.

#### 4. Conclusion

In this paper, we presented the proposed selective local linear embedding (SLLE) algorithm to reduce the dimension of collected gas data for industrial gas samples identification. The extracted high-dimensional data can maintain its original topology and have larger interclass spacing as well as smaller intraclass spacing. Compared to the traditional algorithms such as PCA and LDA, the SLLE performs better in distinguishing between different gases.

Therefore, SLLE algorithm with a low degree of freedom and simple procedure can enhance the accuracy of the selected gas samples detection, which endows higher sensitivity to olfactory machine. The SLLE algorithm is an effective method to quantitatively analyze various mixed gases, and it can meet the requirements of multimode gas-state identification. The outcome of the proposed approach can add more value to applications in the field of industrial gas monitoring and safety.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 61571140), Guangdong Provincial Science and Technology Foundation of China (Grant nos. 2016B030303011, 2017A010101032), Guangdong Provincial Natural Science Foundation (Grant no. 2017A030310071), and Guangzhou Science and Technology Foundation of Guangdong Province (Grant no. 201607010247).

#### Supplementary Materials

Two tables and three figures covering more information on the following subjects are included: the detailed information of the employed PEN3 electronic nose and its technical specifications; full results for low concentration of 10 ppm and 30 ppm, for the selected gases; and classification results for low gas concentration based on Euclidean distance (ED).* (Supplementary Materials)*