Abstract

Fourier-transform infrared (FTIR) spectroscopy is a rapid and nondestructive technology for monitoring atmospheric quality. The identification of each component from the FTIR spectra is a prerequisite for the accurate quantitative analysis of gaseous pollutants. Due to the overlap of different gas absorption peaks and the interference of water vapor in the actual measurement, the existing identification methods of gas spectra have drawbacks of low identification rate and the inability to carry out real-time online analysis in atmospheric quality monitoring. In this work, independent component analysis (ICA) is applied to the spectral separation of heavily overlapped spectra of gaseous pollutants. The proposed method is validated by the analysis of mixture spectra obtained in laboratory and actual atmospheric spectra collected from stationary source. The average time consumption of separation process is less than 0.2 seconds, and the identification rate of experimental gases is up to 100%, as shown by the results of peak searching and the analysis of the correction coefficient between the separated spectra and the standard spectra database. The identification results of actual atmospheric spectra demonstrated that the proposed method can effectively identify the gaseous pollutants whose concentration changes in the measured spectra, and it is a promising qualitative spectral analysis tool that can shorten the identification time, as well as increase the identification rate. Therefore, this method can be a useful alternative to traditional qualitative identification methods for real-time online atmospheric pollutant detection.

1. Introduction

Air is fundamental for living organisms on the Earth, and the quality of the atmospheric environment is closely related to human activities. With the development of modern industrialization and the improvement of human living standards, combustion of fossil fuels and emission of automobile exhaust cause severe atmospheric pollution, which has raised more concern in recent years. Typical gaseous pollutants include nitrogen oxides (NOX), sulfur dioxide (SOX) [1], volatile organic compounds (VOCs) [2], and ozone (O3) [3]. These gaseous pollutants exist as trace gases in the atmosphere but have detrimental effects on human health and the environment [4]. The health effects include cancer, increased hospitalizations, and reduced life expectancy from heart and lung diseases. For example, even short exposure to SO2 can cause serious respiratory illnesses, such as asthma and bronchitis [5]. Moreover, two-thirds of VOCs are carcinogenic and harmful to the central nervous system and hematopoietic system, especially the immune system [6]. The environmental effects are mainly reflected in the greenhouse effect, ozone layer destruction, and acid rain [7]. With atmospheric pollution–caused incidents become increasingly serious, local governments are attaching more attention to atmospheric pollution, and laws and regulations aiming at controlling this have been promulgated. Despite this, some industrial chemical parks still secretly discharge toxic and high-concentration exhaust gases directly into the air from time to time. Therefore, the establishment and improvement of a real-time continuous gaseous pollutant monitoring system that can confirm the source and concentration of specific pollutants are essential for the control of atmospheric pollution.

The advanced online atmospheric monitoring technologies, include gas chromatography–mass spectrometry (GC-MS) [8], proton transfer reaction-mass spectrometer (PTR-MS) [9], and chemical ionization reaction mass spectrometry (CIRMS) [10, 11]. Among these, GC-MS is most widely used in commercial monitoring instruments, and it can rapidly identify and quantify pollutants from atmospheric samples [12]. PTR-MS and CIRMS are improvements of mass spectrometry and have the advantages of high sensitivity and continuous detection. Although these technologies have been proven to be able to detect gaseous pollutants accurately, their implementation is complex and requires reagent assistance. Being reagent free, nondestructive, and providing continuous and simultaneous multicomponent measurement, optical detection has become a new trend in the development of atmospheric monitoring [13].

Fourier-transform infrared (FTIR) spectroscopy is a powerful spectral detection technology that has been recommended by the US Environment Protect Agency for monitoring air pollutants. FTIR spectra are composed of absorption peaks generated from infrared radiation absorption during the vibration transition of asymmetric dipole moment polyatomic molecules, and a wide variety of gaseous pollutants can be measured by FTIR technology due to their physical structures. FTIR has high sensitivity, permitting the detection of changes in gas concentration at the ppb (parts per billion, volume concentration) level. Open-path FTIR have been applied to the online detection of trace gas emissions from forest fires [14] and soil during the spring thaw [15], and from traffic emissions along highways [16] and in harbors [17]. The identification of each component in the FTIR spectrum is crucial for accurate quantitative analysis. In the early stage of FTIR spectrum identification, library searching (LS) methods [18] were adopted when the spectral components were single or the absorption peaks rarely overlapped. Lavine et al. [19] used pattern recognition methods as the search prefilters for library searching and made the identification more precise and faster. Due to the strong abilities of curve fitting and anti-interference of the artificial neural network (ANN), a variety of ANN-based FTIR spectrum identification algorithms have been proposed. Yang and Griffiths [20] encoded reference FTIR spectra of five alcohols as prototype vectors for the Hopfield network and identified each of the five alcohols correctly from more than 100 spectra of different compounds through this network. At present, the most common methods used in practice combine the calibration spectra from the spectral database and chemometrics methods for quantitative as well as qualitative analyses. Griffith [21] calculated the calibration spectra from the high-resolution transmission molecular absorption (HITRAN) database and retrieved the gas information from FTIR spectra by the nonlinear least squares algorithm. Traditional component identification method of open-path FTIR spectra is an artificial identification, which needs a professional spectroscopist to determine the possible components that contained in the observed spectrum, then subtracting the corresponding components from the observed spectrum until a relatively flat baseline is produced [22]. These qualitative identification methods and their improvements can effectively identify spectra in specific occasions; however, in the case of serious spectral overlap, small sample size, absence of reference spectra, and high real-time requirements, it is difficult to accurately recognize trace gases from FTIR spectra.

Independent component analysis (ICA) is a higher-order statistical model–based signal processing method that can separate statistically independent source signals from a set of observed signals without any prior knowledge [23]. ICA has been widely applied to blind source separation (BSS) fields such as speech separation [24], biomedical signals analysis [25], and facial recognition [26, 27]. As a novel unmixing method that can recover the original information from observed overlapping mixtures, ICA has attracted great interest in spectral analytical chemistry [28]. Pasadakis and Kardamakis [29] used ICA to identify constituents in commercial gasoline from its FTIR spectra. Al-Mbaideen and Benaissa [30] investigated ICA combined with multiple linear regression to determine the concentration of glucose from its near-infrared spectra. Yu et al. [31] utilized ICA and wavelet analysis to discriminate the seriously overlapped three-dimensional fluorescence spectra of water pollutants. Monakhova et al. [32] presented the effectiveness of four different ICA deconvolution algorithms for overlapped NMR spectra of eight component mixtures of foods and related products. The studies showed the accuracy and robustness of ICA method in recovering overlapped spectra. However, due to the complexity of atmospheric gaseous compositions and water vapor interference in the actual measured spectra, there are few studies where ICA is applied to the component identification of atmospheric gaseous pollutant spectra.

2. Materials and Methods

In this study, we conducted identification experiments on the spectra collected both in the laboratory and in the actual atmosphere. The infrared spectra were collected by extraction FTIR gas spectrometer independently developed by Anhui Institute of Optics and Fine Mechanics CAS, with a multipass gas cell of a 10 m optical path inside. The spectrometer has a measuring bandwidth of 500 cm−1–5000 cm−1 and spectral resolution of 1 cm−1. The output spectrum of the spectrometer is the average value after 16 scans.

2.1. Materials

Methane (CH4), ethanol (C2H5OH), ethyne (C2H2), and ethylene (C2H4) are common air pollution gases. Because of their low cost, simple preparation, and seriously overlapping infrared absorption bands, we selected these four gases as the experimental gases. The infrared spectra of experimental gases from standard spectral database is shown in Figure 1; as we can see from the figure, the absorption bands of experimental gases are overlapped heavily in the waveband of 860–1500 cm−1 and 2800–3200 cm−1. The pure gases were produced by Hefei Ningte Gas Company and separately stored in 4 L sealed air cylinders. The initial concentrations of the four gases were 65 ppm (CH4), 207 ppm (C2H5OH), 506 ppm (C2H2), and 609 ppm (C2H4). We obtained target gases of different concentrations through a four-channel high-precision gas distribution system independently developed by Hefei Institute of Physical Science, Chinese Academy of Sciences (CAS). The gas distribution platform can simultaneously distribute four gases and get the required concentration by adjusting the volume ratio of the auxiliary gas (nitrogen) and the pure gas. In total, we got 12 samples of mixed gases, the concentration changes of four gases in different samples are mutually independence, this is a necessary condition for ICA separation. In order to simulate the composition of pollutants in the atmosphere, the concentrations of the gases were controlled below 60 ppm.

2.2. FTIR Spectra Acquisition of Experimental Gases

Before spectra acquisition, the gas storage cylinders were placed in a ventilation room to prevent injury caused by dangerous gas leakage, the experimental system and the connecting pipes were completely sealed, and a tightness test was performed. The four gases and nitrogen were configured to the concentrations required by the software built into the gas distribution platform. The spectra were recorded when the gas flow and barometric pressure were stable. The experimental setup for gas mixture spectra acquisition is shown in Figure 2. Spectra obtained from the spectrometer include not only the absorption information of target gases but also random noise; in this experiment, Savitzky-Golay smoothing filter was used for raw spectra denoising. Figure 3 is the denoised spectra of 12 groups mixed experimental gases.

In addition to the absorption peaks of experimental gases, water vapor and carbon dioxide also have strong absorption bands in raw spectra. H2O (g) and CO2 produce serious interference in the analysis of atmospheric infrared spectra. As we can see from Figure 3, H2O (g) has strong and wide absorption peaks in the range of 1350 cm−1–1800 cm−1 and 3500 cm−1–3900 cm−1, CO2 gets strong absorption peaks in the range of 2280 cm−1–2385 cm−1. When using FTIR to analyze gaseous components, the common method to avoid the interference of H2O (g) is to analyze the gaseous components in the “infrared atmospheric window” that is far away from the H2O (g) absorption wavebands. Based on this idea, we remove the strong absorption bands of H2O (g) and CO2 from the spectra and connect the rest of the waveband together. The elimination of interference of water vapor and carbon dioxide before implementing the identification procedure can greatly reduce the complexity of analysis.

2.3. FTIR Spectra Acquisition of Actual Atmospheric

The atmospheric spectra were measured on the roof of student apartment (3 floors) of the Hefei University of Technology; this area is located near the center of the city. There is no large-scale chemical industry source, and the atmospheric gaseous pollutants mainly come from the exhaust gas of motor vehicles. Vehicle exhaust contains a variety of toxic and harmful gases, including CO, CO2, and VOCs like benzene, alkenes, and aromatic hydrocarbon. Because these VOCs all contain hydrocarbon bond functional groups, they have obvious absorption peaks in the range of 2800 cm−1–3200 cm−1, and these absorption peaks are similar in spectral profile; hence, it is difficult for traditional identification methods to effectively distinguish the specific component information in the overlapped spectrum. In this study, spectral data in the region from 2800 cm−1 to 3200 cm−1 were used for components identification. Figure 4 is one of the measured atmospheric spectra.

Figure 5 shows 12 measured spectra in the range of 2800 cm−1 to 3200 cm−1. We can see that the actual atmospheric spectra have a relatively small variation in absorption amplitude, and it is difficult to distinguish the specific gases from the absorption peaks except methane.

2.4. Background Spectra Elimination

The background spectrum is caused by the response of infrared detector to external radiation, and it is a relatively slow change process, which can be considered as a slow change baseline superimposed on the absorption spectrum. In FTIR applications, the background spectrum can be obtained by measuring the spectrum without the absorption of the target component. It is easy to achieve this under laboratory conditions, but it is difficult in pollution gases monitoring by open-path FTIR. In order to suppress the effect of background spectra, Monte Carlo simulation is performed to get the new spectral samples from experimental spectra. Two spectra are randomly selected, and one is divided from the other to produce a new spectrum. For the 12 groups of mixed gases spectra obtained in the laboratory, we get a total of 66 different spectra. However, 12 spectra are enough to form a sample set. Finally, 12 spectra of experimental gases and 12 spectra of actual atmosphere are obtained by the background spectra elimination method. Unlike the background removal in the ordinary FTIR process, we are not seeking the pure background, and the new spectra not only eliminate the nonlinear part of the background but also keep the useful information of target gases.

2.5. ICA

ICA is a powerful signal processing tool, which can restore the original source signal from a set of observed mixed signal; it aims at finding a separation matrix that can transform the observed signal into a linear combination of mutually independent components, which are close to the source signal.

The noise-free ICA model can be expressed by the following linear relationship:where denotes the observed m-dimensional signal vector, is an n-dimensional source component vector, and denotes the unknown mixing matrix. ICA attempts to find the optimal separation matrix W to make the components of as independent as possible.

Because the source signals and mixing information are unknown, the implementation of ICA requires two important restrictions [33]:(1)The components of the source signal need to be statistically independent.(2)At most one component of the source signal obeys the Gaussian distribution.

The concentration and existence of different gaseous pollutants in the atmosphere rarely appear in proportion to each other, and the spectral signal is not subject to the Gaussian distribution. Therefore, ICA can be used in the separation of spectral signals of atmospheric gaseous pollutants.

Among various kinds of ICA approaches, fixed-point ICA (FastlCA) has become one of the most classical algorithms due to its fast operation speed and high accuracy. FastICA was proposed by Hyvärinen and Oja [34] in 1997, and it adopted a fixed-point iteration scheme to make the separation process 10 to 100 times faster than traditional ICA optimization algorithms. The optimized algorithm of traditional ICA is based on stochastic gradient methods, which can be achieved by neural network learning [35]. The main problems of neural network learning are slow convergence and high dependence on the selection of learning rate parameters. In contrast, FastICA gets a cubic convergence speed and does not rely on any custom parameters. In this study, FastICA was used as the ICA approach for rapid spectra separation process.

3. Results and Discussion

3.1. Establishment of the Linear Model

ICA can efficiently separate original signals from the mixed system, which is linear. We need to make sure that the system is under the assumption of a linear mixture model before we do the analysis. The Beer–Lambert law and absorbance additivity describe the relationship between the absorption intensity of multicomponent spectra and the concentration of each light-absorbing substance. The basic principles can be described as follows:where is the total absorption intensity of the spectrum at a certain wavelength , is the absorption coefficient, is the concentration of component at wavelength , and is the optical path length. According to the absorbance additivity, the absorbance of multicomponent spectrum is the sum of absorbance of each component, and the change of concentration of the component will only cause the change of absorbance amplitude but not the position of absorption peaks and the spectral profile. From the above analysis, we can know that the absorption intensity of atmospheric spectra conforms to the linear model, and the ICA algorithm can be used to separate the spectra of mixed gases.

In Section 2, we obtained the transmittance spectra from FTIR spectrometer. The transmittance information should be converted into absorbance information so that the separation algorithm can analyze the experimental data. The Beer–Lambert law also states the relationship between transmittance and absorbance :

Thus, we get the absorption spectra by taking the logarithm of the reciprocal transmittance data. The calculated 12 absorption spectra of experimental gases are shown in Figure 6, which will be used as the input of the separation algorithm after the discrete wavebands are connected to each other. The calculated 12 absorption spectra of actual atmosphere in waveband of 2800 cm−1–3200 cm−1 are shown in Figure 7, which can be directly used as the input of the separation algorithm.

3.2. Identification Results of Experimental Gases Spectra

All the calculation was performed on Matlab R2016 (The Math Works, Natick, MA, USA). We divided the 12 calculated absorption spectral data of experimental gases into three groups, and ICA decomposition was carried out on each group to get the separated spectra; the component identification results were obtained by the analysis of separated spectra.

Figure 8 shows the separated spectra of three groups. The separated spectra are restored to the original waveband positions.

The identification of separated spectra is achieved through peak searching and comparing the similarity among separated spectra and the standard spectral database. Table 1 lists the infrared absorption characteristics of the four experimental gases from standard spectra database.

Because the spectrum may produce wavenumber drift in the actual measurement process, it is necessary to set an offset of peak position for peak searching. Generally, the wavenumber drift will not be greater than 1 cm−1, and therefore, we set the peak position offset to 1 cm−1.

Table 2 shows the peak searching results of the separated spectra; the peak positions are obtained from the maximum value of absorption bands. By sorting the absorption peaks, the positions of the first two peaks are listed in the table. The gas (or gases) that the spectrum may correspond to is discriminated by the position of the strongest peak, and the sub–strong peak is used to assist this discrimination.

The peak searching results show that most of the possible gases can be recognized by the strongest absorption peak of the separated spectrum, especially the spectrum with a single absorption peak like C2H2 and CH4. In the first separated spectrum of group (b), there are two absorption peaks with a similar amplitude, corresponding to two possible gases. In this case, the comparison of spectra is necessary for accurate identification.

Based on the results of peak searching, the final identification is completed by analyzing the similarity between the possible components and the corresponding standard spectrum. The correlation coefficient is a measure of the closeness degree between two column vectors and is used here for spectra similarity discrimination. The correlation coefficient r between the standard absorbance spectrum x and the spectrum y to be identified is as follows:where N is the number of data points in the spectra, and and are the mean values of x and y. −1 ≤ r ≤ 1 can be obtained from the definition. According to the previous study [36], the two spectra match well when r > 0.9. A further criterion is required for the spectral identification when r ≤ 0.8. The calculated results of correlation coefficients are presented in Table 3.

All values of r are greater than 0.8, which means that the separated spectra match well with the standard spectra. The highest correlation is observed with CH4, whose r values are all greater than 0.98. Meanwhile, the lowest correlation is observed with C2H5OH, whose r values are in the range of 0.8259–0.9213. This may be because the absorption bands of CH4 do not overlap with the absorption bands of H2O and CO2, whereas the absorption bands of C2H5OH are complex and part of them overlap with the absorption band of H2O, which is removed before the separation. The first spectrum of Figure 8(b) has two corresponding gases according to peak searching, and the spectrum can be identified by calculating the correlation coefficient between this spectrum and the standard spectrum of the two possible gases. The r value is 0.4126 for CH4 and 0.8259 for C2H5OH; therefore, it is easy to judge that the spectrum is strongly related to C2H5OH. The peak searching and correlation coefficients calculated results show that all the four experimental gases can be correctly recognized in three groups of experiment, and the identification rate of the mixed gases spectra is 100%.

From the provided figures, we can see that the separation results are not completely consistent with the source spectra in the standard database, and the separated spectra may have redundant absorption peaks related to interfering gases. This may be attributed to the incomplete removal of H2O absorption wavebands. Although redundant absorption peaks exist, their amplitudes are small, and thus, do not affect recognition results.

3.3. Identification Results of Actual Atmospheric Spectra

The number of sources, also known as the number of independent components (ICs), needs to be evaluated before ICA decomposition. The number of observed signals, as the input of ICA, should be equal to or greater than the actual number of components in the mixture; otherwise, the ICA results will have error. There are many researches about how to evaluate the number of components in chemical mixture [3739]. The number of components exist in the actual atmosphere can be evaluated by the singular value decomposition of the denoised spectral matrix, and the number of components is equal with the number of singular values that are higher than a certain threshold. Because the singular vector contains more useful information than the singular value, it is more accurate to judge the source number through the singular vector. Therefore, the specific source number of the atmosphere spectra can be determined by comparing the singular vector waveform after decomposition. Figure 9 is the first eight singular vectors obtained from singular value decomposition of the atmospheric spectral matrix; we can see that the first five singular vectors have obvious waveforms, but from the sixth singular vector, they fluctuate slightly near zero. According to the result of singular value decomposition, the number of components in atmospheric spectra can be determined to be 5.

We selected 10 spectra from the 12 calculated absorption spectra data of actual atmosphere and divided the 10 spectra into two groups; ICA decomposition was carried out on each group to get the separated spectra; Figure 10 shows the separated spectra of two groups.

Because the specific gases in the actual atmosphere are unknown, the component identification of atmospheric spectra can be achieved through calculating and comparing the correlation coefficient between the separated spectra and all the gas spectra with absorption peaks in the waveband of 2800 cm−1–3200 cm−1. The gaseous pollutants in the measurement site were mainly vehicle exhaust, so we compared the separated spectra with the standard spectra of common VOCs in vehicle exhaust, and recorded the gas type corresponding to the maximum value of correlation coefficient. Table 4 lists identification results of separated spectra.

According to the identification results from the correlation coefficient between the separated spectra and the standard spectra, it can be concluded that the five gases existed in the atmospheric were o-xylene (C8H10), methane (CH4), toluene (C7H8), p-xylene (C8H10), and n-hexane (C6H14), respectively. Compared with the analysis results of build-in software of spectrometer, which are based on the classical least squares algorithm, all of the five gases existed in the atmospheric spectra. At the same time, according to the software analysis result, there were traces of benzene and styrene in the atmosphere, which were not identified by the ICA separation. The reason for this may be that the concentration of the two gases had not changed in the selected spectral samples, so the absorption peaks of the two gases were eliminated by the division operation during the background spectra elimination process.

From the separation results, we can see that ICA algorithms can separate single gas spectrum from heavily overlapped spectra; it can transform the spectra set into independent absorption peaks. The results also show that the sequence and amplitude of the results obtained by the ICA algorithm are uncertain.

3.4. Efficiency of ICA Spectra Separation

There are total 6831 spectral data points in the target waveband of experimental gas spectra, and the number is 830 in actual atmospheric spectra. Table 5 lists the number of iterations and running time of FastICA algorithm for spectral separation.

Table 5 shows operation efficiency of FastICA algorithm. The average running time of ICA separation is less than 0.2 seconds, and it is conducive to the application of signal separation algorithm in online monitoring with high real-time requirements. The identification results of the separated spectra demonstrate that the ICA algorithm is a remarkable tool in the qualitative identification of heavily overlapped mixed gases FTIR spectra. It can rapidly retrieve the qualitative information from complex multicomponent spectra.

Although the ICA algorithm has good performance on extracting the source signal, the separated signals have uncertain scaling factors. Besides, the absorbance of gas is not in direct proportion to its concentrations in practice. Therefore, establishing a quantitative spectral model by multiple linear regression is necessary for obtaining the concentration information of each pollutant. The common quantitative spectral methods include partial least squares (PLS), classical least squares (CLS), stepwise regression analysis (SRA), and support vector machine (SVR).

It is noteworthy that only several spectral samples are needed to complete the whole identification process by the separation algorithm. Conversely, the neural network algorithm and other identification methods like CLS require multiple samples for modeling. Thus, the proposed method is more suitable in practice when reference substances are not available or are expensive. Moreover, the spectral separation process does not depend on the existing spectral database. When the separated spectra are not included in the spectral database, they may belong to new species. We can thus enrich the spectral database by adding the spectra of new species into the existing database.

The identification results of actual atmospheric spectra show that the ICA method can effectively identify the gaseous pollutants whose concentration changes in the measured spectra. Because the absorption peaks of gas spectrum with constant concentration would be eliminated by the background spectrum elimination method used in this study, we need to focus on the background spectrum elimination method, which can eliminate the background spectrum while retaining all target gas information in the following study, and the selection of proper waveband for ICA separation can reduce the difficulty of spectra identification.

4. Conclusion

Because the absorption peaks of different gas infrared spectra overlap extensively, the traditional atmospheric spectral detection methods have difficulties rapidly and accurately distinguishing specific gas types from overlapping spectra when monitoring atmospheric gaseous pollutants. In this study, we use the source information recovery ability of ICA algorithm to separate single gas spectra from seriously overlapped, multicomponent FTIR spectra. A series of pretreatments are presented to combat the noise and interference produced in the spectral acquisition process, and experiments on laboratory obtained spectra and actual atmospheric spectra verify the effectiveness of ICA algorithm on mixed spectra separation. The proposed method can improve the identification rate as well as the identification speed and enrich the spectral library compared with traditional gaseous spectral identification methods. The whole separation process only needs several mixed spectral samples, and it can make the online atmospheric spectral monitoring and other linear system analysis with high real-time requirements simpler and more effective.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no potential conflicts of interest related to this manuscript.

Acknowledgments

This work was supported by the National Key Scientific Instrument and Equipment Development Project of China (no. 2013YQ220643) and The National 863 Program of China (no. 5702014AA06A503).