Abstract

Laser-induced breakdown spectroscopy with soft independent modeling of class analogy is used in the identification of a large number of unprocessed geological samples having similar components in this study. Considering a variety of data from different samples, representative spectral regions representing the major components were extracted. In addition, principal component analysis was applied to remove noninformative variables from the spectrum. The unclassification rate, misclassification rate, and average correct classification rate for 25 types of geological samples were 1.2%, 4.7%, and 94.1%, respectively. These results suggest that laser-induced breakdown spectroscopy using soft independent modeling of class analogy can be used to identify a wide variety of geological samples. Furthermore, we found that this approach can be used to identify spectral differences among similar sample types because of matrix effects and the trace element impurities.

1. Introduction

Laser-induced breakdown spectroscopy (LIBS) [1, 2] is a simple atomic emission technique for multiple elements and provides a semidestructive and efficient analysis, particularly in harsh and dangerous environments. Thus, LIBS has been widely used for various applications, such as industry-oriented analysis [3], archaeological investigation [4], geological and environmental studies [57], and jewelry characterization [8]. Geological materials including rocks and minerals convey important information about particular geological environments, and this information can be extremely useful for studies such as determining mineral provenance, reservoir description, prospecting, and geochemical mapping [911]. Nevertheless, there is a wide variety of geological materials with overlapping characteristics, thus compromising their proper discrimination. The conventional geological survey depends on the geologist’s assessment and subsequent laboratory analyses, which can be time-consuming and complicated. To simplify the analysis, LIBS applications on geological materials have been proposed over the last two decades. Furthermore, multivariate preprocessing methods have been increasingly studied, including approaches based on principal component analysis (PCA) [12, 13], partial least squares discriminant analysis (PLS-DA) [14], graph theory (GT) [15], independent component analysis (ICA) [16], and artificial neural networks (ANNs) [17, 18]. Such methods consider the effect of redundant information and hence increase the efficiency of data analysis and prevent negligible fluctuations resulting from experimental conditions and instrumental instability [19]. Specifically, soft independent modeling of class analogy (SIMCA) is widely used to classify high-dimensional data because it incorporates PCA for dimensionality reduction [20]. It was originally developed to increase the accuracy and speed of classification in near-infrared spectroscopy [2125] and subsequently applied to the classification of LIBS [26, 27]. Although suitable results have been reported in the classification of some geological materials, it is still challenging to provide a method that suitably classifies a large number of materials, especially when they present similar major elements. In this study, we applied the SIMCA and PCA to LIBS data aiming to classify a wide variety of geological samples.

2. Experiment and Methods

2.1. LIBS Instrument

Figure 1 illustrates the complete experimental system used in this study. The Brilliant B Nd : YAG Laser (Quantel SA, Les Ulis Cedex, France) was operated at a fundamental wavelength of 1064 nm, a repetition rate of 10 Hz, and a pulse width of 10 ns. The laser energy was optimized to maximize the peak intensity without saturating the intensified charge-coupled device (ICCD) camera. The excitation energy from this laser was focused on a target with a long-focus (f1 = 100 mm) lens to prevent contamination from spatter particles generated by the laser shots. Light emitted from the plasma was collected by a pair of matching coaxial fused silica planoconvex lenses (f2 = f3 = 38.1 mm) and guided into a 230 μm diameter optical fiber for linking with a Mechelle 5000 spectrometer (Andor Technology Ltd., Belfast, UK). Then, the dispersed light from the spectrometer was recorded with an iStar DH734i-18F-03 ICCD camera (Andor Technology Ltd., Belfast, UK) having a wide spectral range (212 nm–1032 nm) with 0.1 nm resolution. The angle between the collection direction and the sample stage surface was approximately 45°. The samples were fixed on a rotating platform and mechanically rotated to different positions following laser ablation. The crater effects were minimized, and the inhomogeneity among samples was partially compensated by collection from different positions.

2.2. Samples and Measurements

This study involved the analysis of 25 types of geological samples representing a mixture of minerals and rocks (carbuncle), which are listed in Table 1. Figure 2 shows photographs of six types of samples used in this study. These samples were obtained from the China Institute of Geology in Qingdao City, Shandong Province. Five different blocks of each sample were collected at different but nearby geographical locations. In addition, several geological samples with a similar chemical composition were purposely considered in this study to verify the robustness of the proposed model. In fact, compositionally similar minerals can exhibit a very high spectral correlation, thus posing a challenge to intersample discrimination. For instance, sample No. 20, 21, and 22 can be considered as a type of gypsum, which is mostly composed of calcium sulfate (CaSO4), whereas sample No. 7, 10, and 17 basically consist of iron oxides (Fe2O3), and sample No. 8, 19, and 25 were also considered of the same type. Samples were measured using LIBS without pretreatment to obtain raw data from in-field measurements. The five separate blocks of each geological sample were detected, from which three were assigned for determining the method parameters, whereas the remaining two were used to test the method performance. To partially balance spectral heterogeneity, each spectrum was determined from 5 laser shots and 20 spectra acquired per block at different points on the sample surface. The integration time and delay were 15 µs and 200 ns, respectively, to eliminate continuum emission.

2.3. Model

Multivariate analysis can be applied to reduce or compress spectral data while retaining important spectral information of the samples [28]. In particular, SIMCA is a widely used supervised pattern recognition method to classify sample spectra within specific categories. It consists of a collection of PCA models and can provide independent classification for each category, as detailed in [29, 30]. Specifically, PCA calculations were carried out in order to reduce the dimensionality of the data set, allowing an overview of the samples. The results from PCA are typically analyzed by score and loading plots. The score plots allow the identification of samples, by verifying if there are similarities or not, and the identification of outliers and clusters. Loading plots permit the identification of variables that have greater importance for the sample positions in the score plots. The optimal number of principal components (PCs) to characterize the data set was based on the total value of the principal component retained variance [30, 31]. In addition, we used a toolbox for SIMCA that was developed by the Milano Chemometrics and QSAR Research Group at the University of Milano-Bicocca in Italy [32, 33]. Moreover, we considered a statistical confidence level of 95% (α = 0.05) and implemented the calculations using MATLAB version 7.2.

A total of 1000 spectra (40 spectra × 25 samples) from known samples were used to build the SIMCA recognition model, and 500 spectra (20 spectra × 25 samples) were applied for optimizing the parameters based on cross validation. The remaining 1000 spectra (40 spectra × 25 samples) from unknown samples as a test set were used to determine the classification performance.

2.4. Emission Line Selection

The analytical spectral line of an element in the plasma is related to the ejected sample mass and depends on the laser radiation parameters, that is, energy and focusing. Either random or systematic changes of these parameters can strongly affect the analytical precision and accuracy and may introduce nonlinearity in the classification [34]. Moreover, the roughness of the sample surface further increases nonlinearity by the interaction between the laser and samples. However, normalization can compensate some shortcomings and signal variations resulting from experimental conditions and instrumental instability [35]. In this study, the signals corresponding to various elements were normalized with the total spectrum intensity. Figure 3 illustrates the normalized spectral lines of five mineral samples, where the highlighted elements were identified using the NIST Atomic Spectral Database. Emissions in the spectra from the analyzed 5 samples correspond to the elements showing high similarities and differences among distinct LIBS fingerprints.

The recorded spectra consist of more than 20,000 pixels spanning a wide wavelength region from ultraviolet to near-infrared. However, a substantial portion of the feature space may not be relevant for classification. Hence, feature selection must be applied to eliminate spurious correlations, especially when interclass differences are subtle. Feature selection allows to reduce regions of LIBS data that do not convey useful classification information. We found that the most important features correspond to wavelengths from the elemental emission lines of K, Li, Na, Ba, Mn, Ca, Al, Ti, Si, Mg, and Fe. In fact, these elements are the main components of the continental crust and determine unique chemical fingerprints, which are useful for geological study. For classification, a set of spectral regions from the major elements commonly used in spectral analysis were selected [36], totaling 1107 variables. The selected spectral variables are listed in Table 2 and illustrated in Figure 4.

3. Results and Discussion

3.1. PCA Optimization

Problems of unsatisfied collinearity and high computational cost persisted after selecting the spectral regions. Hence, the selected LIBS regions were projected into lower-dimensional independent variables using PCA, and those having the maximal interclass variance and minimal intraclass variance were iteratively determined. The selection of principal components greatly affects the classification capabilities and can prevent both under- and overfitting problems of classification. The scores and loadings of PCA for the 25 types of samples are shown in Figure 4, where the principal components PC1, PC2, and PC3 contribute 28.84%, 22.17%, and 14.34% of the overall variance, respectively. In addition, high loading indicates that elements corresponding to that wavelength have a high effect on the principal components [37] and also a high concentration of the corresponding element in the samples. The loading and score of PC1 suggest a high correlation with the concentrations of Ca and Al. Similarly, PC2 is clearly related to Ca, Al, Si, and Mg; PC3 is more relevant to Ba, Mn, Ca, and Al. Considering that the first three principal components convey only 65.35% of the original information, other principal components were also introduced into for SIMCA. Figure 5 shows that an increasing number of principal components reduce the root mean square error of cross validation (RMSECV), thus indicating a more accurate selection. The RMSECV converges to a stable minimum after considering approximately 15 principal components. By selecting the most relevant principal components based on the RMSECV results, the classification for the validation data set was as follows: unclassification of 0.4%, misclassification of 1.4%, and correct classification of 98.2%.

3.2. SIMCA Evaluation

Table 3 lists the SIMCA classification results for the test data. The results also demonstrate that misclassification mostly occurs among similar samples, suggesting the difficulty to classify minerals that have the same cations as major constituents, such as gypsum (sample No. 20 (selenite), 21 (alabaster), and 22 (anhydrite)) and hematite (sample No. 7 (oolitic hematite), 10 (black hematite), and 17 (reniform hematite)). Nevertheless, the correct classification rate among similar samples remained acceptable. The correct classification rate among similar samples can be related to the slight differences resulting from physical matrix effects including hardness, structure, and texture. These factors can result in different amounts of ablated mass and in a consequent variation of the spectral lines, even when having a similar chemical composition of the geological samples [38]. Therefore, these influences can be useful in studies for recognizing similar geological samples. Another reason for the ability to distinguish similar samples can be attributed to minor impurities, whose distribution trace in natural minerals provides different spectral features among similar matrices.

The correct classification rate of 60% for carbuncle is lower than that of other minerals. This may be ascribed to the fact that the carbuncle as a type of rock is a combination of mineral grains, porosity, and cement-mixed body. There are also subtle differences between each block of carbuncle because they come from the different locations. The measurements on different points of the carbuncle block surface do not exhibit consistent proportions. In contrast, minerals are a single substance or a compound formed by geological action, presenting a relatively fixed chemical composition.

The remaining unclassified and misclassified samples can be attributed to the target heterogeneity. To obtain representative spectra for calibration/validation in the model from heterogeneous samples, it is necessary to collect LIBS information from a sufficiently large number of analysis spots, but this number could have not been sufficient to obtain representative analyses for all minerals in this study.

Overall, a high performance is achieved for all the samples, with an average correct classification rate of 94.1% and negligible rates of unclassification and misclassification. Therefore, the proposed SIMCA with LIBS was capable of correctly classifying samples in multiple categories, even when presenting similar compositions. Furthermore, the selected wavelength ranges, which reduce the amount of analyzed data to the major elements of the geological minerals, allowed to retrieve a successful and more efficient classification.

4. Conclusions

In this paper, LIBS combined with SIMCA is evaluated for the robust classification of a wide range of mineral types, which can be suitable for real-world applications, an improvement over many previous studies that were limited to several minerals. The LIBS data of different minerals can be used to identify samples based on major constituents. In fact, PCA allows to evaluate the discriminating ability of different elements present in geological samples by the corresponding loadings and scores of principal components. In this study, PCA-optimized SIMCA was employed to classify 25 types of geological samples. Although the correct identification was compromised for some varieties of rock such as carbuncle, which is composed of a variety of minerals having a high degree of trace element variability, the overall correct classification rate of 94.1% was high and the unclassification rate of 1.2% and the misclassification rate of 4.7% were acceptable.

Data Availability

The data used to support the findings of this study are available in the Microsoft Excel format from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (Grant no. 41503063). The authors would like to thank Ye Tian for the helpful discussion and suggestions and Zhao Luo for providing the language assistance.