Reflectance spectroscopy is a nondestructive, rapid, and easy-to-use technique which can be used to assess the composition of rocks qualitatively or quantitatively. Although it is a powerful tool, it has its limitations especially when it comes to measurements of rocks with a phaneritic texture. The external variability is reflected only in spectroscopy and not in the chemical-mineralogical measurements that are performed on crushed rock in certified laboratories. Hence, the spectral variability of the surface of an uncrushed rock will, in most cases, be higher than the internal chemical-mineralogical variability, which may impair statistical models built on field measurements. For this reason, studying ore-bearing rocks and evaluating their spectral variability in different scales is an important procedure to better understand the factors that may influence the qualitative and quantitative analysis of the rocks. The objectives are to quantify the spectral variability of three types of altered granodiorite using well-established statistical methods with an upscaling approach. With this approach, the samples were measured in the laboratory under supervised ambient conditions and in the field under semisupervised conditions. This study further aims to conclude which statistical method provides the best practical and accurate classification for use in future studies. Our results showed that all statistical methods enable the separation of the rock types, although two types of rocks have exhibited almost identical spectra. Furthermore, the statistical methods that supplied the most significant results for classification purposes were principal component analysis combined with k-nearest neighbor with a classification accuracy for laboratory and field measurements of 68.1% and 100%, respectively.

1. Introduction

Over the past few decades, many studies have investigated the spectral properties of metal-bearing minerals and clay minerals present in igneous rocks using the visible light, near and shortwave infrared (VNIR-SWIR) spectroscopy. This spectral domain between 350–2500 nm has been proven to be a reliable and rapid tool for detecting and identifying clay minerals [1, 2] or for predicting the concentrations of Cu in waste dump material [3]. Reflectance spectroscopy was also used to discriminate different grades of ore samples [4, 5], to identify alteration zones associated with copper deposits [6] and classification purposes [710].

The SWIR is considered to be the best spectral region to identify various aspects of hydrothermal alteration zones [11]. Hydroxyl-bearing minerals including clay and sulfate groups and carbonates in the alteration assemblages show spectral absorption features due to vibrational processes of Al-OH at 2200 nm, Mg-OH at 2300 nm, and CO3 groups at 2350 nm [12, 13]. In addition, phyllosilicates, such as kaolinite, montmorillonite, and chlorite which are Al-Si-(OH) and Mg-Si-(OH)-bearing minerals and the Ca-Al-Si-(OH)-bearing minerals such as the epidote group, can also be identified using the SWIR region [2, 1416].

One of the most important mineral groups that are associated with alteration zones and hydrothermal sulfide deposits over porphyry copper bodies are Fe-oxides [1720], which are spectrally active in the VNIR region (400–900 nm) due to electronic transition (charge transfer) in the Fe cations [21]. The mineral zoisite shows distinct absorptions both in the VNIR and the SWIR [22] at 430, 530, and 800 nm due to the presence of ferric iron and an unusual OH feature at 1680 nm as well as features caused by combinations of OH with lattice or bending modes of Al-OH at 2300, 2350, and 2480 nm [23]. Feldspar, quartz, and pyrite, on the other hand, do not exhibit any spectral features in the VNIR-SWIR region, but their presence may ‘mask’ other absorption features [24].

The spectral properties of rocks and minerals are affected by particle size, which is related to two major scattering processes: volume scattering at the surface of particles and volume scattering that occurs within the particles [2527]. In the VNIR-SWIR range, lower particle sizes are usually associated with higher reflectance compared to the same material of larger particle size [28]. In rocks as mineral assemblages, factors such as texture and weathering effects must be further considered to impact the spectral signatures. Such factors might heavily impact the spectral properties of rocks and might even mask the presence of specific mineral features that are visible otherwise [29]. Genetically related rocks can display systematic variations of spectral parameters as functions of systematic variations of petrographical and geochemical parameters [10]. Thus, for classification purposes as well as for quantification and geochemical properties, it is important to study the rocks’ microcomplexity which is affected by the mineral chemistry and structure, grain size, and texture. Moreover, this microcomplexity affects the spectral properties and spectral variability at different observational scales [10].

The identification and classification of rocks and minerals have been the focus in many studies either by using their spectral characteristics [30] or by various spectral processing methods such as spectral angle mapper (SAM) [10, 3133], support vector machines (SVM) [32, 34], and principal component analysis (PCA) [35]. Recent studies found an overall classification accuracy of 66% based on the spectral data of various rocks [36] and an accuracy of 67.4% and 69.7% based on SAM and spectral information divergence (SID), respectively [37]. In addition, using a multilayer perceptron (MLP) and a convolutional neural network (CNN) applied on SWIR reflectance spectra, it is possible to identify alteration minerals with a test accuracy of 97.8% [38].

The quantification of the spectral variability is important because it determines the ability to separate between different altered rocks for qualitative (classification) and quantitative (statistical modelling) purposes. The outcome may have a huge economic impact resulting from a more accurate rocks classification and the exploitation of valuable raw materials. In addition, it is a fundamental aspect to ensure the reliability of statistical models for predicting the physical, chemical, and mineralogical composition of rocks. It should be clarified that although rocks are an aggregation of minerals, this work is intended to examine the spectral variability of the rocks and not of the rock components (i.e., the mineralogical composition). Yet, the mineralogical composition is still used for providing an explanation to the variability in the samples.

The main objective of this study is to examine and quantify the spectral variability of three types of granodiorites to bridge the gap between laboratory measurements and field data and enable a more precise classification of the selected rock types. This is achieved by examining well-established statistical methods: mean and spectral standard deviation (SSD), SAM, average sum of deviation square (ASDS) and by performing PCA followed by k-nearest neighbors (kNN) algorithm for classification purposes.

2. Materials and Methods

2.1. The Rock Samples

The samples were collected from an open pit mine located in the Erdenet porphyry copper-molybdenum deposit south-east of the city Erdenet, Mongolia [39]. The area is part of the Central Asian Orogenic Belt: The Selenge Intrusive Complex and is consisting predominantly of late Permian granodiorite [40] and also andesite, diorite, granite, and breccias [39]. The two most abundant minerals in the rock samples are feldspar and quartz. However, other minerals are also present in the samples and through them it is possible to characterize the different rock types: a rock type that contains feldspar, quartz, and pyrite (M1), a rock type that contains mostly feldspar and quartz (M2), and a rock type that, besides feldspar and quartz, exhibits various contents of zoisite, ferric oxide, and magnetite (M3). The rock samples, which are shown in Figure 1, originated from the open pit mine and were brought directly to the experimental site to create semisupervised conditions for the spectral measurements. We anticipate that this setup is an essential step for an ore deposit exploration and mineral mapping using spectral means in a more precise manner.

As shown in Table 1, the rocks contain four main minerals: quartz, feldspar, zoisite, and pyrite which make up between 96 and 99.9% of the rock samples. As each mineral has its unique spectral signature, any variability in their abundance in different rock pieces within the same rock type will affect the SSD.

2.2. Spectral Measurements

The acquisition of the spectral information has been conducted using the portable Spectral Evolution SR-3500 spectrometer and a Zenith Lite™ panel which served as a white reference. The spectrometer has a spectral range of 350–2500 nm which covers the VIS-NIR-SWIR and it has spectral resolutions of 2.8, 8, and 6 nm at 700, 1500, and 2100 nm, respectively (Spectral Evolution).

2.2.1. Field Measurements

The area selected for the experimental site was a concrete basketball court in the city of Erdenet, Mongolia (49.03°N/104.06425°E) which served as a flat and homogeneous background for the experiment and it is also described in [41]. The area was divided into squares which contained the three piles of each rock type that were placed on the surface. The spectral measurements in the field were performed using a bare fiber optics with 25° FOV with sunlight as the energy source where each rock type was measured 30 times from 1 meter.

2.2.2. Laboratory Measurements

During the field measurements, we have noticed differences within each rock type (e.g., variations in colors). Therefore, to check whether these differences are important, we collected several pieces of rocks for each group. The final collection contains 41 rock pieces that belong to one of the three rock types (M1, M2, or M3), whereas each type was divided into 3–5 groups to check whether any kind of spectral variability exists between the groups of a certain rock type. The spectral measurements were performed in the laboratory using a contact probe. Each single rock piece was measured 10 times from all sides for quantifying the variance of each rock piece at different scales which then ended up to a total of 410 spectral measurements out of which 90 for M1, 120 for M2, and 200 for M3.

2.3. The Multilevel Approach

The examination of the spectral variability has been done using a multilevel approach. The multilevel approach was processed as follows: Level 0 (L0) is the spectral data after preprocessing of 10 measurements for each rock piece (410 spectra). Level 1 (L1) contains the average spectra of L0 which results in 9, 12, and 20 spectra for each rock piece of M1, M2, and M3, respectively. Level 2 (L2) contains the average spectra of L0 which results in 3, 4, and 5 spectra for each group in M1, M2, and M3, respectively. Level 3 (L3) contains the average spectra of L0 which results in one spectrum for each rock type (Table 2). It is important to acknowledge that the differences in the number of pieces is related to the large visual variability of M3 compared to M1 when the rocks were collected for the laboratory measurements. Level 0x (L0x) is the raw spectral data of 30 measurements for each rock type. Level 3x (L3x) contains the average spectra of L0x which results in one spectrum for each rock type (M1-M3).

2.4. Preprocessing and Spectral Analysis

After the spectra were collected, several corrections and calculations and preprocessing procedures were applied. First the database was corrected using the white reference correction factor to covert the relative reflectance measurements to absolute reflectance values. Then, the Savitzky-Golay (SG) smoothing algorithm was applied using a polynomial order of 3 and window size of 21 bands followed by the removal of the noisy bands which resulted in 1633 bands for the statistical analysis.

2.5. Statistical Analysis

The comparison between the spectral data at all levels in both field and laboratory domains was conducted using four statistical methods:(1)The mean () and the standard deviation () were calculated for each wavelength in each level of processing. While the mean provides a summary of a large dataset, the standard deviation evaluates the degree of variation for each wavelength. The mean is given by (1) and the standard deviation is given by (2):where is the number of spectra and is an observed value.(2)The SAM is used to assess the similarity between two spectra using their vectors [42]. It is a rapid and easy method, robust against changes in illumination, and enables a comparison between field and laboratory measurements. The SAM is given by where and are the reference spectrum and the measured spectrum, respectively.(3)The ASDS between two spectra [43] quantify the differences along the entire spectrum, but, unlike SAM, it is sensitive to illumination. Therefore, ASDS provides a comparison regarding the degree of reflectance. The ASDS is given in where is the value obtained from a given wavelength in the reference spectrum, is the value obtained from the same wavelength in the measured spectrum, and is the number of wavelengths (in our case, N = 1633).(4)The PCA algorithm is used to reduce dimensionality while keeping as much variability in the dataset [44]. Subsequently, the kNN algorithm is used to perform a supervised learning machine for classification purposes. The algorithm tries to find the optimal number of points (k) to classify a certain sample by its Euclidean distance to other samples and is done by initially dividing the data into 70% training and 30% test samples. As it is impractical to provide a fixed value for k, the optimal k value is determined by cross-validation prior to the classification. The PCA and the kNN were conducted using the SPSS statistical software [45]. Figure 2 depicts a schematic representation of the statistical analysis performed on each dataset.

3. Results

3.1. Mean and Spectral Standard Deviation (SSD)

The mean spectra (L3 and L3x, for laboratory and field measurements, respectively) were calculated and are presented in Figure 3. Using the mean spectra as a reference, M1 appears to be different from M2 and M3 by the overall shape of the spectrum and due to the less significant absorption features (excluding the O-H absorption at 1400 and 1900 nm). On the other hand, the differences between M2 and M3 are smaller which means it is more difficult to distinguish between these two rock types. Both M1 and M3 have almost the same spectral features and reflectance and present the same absorption bands.

The SSD was calculated for laboratory and field measurements and results are presented in Figure 4. The average SSD values for the laboratory are 7.4, 8.5, and 13.2 and in the field are 2.1, 3.1, and 2.3 for M1, M2, and M3, respectively. The high SSD values in the laboratory are caused by differences in small mineral clusters and oxidation processes within each rock piece, which are not observed in the field measurements due to the large area covered by the measurements. The presence and quantity of various minerals in the rocks generate spectral absorptions to occur at different wavelengths as discussed in 2.1. Therefore, due to the variability in the quantity of minerals in the rocks, we can expect high spectral variability at the same wavelengths. Hence, the variability in the Fe-oxides content results in a high SSD in the 400–900 nm spectral range, and the variability in zoisite content results in high SSD in wavelengths where spectral absorptions exist, mainly in the 430–800, 1680, and 2300–2450 nm ranges, whereas variability in the quartz and feldspar contents results in changes in SSD across the entire VNIR-SWIR range and not in specific wavelengths.

This shows that the lowest SSD values are occurring for M1 in both laboratory and field spectra, which may indicate a higher homogeneity of M1 compared to M2 and M3. However, using the contact probe in the laboratory, M2 shows lower SSD values than M3, while using the bare fiber in the field, M3 shows lower SSD values than M2. The reason for this change is due to the high spectral heterogeneity of M3 in the microscale measurements whereas in the field data, the mineralogical diversity is not reflected.

3.2. Spectral Angle Mapper (SAM)

The SAM algorithm was used for analyzing the spectral similarity in a reference spectrum to the entire dataset. For that purpose, both internal and external reference spectra were used. The internal reference is a spectrum coming from within the dataset. In our case, the internal reference spectra were the average spectra of the three rock types, hence, L3 or L3x. L3 was used as a reference for L0, L1, and L2, whereas L3x was used a reference for L0x. However, to examine the spectral similarity between all samples, a reference spectrum was used with a reflectance equaling to unity across the desired spectral range. Using the same external reference for all measurements allows the examination of the differences between rock types in each level. Moreover, it provides an answer to the question at what stage of averaging a good distinction can be made between endmembers while also adding L3 and L3x to results.

3.3. Internal Reference

In each level, the average SAM was calculated for each rock type using L3 and L3x as the reference spectra. Results show that the lowest SAM values were achieved using the same rock type as a reference. However, the results for M2 and M3 were still close compared to M1 which shows that even with this method, there is difficulty in distinguishing between the two types of rocks. Additionally, the SAM values decrease according to the level of processing and especially between L0 and L0x, which indicates a decrease in the spectral variability between laboratory and field measurements and is observed for each rock type when using the same rock type as a reference: for M1 a decrease from 0.073 to 0.017, for M2 a decrease from 0.083 to 0.027, and for M3 a decrease from 0.115 to 0.022. Results are summarized in Table 3 and the lowest values for each level are given in Figure 5.

3.4. External Reference

The degree of similarity was also examined using an external spectrum which provides a comparison between the rock types while adding also L3 and L3x in the analysis. Results shown in Table 4 are consistent with the findings which were obtained using the internal reference in which M1 shows the lowest SAM values both in laboratory and in the field while M2 and M3 shows a relatively similar value. However, despite the similarity between M2 and M3, they are still distinguishable from one another in all levels.

3.5. Average Sum of Deviation Squared (ASDS)

The ASDS method was used to provide an additional insight about the spectral variability compared to the same internal reference spectra as in the SAM method (L3 for L0, L1, L2, and L3x for L0x). The lowest ASDS values for each rock type were obtained with its own reference type. For example, the lowest value for M3 was achieved using M3 as a reference, regardless of the level. Furthermore, the minimum value was achieved by M1 in all levels of laboratory measurements; however, in L0x, the minimum value was presented in both M1 and M3. The results are summarized in Table 5 and Figure 6.

3.6. Principal Component Analysis (PCA) and k-Nearest Neighbors (kNNs)

The PCA and kNN algorithms were applied on each dataset separately. The PCA was performed on the first three factors (components) which explain more than 97% of the cumulative variance in all levels. However, even though the third component was always above 97% cumulative variance, the main difference was observed in the variance of the first factor. As shown in Table 6, the variances of PC1 were 79.35, 75.58, 70.42, and 63.24% for L0, L1, L2, and L0x, respectively. Although L0x obtained the lowest variance in PC1, it provided the highest variance in PC2 (34.04%) and hence also the highest cumulative variance in PC2 (97.28%). Subsequently, kNN was applied on the PCA results and a visual presentation is given in Figures 7(a)7(d)).

Table 7 provides the kNN results for L0, L1, L2, and L0x datasets. M1 obtained the highest prediction in L0 and L1 with 97% and 100% success, respectively. M2 obtained the second highest prediction in L0 with 63.6% success and the M3 obtained the second highest prediction in L1 with 71.4% success. The poor results achieved in L2 are due to the small number of samples in the PCA space accompanied with k = 1 with prediction of 0%. The best overall prediction was obtained in L0x with 100% success for all rock types. The average predictions of the rock types in L0, L1, L2, and L0x were 72.5%, 57%, 0%, and 100%, respectively.

4. Discussion

4.1. The Essence of the Spectral Variability

The spectral variability is derived mostly from the mineralogical variability found on the rock surface. The assessment of variability is important for the classification of rocks, especially rocks that are similar in their average mineralogical and spectral composition but exhibit diverse spectral features on their surface. Moreover, understanding the spectral variability is also important for assessing the uncertainty that may be obtained when performing quantitative predictions. Therefore, finding the most precise method for identifying the variability is of immense importance for the purpose of classifying the different types of ore-bearing granodiorite and hence also a great economic impact associated with the excavation and exploitation of natural resources.

4.2. The Feasibility of Classifying the Rocks Samples

Any approach of classifying rocks, especially those with similar chemical compositions, involves some uncertainty. Moreover, in some cases the spectra of one type may match closely to spectra of another, presenting the possibility of faulty identification [46]. Longhi et al. [47] concluded that the spectral classification and the petrographic rock classifications do not necessarily match with each other and Sgavetti et al. [10] found that only limited number of basalt spectral classes which were measured in laboratory were comparable to field spectral classes. They also added that the usefulness of laboratory spectra was more related to absorption features assignment than endmembers for classification purposes.

The spectral variability of similar rocks types was examined by Mierczyk et al. [31] which found an overall accuracy of 63.5% using SAM. Schneider et al. [48] compared laboratory and field spectral images using several classification methods and found differences in the performance due to illumination conditions, calibration approach, and the presence of dust deposits. In addition, they conclude that the small-scale spatial variability in rock type/mineralogy within each geographical area affects the spectral variability. Pan et al. [49] claimed that the influence of spectral variability inhibits an appropriate quantitative assessment. Accordingly, the variations in the overall accuracies between L0 (68.1%) and L0x (100%) given in Table 7 complement their findings by demonstrating that spectral variability also prevents efficient classification, as shown in this study.

In this work, different methods for assessing the spectral variability were utilized which included a calculation of the mean and standard deviation spectra, SAM, ASDS, and PCA followed by kNN. The best performance was obtained by PCA combined with kNN for three principal reasons. (i) It is an unsupervised classification method which means it does not require any prior knowledge or a reference spectrum to perform the analysis. (ii) The classification was sufficient for the laboratory measurements (L0) with an overall accuracy of 68.1% and excellent for the field measurements with an overall accuracy of 100%. (iii) The power of kNN is reflected not only in the spatial distribution of the points but also in the value of the neighboring points; i.e., the classification threshold of any point is not quantified to the distance to another training point but to the average value of neighboring k points. This reduces misclassification if any abnormal points existing in the training dataset.

5. Summary and Conclusions

For this study, several methods were used to assess the spectral variability of three endmembers (M1-M3) both in laboratory and field measurements. The results obtained from the averaging method and the calculation of the standard deviation show that M1 has a unique spectrum which shows different spectral features than M2 and M3 such as weaker absorptions at 400–2300 nm and stronger absorption at 2350–2450 nm. Additionally, M1 shows the lowest SSD compared to M2 and M3 in both laboratory and field data. Moreover, the overall albedo of the spectral data measured in the laboratory was 30–40 percent higher than the measurements performed in the field which results in differences in the SSD. The differences in SSD in the field can be due to either differences in mineralogy or the size of rocks and shading and, with that saying, M2 and M3 show high SSD (>3) between 1600–1750 nm in the field. This is in contrast with M1 which does not exceed SSD value of 2.6.

The SAM method was assessed using both an internal reference (the average spectrum of each rock type) and an external reference (a spectrum equaling 1 in all wavelengths). The internal reference was used to evaluate the variability within each rock type as it uses the average spectrum of the same type. On the contrary, the external reference enables us to evaluate the variability in comparison to the entire dataset, including the measurements of other types.

The results of SAM using the internal reference show that the lowest value for L0 and L0x was of M1 and the lowest value for L1 and L2 was of M2. Moreover, the values for L0x were comparable with 0.017, 0.027, and 0.022 for M1, M2, and M3, respectively. Results of the external reference show that in both laboratory and field, the lowest SAM values were obtained for M1 in all levels while the highest was obtained with M2. However, no significant changes were observed with the levels. This indicates that the differences between the external spectrum and the spectra in each level remain the same.

The ASDS was assessed using the same internal reference as in SAM. The main difference between SAM and ASDS methods is the influence of the albedo. While in SAM, the albedo has no effect on the results, in ASDS it is the most significant factor.

From the PCA and kNN analyses, it can be concluded that the explained variability of laboratory measurements was high within the same type of rock. This allows an excellent prediction for M1 (97% accuracy) but lower prediction accuracies for M2 (63.6%) and M3 (56.9%). In contrast, the spectral variability in the field data between measurements of the same rock type was low while the spectral variability between the different rock types was high enough to allow a clear separation between the types which resulted in a prediction with 100% accuracy.

This study reveals the heterogeneity of the selected rock types by examining their spectral variability in six different databases both in laboratory (L0, L1, L2, and L3) and in the field (L0x and L3x) using several well-known statistical methods. Since rocks are an accumulation of minerals, it can be argued that as the spatial resolution increases, so does the spectral variability, because in laboratory data the mineralogical component and its variation on the surface of the rock are more strongly portrayed than in the field measurements.

The results of this study will serve as a background in future studies for the purpose of quantifying the chemical, physical, and mineralogical parameters of rock samples. Results can be utilized for choosing the wavelengths that show the minimum spectral variance for future statistical modelling and evaluate the proper method for a qualitative assessment and characterization of the rocks. In addition, it emphasizes the importance of field measurements for an accurate prediction.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest in this work.

Authors’ Contributions

Yaron Ogen is the main author of this manuscript and developed the research concept for this study and conducted the field and laboratory spectral measurements as well as all statistical analyses presented in this study. Michael Denk conducted the field spectral measurements together with Yaron Ogen. Cornelia Gläßer and Michael Denk were involved in developing the concept for this study and substantially contributed to writing the manuscript. The experimental site was set up based on a concept mainly designed by Holger Eichstädt with significant contributions from René Kant and Cornelia Gläßer. Rudolf Suppes, Ralf Löser, and Munkhjargal Chimeddorj substantially contributed to setting up the experimental site. Tugsbuyan Tsedenbaljir, Undrakhtamir Alyeksandr, and Tsedendamba Oyunbuyan contributed their geological expertise to this experiment and organized the provision of the rock samples for the field as well as the laboratory experiments.


The study is part of the ADRIANA project which was funded by the Client II Program of the German Federal Ministry of Education and Research (BMBF), funding CODE: 033R213B. The authors wish to thank the staff members from EMC, EIT, and GMIT, especially Mr. Tumendelger Batsuuri, Mr. Galsanjamts Otgonbaatar, and Mr. Tushig Zolboo, for their assistance during the field work.