Abstract

Due to its high spatial and spectral information content, hyperspectral imaging opens up new possibilities for a better understanding of data and scenes in a wide variety of applications. An essential part of this process of understanding is the classification part. However, the high spatial and spectral resolution also leads to enormous amounts of data. The effective handling and use of such datasets for classification requires processing steps (dimensionality reduction through feature selection or feature extraction) that are not always goal-oriented. In this article, a new general classification approach is presented that uses the geometric shape of spectral signatures instead of purely statistical methods. In contrast to classical classification approaches (e.g., SVM, KNN), not only are reflectance values taken into account, but also parameters such as curvature points, curvature values, and the curvature behavior of spectral signatures are used to develop shape-describing rules in order to use them for classification by a rule-based procedure with IF-THEN queries. The flexibility and efficiency of the methodology are demonstrated on datasets from two different application domains and lead to convincing results with good performance.

1. Introduction

Optical technology developments are extending the possibilities to better understand the world and its resources. Starting with images consisting of three color channels covering the visual electromagnetic spectrum, the developments since the late 1960s opened up the possibility of using spectral properties for identification of materials by using multispectral images with tens of channels. Especially with the developments in the last two decades, another enormous step forward was made and the low spectral resolution of multispectral images was overcome. With several hundred narrow channels, hyperspectral imaging (HSI) opens up completely new possibilities for analysis in a wide variety of application fields [1]. As examples, [24] use HSI to maintain and increase crop yields in precision agriculture. The evaluation of food quality and safety through the use of HSI is part of the research of [57]. Other applications of HSI can be found in medicine to diagnose diseases or to monitor wound healing [810], in the art market to verify the authenticity of artworks [1113], and in forensics to analyze crime scenes [1416].

The key technology behind those applications is the HSI with detailed spectral and spatial information, which makes the HSI a powerful information source for advanced classification methods like k-nearest neighbor, support vector machines, random forests, neural networks, and deep learning approaches. A comparison of these mentioned classification methods is conducted in [17] and shows that there is no classifier that consistently provides the best performance and that the quality of the classification result mainly depends on factors such as the availability of training samples, processing requirements, tuning parameters, and speed of the algorithm. Another aspect to be considered in the context of mentioned classifiers is the Curse of Dimensionality. If high dimensional HSI are directly used as input, the classification accuracy decreases, while the computational effort of the model tends to increase exponentially. To avoid this problem, a dimensionality reduction is essential [18]. The reduction of dimensions implies that algorithms automatically have to extract a set of characteristic spectral values, which have to represent the deciding features of the objects in question, from the entire course of a spectral signature. The automatic determination of the optimal number of relevant features [19] or the time required for example in the case of graph-based methods [20] is among the main problems. The amount of advanced existing studies (e.g., [1927]) proves that there is no existing satisfactory, robust, and reliable methodology and that this topic is one of the main open issues in spectral imaging. As a consequence, this recommended step leads to loss of information in the spectral space, because the selected bands do not give an accurate description of the original spectral signature.

Other classification approaches based on indices [2830] also arbitrarily select a subset of spectral values. One of the most popular is the Normalized Difference Vegetation Index (NDVI) from the field of remote sensing [31], which allows the classification of vegetation covered areas on the Earth’s surface by combining only a few bands of NIR and Red wavelengths. Another example of such an index is the Normalized Difference Plastic Index (NDPI), which combines shortwave infrared bands for the classification of plastic materials in urban areas [32]. By limiting to a small number of spectral bands in a restricted spectral range, broad classifications such as separation between plants, water, or urbanized areas can be made, but finer separation (e.g., between plant species or plastic materials) would be difficult due to the reduced number of spectral bands and contradicts the use of HSI, which offers the enormous advantage of high-resolution spectral information.

Both band selection methods and indices consider only a small set of characteristic values, while the shape of the spectra is not taken into account. A consideration of the shape of spectral curves as important information source can be found in [3335].

The work of [34] consists of a classification approach that fully utilizes the shape of spectral curves by using a code to parameterize the spectral curve shape. The research of [35] deals with the definition of analysis rules based on spectral features like band position, band depth, band width, and band asymmetry. These key parameters are used to describe the absorption features of spectra. Disadvantage of both works is the time required to develop the description parameters of the curve shapes. Both approaches use tables to store the parameters and a subsequent matching process between the parameters of the reference data and the spectra that need to be classified, which also makes the classification process a time-consuming task. The advantage of considering shapes, especially in combination with high-resolution spectral data, is clear: spectra express the mixed reflectivity of the elements that make up an object (e.g., molecules, pigments, cell structure, and water content), which is why each individual component has only a proportional influence on a spectrum. Changes in the composition of the elements then mainly change the mixture of all spectral contributions, resulting in local or regional variations and changing the shape of a spectrum.

In this article we follow the shape-based works of [3335] and propose a new rule-based classification method using shape-based properties of spectral curves like curvature points, curvature values, curvature direction, and spectral values.

Particular attention should be paid to the following points:(i)The formulation of the rules should not become complex(ii)The classification process should not be time-consuming

Keeping these points in mind, an approach was developed that allows establishing rules, using any kind of logical elements, in a straightforward manner. The establishment of rules implies existing knowledge. A prior analysis of the data and the acquisition of knowledge enable a better understanding of the data and allow structuring and simplifying a problem. Research papers from the field of remote sensing that use the advantage of knowledge can be found in [3641]. The experimental part for the evaluation of the method was performed on two different datasets. A detailed description of the method and the used datasets are part of the following section.

2. Materials and Methods

2.1. Spectral Data Set Acquisition

The hyperspectral systems used in this work are two pushbroom cameras from Specim Ltd. (Oulu, Finland). The Specim FX10 captures the spectral signature from 400 nm to 1000 nm (233 bands), while the Specim FX17 captures the spectral signature from 900 nm to 1700 nm (229 bands). All spectral images acquired by these cameras are radiometrically normalized by using dark reference images for dark-current (closed shutter) and a white reference image to reduce the influence of the intensity variability. For the white calibration a 99% reflectance tile was used. As shown in Figure 1, a set of different HSI consisting of two different object types are used to demonstrate the proposed method.

The first dataset was captured with the Specim FX17 and shows an image of seven classes of different plastic types. It has a resolution of 661 × 500 pixels and 229 bands with a spectral range from 900 nm to 1700 nm. The second scene was captured with the Specim FX10 and the Specim FX17 and shows ten classes of different plant types. The HSI of both cameras were combined and have a resolution of 1220 × 640 pixels and 462 bands covering the spectral range from 400 nm to 1700 nm.

The HSI with different plastic types is used to demonstrate the basic functionality of the approach and gets more attention due to existing ground truth (the datasets can be found on https://doi.org/10.5281/zenodo.5068201). As part of a waste sorting application, the demonstration also covers experiments with real waste consisting of plastics and electronic waste like printed circuit boards (PCB). The plant-based dataset, on the other hand, is used to demonstrate the potential and flexibility of the approach regarding different kind of applications.

2.2. Methodology

Spectral signatures offer the ability to distinguish between materials and are the result of reflected light from the surface, which is captured within a broad electromagnetic spectrum. The reason for different signatures is a combination of molecules in materials and the morphological structure. Different molecules result in different spectral signatures. The morphological structure, on the other hand, is important because of the resulting light path that is created by reflection, absorption, transmission, and deflection from the different components of an object. That is why plants show different spectra when the cell structure changes, e.g., due to stress or aging. Both differences in molecules and structure result in different shaped spectral curves. The idea of our work is to describe the spectral curve shape by a combination of spectral values and shape-based parameters, to use this knowledge for the formation of rules.

Changes in the material composition of objects lead to local or regional changes in the course of spectral signatures. These changes inevitably lead to changes in the curvature behavior, what makes the curvature κ to a significant parameter for the shape description and the modelling of spectral changes.

Mathematically, the curvature is the change of a curve that occurs when the curve is traversed and can be expressed in parametric form for each point using equation (1), where points refer to derivatives.

While the curvature of a straight line is zero everywhere and the curvature of a circle is equal at all points, the curvature for all other curves changes from point to point and indicates how strongly the curve at a point deviates from a straight line. Thus, for the description of spectral shapes, we use the following properties of curvature:(i)The dimension of the current rate of change of the direction of a point moving on the curve: the greater the curvature, the greater the dimension of change(ii)The behavior of the curvature: if the curvature value is positive, it is called a convex curve, and in the case of a negative curvature value, it is considered a concave curve.

As shown in Figure 2, extreme values of the second derivative are used to select significant parameters. The combination of these parameters with selected spectral values allows a precise description of the shape of spectral curves using a few selective spectral bands. The base for the calculation of the curvatures is preprocessed spectral curves. The preprocessing consists of a smoothing and a subsequent step of Continuum Removal. A schematic representation of all essential steps can be found in Figure 3.

Continuum removal is a normalization procedure which allows a better quantification of absorption peaks after removing the overall concave shape of spectral curves [42, 43] and is illustrated in Figure 4. Due to the particularly highlighted absorption bands, rules based on curvature values can be developed much more efficiently. An example for continuum removed spectra and the calculated curvatures is shown in Figure 5. Depicted are continuum removed spectra for two different plastic types (PE, PS). In addition to the course of the spectral signature, the calculated curvature values are shown as positive and negative vertical lines in green and red color. The longer the line, the stronger the curvature at the respective band.

The curve behavior is represented by red and magenta colored dots. These points are maximum and minimum points and are automatically determined by the local maxima and minima of the second derivative. It helps to distinguish between concave and convex curve behavior, providing an important source of information for describing the shape. For the subsequent formation of rules, mainly the red lines are used, since these reflect both the concave and convex behavior and a significant change in curvature. All other curvature values (green lines) are not considered in the rule formation. To ensure that only significant changes in the curve are captured, a threshold is set for the selection of the relevant curvatures (red lines). The setting of the threshold mainly depends on the curve shape. The smaller the threshold the more detailed the description of the curves. However, for highly variant spectra (Figure 6) higher threshold is sufficient. The examples in Figure 5 show the result of selected red lines for a threshold of 0.1. Building on these extracted parameters regarding the spectra for each material, a collection of conditions is formulated and used for the classification. It should be mentioned here that, in addition to the used local shaped values, other rules can also be integrated. For example, rules are describing global shape effects (e.g., the expressivity of the green peak for plants) or adding relations of averaged reflectivity in different regions as indices like NDVI do.

One of the main advantages of rule-based classifier is the simplicity. Once the knowledge on which rules are based has been worked out, conditions can be easily set up. Further advantages are the performance, the ability to handle redundant and irrelevant attributes, and the flexible extensibility of rule sets [44]. Acquiring knowledge can seem effortful, but in view of the resulting advantages, it should be seen as a clear benefit which allows structuring and simplifying problems by using expert knowledge. A comparison of deep learning and a knowledge-based method can be found in [45] and show that a rule-based method can even be better as machine learning-based methods. For better illustration of rule formation, we refer to the spectral curves of PE and PS from Figure 5 and express them as shown in Table 1.

The parameter CV stands for the curvature value at a specific spectral band and a positive or negative threshold is used to distinguish between convex and concave curve behavior. All conditions with a negative threshold will capture the downward red lines (concave behavior), while all conditions with a positive threshold will capture upward red lines (convex behavior). As already mentioned, for rule formation mainly the bands with high curvature are taken into account (red lines). Nevertheless, this does not mean that all bands are really necessary for the rule formation. By analyzing the data in advance, it is possible to achieve a clean classification even with a smaller number of selected bands. Therefore, only six conditions are defined for sample PS in Table 1, while in Figure 5 a total of seven red lines are present. The continuum removed spectra and rules for all other existing plastic types in the used dataset are listed in the Appendix (Figures 715). The corresponding spectral signatures are also shown in Figure 6. As an additional condition the continuum removed reflectance value (CRRV) could be used, if it is not possible to sufficiently distinguish the shape. For instance, to separate PF-black from all other available plastic types.

3. Results and Discussion

Once the rules are established, the next step is to apply them. This involves a pixel-by-pixel processing of the corresponding dataset and a check of the rule conditions for each individual pixel. If a condition applies, this pixel will be assigned to the appropriate class.

3.1. Classification Results for Plastic Samples

Applying the developed rules in the Appendix to the dataset consisting of plastic samples in Figure 1, results shown in Figure 16 were obtained. The result of the classification shows that the individual samples were classified correctly. A closer look at the border areas of the samples shows that a shadow effect occurs and that this effect can influence the quality of the classification. This shadow effect occurs especially with samples (e.g., PS) that do not lie in a planar position.

Another point to consider here is the classification of PF samples. The continuum removed reflectance spectrum illustrated in the Appendix does not correspond to the real spectrum of the material PF. The PF samples used here are composed of black colorants. A well-known problem is the strong absorption of black colorants like carbon black. Due to the strong absorption of light from the UV to the NIR there is no reflected light that can be detected by the sensor and thus no spectral information that can be used for a classification [46, 47]. The rule-based approach presented here offers the advantage that even in the absence of reflectance rules can be established based on the existing limited information, which permit such a classification. In the case of these samples, it means all black plastics will be classified as PF-black. It must also be noted in this example that the samples used have clean, homogeneous surfaces and the resulting spectra have a correspondingly high degree of shape similarity. However, considering applications in the field of waste sorting, it is more common to work with dirty and damaged materials.

Therefore, for the evaluation of the methodology, a dataset (1253 × 578 pixels and 229 bands) with real waste consisting of plastic parts (objects 1–19) and circuit boards (objects 20–27) was processed additionally. The classification result for the real waste dataset is shown in Figure 17. The objects in the dataset were chosen randomly from a collection of different plastic parts. Therefore, it is not surprising that certain materials (e.g., PMMA, PVC, PE, and UP) are not present. In addition to the plastic types mentioned so far, polypropylene (PP) and acrylonitrile butadiene styrene (ABS) were identified on the basis of spectra from [48, 49] and formulated as rules. Furthermore, an additional rule was established for the classification of printed circuit boards. A look at the classified plastic parts shows that the nonhomogeneity of the surfaces is partly reflected in the results in form of unclassified or misclassified pixels. Nevertheless, it can be stated that the classification was generally successful by using the developed ten rules. The black plastic parts (objects 1 and 17 in Figure 17) that are missing were not modelled in this example, because the used background also consists of black plastic and a separation in this special case proves to be difficult. Also, since the spectra of object number 5 in Figure 17 could not be assigned to any material, no rule for classification was established.

An important step in the recycling process is the separation of plastics and electronic waste. In particular, the high variation of different material compositions from which PCBs are made makes separation difficult. However, the example shown here also demonstrates that prior analysis of data and the use of rules based on acquired knowledge can lead to a more efficient recycling process. Due to the already mentioned high variation of PCBs and the composition of different materials (e.g., board, conductors, solder joints, resistors, and capacitors), a holistic assessment of a PCB is difficult to implement.

Nevertheless, it is evident in this example that the board has been identified in all cases and, as expected, only the areas with conductive tracks and metallic or electronic components have not been assigned and consequently ended up in a category with black plastics and the background. A significant difference to the samples shown in Figure 16 is the heterogeneity of the surfaces. Differences in depth, shadow effects on the surface, and dirt lead to a very high variability of spectral signatures. As an example, a region of object number 12 in Figure 17 was selected. In this area, differences in depth, shadow effects, and soiling can be found. A representation of the high variability in form of spectral shifts is shown in Figure 18. The shape-based classification result for object number 13, however, shows that a satisfactory result was achieved despite the high variability. The reason for the robustness of the proposed method is that even if the spectral values change and result in spectral shifts due to certain factors like dirt or shadow, the geometric shape remains stable.

The processing time using MATLAB on a machine with an Intel(R) Core(TM) i7-10700 CPU @ 2.90 GHz and 16 GB RAM was 136 seconds for the real waste dataset and 44 seconds for the plastic dataset from Figure 16.

In order to compare the quality of the proposed method to standard methods, we decided to use one of the most commonly used pixel-based classifiers. Therefore, the dataset shown in Figure 16 (plastic samples) and the dataset shown in Figure 17 (real waste samples) were processed using a supervised multiclass SVM classifier (C-SVC, One-Versus-All), which is based on the selection of a training set to obtain a model and the subsequent application of this model for the prediction of classes. The dataset with the real waste was limited to the plastic objects. As already described in [17], the quality of the results strongly depends on the chosen training samples and parameterization. For this reason, three different kernels, namely, Radial Basis Function (RBF), linear, and polynomial, with different parameters have been tested on the datasets. Due to the multicategory classification, a classification threshold is used, which is the maximum distance to the hypersurface for conducted classification to a specific class. The parameters for the kernels and the classification threshold were obtained by testing and checking the results. Because of the homogeneity of the plastic samples, one material sample of each material class was manually labelled and used as training set. A description of the datasets can be found in Table 2. The best results for the plastic dataset using different kernels and classification thresholds are shown in Figure 19.

The accuracy of each classification method compared is reported in Table 3. In principle, the results of this homogeneous and optimal dataset are comparable for most of classification metrics used. Nevertheless, significant differences can be seen in the resulting images, especially at the edge of objects, in form of unclassified or misclassified pixels. As already mentioned, this is due to a shadow effect that locally occurs in some areas, which impacts mainly the quality of the SVM methods. While the rule-based approach also classifies these edge areas and some shadow parts as corresponding material, the SVM results in misclassifications, especially using linear and polynomial kernels.

Compared to the plastic dataset, the real waste dataset has a typical situation consisting of objects with a high degree of variety. This not only increases the effort required to train the model, but also makes the training process more difficult. Small subsets of the individual samples were manually selected as training set. This strategy was considered to be appropriate due to the strong differences between the plastic samples. Thus, when selecting the subsets, special care was taken to ensure that particularly critical areas, such as shadow areas caused by depth, dirty areas, and damaged surfaces, were also covered in the training set. The best resulting class-labels are presented in Figure 20 and show also the effect of different classification thresholds. The corresponding numerical values are listed in Table 4.

In principle, also for this dataset it can be stated that, despite the challenging objects of the real waste dataset, a high degree of accuracy can be achieved for both methods. The best SVM result was obtained with an RBF kernel consisting of a C-Value of 0.2 and a classification threshold of 0.3. In comparison, the rule-based method produces slightly better values for the different metrics. Also noticeable is the more homogeneous representation of objects in the resulting class-labels in Figure 20. The reason for this is again the robustness against spectral shifts. While SVM is based on pure reflectance values, the shape-based approach only needs to consider the geometric shape of spectral signatures. One of the advantages of the shape-based approach is that, theoretically, only one spectrum per material is required for the rule formation. When using an SVM, it is necessary to ensure that, in case of high variability datasets, all data representing the spectral variations of one material class are included in the training set. Fulfilling this requirement can be a challenging task.

3.2. Classification Results for Plants

Compared to the dataset consisting of plastics and electronic waste, spectral signatures of plants often show a very similar curve shape. This fact generally complicates the classification process.

To illustrate the ability of classifying even in such difficult cases and the flexibility of the methodology presented in this work, rules were developed to distinguish between plant species based on the spectral signatures shown in Figure 21, which are coherent with spectral signatures provided in [50]. Due to the similarity of the shape, in this particular case, rules have been formulated using mainly Continuum Removed Reflectance Values (CRRV) as well as curvature parameters. This involved using the entire spectrum from 500 nm to 1700 nm to work out fine differences regarding the water content, the cell structure, and the pigments of used plants. In this context, it must be taken into account that the spectral properties of a plant species can also vary, as spectral signatures will be affected by factors such as species, variety, age, internal cell structure, environmental conditions, chemical composition, and nutrient content [51]. An example for such a variation is given in Figure 22 for moss. Although not implemented in this work, one possible way to deal with such heterogeneous spectra could be the addition of rules based on texture features or other local image features [5254].

A total of six rules were developed to separate the ten samples in groups of moss, lichen, red sedum, green sedum, Geranium robertianum, and green leaves. The results in Figure 23 show that, in principle, a classification between different plant species is possible. For example, it can be clearly seen that red sedum is distinctly different from the other plant species due to its coloring. Green sedum also differs based on the cell structure, whereby the properties of this plant seem also partially reflected in moss, which can be relatively well distinguished from the other plants due to the spectral signatures in the NIR and the low reflectance around 550 nm. Furthermore, it was possible to classify the lichens in object 10.

The very strong spectral similarity between the leaves does not allow a clean separation between the individual species. Only Geranium robertianum differs due to its low reflectance in the range around 550 nm. In this context, it is important to note that the spectral signatures shown in Figure 21 only reflect the spectral signature of one pixel and that the recognizable differences, which refer to reflectance values or shape, are not consistently present due to the high variability within a plant species. Nevertheless, it can be stated also for this example that a rule-based approach is a flexible method and, depending on prior analytical efforts, has the potential to provide useful results.

4. Discussion

Results of a rule-based classification approach were presented, based on the shape of spectral signatures. The rules are established in a supervised way and are based not only on spectral values, but especially on parameters that describe the geometric form of a spectral signature. These parameters include the automatic determined curvature points (i.e., specific spectral values obtained through the 2nd derivative), curvature values, and curvature behavior. The effectiveness of this method was demonstrated with different datasets from completely different fields of application. In particular, the separation of materials with significant geometric differences in the course of the curve leads to convincing results, which is reflected in the Overall Accuracy (OA) of the plastic dataset with 96.94% and the real waste dataset with 98.42%.

An essential advantage over classical classification approaches is the possibility of describing the course of a spectrum in detail on the basis of a few selective parameters. Classic approaches do not require a previous analysis of the data and offer the advantage of automated processing based on the actual data. The prerequisite for this is, on the one hand, the availability of a large number of training data, which is not always readily available and could be a very time-consuming task (e.g., annotation), and preprocessing in the form of a dimensional reduction. This reduction is a selection of informative bands and often leads to a loss of information within the spectral signatures, because it is based on pure statistical reduction. With the method presented here, the finest changes in the course of the curve can be identified and modelled by using theoretically only one spectrum instead of a large amount of training data. The prerequisite for this is a previous analysis of the spectral signatures and the establishment of the rules based on expert knowledge. This process may be considered time-consuming, but considering the parameters used, it does not require a particularly high investment of time.

An automated generation of the rules would also be conceivable, since the essentially used parameters such as curvature value, curvature point, and curvature behavior are determined by simple mathematical methods. A disadvantage that can arise with the automatic formulation of rules is that the number of rules increases with the number of categories. This can lead to complex rules that conflict with each other. Modelling the rules based on expert knowledge can avoid this by understanding the interaction of the factors, which makes it possible to formulate the simplest concept that gives the best results. Basically, when dealing with spectral data, it can be stated that the number of categories that can be classified from spectral bands is small enough that a limited number of rules can be used. The analysis of spectral signatures in advance has the further advantage that there is no need to use the entire spectrum for classification. If significant shape differences are detected within a limited spectral range, it is sufficient to consider only this range in order to establish the rules based on it, which leads to an increase in performance.

While spectral values can vary strongly depending on different factors, differences in the shape of spectral signatures are only due to differences in material composition. This fact is the reason for the promising results and confirm our expectations of the proposed analytical method, which is robust to spectral variations and allows an unambiguous description of the spectra based on the basic idea of using the shape of the individual fingerprint of different materials or objects.

Furthermore, the processing time must be mentioned. The pixel-by-pixel processing and checking for conditions using MATLAB on a machine with an Intel(R) Core(TM) i7-10700 CPU @ 2.90 GHz and 16 GB RAM takes 113.42 seconds for the plant dataset (1220 × 640 pixels and 462 bands) shown in Figure 23. In comparison, the shape-based classification in [34] for a dataset of 512 × 512 pixels and 6 bands takes about 9 minutes (computer configuration: CPU 2.93 GHz and installed memory 4.00 GB). Another comparison with a research using a classic classification method like SVM also shows a better performance. The processing time for a comparable dataset of 1168 × 696 pixels and 520 bands takes 2980.21 seconds using 30 bands and 9381.37 seconds using all bands of the dataset (computer configuration: Intel Core i7-6800k 3.40-GHZ CPU and installed memory 64 GB) [55].

5. Conclusions

Considering the shape of spectral signatures and describing them by curvature parameters and spectral values proves to be a convincing supervised method for classifying diverse groups of materials from different fields of application. Further evaluation on publicly available datasets, e.g., from the field of remote sensing, also with respect to other classification methods, would be of interest. A possible disadvantage of this methodology could be the necessary prior analysis of the data. This circumstance could be eliminated by automation. Especially in the case of significantly different spectra, like the plastic samples, automation is certainly feasible and would bring the benefit of an unsupervised method.

Data Availability

The HSI datasets used to support the findings of this study have been deposited in the Zenodo repository (https://doi.org/10.5281/zenodo.5068201).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the European Union from the European Regional Development Fund and the State of Rhineland-Palatinate. The authors thank Pellenc ST for supplying the real waste materials.