Abstract

Imaging spectroscopy in the remote sensing is an ever emerging platform that has offered the hyperspectral imaging (HSI) which delivers the Earth’s object information in hundreds of bands. HSI integrates conventional imaging with spectroscopy to get rich spectral and spatial features of the object. However, the challenges associated with HSI are its huge dimensionality and data redundancy that requests huge space, complex computations, and lengthier processing time. Therefore, this study aims to find the optimal bands to characterize the roof surfaces using supervised classifiers. To deal with high dimensionality of hyperspectral data, this study assesses the band selection method over data transformation methods. This study provides the comparison between data reduction methods and used classifiers. The height information from LiDAR was used to characterize urban roofs above the height of 2.5 meters. The optimal bands were investigated using supervised classifiers such as artificial neural network (ANN), support vector machine (SVM), and spectral angle mapper (SAM) by comparing accuracies. The classification result shows that ANN and SVM classifiers outperform whereas SAM performed poorly in roof characterization. The band selection method worked efficiently than the transformation methods. The classification algorithm successfully identifies the optimum bands with significant accuracy.

1. Introduction

Urban environments are typically complex and heterogeneous structures and challenging to map their ecological conditions [13]. The distribution of varied urban surface materials has an impact on the surrounding environment [4, 5]. This has requested the detailed accounting of surface materials and its properties, such as magnitude, abundance, location, geometry, and spatial pattern [6]. Additionally, the high diversity in urban environments has attracted the significance for monitoring and updating information on a changing urban world [7]. Urban surface materials show high variability in their properties for color, coating, degradation, and illumination of material [8]; moreover, the highest variability was found in roofing materials [9]. The mapping of roof materials has been exciting due to the challenges occur in roof mapping such as the high variability of roof surfaces (shape, slope, and texture), complex structures on roofs (solar, heating, and air-conditioning installations), and the wide range of roofing material available on the market (metal, plastic, ceramic, etc.) [10, 11].

In remote sensing, imaging spectroscopy (IS) is also referred as hyperspectral imaging (HSI), which integrates conventional imaging with high spectral and spatial information of object. Hyperspectral remote sensing is economic than conventional surveying since it allows to study vast land surface in less time and relatively less expensive [2]. Initially, IS in remote sensing was only limited to multispectral imaging. The complex urban area requires high spatial and spectral detail of an object [12]. The urban mapping was at low pace due to lower spatial and spectral dimensions of multispectral imageries [9]. The advent of airborne HSI sensors has raised new opportunities in urban remote sensing applications, due to a combination of high spectral and high spatial resolution in HSI images [3, 9]. HSI sensor stores information of a pixel in hundreds of spectral bands which enables the accurate identification of objects or classes which have similar spectral characteristics [13]. HSI collects a wide continuous narrowband reflectance information across the electromagnetic spectrum precisely 400–2500 nm which provides higher separability features for characterizing complex urban objects [9, 14]. Increasing spectral and spatial image resolution allows features to be segregated, and more researchers have been successful in using various spectral indices and spectral unmixing methods to isolate features [15, 16]. Collectively, HSI has been advantageous from the last decade for the superior level of spectral detail it provides compared to the multispectral imaging [10, 17].

Although HSI provides higher feature separability qualities, it poses greater demands on the storage and processing environment, due to its greater number of bands [15]. Moreover, adjacent bands show an inherent redundancy in pixel properties [18]. Increasing the number of bands does not proportionally increase qualitative information; therefore, HSI encounters the Hughes phenomenon. That is, for a given training sample, the accuracy reaches to a certain threshold number of bands, and further accuracy decreases as the number of bands increases [19, 20]. Hence, HSI’s dimensionality increased both the computational burden and the need for efficient data reduction methods. Most remote sensing applications do not use all the available spectral bands. Moreover, the customized remote sensing applications seek to retain the relevant and vital information for the classification [21, 22]. The hyperspectral data also possess redundant information in its adjacent bands which can raise computational burden and complexity [18]. Therefore, dimensionality reduction methods such as feature selection and band selection methods become essential and popular.

A significant increase in the number of features of the urban area increases the complexity of classification; therefore, it demands the platforms which has diverse sensor technologies [23, 24]. Moreover, it is obvious that no single technology is competent for reliable image interpretation [25]. Fusion technology integrates the meaningful information by combining (fusion) images from two or more different platform sensors which collect diverse information of an object [26]. The fusion of two or more datasets such as hyperspectral and LiDAR which are collected from different platforms increases the separability of features [27]. LiDAR data has been popular for the delineation of the surface geometry of objects; moreover, it can be used in conjunction with another sensor platform [6]. Past studies are suggestive of using LiDAR features that can improve the accuracies when used in combination with hyperspectral data [2629]. An increase in the accuracies was found by utilizing LiDAR and hyperspectral data; Alonzo et al. [25] found nearly by 4.2% increase. Onojeghuo and Blackburn [30] found to increase the accuracy by 11% using LiDAR-generated mask. Specifically, the urban studies where height information from LiDAR found complementary to hyperspectral data for better discrimination of objects [1, 12, 27].

Standard approaches in urban classification usually follow field surveys and visual interpretation of aerial photography to create land-use maps and urban inventory [10]; however, they are time-consuming. Airborne remote sensing has brought solution by practical, affordable, and rapid mapping of urban areas [9]. Numerous studies are available for estimation and mapping of impervious urban areas using remote sensing methods; Weng [6] has presented a comprehensive review of remote sensing methods for the classification of impervious urban surfaces. Studies characterizing urban surfaces using spectral information of objects are popular (to mention a few, [9, 10, 3133]). Initially, a high-resolution multispectral imaging has been used to discriminate urban surfaces using object-based image classification [34, 35]; however, further development in the hyperspectral sensor has raised the discriminating power of an image [31]. HSI suffers from autocorrelation and data redundancy in the adjacent wavebands [36, 37], which leads to needing for optimal selection of bands and a powerful classifier. Several studies are available which deal with per-pixel, subpixel, and object-based algorithms [6, 38], but fewer studies have addressed the configuration of a spectral subset (optimal bands) in urban roof classification.

The identification of a spectral subset has been an effective strategy [17], which optimizes the roof discriminating process without losing the spectral content of the object. Therefore, this study aims to assess the optimum number of bands to be used in the classification of urban roof surfaces using an advanced set of supervised classifiers. The study proposes band selection method that uses the magnitudes of band loading-derived component analysis. Dimensionality reduction methods and supervised classifiers were used to find optimal bands for successful characterization of roof surfaces. This study compares the efficiency of the data reduction methods and supervised classifiers (SAM, ANN, and SVM).

2. Material and Methods

2.1. Study Area

Bialystok is the largest city in northeastern Poland (53°08N, 23°10E) (Figure 1). The city accommodates the variety of urban roof surfaces in buildup, old town, and suburban area. Bialystok city consisted of parks, palace, church, buildings, numerous types of roads, pavements, river, lakes, canals, grasslands, concrete, and historical old town. The climate is continental (Central Europe), and the mean annual precipitation fluctuates around 580 mm/year. The flight line (study area) runs diagonally through Bialystok city and covers a length of 8.2 km and a width of 0.5 km; due to uneven urban structures and sun’s illumination angle, the image suffers from shadows (Figure 1).

2.2. Field Data Collection and Polygons

The field data was collected nearly the same time with remote sensing data acquisition, that is, 20 July 2015. We recorded roof surfaces on Google Earth images (hard copy) of the study area. The clearly visible roof surfaces were recorded. Predominantly, roofs were consisted of metal and ceramic material coated with different colors. The asbestos, copper, and white roofs (plastic roofs of petrol filling stations, big umbrellas of restaurants, and plastic covered construction building) were rare in the region. From fieldwork, we identified 15 classes encompassing all roofing types in the area. The roof classes were as follows: copper, asbestos, metal slate (uncolored metal sheets), new red tiles (ceramic), old reddish tiles (ceramic), black tiles (ceramic), brown metal (different brownish metal shades), reddish metal (a large variety of dark reddish materials), yellow (color) metal roof, green (color concrete) roof, blue (color) metal roof, concrete (gray color) roof, white/plastic roof, bitumen, and bitumen-asphalt mixes.

For roof surfaces like copper, it was difficult to obtain a sufficient number of training and test data whereas the red ceramic tile was widely used for roofing which shows a wide range of spectral reflectance due to ageing and rusting; therefore, it was found difficult to encompass red ceramic roof into a limited number of training pixels. Similarly, for red metal roofs, approx. 20 shades were found due to color coating, ageing, and rusting. Numerous types of shades were found in the roofing of bitumen and asphalt due to a wide-ranging percentage of the material mixtures.

Field data was divided into training and testing polygons by using the random sampling method. This is the most feasible and practical sampling method due to the fact that the distribution of classes was not uniform across the study area. In data collection, the polygon size was dependent on roof size as the identified roof of a building was marked as a polygon. However, during the actual selection of test and verification data, only a few pixels were selected from a single roof and not the entire roof. In order to encompass all spectral variations within a single class, various shades of one roof class were selected from the study area. For the best practices, samples were selected from all over the study area for better representation of class over the study area. From field data, nearly 195 roof locations were noted. After noting the field data on remote sensing image, spectral signatures of the same material were compared with the available spectral library to avoid misclassification.

The standard spectral library which encompassed in ENVI software was used for comparison to identify the accurate class. Similarly, nonvisible roofs (tall building roofs) were identified by comparing the spectral signature of the material with the available reference spectral library of ENVI. From the collected field measurements, the pure pixels for training polygons and test polygons were selected. The polygons were drawn homogenous covering the whole scene (hyperspectral image) for better generalization of algorithm. For each class, an average of 634 training pixels was used to build each training class. In order to build more suitable classes, we associated real-world roofing with the spectral properties of the hyperspectral image.

2.3. Remote Sensing Data and Preprocessing

The HySpex image was acquired by MGGP Aero Company, on 15 July 2015 using a HySpex scanner. HySpex spectrometers are designed to collect a total of 451 spectral bands in two ranges of electromagnetic frequencies; VNIR (400–1000 nm) and SWIR (930–2500 nm). The HySpex scanner mounted with pushbroom scanner which records VNIR and SWIR regions in spatial resolutions of 0.5 meters and 1 meter, respectively. The MGGP Aero delivered HySpex image after an initial preprocessing of the image such as basic orthorectification, atmospheric correction, and resampling to get uniform dimensions. For instance, in order to get uniform spatial and spectral dimensions across the whole electromagnetic spectrum, the VNIR image was resampled to get the HySpex cube of a 1-meter uniform spatial resolution. Atmospheric correction of HySpex was done using ATCOR4 software and compared with standard ground reflectance measurements. ATCOR4 also does orthorectification to eliminate image distortions using digital surface model. This study has used the LiDAR data which was collected from the National Geodetic and Cartographic resources, at a density of 6 pulses/m2. ALS data was processed to generate the digital surface model with a cell size of 0.5 meters. Then, DSM was also resampled to a 1-meter resolution, and this facilitates to make data fusion with HySpex data as fusion requires to have same spatial dimensions for an effective overlap.

2.4. Methods

The aim of finding the optimal bands for classification of the roof was achieved using dimensionality reduction methods and classifier’s performance. This study proposes a specialized band selection method for dimensionality selection and compares it with conventional data reduction methods. A supervised spectral and pixel classifiers were used in this study. The main processing in this study comprises the dimensionality reduction, data fusion, and classification process.

2.4.1. Dimensionality Reduction

The available dimensionality reduction methods in remote sensing can be divided into two methods: the first is band selection method which is also called as band prioritization or band subset, and second is feature extraction method which is referred as transformation method [39]. Band prioritization excels over feature extraction methods, as the band selection methods retain original physical properties of an object which contain abundant information of object properties. Precisely, the band selection method allows the retaining of target-oriented information in meticulously selected bands. In contrast, feature reduction methods are essentially a data compression method which alters the raw image data into a transformed image. Essentially, transformation methods convert raw image data into a new image of that has variance information in terms of eigenvector magnitudes [40].

Initial data reduction was performed using a visual interpretation of HySpex image; noisy and grainy bands were eliminated from processing. Precisely, the last five bands (447–451) suffered seriously from the physical sensor-noise; therefore, these bands were excluded from further processing. Furthermore, the physical subset of interpolated water-vapor bands was taken because interpolation bands were estimated from sensor-acquired bands. Therefore, a spectral cube of 303 bands was preserved by eliminating 143 water-vapor bands from HySpex bands. Further dimensionality reduction was done by using conventional compression methods such as principal component analysis (PCA), minimum noise fraction (MNF), and the proposed band selection method using PCA loadings (BS-PCALo).

(1) Principal Component Analysis (PCA). PCA has been used as a well-established image compression method [41] preserving the image’s total variance in the process of transformation and minimization of mean square approximate error [29, 42]. It uses the second order statistics and a covariance matrix by projecting data into an orthogonal space to measure the highest eigenvalues which correspond to the maximum variance of the uncorrelated linearly transformed data [19].

(2) Minimum Noise Fraction (MNF). MNF transform is also well-known feature extraction and denoising method [43], also called as a noise-adjusted principal component [44]. Basically, MNF is an advanced form of PCA as MNF functions better for denoising in the variance matrix [43]. Just like PCA, MNF also transforms original data into a new dimensional space whose eigenvectors are orthogonal to other eigenvectors. MNF has been used by many hyperspectral and multispectral studies to reduce the data dimensions [17, 43, 4547]. ENVI 5.2 was used to perform PCA and MNF transformations.

(3) Band Selection Using PCA Loadings (BS-PCALo). The proposed band selection was executed in the R (R programming software version 3.2.3) environment using the set of extensive scripts utilizing PCA libraries [48] and varimax rotations. The primary output from PCA scripts results in eigenvalues and eigenvectors for each component. The process resulted in three principal components with having higher eigenvalues. These eigenvalues contain the percentage of original data (bands) captured in that component. Band loadings are the coefficient between each variable (band) and any component; therefore, information of the band loading can be used to find the best bands [40, 49]. In order to identify the best bands, we decomposed principal components into their band loading values. The band loading magnitudes represent the contribution of bands in the component; moreover, algorithm scrutinizes and selects only the bands which have high loading magnitudes [50]. Therefore, a relevant band number of high loading values were identified using disintegrating component information.

The conventional PCA and MNF use all the HySpex bands to input whereas this band selection method utilizes a digital number (DN) of HySpex bands where the training polygons are located. We plotted training polygons on the HySpex image (303 bands) in ENVI 5.1. Then, the training polygons were exported in .csv file format which contains only information of DN of the pixel and no information about the location and class detail. This method also proposes to use polygons instead of using the whole set of bands to dimensionality reduction process. Since training polygon contains only the target information, the algorithm will find only relevant spectral bands. Moreover, only relevant variances in the polygons of each band have been recorded in the component. We processed 303 bands to get PCA- and MNF-transformed bands in Envi software, in order to compare the efficiency of transformed bands with selectively chosen spectral bands, that is, BS-PCALo bands.

2.4.2. Feature Enhancement: Fusion

HySpex and LiDAR can be the fundamental blocks for quantitative and morphological characterization of urban surfaces using the fusion method [6]. The fusion of different platform sensors that use different physical principles to record diverse properties of an object is advantageous [15]. The high complexity of roof surfaces requires the integration of multiplatform sensors for successful feature classification [1]. LiDAR data has gained popularity for delineation of the surface geometry of objects [51] and can be used in conjunction with another sensor platform using a fusion method, which works complimentarily to the radiometric property of hyperspectral data [15]. Additionally, LiDAR benefits from a low processing requirement and high spatial resolution [6, 7]. LiDAR data has proven competent in improving up to 25% accuracy in differentiating surfaces by means of height [4].

This study area contains asphalt roads and asphalt roofs which possess similar spectral reflectance; therefore, it is necessary to make a distinction between roof and ground surfaces. Since we are interested in roof classification, we masked ground features using LiDAR data. The HySpex and LiDAR were coregistered by resampling images to the same spatial dimensions and registering them on the same geographical coordinate system. Moreover, the geographical location of features was tested for building edges and found an excellent overlapping with roof and edges. The LiDAR data was used to build roof mask at the height of 2.5 meters above the ground. The vegetation was masked using HySpex NDVI values (Figure 1(b)). Shadows have a negative influence on the classification process, due to different sun’s illumination angle with roof surfaces create heterogeneous shadow patterns that make classification erroneous [52]. Therefore, the infrared region band whose electromagnetic frequency of 1240 nm was used to mask shadows and water since both show the darkest pixels in the band.

2.4.3. Roof Surface Classification

Roof surface classification was performed using SAM-, SVM-, and ANN-supervised classifiers using a fixed training set. The process of finding optimal band does not limit to only band selection; moreover, further band optimization was done using classifier’s performance at given input bands at a desired level of accuracy. In the next iteration, a lower number of bands have been tested using accuracy matrix. A spectral-based (pixel) supervised classification algorithm has been used in this study, due to their proven ability in differentiating material properties in complex areas [7]. The established spectral libraries of roof surface or urban surfaces available in ENVI were not used for input (training data) since the spectral signatures of an object from the HySpex image were different to the spectral signatures taken in laboratory settings. Therefore, to escalate the precision of classification of a field acquired, polygons were used in roof surface characterization.

(1) Spectral Angle Mapper (SAM). SAM is a nonparametric supervised classification method widely used in a vast range of spectral studies where material identification is prime. SAM algorithm is based on a concept that the spectra of similar objects are linearly scaled over the region [53, 54]. Precisely, the linearly scaled spectra of objects fall within the same spectral angle, and similarly, the spectral angle of an object that falls beyond the defined spectral angle will be determined as another class or remains unclassified [53]. Therefore, the prior information of data structure is advantageous in endmember selection in SAM classification [54]. However, looking at the complexity of roofing materials that hold enormous spectral information in thousands of hyperspectral bands requests profound knowledge of spectral characteristic for an accurate endmember selection [32]. In this study, endmembers were selected from averaging of spectra from sourced training polygons. Various radian angles from 0.10 to 0.30 radians were used to analyze the best fit spectral angle which minimizes the number of unclassified pixels in classification and to improve accuracy.

(2) Support Vector Machine. SVM is also a nonparametric supervised classification method. It segregates the classes using multiple decision surfaces called hyperplanes [3]. SVM is a machine learning method; it finds an optimal decision hyperplane between classes and solves optimization problems in complex data structures [55]. Most conventional classifiers work better if we source prior knowledge of data distribution such as maximum likelihood method that requires the knowledge of data distribution as it assumes the normal distribution of the data. However, the high dimensionality of hyperspectral images often comes with unknown distribution of data [56]. The SVM is popular in dealing with complex datasets and has proven its effectiveness using minimum training samples without compromising accuracy [55, 57, 58]. Moreover, in a holistic review by Mountrakis et al. [56], they suggest that the SVM classifier has been effective at the high variability surfaces and compound spectra of surface materials. In ENVI 5.2, SVM was set to run using a well-known radial kernel function which has worked well in a variety of applications reviewed in Mountrakis et al. [56]. In the learning machine, we set a penalty parameter (an added error) value of 100 for separating hyperplanes. Penalty parameter controls force rigid margins by excluding rigid training pixels that make a poor generalization of an algorithm or local overfitting of data. Vice versa, local overfitting decreases the generalization of an algorithm which ultimately would decrease the accuracy [59]. Penalty parameters and RBF kernel of SVM were chosen using review methods and expert opinion experience using a trial and error method on sample data.

(3) Artificial Neural Network (ANN). ANN is also a supervised, nonparametric, and machine learning method [6, 60] used in this study for the classification of roof surfaces. The neural network in its basic form was designed for pattern recognition and works by imitating the function of the human brain so it can learn and adapt through trial and error to perform complex processes [38]. ANN is an iterative adaptive theory which utilizes the information of data calibration from the previous iteration, so it can generate more accurate classes [61]. Additionally, ANN is capable of speedy classification and incorporates different types of data [60]. ANN is used in many extensive urban studies (to mention a few, [55, 6165]).

ANN was set to run in ENVI 5.2 using a standard backpropagation approach in supervised learning. To avoid an overfitting in ANN algorithm, we used the following multiple parameters to create good generalization for the new data. The first is training threshold contribution, which ranges from 0 to 1 and was set to 0.8. This determines the initial internal weights—a zero value means no changes in internal weights and the highest rate of one can cause a forced convergence in the values of the statistical model. The second adjustment was the training rate, which ranges from 0 to 1 and was set to 0.2. A higher training rate will speed up the process by changing the magnitudes of weights; moreover, a higher rate will create nonconvergence in training results. Next, we set the training momentum, which ranges from 0 to 1 and was set to 0.8; a higher training momentum value will take a larger step (bounce) in an adjustment of weight which leads to higher errors. The last setting was training root mean square exit criteria, which ranges from 0 to 1 and was set to 0.1; this means that at 0.1 RMS, the error training process should stop to avoid overtraining of data. The number of training iterations was set to 999 in one hidden layer. The above settings were estimated by trial and error using different combinations. Since we used a standard backpropagation approach, the error will backpropagate to adjust internal weights in the nodes of the hidden layer. This leaning and adaptation allowed the gradient loss of error through the backfeeding of error which was estimated using target output versus actual output.

(4) Validation. In postclassification, an error matrix was calculated to validate the results using ground truth test polygons. The contingency matrix was estimated in the ENVI 5.2 software, and multiple parameters of accuracy were assessed such as the overall accuracy (OA), the kappa coefficient (κ), the user accuracy, and the producer accuracy. Overall accuracy (OA) of classifier gives the general idea of a classification by estimating the percentage of correctly classified pixels versus the total ground truth pixels. Producer accuracy (PA) measures the probability of correctly classified pixels to total ground truth pixels of a certain class, and user accuracy (UA) measures total pixels classified by a classifier to a certain class versus correctly classified pixels within the same class (Envi documents). Kappa coefficient indicates the level of agreement between classification and ground truth versus random classification, and its value ranges from 0 to 1, where a value near one indicates perfect agreement and near zero indicates classification results are random or by chance. We used a total of 8169 ground truth pixels, that is, an average of 510 pixels per thematic class for the verification of the results.

3. Results

The supervised set of classifiers such as SAM, SVM, and ANN was used to classify the roof surfaces. We used the proposed band selection method BS-PCALo and the conventional transformed method MNF and PCA for dimensionality reduction. To achieve the goal of finding the optimal HySpex, wavebands in the classification of roofs were obtained using a designed algorithm. The algorithm used varied multiple combinations of bands. At each next iteration, the number of bands used as input was lower in numbers than bands used in the previous iteration. This study has used the supervised and unsupervised methods for dimensionality reduction. PCA and MNF are the conventional and unsupervised methods of dimensionality reduction, whereas BS-PCALo are the supervised dimensionality method designed to get set of the best bands. The bands in the following iteration for classification was based on a careful analysis accuracy assessment.

The results of BS-PCALo method resulted into the three principal component: BS-PCA1, 2, and 3 which contain 113 spectral bands. However, some bands also duplicate into another PCA. Therefore, 76 distinct resultant bands were identified. The first principal component has a maximum loading magnitude values whose information content was 70.74%, while the other two components have 23.63% and 3.08% information, respectively. The first two components contain 94% of information stored in 48 unique bands (Figure 2) whereas only 3% information was stored remaining 21 bands which can be excluded from further classification.

From 303 bands, the visible spectrum (400–700 nm) holds 86 bands, NIR spectrum (700–1500 nm) holds 76, and SWIR spectrum (1500–2500 nm) holds 141 bands. Figure 2 shows that PCA1 and 2 encompass 48 bands, composed of 18, 7, and 23 from the visible, NIR, and SWIR spectrums, respectively. Data reduction methods such as MNF transformation and PCA transformation resulted in compressed bands. We selected 14 bands each from MNF and PCA on the basis of information content which shows that approximately 90% of information was stored in 14 transformed bands.

3.1. Iteration Sequence

(1) Initially, all 303 bands were selected for classification which excludes interpolated bands. (2) BS-PCALo algorithm resulted into three components (BS-PCA1, 2, and 3) which encompass 76 bands were used. (3) The first two principal components (BS-PCA1 and 2) from BS-PCALo were selected as input which contains 48 bands as the third component has very less information. (4) From BS-PCALo method, 37 bands were selected by using two rules: first, the repetitive bands which were common in three principal components of BS-PCALo loadings which allow an assumption that repeated bands have higher information for their repetitiveness (high loading value bands flashed in every component); second, from two the adjacent bands, only one was selected as the adjacent band shows similar properties. (5) From 48 bands (BS-PCA1 and 2), we selected 20 bands of equal interval. (6) To explore the distinctive spectrum potential, we used 5 spectral bands from each spectrum: visible, NIR, and SWIR regions, that is, 15 bands. These 15 bands were selected from 48 bands (BS-PCALo). (7) In further iterations, we reduced the number of input bands such as 10, 6, and 4 spectral bands in equal interval bands from 48 (BS-PCALo-1 and BS-PCALo-2) bands. (8) We selected 10 bands at equal intervals from 303 bands. (9) To compare conventional data transformation methods, we used 14 MNF-transformed bands and (10) 14 PCA-transformed bands.

3.2. Classification Results and Accuracy

The first set of classification results based on the highest OA accuracies of the classifiers was plotted (Figure 3). The best OA accuracies achieved by the classifier using different input bands were identified and plotted in Figure 3. We observed that a red circle is indicating the brown metal roof class correctly classified in SVM and ANN. However, SAM misclassifies it as the asbestos using all the 303 spectral bands.

The classification results achieved using (a) SAM, (b) ANN, and (c) SVM classifier for 48 bands were plotted in Figure 4. The red square in (Figure 4(b)) indicates that ANN correctly classified a roof as metal slate, while in contrast, SAM and SVM misclassified it as copper (Figures 3 and 4).

A confusion matrix was estimated to achieve a comprehensive understanding of classification results, and it allows to evaluate algorithm performance compared using various accuracy parameters (Table 1). The Google Earth images were also used to validate the results by visual analysis; in extreme cases where visual analysis is difficult, we asked the owner of a house to validate the results by a phone call. The highest OA accuracies and kappa observed using ANN classifier for 76 spectral bands (BS-PCA1, 2, and 3) were 94.19% and 0.93, respectively. A kappa value of 0.93 indicates that 7% of classification is by chance or random. Surprisingly, in 303 BS-PCA bands, the SVM classifier has shown a good OA of 89.72%. SVM has resulted a higher OA of 91.94% and 0.91 kappa when using PCA-transformed bands. Surprisingly, the ANN classifier was unable to classify the PCA-transformed band which resulted in a poor OA of 35.25% and kappa of 0.29. The thematic accuracies like PA and UA were averaged to give a better knowledge of the classifier in classification. Averaged thematic PA and UA showed the percentage of a correctly classified pixel for the selected classifier (Table 1).

In ANN classification, each iteration ends with an error value, that is, root mean square error (RMSE) which backpropagates the error value in the next iteration to change internal weights. The last RMSE value indicates the distinction and quality of classification; the last iteration RMSE values are plotted in Table 1. SAM has shown lower accuracies in general and also in the comparison between other classifiers. Moreover, SAM resulted in lower PA and UA; the highest PA was found in 65.83% using all the 303 spectral bands. SAM reached the OA of up to 60.17% when the whole spectral features (303 bands) were sourced for the classification further; accuracy did not fall drastically with decreasing bands. Moreover, even at 10 spectral bands, PA remained at 62.66% and OA remained at 57.70%, that is, falling by a mere 3%. SVM has shown the highest accuracy of 91.94% using 14 PCA transformed than 14 MNF transformed bands.

Spectral efficacy of HySpex was tested using 5 bands each from visible, NIR, and SWIR regions of electromagnetic spectrum. The lowest accuracies were noted if the visible spectrum was excluded from classification (Table 1). SAM and SVM classifiers have shown good OA using the visible spectrum along with NIR spectrum. Meanwhile, ANN showed good accuracies for the visible and SWIR regions.

4. Discussion

In this study, we utilized HySpex image from the city of Bialystok which has electromagnetic spectrum frequency 400 nm to 2500 nm. As stated, the study aims to find an optimal number of bands which deliver the highest accuracy in the classification of roof surfaces. In the classification of urban roof surfaces, we examined a robust classification approach which will excel in high distinguishing capability with stability, time economic, and higher classification accuracy. To accomplish this, we used SAM-, ANN-, and SVM-supervised classifiers which ran under the same size of training polygon (pixels) using different numbers of input bands.

Prior to the classification setup, we attempted to reduce data dimension, noise, and undesirable complex data. Remarkably, the excluded visible noisy bands and an interpolated water absorption bands have largely helped to reduce feature data space by nearly 30%. Importantly, the fusion of HySpex and LiDAR gave an extra dimension of height to the data which assisted to make the distinction between ground and roof features. This morphological dimension (height) allowed eliminating an unnecessary information from data, which will ultimately facilitate the classifier’s focus on available target information for object discrimination [15, 32, 51]. The water bodies, shadows, and vegetation were masked which otherwise may introduce additive noise in the data. These methods of selective preserving of target information must have certainly helped in the classification process since using 303 spectral masked (water, shadows, and vegetation) bands in SVM resulted in substantial OA of 89.72% (Table 1).

4.1. Dimensionality Reduction

Though narrowband offers a continuous spectrum, the information content in adjacent bands portrays an inherent redundancy in properties [18]. BS-PCALo method in R has efficiently worked in integrating the feature extraction and feature selection in two steps; initially, it finds data variance using the feature extraction technique, and then band separability loadings allow to know band positions. In R, 76 highly efficient band positions were traced using training pixels and not by feeding the whole image into the software. The use of only training pixel to find the best bands has enabled us to eliminate billions of mathematical processes which otherwise required for the whole image processing. In BS-PCALo algorithm, each pixel was assessed independently as we do not source the information of training class, location, or feature boundary. Nevertheless, it has estimated minimum and maximum variance between adjacent bands at a certain threshold in the given training pixels. Running BS-PCA script in R is far more time efficient (i.e., ~1minute) than running PCA and MNF in ENVI software (which took 3-4 hours with advanced computing processors). The BS-PCA components facilitated to locate the most relative band positions and allowed us to retain original bands with their rich spectral properties. Here, we should note that 48 bands comprise 18 bands from the visible spectrum (400–700 nm), 7 bands from the near-infrared (NIR) region (700–1500 nm), and 23 bands from the short-wave infrared (SWIR) region (1500–2400 nm). From the original 303 bands, ~97% of information was preserved in the first three BS-PCA components.

Conversely, conventional PCA and MNF processing resulted in data variance information in transformed bands and not spectral properties [41]. The spectral properties of bands possess significant information which makes them conducive to the classification process than transformed bands [66]. Unexpectedly, ANN worked well with MNF bands and not the PCA. Possibly, PCA bands suffer more from noise compared to MNF bands, due to effective noise elimination in the MNF method. SAM showed better result with PCA-transformed bands than MNF which shows SAM has good tolerability to noise present in the PCA-transformed bands, which is a similar finding to those of Panigrahi and Prashnani [67]. This also explains the reason for the relatively lower accuracies in ANN using 303 bands.

Remarkably, SVM has showed the highest accuracy of 91.94% using 14 PCA-transformed bands but at the cost of PCA processing time. However, using 14 MNF bands shown low accuracies in the SVM classification and classifiers, possibly due to the noise whitening effect of MNF which has reduced the data variance. On the other hands, using 10 spectral bands (BS-PCALo) in SVM and SAM classification resulted in a higher accuracy than 14 MNF bands; this demonstrates that spectral properties of bands are advantageous in the classification process. Collectively, removing redundant data and preserving corresponding variance information along with their spectral properties not only enriched but also substantially simplified and organized the information for the target area classification.

4.2. Classifier Stability and Band Optimization

In this study, the SAM classifier has shown poor accuracy among the range of classifiers; however, multiple adjustment in radian angle was tested to get competent results. A radian angle of 0.25 was found most suitable fit to the data which leaves lesser unclassified pixel and levered the accuracy. For instance, using a 0.1 and 0.25 radian angle, OA raised from 49.43% to 57.92% utilizing 76 spectral bands, and 0.1 radian also left a residual of 30.75% unclassified pixels. SAM could reach to a maximum OA of 60.17% using all the 303 bands for the obvious reasons of rich and continuous spectral information in 303 bands. Using the best 0.25 radian spectral angle, SAM classification showed a variable percentage of unclassified pixels ranging 0.5–5% and salt-pepper noise effect. OA of 48 and 37 spectral bands was the same, which is clear that 11 bands possess some duplicate or redundant information which can be excluded from classification process. Poor SAM result was observed in few forest and urban land cover studies when compared to pixel classifiers [68, 69]. A higher separability of classes was found in SVM and ANN than that in SAM, as unclassified pixels were noted only in the SAM classification.

SVM has appeared to be very stable and significant; using as much as 303 spectral bands or as low as 14 PCA-transformed bands, it resulted in the highest OAs. However, SVM has longer processing time (20 hours). Reducing from 303 to 48 bands, SVM’s accuracy was reduced marginally by 1.73%; moreover, using 6 spectral bands, OA was highest (81.99%) among all classifiers. We are in agreement with Mountrakis et al. [56] and Alonso et al. [70] that SVM is less sensitive to the data noise and effective in handling large data size. SVM’s penalty parameter allowed to detect noise or nonclass features in the data using decision hyperplane which barricade the rigid features entering in the model building, hence proving its legacy of stability as detailed in Mountrakis et al. [56]. The average thematic accuracies such as PA in SVM classifier were above 80% using as few as 6 spectral bands in classification (Table 1); high PA accuracies in SVM were also found by Benarchid and Raissouni [71].

In this study, we have observed the Hughes effect. The HySpex image significantly suffered from the Hughes phenomenon; similar findings were observed in Thenkabail et al. [13] and Ma et al. [72]. Using SVM, the saturation in the accuracy has been observed while reducing bands from 303 to 48 BS-PCALo bands explains the Hughes effect. Similarly, the Hughes phenomenon is found in ANN, using 76 bands (BS-PCALo) produced a higher OA than using 303 and furthermore marginally dropped using 10 BS-PCALo bands. Hughes phenomenon not only shows the saturation of accuracies but also can reduce the accuracies if a number of bands were further increased. Noise and data redundancy in 303 bands minimizes ANN accuracies, as ANN is sensitive to the input data quality and noise [73]. In this study, SVM accuracy using noncompressed 303 bands was found high; however, bands resulted from BS-PCALo method allowed to find optimal bands.

There are numerous studies where ANN proved to be the best among other classifiers [55, 61, 74], and the studies proved that SVM is better over ANN [75, 76]. In the present study though, SVM turned out to be stable at all levels of classification (Table 1); however, ANN accuracies were higher than SVM. Over this study, ANN has been an advantage for faster processing, rapid training momentum, and its hidden layer which decreases the complexity of process and adjustable multiple parameters to control data overfitting. Overall, SVM and ANN classifiers performed well in the roof surface characterization with significant accuracy.

In the process of finding the optimal bands, the accuracy of using 10 bands was noteworthy; for instance, BS-PCALo derived 10 spectral bands using ANN classifier and OA was 90.85% which was higher than using 303 bands (OA of 83.73%). Also, OA of SAM classifier using 10 spectral bands (BS–PCALo) was 57.70%, which was higher than using 48 spectral bands (OA of 56.12%). Also, the accuracy of 10 BS-PCALo for all the classifiers did not drop drastically when band reduced from 303 to 10, and an average accuracy dropped for all classifiers was only 3%. The high resolution of the HySpex image showed remarkable potential to result in successful classification, and we are in agreement with Heiden et al. [10] that the high resolution of HySpex (hyperspectral sensors) has the capability to capture discrete spectral features to differentiate roof surface materials. Remarkably, average PAs and UAs were observed above 91.86% using ANN classifier and above 86.36% using SVM for 10 BS-PCA bands. The accuracies using less than using 10 (BS-PCALo) bands were remarkably lowered below acceptable percentage, that is, 85%. Hence, algorithm navigated to deduce that 10 bands selected from band selection method (BS-PCA) have higher potential to store object spectral information to acquire desired accuracies.

4.3. Spectrum Efficacy

Spectrum effectiveness was validated with regard to their robustness in determining separability between features by using a specific wavelength range of the spectrum. The accuracy matrix demonstrates that the visible spectrum and the SWIR spectrum contains enormous information due to its higher absorption rate of object reflectance. Most spectral features were defined in the 450–650 nm, 1100 nm, 2120–2160 nm, and 2290-2470 nm ranges (Figure 2); these results correlate with the findings of Heiden et al. [9]. The use of the visible spectrum along with NIR or SWIR largely elevated the accuracy level by 12–17% than using NIR and SWIR spectra together. The BS-PCA1 and 2 components in R program showed a tendency towards the SWIR region; the SWIR region holds 23 bands, the visible spectrum holds 18 bands, and the remaining 7 bands are from the VNIR region. These findings can be used to deduce that roof objects impose strong absorption features in the visible and SWIR electromagnetic regions.

We found low thematic accuracies in few classes such as the classes like asphalt and asphalt-mixes. These roofs were amalgamated most likely due to varied roofing types as roofing agencies have constantly changed the percentages of mixing materials to find the most cost-effective and durable combination for rust and water resistance. The variability of a material was found to be one of the major limitations of this study; however, the sampling patterns cover all the variability of the material which allows overcoming this limitation. Interestingly, asbestos roofing was long standing and found over the old broken houses. Especially, asbestos and copper roofs were observed rarely which validated using Google Earth and field-collected data. White roofings reflecting bright light were validated along with other roof classes using images.

5. Conclusions

This paper outlines the classification urban roof material using HySpex spectrometers and aimed to find an optimum band for successful feature separation in a complex urban area of Bialystok. This study uses band reduction methods and classifier’s efficiency to reach an optimal number of bands. Fusion of LiDAR and HySpex has proven to be an asset by enriching extra dimension to the classification. The algorithm successfully determined the optimum bands for successful classification of roofs with substantial classification accuracy.

The findings of this study able to deduce that the proposed band selection method BS-PCALo has a proven ability to locate band positions and preserves spectral property bands. The algorithm identified the redundant bands using the principal components (BS-PCA1, 2, and 3) information. The 76 bands obtained from BS-PCALo were rich in target formation and allowed to pursuit the optimal bands. The study ables to conclude that 10 spectral bands selected from BS-PCALo method have shown considerable accuracies in roof surface mapping. The visible and SWIR spectrum regions have the maximum band absorption to differentiate urban roof features. The comparison between MNF and PCA transformation concludes that PCA-transformed band shown higher OA of 91.94% than MNF-transformed bands.

In HySpex classification, the SAM classifier showed overall lower performance, having failed to reconcile collective spectra of target classes, which resulted in lower accuracies. Meanwhile, SVM and ANN showed statistically significant higher accuracies. Though SVM and ANN both suffered from Hughes phenomenon, SVM classifier was relatively stable for Hughes effect where accuracies were not dropped with increasing bands. The disadvantage with SVM is its long processing time; however, SVM was found robust and stable for all levels of classification. ANN was found promising for its high accuracy and time economic. ANN achieved the highest classification OA of 94.19% using 76 bands and using 10 bands achieved the optimal accuracy for optimal bands, that is, 90.85%. The finding also concludes that ANN is noise-sensitive (data input quality) and required to adjust multiple parameters; however, ANN has yielded the highest accuracies and largely assisted to find the minimum best bands for the algorithm.

Overall, SVM and ANN both have achieved the substantial accuracies and suitable to study urban roof surface types and band optimization; however, ANN excels in the overall performance. The future recommendation roof studies includes the use of more advanced classifiers such as multiple decision classifier algorithm, multikernel classification approach, or deep learning neural networks. Urban surfaces are ever changing which need a repetitive mapping that produces necessity to develop sophisticated automatization of an algorithm.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

The methods of image preprocessing were tested in frame of project “The Innovative Approach Supporting Monitoring of Non-Forest Natura 2000 Habitats, Using Remote Sensing Methods” cofinanced by the funds of the National Centre for Research and Development (NCBiR) in the framework of programme “Natural Environment, Agriculture and Forestry” BIOSTRATEG II no. BIOSTRATEG2/297915/3/NCBR/2016 and grants for science awarded by the Polish Ministry of Science and Higher Education under the Theme no. 501-D119-64-0180200-15. The authors would like to thank the European Union’s Erasmus Mundus NAMASTE consortium for financing the tenure of research program. Special thanks are due to the MGGP Aero Company for acquiring and initial preprocessing of the HySpex and LiDAR images. Also, the authors would like to thank the anonymous reviewers for enhancing this manuscript.