Abstract

The development of geospatial technologies has opened a new era in terms of data collection techniques and analysis procedures. Digital elevation models as 3D visualization of the Earth’s surface have many mapping and spatial analysis applications. The primary terrain factors derived from the raster dataset are usually less critical than secondary ones, e.g., ruggedness index, which plays a vital role in engineering, hydrological information derivation, and geomorphological processes. Surface ruggedness is a significant predictor of topographic heterogeneity by calculating the absolute value of elevation differences within a specified neighborhood surrounding a central pixel. The current study investigates the impacts of various topographic metrics obtained from a digital elevation model on characterizing terrain ruggedness utilizing stepwise principal component analysis. This popular multivariate statistical technique is applied to conduct a comprehensive assessment and treat the information redundancy of terrain parameters. Simultaneously, the standard deviation of elevation is also proposed as an alternative approach to quantifying topographic ruggedness. Besides, quantitative and qualitative method is espoused to validate the algorithms and compare their capabilities to the previously introduced models in the literature. The findings have shown that principal component analysis provides superior performance against other models. Furthermore, they indicated that the standard deviation of elevation could be used instead of the available ones.

1. Introduction

Geomorphometry, which is defined as the science of quantitative Earth’s terrain analysis [1], has mainly concentrated on studying terrestrial landscapes and derivation of a large spectrum of environmental variables [2, 3]. The essential data source for performing such spatial analysis is digital elevation models [4]. The digital elevation model is a three-dimensional raster representation of the land surface to identify geographical features [5, 6] by interpolating the landform using modern geospatial techniques [79]. It can be established through conventional ground topographic surveys or remote sensing methods [1013]. In general, DEM is high-quality spatial data that is applied to extract terrain factors such as slopes, curvatures, aspects, and drainage network [1418].

Generally, irregularity in elevation makes cultivation difficult and costly to traverse [19, 20], but useful for keeping newborn reindeer calves against weather and windchill [21]. Relief ruggedness is a valuable attribute of wildlife habitat models and one of the prime components affecting vegetation diversity, snowmelt patterns, and water drainage that are key elements of nutrient obtainability [2225] and the book [26]. Many previous studies have addressed quantifying surface ruggedness, in which some of these investigations were adopted on the statistical dispersion of elevations, slopes, aspects, and vectors orthogonal to planar facets on the landscape. Beasom et al. [27] described an approach for characterizing land surface ruggedness as a function of the total length of contour lines in a study area. More recently, Riley et al. [28] proposed a methodology to determine topographic heterogeneity by calculating the terrain ruggedness index value for each grid cell from the sum elevation changes within a specified neighborhood surrounding a central pixel. Sappington et al. [29] developed and tested a vector ruggedness measure algorithm to estimate landscape ruggedness as the variation in the 3D orientation of grid cells in a moving window by decomposing slope and aspect into 3D vectors. In contrast, Popit and Verbovšek [30] identified the ruggedness index by computing the differences between the lowest and highest elevation for each cell location on the output raster.

In a multivariate statistical analysis, the dataset’s dimension plays a critical role in determining the problem’s complexity. One of the most common techniques for reducing the data dimensions to increase its interpretability is the principal component analysis that was introduced by Pearson [31]. Presently, PCA is considered a powerful statistical approach in image processing due to its ability to decrease the number of correlated image bands resulting in a smaller dataset capable of representing most of the original variabilities [32]. Over the last few decades, PCA was used in geosciences and remote sensing [3337]. For instance, Petrişor et al. [38] highlight the potential of integrating geographic information systems and PCA as a decision-support tool for underdeveloped countries. Huang et al. [39] concluded that PCA could provide an accurate representation of terrain complexity distribution features compared to k-means clustering. However, this method has not yet been introduced to the process of modeling terrain ruggedness.

As stated before, no optimal measure would provide a powerful tool for defining habitat availability [40, 41]. Hence, selecting an appropriate method to evaluate this indicator is an essential task. In contrast, GIS software such as ArcGIS provides a variety of geoprocessing tools, but still has some gaps that must be addressed. The article’s main objective is to investigate the variability of quantifying topographic ruggedness characteristics, relying on PCA, a standard deviation of elevation, and traditional indices. Additionally, the spatial tools using ArcGIS Model Builder and Python script will be developed for automating the processes. As the basis for terrain ruggedness analysis, the DEM is generated using synthetic data acquired by a scanned topographic map with varying coefficients to emulate Earth’s physical land-surface features. The comprehensive assessment of several terrain metrics extracted from DEM is examined to describe topographic ruggedness using PCA. After that, the surface ruggedness obtained from the PCA and STD methods was compared against the ones from two different literature models by adopting some statistical approaches and visual depictions.

2. Materials and Methods

Topography relates to the shape and characteristics of the Earth’s land surface and is a central controlling element in a wide range of natural processes [42, 43]. It must be quantitatively evaluated [44] by building accurate and reliable geospatial data [45]. The terrain analysis is concerned with extracting topographic attributes from DEM raster in GIS as a convenient environment for handling data [46]. In general, DEM quality and resolution have a critical influence on the derived factors [4749]. The analysis scale and chosen approach for determining different terrain variables will also play a role in the final results. Surface ruggedness is a significant geomorphological parameter applied in geoscience and environmental studies for indicating the relative change in elevation between adjacent grid cells [50]. Many approaches have been applied to verify which algorithm is the most appropriate for depicting topographic ruggedness. The principal component analysis was utilized in this study to evaluate the correlations between terrain factors and reduce the dimension of the original parameters to characterize surface ruggedness. At the same time, the standard deviation of elevation was also tested to be another ruggedness model. Figure 1 describes a methodology flowchart adopted in the current study for quantifying relief ruggedness.

2.1. Case Study Description

The chosen case study is the Wadi Musa area (Valley of Moses) as one of Ma’an governorate towns in Jordan. It is characterized by gorgeous natural scenery and a heterogeneous landscape with mountains in the eastern part (Figure 2). The geographic position of Wadi Musa has situated about 200 km to the south of Amman, between 30° 15′ 00″ N and 30° 30′ 00″ N and 35° 10′ 00″ E and 35° 30′ 00″ E, as shown in Figure 3. The elevation of the terrain surface in this region varies from 49 m to 1725 m above the mean sea level (AMSL). This area has several types of features to assess the topographic ruggedness map under different landform conditions. Indeed, characteristics of terrain relief, such as flat, hilly, and mountainous, have a significant influence on the reliability of DEM [51] and extracted terrain factors. The digital elevation model of the site was generated using a topographic map with a scale of 1/25000. The map was scanned and digitized to capture contour lines with an interval of 10 m. After that, the ANUDEM method, which is entailed in ArcGIS 10.2 under the name of Topo to Raster, is applied to derive the DEM model with 10 m accuracy from CLs. This algorithm was developed by Hutchinson [52] to fit the shape and drainage structure of the land surface. The dataset’s quality control was performed by superimposing and inspecting the CLs extracted from the built DEM with the original ones. Figure 4 illustrates the flowchart of data acquisition.

2.2. Surface Ruggedness

Geomorphologic analysis as a measure of landforms geometry was broadly applied to understand various natural processes that impact humans and the environment [53, 54]. Generally, the terrain characteristics have tremendous leverage over the habitat and socioeconomic activities. The emergence of modern geospatial techniques has brought new ways to represent and quantify the bare-earth topography using grid-based DEMs and develop novel algorithms for extracting geomorphometric attributes [55, 56]. Neighborhood tools are common to carry out DEM analysis workflows by computing a specified statistic for a focal cell within a defined moving window that is designated in relation to the central point [57]. In this study, a 3 × 3 sampling window is effectively shifted across the DEM surface to identify terrain information, as indicated in Figure 5. The main reason behind this selection is to limit the effects of the cell size on the results of the investigated models since finer windows require highly accurate DEMs [58].

Terrain ruggedness index is assigned for each raster cell using the algorithm described by Riley et al. [28] by calculating the square root of the sum of the squared elevation differences between the central pixel and the adjacent ones within a specified neighborhood, as presented in equation (1), in which zo is the elevation of the central cell, z is the elevation of each neighbor grid cell to the focal one, and N is the width of the roving window (number of pixels). Hence, according to Figure 5, equation (2) is written. The source code was developed in Python programming in the context of ArcPy to calculate the TRI algorithm, as indicated in Figure 6.

Vector ruggedness measure provides another procedure to quantify terrain ruggedness as the variation in cells’ 3D orientation within a neighborhood, as presented in Figure 7. It was initially developed by Hobson [59] and later espoused by Sappington et al. [29]. The vector ruggedness measure can be calculated from the slope and aspect using equation (3). Figure 8 illustrates a complete workflow ArcGIS Model Builder scheme to generate a VRM grid from the DEM dataset.where n is the sample size (number of pixels in the moving window) and r is the resultant vector of central cells in a given neighborhood.

2.3. Ruggedness Conditioning Parameters

The terrain metrics shown in Table 1 are tested to evaluate and analyze ruggedness mapping to determine the most influential predictive factors. These parameters include six topographic features: slope, aspect, total curvature, surface roughness, topographic relief, and topographic position index. The relevant hydrological factors were the stream power index, sediment transport index, and topographic wetness index. Figure 9 demonstrates an automated workflow in ArcGIS Model Builder that uses standard tools to extract morphometric variables from DEM.

2.4. Standard Deviation of Elevation

The standard deviation of elevation is determined by analyzing a DEM at a pixel level via sliding window scanning using equation (5). The standard deviation of elevation is examined in this study as an alternative ruggedness measure. The key point about choosing this method is to ease calculating its parameters using focal statistics in ArcGIS with a reasonable processing time for a large spatial dataset.where N is the width of the roving window, is the mean elevation in the window, and z is cell elevation. Hence, according to Figure 5, it can be written as follows:

2.5. Principal Component Analysis

The simplified algorithm for conducted PCA is seen in Figure 10. In general, four main steps are used to perform this method based on terrain factors. The correlation and covariance matrices are computed, and then the principal components are defined, and finally the dataset is shifted and rotated to the new axis. Usually, to improve the performance, avoid the numerical conflicting, and training stability of the ruggedness values, the following equation is used to normalize the obtained results [66].

2.6. Model’s Performance Assessment

As mentioned previously, the obtained ruggedness grids from each method were classified into five different classes. These classes were defined as level (1), slight (2), moderate (3), high (4), and very high (5). In line with this study’s aim, the identified classes from each model were compared by adopting the principle of confusion matrix that is always used for evaluating the performance of classification algorithms in machine learning. Each matrix’s row represents the instances in a true class, whereas each column accounts for the predicted class cases. For a specific class in a multiple classes matrix such as the one shown in Figure 11, it can be divided into four main categories to indicate the number of times that the model: (1) true positive (TP) has rightly predicted the positive class, (2) true negative (TN) has precisely estimated the negative one, (3) false positive (FP) has incorrectly predicted the positive class, and (4) false negative (FN) has incorrectly estimated the negative class. Thus, using this matrix provides an easy and direct way for investigating the matching between two different approaches of the selected ruggedness modeling methods.

In general, matching any two models in this study is evaluated by computing the confusion matrix’s overall accuracy. The accuracy is defined as the number of corresponding points in all cases divided by the total number of points selected for the investigation. Furthermore, the models’ performance at the class level is assessed using two different approaches: precision and recall, as demonstrated in Figure 12.

3. Results and Discussion

Digital elevation models are fundamental to many applications in Earth sciences such as geomorphology, geology, ecology, and engineering [6769]. Hence, the accuracy of DEM-derived elements must be deemed to reduce the inherent errors. The terrain ruggedness is extracted from DEM to measure local elevation variability within a defined window. This study’s main objective is to carry out a comparative analysis to assess the quality of surface ruggedness generated by the PCA and STD algorithm. This analysis includes representing topographic ruggedness indices and verifying their performance. The standard deviation of elevation was determined using focal statistics over a 3 × 3 cell neighborhood of the DEM dataset. The chosen conditioning parameters (Figure 13) were normalized, and then the PCA approach was applied to describe surface ruggedness. On the other hand, the geoprocessing workflow in ArcGIS 10.2 software was automated by Python Script and Model Builder tool to calculate TRI and VRM indices, respectively.

The terrain variables in Table 1 were all used to develop an initial PCA model and choose the most suitable ones through its correlation matrix and eigenvalues. Indeed, the curvature, TWI, TPI, and aspect have indicated very low correlations than other metrics. Pearson’s correlation coefficient of each pair for a selected dataset (sample size 64635 points) and descriptive statistics were computed to investigate the impact of various terrain parameters on the ruggedness indices, as shown in Tables 2 and 3. Hence, the influencing parameters, which highly correlate with the ruggedness index, were selected for building the PCA model. The Pearson correlation coefficient varies between −1 and 1, with values close to the limits reflect high significance. In general, the blue color refers to have a positive correlation in which the ruggedness index is directly proportional to the terrain feature, while the orange color represents having a negative correlation means that the ruggedness index is inversely proportional to the terrain feature. Accordingly, the slope, TR, and SR have revealed the highest correlations; besides, SPI and STI gave a good correlation with different ruggedness methods. Hence, it is vital to use them to generate an accurate PCA map. Although TWI shows a significant relationship with all approaches, it minimized the model’s capabilities when utilized in the PCA due to its negative interaction. Additional analysis was conducted to identify the bivariate correlation of factors with the PCA model. Table 3 indicates that the trends in PCA results follow those of the other approaches. In contrast, the correlation coefficients between the examined parameters and VRM are significantly lower than the other models. For instance, the correlation coefficient between the slope and the ruggedness index reduced from about 0.99 to 0.291 when using the VRM model. A similar drop was observed in the VRM case, whether the correlation is positive or negative. On the other side, Figure 14 illustrates the correlation plots to measure the degree of agreement between the models and chosen topographic metrics. The x-axis of all diagrams represents the ruggedness index for each of the investigated methods (i.e., PCA, TRI, STD, and VRM) stated on the top, and the y-axis is the topographic parameters. In general, the TRI and STD algorithms display nearly the same, and TRI is highly correlated with the slope and the SR. In contrast, the PCA is related to several variables while there is no correlation between the VRM and comparative factors.

Figure 15 represents topographic ruggedness maps generated by TRI, VRM, PCA, and STD algorithms examined in the research area. These datasets are split into five homogeneous classes (level, slight, moderate, high, and very high) using the Jenks natural breaks optimization [70, 71]. Indeed, the smoothest areas represented by shades of a gray tone are entirely related to plain regions. Moreover, the more rugged zones, clarified by cyan color, are fully seen around the mountains’ topmost, as illustrated in Figure 16. The graphical representations of the ruggedness index indicate that the texture and colors/tones of the STD model closely match the corresponding TRI one. At the same time, these two surfaces show a slight difference from that of PCA, especially in places with very high ruggedness. Also, the VRM index reveals a considerable variation in the tone color patches from other algorithms. As a result, visual inspection has highlighted the PCA approach as practically the most superior model among the tested ones. Nevertheless, further analysis will be performed to evaluate tested models using statistical methods.

The PCA model was developed using selected topographic parameters in the case study, and covariance and correlation matrices are shown in Tables 4 and 5. Furthermore, the eigenvectors and eigenvalues are provided in Tables 6 and 7, respectively. Typically, the first component of the updated PCA cumulative contribution reached about 82.3%, while it was possible to exceed 95% using the first and second components only. However, the surface ruggedness was created using all the components to reach the best accuracy.

The confusion matrix was defined for each pair of methods to evaluate their capabilities and variations, as clarified in Figure 17. It is observed that both TRI and STD are almost identical in which they provided the same zonal classification with an overall accuracy (matching) of 99.7%. This means that the STD model can replace the TRI without any negative influence. The difference between PCA and STD and TRI can be represented by an overall variation of about 20%. The PCA classified more points in the very high category compared to the TRI one that was capable of correctly identifying about 45% of the points only. This means that the PCA model is more sensitive for rugged areas in comparison with other methods. Pairwise comparison between the PCA and VRM models showed an overall difference of about 70%; besides, the ratio of points classified in the first class by VRM was almost 97% of the testing population. This observation is in line with the previous findings in Figure 14 and means that the VRM method is incapable of detecting moderate, high, and very high roughness areas. Finally, both STD and TRI results indicated about 44% correlation with the ones extracted from the VRM due to having a majority of the points classified in the level class of the VRM model.

The following comparisons and postanalysis of the results will omit the VRM model and consider PCA, TRI, and STD ones as most VRM points concentrated in the level category. Figure 18 provides a class-based disparity between the PCA, TRI, and STD results through box plots. In general, it can be seen that the points’ distribution in each PCA class takes a larger range as compared to the TRI and STD models that shrink the classes into minimal extents of propagation. Furthermore, it can be noticed that the limits of the level class in the PCA model are equal to that of the level, slight, and moderate of the TRI and STD ones combined. This means that the PCA method increases the ruggedness index values of the entire area as compared to the TRI and STD results even though their maps look very similar. On the other hand, the datasets’ normal distribution in each of the investigated models is shown in Figure 19. It can be observed from the plots that the only significant difference between the PCA and the other algorithm is defined in the first class (level), where the PCA follows the normal distribution perfectly. Also, the spread of the other classes looks pretty similar in all of the indices. In general, Figure 20 shows a strong relationship between the results of PCA and those of TRI and STD. Thus, the regression equation (8) can be used to transfer TRI and STD models to the PCA one in Wadi Musa were proposed in this paper. The correlation coefficient of these equations is 0.9925 and 0.9926 in TRI and STD, respectively. These models can find application in many instances to overcome the need for high computational efforts.

4. Conclusion

Topographic ruggedness is an essential factor of terrain analysis for many applications in engineering and geomorphology. Several algorithms were introduced in the literature to show the spatial distribution of surface ruggedness but did not necessarily provide an optimal solution. This study presents a substitutional approach to generate terrain ruggedness using PCA and STD algorithms. Firstly, the effects of various DEM-derived topographic metrics on the ruggedness model were investigated using PCA. Secondly, the proposed methods are compared with the most common ones, namely, TRI and VRM. Indeed, the performance analysis of the ruggedness indices was conducted relying on visual inspection and statistical methods. The findings indicated the feasibility of applying PCA to generate an accurate ruggedness map consistent with the actual topographic features. STD model is simple to implement, available in ArcGIS software, and produces nearly identical results to TRI, the more complex method, while VRM showed the lowest performance. According to the statistical analysis, the ruggedness index’s most influential variables are slope, TR, and SR. Finally, additional research is required to understand PCA’s capabilities for describing terrain characteristics across various landscapes [72] and at different scales.

Abbreviation

TRI:Terrain ruggedness index
DEM:Digital elevation model
VRM:Vector ruggedness measure
PCA:Principal component analysis
GIS:Geographic information systems
STD:Standard deviation of elevation
CLs:Contour lines
AMSL:Above mean sea level
α:Aspect
β:Slope
C:Curvature
SR:Surface roughness
TR:Topographic relief
TPI:Topographic position index
TWI:Topographic wetness index
SPI:Stream power index
STI:Sediment transport index.

Data Availability

Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The author declares no conflicts of interest.