A microzonation study deals with the classification of hazards in a town or city in terms of surface ground motions that result from amplification and resonance frequency in soils against seismic tremors. This paper presents the result of a microzonation study in terms of resonance frequency and peak amplitude for Lahore city, Pakistan. In order to recognize the local soil effects of the covered geology at 159 sites in Lahore city, the horizontal-to-vertical spectral ratio (HVSR) Nakamura method was implemented. A fuzzy C-mean (FCM) clustering algorithm was adopted to obtain the best cluster solution of the analyzed HVSR parameters. The results of the Silhouette Index suggest that the FCM clustering solution of observation data points is more consistent. The results of clustering reveal three solutions. Clusters 1 and 2 reveal that a major part of the research area possesses low to moderate frequencies (0.66–1.03 Hz) with a peak amplitude of 2.25–4.38 mm, indicating the presence of soft to hard rock and thick alluvial sedimentary cover. Cluster 3 reveals the presence of soft to compact rocks (with frequencies and amplitudes of 0.73–1.03 Hz and 3.02–4.11 mm, respectively) overlaying the bedrock. Lahore city has 60% of soil cover with an amplitude of 2–3 mm (for the central part) and about 40% of 3–4 mm in the northern, southern, and southwest portions. According to the NEHRP soil classification code of 1997, a major part of the city has stiff nature of the soil, while a few places reveal the presence of very dense soil. The maps produced in this study will provide expected ground motion-related useful information to reduce the seismic risk for infrastructure in Lahore city.

1. Introduction

An earthquake is a natural phenomenon that has caused huge loss of physical damage in different countries, e.g., Armenia, Iran, California, Japan, Greece, Turkey, India, Sumatra, Pakistan, China, and Haiti. Therefore, globally multi-dimensional earthquake hazard assessment studies by [14] have been performed. Several studies have shown that most Asian regions, including Pakistan, are vulnerable to seismic events [5] because they are located near active faults [6]. Historically, Pakistan has gone through many damaging earthquakes that resulted in the loss of lives and important infrastructures, e.g., the 1935 Quetta and 2005 Kashmir earthquakes [7]. Northern and northwestern parts of Pakistan are also considered as vulnerable due to moderate to high seismicity [8].

Seismic hazard analysis for a region can result in two forms, i.e., seismic macrozonation and microzonation. In seismic macrozonation, the probability of earthquake ground shaking corresponding to a return period is calculated by considering the shear wave velocity of bedrock. Structures such as roads, buildings, dams, are mostly built on soils having different characteristics (shear wave velocity, frequency, amplification, etc.). These parameters cannot be estimated at a regional level, and hence a process called seismic microzonation is introduced to map and identify the response of an earthquake on local soil by estimating these parameters.

Various seismic microzonation techniques have been proposed, among which the site effect technique is generally followed to estimate the peak amplitude, resonance frequency, vulnerability to ground instabilities [9], top thirty-meter shear wave velocity (Vs30), etc. [10]. The recent experimental, numerical, and theoretical studies [11] of local soil in terms of damage distribution against earthquakes proved to be very effective. It is now accepted that local soil conditions play a significant role in earthquakes. Calculation of sediment thickness from resonance frequency is an appropriate way to characterize site conditions [12]. Mapping the average top 30-meter shear wave velocity (Vs30) of the sedimentary layers is a globally accepted parameter used in the recent development of buildings for forecasting the potential amplification against seismic shaking [13, 14].

The existing seismic hazard assessment studies of [1518] have classified Lahore and its surroundings into microzones without incorporating the soil/subsoil condition. Limited research has been carried out to highlight the site response parameter (amplification, frequency, Vs30, etc.) in Pakistan. [19] estimated site effects of local soils in Fateh Jang region; [20] performed site response analysis to mitigate the impact on natural hazards; [10] performed seismic microzonation of Islamabad-Rawalpindi metropolitan area; [21] produced microzonation map for Abbottabad basin and its immediate surrounding; [22] highlighted the application of machine learning techniques on ambient noise data for Potwar region; and [23] performed site response studies in Peshawar using HVSR Nakamura technique.

Since Lahore city majorly lies over alluvial cover (Figure 1), no such studies have been opted for Lahore city, Pakistan. Therefore, this study is focused on the microzonation of Lahore city that takes into account the soil/subsoil conditions while designing and constructing multistory buildings and other engineering structures.

Clustering techniques have been found very promising in developing solutions for several geoscience problems [2428]. In this study, a fuzzy C-mean clustering algorithm is used to find the cluster solution for HVSR data in order to investigate the site’s effect parameters [2931]. In Figure 2, the flowchart represents the important steps that are performed in the clustering analysis of observation data. Based on clustering analysis, these site response parameters are further classified in Lahore city into microzones that incorporate local soil/subsoil conditions and characterize Lahore city into different risk classifications against future earthquakes in the design of engineering structures.

2. Materials and Methods

In this study, the Nakamura method [32] of HVSR (Horizontal-to-Vertical Spectral Ratio) for the calculation of site effects parameters was adopted.

In this method, a single station measurement can easily be done to obtain subsurface information. This method was initially used to investigate the risk of seismicity in Japan [32, 33]. The HVSR method of Nakamura has been adopted for the seismic site characterization studies of large cities all over the world and obtained fruitful outputs from HVSR surveys by different researchers, i.e., [34] characterized the site of Iran based on HVSR; [35] adopted this method for seismic microzonation study of Catania Italy; [11] in Lubaljana area Slovenia; in Almerìa City [36, 37] in Beijing; [38] in Urban Las Vegas Nevada basin and [10] in Islamabad and Rawalpindi, Pakistan, etc.

Data acquisition is the initial stage in performing research work in any area or field. Horizontal-to-vertical spectral ratio data were acquired from 159 test locations using a Tromino Engy system for high-resolution wireless seismic and geophysical surveys (housed at NCE in Geology, University of Peshawar). For each data collecting point, the sensor was set to record data for 10 minutes duration at a sampling frequency of 128 Hz. This instrument sends electromagnetic pulse in the form of Rayleigh waves to the ground and detects soft/compact sedimentary layers.

The acquired HVSR data set was processed in the Grilla software by setting the engineering interest frequency range of 0.5–20 Hz. The H/V curves of HVSR data were generated using 10% smooth triangular windows (Figure 3). The features of H/V curves were analyzed in accordance with the worldwide recommendation introduced by [39]. First, the reliability of the curves was checked (i.e., is significant cycles for a definite fo, low scattering, and enough windows number acceptable for a set frequency range). Secondly, HVSR curves were checked to determine whether the criteria for a clear and reliable peak are fulfilled or not [39], as shown in Table 1. Based on these two criteria, 8 points were skipped due to so much noise and poor data.

2.1. Clustering Analysis of HVSR Data
2.1.1. Data Preprocessing

After analyzing HVSR data in the Grilla software, site response parameters, i.e., fundamental frequency (fo) of soil and H/V spectral amplitude (Ao) of corresponding H/V spectral ratios, were selected for clustering analysis.

A preprocessing tool was applied prior to clustering to remove any inconsistent (missing, unknown, or redundant) values. The data set was found to be consistent, and no missing or unknown values were found. However, different ranges are found in the data which need to be scaled. This task was done using a z-standardization test on two variables of amplitude and frequency. For elements Xi = Aoi and foi, the z-standardization can be calculated by where σ = standard deviation and μ = mean value of variable Ao and fo. Some upscale points were found (Figure 4) using the z-standardization of the data. These points were scaled using a standard deviation error value. Thus, a dataset of 151 records containing two variables was compiled for input in the clustering analysis stage.

2.1.2. Clustering Analysis

In this study, the processed dataset was assigned for clustering in XLSTAT statistical analysis software. XLSTAT is a flexible add-on for excel data analysis that allows operators to customize, analyze and share results with powerful features of machine learning, test validation, parametric/no parametric test, data visualization, clustering, etc. [40]. The Fuzzy C-mean (FCM) clustering algorithm developed by [41] and improved by [42] was used to find a number of cluster solutions in the HVSR data.

In the first step, the FCM algorithm attempts to divide a finite collection of n elements into a group of c fuzzy clusters according to some given principle. Given a finite set of data, this algorithm returns a list of cluster centers C and a partition matrix:

In the second step, coefficients are assigned randomly to each data point for being in the clusters. Any point xk has a set of coefficients giving the degree of being in the kth cluster γik. In the third step, the mean of all points, weighted by their degree of fitting to the cluster, is estimated by relation:where m is the hyper-parameter, a fuzzier cluster will result at the end, if the value of m is higher.

For each observation, the clusters’ coefficients are computed by updating the partition matrix using relation:

This process is repeated until the algorithm has converged (that is, the coefficients’ change between two iterations is no more than the given sensitivity threshold). Where for each element, γ tells the degree to which element belongs to a cluster.

For an overlapped data set, FCM results are quite better than the K-mean. However, the FCM algorithm responds poorly to data sets that contain unequal size clusters. It is also sensitive to noise and outliers.

2.2. Tukey-Kramer Post Hoc Test

To find which cluster means are exactly different, a most frequently used post hoc test of Tukey-Kramer was performed. In this test mean between pairs of samples is compared by calculating the absolute mean difference in each cluster. Then the critical Q value was determined using relation:where Q = 5.99 from the K-Wallis test; S2pooled = Pooled variance across all groups; and n = Sample size for a given group = 10.

The pooled variance is calculated as the average of the variances for the clusters, which turns out to be 0.073. Thus, Qcriticalvalue is calculated as 0.511.

In the last step, the comparison was made between the absolute mean-variance and each cluster to the Q critical value. The difference is significant if the absolute mean variance is larger than the Q critical value.

3. Comparison with Different Approaches

In this study, the cluster solution of the observed data was compared with four clustering techniques of K-mediod, Density-based spatial clustering of applications with noise (DBSCAN), and Hierarchical and K-mean algorithms [4346]. Table 2 shows the comparison of these algorithms in terms of different criteria (distance measurement, granularity, initial and termination condition, etc.).

A comparison of -value obtained using K-mean, Hierarchical, and FCM is presented in Table 3 in terms of cluster profiling of amplitude and frequency using a non-parametric classical one-way ANOVA Kruskal-Wallis test. The results are shown in Figures 5 and 6 as box plots. The Kruskal-Wallis test gives the interpretation of consistency in observed data samples. In all three approaches, the computed -value is lower than the significance level of alpha = 0.05 for H/V amplitude, rejects the null hypothesis (Ho) for variable amplitude (Ao), and accepts the alternative hypothesis (Ha). For variable fo the computed -value is greater than the significance level alpha = 0.05, indicating that Kruskal-Wallis cannot reject the null hypothesis Ho.

The results of the Silhouette Index suggest that the FCM clustering solution is closer to +1, indicating the cluster is more consistent in differentiating the characteristics of observation data points. In contrast, Hierarchical and K-mean clustering solutions have negative Silhouette values, which tell about some misclassified locations of observed data.

4. Results and Discussion

This study is based on 159 sites. Some locations were very restricted and not allowed to carry out a survey. The official geological map on a local scale is also not available; the geological maps are only available from different research works in parts. The reason is the lack of studies that incorporate local soil conditions. But in this study, we try to map local soil and categorize it into different soil classes.

In this study, a meaningful cluster solution of the observed data was evaluated using the clustering algorithm of fuzzy C-mean. To see the effect of various centroids, the observed data set was iterated ten times for the optimal number of clusters by applying the fuzzy C-mean technique.

The profiling of a cluster for a variable plays a significant role in defining the differences between the clustering solutions while finding its characteristics. The clustering solutions generated by FCM were further analyzed in terms of cluster profiling of amplitude and frequency based on the median, mean, standard error, and standard deviation values using a non-parametric classical one-way ANOVA Kruskal-Wallis test. The profiling results for these two variables are listed in Table 4. In this table, the four rows for each variable fo and Ao contain information about the values of a number of observations without missing data, minimum, maximum, mean, and standard deviation for each cluster. The Kruskal-Wallis test gives the interpretation that the observed data samples either come from the same location (Ho) or do not come from the same location (Ha). In each table, the computed -value is lower than the significance level of alpha = 0.05 for H/V amplitude rejecting the null hypothesis (Ho) for variables amplitude (Ao) and accepting the alternative hypothesis (Ha). For variables fo, the computed -value is greater than the significance level alpha = 0.05, indicating that Kruskal-Wallis cannot reject the null hypothesis Ho.

This indicates the cluster solutions of variables are from the same distribution (for Ao) and from different distributions (for fo). In the statistical analysis, the Silhouette Index of clusters has been found to be very effective in the interpretation of the cluster solutions [47]. The results of the Silhouette Index suggest that the FCM clustering solution is closer to +1 (average Silhouette width of cluster 1 = 0.480; cluster 2 = 0.498 and cluster 3 = 0.533), indicating the cluster is more consistent in differentiating the characteristics of observation data points.

The distribution of two variables (frequency and amplitude) in clusters 1, 2, and 3 is also shown as box-plots (in Figure 7) and scatter plot (in Figure 8) for the FCM clustering algorithm. The solution of cluster results shows that two variables H/V amplitude and frequency are very significant for discriminating among clusters. Cluster 1 is the largest cluster (63 observations), having a high H/V amplitude of 2.64–4.38 mm and low to medium range frequencies (0.66–1.03 Hz).

Cluster 2 has 61 observations with low to medium range frequencies of 0.66–1.03 Hz and moderate amplitude values (2.25–3.49 mm). The frequencies of HVSR peaks in Cluster 3 range from 0.73–1.03 Hz with an amplitude of 3.02–4.11 mm, resembles of soft to compact rocks overlaying the bedrock (Table 5). The fundamental frequency of cluster 1 ranges from 0.73 to 1.03 Hz with an amplification of 3.02–4.11 mm also indicates the presence of soft to hard rock and more sediment. The entire three clusters indicate the sites are moderately vulnerable in case of any seismic event. The observations in all clusters represent more sediment thickness, which indicates the bedrock is not shallow in the area.

Based on the one-way ANOVA Kruskal-Wallis test, the null hypothesis is not rejected for frequency data, while it is rejected for amplitude (Ao) data since the p-value is lower than the significance level of 5%. The rejection tells at least one sample is from a different location, and the means between the three clusters are not equal. Based on the Tukey-Kramer post hoc test, the mean difference between cluster 1-cluster 2, and cluster 2-cluster 3 is not statistically significant, while it is noted only between cluster 3 and cluster 1.

The contour maps of these variables, i.e., resonance frequency and amplification data, were prepared using the Kriging interpolation tool in ArcGIS 10.4 software developed by ESRI. In the contour map of frequency, the lowest dominant frequency ranging from 0.1–1.0 Hz and the highest ranging from 1.01–2.0 Hz is represented by green and red colors, respectively. Figure 9 also represents the correlation between topography and distribution of predominant frequency in the area. High topography has high frequencies, and low topography possesses low predominant frequencies. This reveals a consistent deposition of sediment layers. Low topography resembles the thickness of soft sediment layers corresponding to very low predominant frequencies, while high topography is related to weak layer deposition corresponding to high frequencies represented by red contours.

The study area is comprised of Quaternary sediments [48] (Figure 1). Almost 60% of the study area (for the central part, green color) has soil with amplification ranging from 2 to 3 mm. While 40 percent area, i.e., the northern, southern and southwest portions of the city, is covered by soil with amplification ranging from 3 to 4 mm (Figure 10 red contours). The peak amplitude is associated with the impedance contrast of bedrock and soil cover. Some locations with high soil amplification (i.e., Lahore Zoo, Lahore Museum, UET housing society, Hamza Town, Sukh Chain gardens, and Tariq gardens) have the tendency of high risk to earthquakes, while soil with a low amplification factor bears a low risk of damage in some locations including Model Town, Johar Town, PIA colony, Faisal Town, University of Punjab, Baghbanpura, Green town. Stiff soil is characterized by low soil amplification, while soft soils are characterized by high soil amplification values. The townships of Baghbanpura and model town are built over sandy gravel, sandy hard clay, and loam type soil that can weaken the soil strength against earthquakes greater than 5.0 M. About 90% of the city is built on deltaic-mud soil up to 30 meters in depth. The locations of the University of Punjab, Lahore Zoo, Lahore Museum, and many towns (Model town, Johar town, Garden town, Faisal town, etc.) are very sensitive to larger earthquakes if proper geotechnical and geophysical testing are not followed prior to construction. The construction along Lahore Branch Canal needs special study in terms of soil liquefaction and groundwater seepage. This canal needs proper maintenance on a yearly basis, and cleaning of sediments is required.

The horizontal-to-vertical spectral ratio technique is favorable for the locations where the stiff layer is underlying a low rigid layer. Likewise, a clear variation in the mechanical properties of the hard rocks and soft soils is seen. Besides this, HVSR also identifies the difference between soft rocks and stiff soils based on the shape of curves. Soft soils reveal a clear curve like a bell in shape, while stiff to soft soils have flat-type curves.

Several studies have examined the variations in predominant frequencies. After such studies, it is concluded that predominant frequencies with low values mainly correspond to the presence of thick deposition and high-frequency values resemble thin deposition of sediments.

The maps produced could be used in the risk assessment (according to expected ground motion) of infrastructures in Lahore city. The results will be fruitful in designing underground constructions while considering subsurface soil. The present study can be significant for the researchers integrating the seismic risk assessment studies with the clustering techniques.

5. Conclusions

The comparison of partitioned and non-partitioned clustering techniques recognized three cluster solutions that combine the data in terms of low, high, and medium risky sites.

However, the FCM reveals further study of additional variables, vulnerability index, sediment thickness, soil liquefaction, etc., that need to be carried out at certain locations that reveal more than one cluster feature.

The predominant frequency fo corresponds to the topography of the study area. High fo values reveal deposition of a thin alluvial cover, and the low-frequency belongs to a thick alluvial cover.

A major part of the study area possesses a very low-frequency range (0.5–1 Hz) corresponding to a thick alluvial cover.

High amplification is noted in the southern and western parts of the study area, which shows a thick cover of sediments and high ground resonance potential. This part of the city might be harmful against medium to large earthquakes and has a high potential to damage engineering structures in the area.

Lahore city has 60% of alluvium cover exhibiting resultant amplification of 2–3 mm in the central area and about 40% of 3–4 mm in the northern, southern, and southwest portions.

A major part of the city has stiff-soft sediments, while a small part of the city is covered with soil comprising of a very dense nature of soils.

The locations of the University of Punjab, Lahore Zoo, Lahore Museum, and many towns (i.e., Model town, Johar town, Garden town, Faisal town, etc.) are very sensitive to larger earthquakes if proper geotechnical and geophysical testing are not followed prior to construction.

The construction along Lahore Branch Canal needs to carry out a special study in terms of soil liquefaction and groundwater seepage.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The authors are thankful to the National Centre of Excellence in Geology, University of Peshawar, for arranging field work for the data collection. This research received no external funding.