Abstract

Earthquake is one of the natural disasters that has always influenced human life. It is not currently possible to predict exactly when and where an earthquake will occur, nor how large it will be. It is also impossible to prevent an earthquake. However, by designing seismic-resistant structures, the amount of financial losses and casualties can be reduced. This resistant design requires the use of earthquake risk analysis. By using the earthquake risk analysis, it will be possible to estimate the parameters of the strong ground motion, including acceleration, velocity, and displacement in each area. Estimating the parameters of strong ground motion will be possible just by obtaining the appropriate attenuation relationship. The aim of this paper is to present an appropriate attenuation relationship to estimate the horizontal component of the possible occurrence of peak ground acceleration in each region. Two methods were used to calculate attenuation relationship: gene expression programing (GEP) and group method of data handling (GMDH). In the first step, an up-to-date and comprehensive catalog consisting of 1,185 earthquake records that occurred around the world has been prepared. In the next step, the parameters of magnitude, hypocentral distance, and shear wave velocity of these records have been used as variables of the attenuation relationship. Then, the fitness function (f) was determined, and attenuation relationships were calculated using GEP and GMDH. The amount of fitness function (f) was obtained 766.12 from 1,000 in the GEP method and 767.77 from 1,000 in the GMDH method. The values of the fitness function, residuals and comparison plots showed a high-agreement between “the values predicted by the attenuation relationships” and “the actual values observed in the earthquakes.” Finally, according to the results of this research, it can be said that the use of GEP and GMDH methods has provided better results than the other similar researches. Also, the use of up-to-date records makes the results of this research more reliable than the previous researches.

1. Introduction

Human life has always been influenced by the natural disasters such as floods, storms, and earthquakes. These disasters have always threatened human life and property. The occurrence of these natural disasters will never stop and posterity will always be threatened. One of the most important and unpredictable events is an earthquake. Many earthquakes usually occur annually in the world. Some of them occur in vacant and barren areas, which are certainly not considered a threat to humans. Some earthquakes are so mild that they are not even felt by humans and are only recorded by the accelerometers. However, high-intensity earthquakes may occur in residential environments and pose a constant threat to humans.

Japan, the United States, China, India, New Zealand, and Iran are considered as the seismic countries in the world. In Japan, from 1945 to 1995, 14 earthquakes with a magnitude greater than 6.5-Richter happened in 50 years that, left more than 8,000 victims. The Fukushima earthquake (2011) with a moment magnitude about 9, is one of the strongest earthquakes in Japan since 1900. The record for the largest earthquake in history with a moment magnitude 9.5 also belongs to Chile, which occurred in 1960 [1].

Earthquakes had been considered an unknown phenomenon several centuries ago. However, today scientific advances made in the field of seismic geotechnical engineering and more knowledge of the internal structure of the earth have led to the understanding of the various phases of the earthquake and its effects on the earth’s movement. Despite the achieved progress, it is impossible to predict its occurrence anywhere in the world, but it is possible to prepare for it by building seismic-resistant structures. The purpose of designing earthquake resistant structures is to construct structures that withstand seismic loads, do not undergo a lot of damage and rescue the inhabitants. To achieve this goal, it is necessary to predict and estimate the severity of possible earthquakes in each area following its various features. This goal is possible just through risk analysis. The prerequisite of risk analysis studies is to identify the parameters of strong ground motion for different areas [2].

The amount of strong ground motion is attributed to the magnitude, source to site distance, the source mechanism (type of faulting), geology of the region, surface topography, and dynamic properties of the material propagation [3]. Also, other parameters including soil nonlinear behavior, directivity, rupture propagation, and basin effects on ground motion can be even more effective than some of the above mentioned independent parameters [4].

In order to determine the impact of various earthquake parameters on a structure or a site located at a distance from the earthquake center, attenuation relationships are used.

Attenuation relationships are usually presented for important earthquake characteristics such as peak horizontal and vertical acceleration, peak ground velocity (PGV), elastic response spectra, and inelastic response spectra. The largest number of attenuation relationships is presented for peak ground acceleration (PGA) estimation.

The purpose of the present study is to present a new global attenuation relationship to estimate the horizontal component of peak ground acceleration (PGAH) using the two methods of gene expression programing (GEP) and group method of data handling (GMDH) and compare the results.

2. Overview of the Attenuation Relationships

Attenuation relations were first introduced in the 1960s by Neumann. Neumann [5] provided the attenuation model of PGA in 1954 for earthquakes in America. The only parameter involved was the distance of the accelerograph from the earthquake’s center. In Newman’s attenuation relationship, even the role of earthquake magnitude was ignored. Subsequently, models were developed from the 1970s to mid-1980s by Milne and Davenport [6], Esteva [7], Denham and Small [8], Donovan [9], Esteva and Villaverde [10], Orphal and Lahoud [11]. In these models, in addition to the distance of the accelerometer from the earthquake center, the magnitude of the earthquake was also included in the calculations. However, the strong role of soils in providing these relationships and the amount of damage was still ignored. Following the damage caused by the 1964 earthquakes in Niigata, Japan, and Alaska, progress was made in expanding the effective parameters in the attenuation relationships. Since 1976, a new parameter indicating soil type was added to the attenuation relationship [12]. After that, many attenuation models were presented by various researchers for the different regions, some of them are explained below.

In 1981, Joyner and Boore [13] proposed a new attenuation relationship based on the North American earthquakes. The catalog consisted of earthquakes with surface wave magnitude (MS) from 5 to 7.7 and records were taken from stations less than 370 km from the epicenter. Also, soil type has been effective in estimating acceleration as a coefficient for both soil and rock [13].

The Atkinson and Boore [14] attenuation model was also presented in 1990 based on North American earthquakes with a moment magnitude (MW) between 5 and 7, which occurred at a distance of 10–100 km of earthquake accelerometers.

Sarma and Free [15] presented Equation 1 in 1995where M, R, and PGA are the magnitude, the hypocentral distance (km) and PGA (cm/s2), respectively. Also, the variable S in this relation is “zero” for the rock and “one” for the soil.

Following advances in technology and computer science in the early 21st century, intelligent methods have emerged and gradually replaced traditional methods. These changes led to the creation of more complex models in attenuation relationships. In 2007, Sobhaninejad et al. [16] used the intelligent genetic algorithm (GA) method to provide an attenuation relationship. Subsequently, Cabalar and Cevik [17] introduced a new attenuation relationship for Turkey in 2009 using the same method. In 2010, Ornthammarath et al. [18] also introduced a new relationship in which the shear wave velocity was also involved in this relationship and is shown in Equation (2).where Mw is the moment magnitude, rjb is Joyner–Boore distance (km), and PGA is the peak ground acceleration (cm/s2). Ss that specifies the type of soil is selected according to the value of shear wave velocity as Equation 3.

Other researchers who have used this method to present attenuation relationships including Kermani et al. [19], Jafarian et al. [20], Alavi and Gandomi [21], Ghodrati et al. [22, 23], and Gandomi et al. [24]. The relationship presented by Alavi and Gandomi [21] in 2011 is as Equation (4).

In this relation, Mw is the moment magnitude, RClstD is the closest distance to the rupture surface (km), Vs,30 is the average shear-wave velocity over the top 30 m of the site (m/s), and PGA is the peak ground acceleration (cm/s2).

One of the most recent attenuation relations was offered by Kumar et al. [25]. The relationship that was determined for the northeastern Himalaya region is presented as Equation (5):where M is the magnitude, R is the hypocentral distance(km), and PGA is the peak ground acceleration (cm/s2). This relationship is based on the earthquakes that occurred in the northeastern part of the Himalayas with a magnitude range of 4–6.8.

In 2018, Javan-Emrooz et al. [26] presented Equation (6):

In this relation, M and R are the moment magnitude and the hypocentral distance (km), respectively. Also, PGA is the peak ground acceleration in centimeteres per square second (cm/s2). This attenuation model is based on earthquakes in northern Iran with a range of moment magnitude (MW) of 4.5–7.4 and a hypocentral distance of 2–100 km.

Also, Erken et al. [27] provided attenuation relation equations by using nonlinear regression analysis in Northwest Anatolia region for two different site conditions.

Karimi-Ghalehjough and Mahinroosta [28] presented attenuation relationship of horizontal and vertical PGA using regression analysi fuzzy logic model in Iranian plateau. In this study, three different site conditions were considered: rock, stiff soil, and soft soil [28].

Shiuly et al. [29] in 2020 presented a new model for prediction of PGA by using artificial neural network (ANN) and GA in Himalayan region for earthquake data recorded in rock sites.

One of the most recent suggested relationships is the Abdelfattah et al. [30] model, which was presented in 2021 as Equation (7).where ML is the local magnitude, R is the hypocentral distance (km), and PGA is the peak ground acceleration (cm/s2).

Also, in 2021 Kumar et al. [31] offered an attenuation relationship using regression based on the earthquakes that occurred in the central Himalayas with a magnitude range of 5–6.8.

Pourzeynali and Khadivyan [32] have provided attenuation relationship of horizontal and vertical PGA using regression analysis for Alborz zone of Iran.

3. Data Collection

The first step to present the attenuation relationship is to prepare a catalog of occurred earthquakes in a sufficient number and with high accuracy. In the present study, considering that the final relationship is global, records from all over the world were used and were chosen completely randomly and with appropriate dispersion. These records are obtained from the PEER (Public Employees for Environmental Responsibility (https://ngawest2.berkeley.edu/) website by using the possibility of filtering the information of records such as earthquake record number, earthquake name, earthquake recording station name, earthquake occurrence date, earthquake magnitude, fault distance, rupture radius, shear wave velocity, etc. which can be used from one or more of them. In this study, only the filters of magnitude and distance were used, then 1,218 records were found, which include earthquakes that occurred continuously from 1938 onwards.

As mentioned before, in this research, the horizontal component of the PGA is considered the target. Also, magnitude, distance, and the shear wave velocity were considered as variables of the seismic attenuation model. In the following, each of these parameters is described.

3.1. Magnitude (MW)

The magnitude of the earthquake indicates the amount of energy released by the earthquake and is directly related to it. Therefore, it will have the greatest impact on the amount of strong ground motion and it can be considered as the most important and effective parameter in determining the attenuation relationships. Magnitude has a direct effect on the horizontal component of PGA. The higher magnitude of earthquake occurrence will cause higher parameters of strong ground motion at a certain distance from the source of the earthquake.

Magnitude is expressed in different scales due to the variety of recorded seismic waves and the institutions and devices recording these earthquakes. The magnitude types are local magnitude (ML), surface-wave magnitude (MS), body-wave magnitude (mb), and moment magnitude (MW). Local magnitude (ML) based on amplitude recorded by Wood Anderson seismograph, surface-wave magnitude (MS) based on Riley wave amplitude, and body-wave magnitude (mb) based on P-wave amplitude are measured. According to the saturation effects, although the energy released during an earthquake increases, the seismic properties do not necessarily increase at the same rate. Therefore, these three scales do not have high accuracy in large earthquakes [14]. The moment magnitude is independent of the characteristics of the earth’s motion and can accurately define the magnitude of earthquakes. Therefore, in this research, moment magnitude has been used. To prepare the earthquake catalog in this research, earthquakes with a moment magnitude of 4–7.5 have been used because earthquakes with a magnitude of less than 4 are not considered dangerous for construction engineering structures.

3.2. Distance (R)

The distance indicates the path that seismic waves pass from the source of the earthquake to the site. It is the second effective parameter in the attenuation relationships. The distance has an inverse effect on the parameters of the strong ground motion, so if the distance from the site to the epicenter is longer, the PGA will be smaller. The distance in seismology is expressed with different types, which types are: the shortest horizontal distance to the vertical rupture image (rjb), the shortest distance to the rupture surface (rrup), the shortest distance to the seismic rupture surface (rseis), and distance from the epicenter (rhypo) [23].

In this study, the path taken by the waves from the source of the earthquake to the desired site is the most effective parameter on the ground motion. The used catalog data have a hypocentral distance (rhypo) from 15 to 150 km.

3.3. Shear Wave Velocity (VS)

Although most of the path, passed by the waves, is in the bedrock, these waves often pass through a layer of soil at the end of the path. This layer of soil can play a major and significant role in changing the frequency of waves reached to the surface of the ground. It can also cause the damping of some frequencies or intensify other frequencies and change the magnitude of the earthquake. Therefore, this is necessary to consider the ground type and tectonic conditions in providing attenuation relationships [3]. Substrate properties can be entered into computations in both qualitative and quantitative forms. If the effect of this parameter is considered qualitatively, based on the shear wave values, the ground type is divided into several categories and the effect of the ground type is entered as constant coefficients in the calculation of attenuation relationships. If the soil effect is considered quantitatively, the value of shear wave velocity will be used in the relations. According to the ability of the methods used in this research, the effect of the soil type parameter has been applied quantitatively and using the value of shear wave velocity on the attenuation relationships. Obviously, this approach has increased the accuracy of calculations.

3.4. Preparing the Final Catalog

In the first step, data have been collected and a preliminary catalog has been prepared to consist of magnitude, distance, and shear wave velocity of the 1,218 earthquake records. In the next step, all the records were processed using Seismosignal software and considering the appropriate correction frequency range of 0.2–30 [33]. Then the peak acceleration of each recorded earthquake at each station was taken. The important point in this section is that there are three components of acceleration in three orthogonal directions per record, but in the present study, whereas the study was done on the horizontal component of acceleration, the vertical component was set aside. Then for the horizontal components were considered to be the most critical case and for this purpose, square root of the sum of the squares (SRSS) of two orthogonal components was used.

After this stage, due to error possibility in recording and extracting the records, the records were corrected through sorting in an incremental magnitude. In the next step, the graph was drawn in terms of the horizontal component of PGA. Then the points of the graph that have relatively large dispersion were removed from the earthquake catalog. After this step, 1,185 records remain as the final records that are involved in the attenuation relationship. Magnitude distribution diagrams in terms of distance and shear wave velocity in Figure 1 and the number of records in terms of magnitude and distance in Figure 2 are shown.

4. Methods for Estimation of the Attenuation Relationships in This Research

4.1. Gene Expression Programing (GEP)

GEP is a developed GA and genetic programing (GP) that was proposed by Ferreira [34] in 1999. It can be said that this algorithm, like GA and GP, is a GA that uses people as a population, choosing is based on the suitability and introduction of the genetic diversity using one or more genetic operators.

The main elements in GEP are just two elements, chromosomes and expression trees. The structure of chromosomes is organized in such a way that it allows the creation of multiple genes, and each of them is identified as a subexpression tree. The structure of genes is also designed so that each gene has a head and a tail, and this structural and functional organization of the gene in the GEP always ensures the production of valid programs, and it is not important how much they change the depth of the chromosome. Expression trees are also genetic information encoded in chromosomes. As in nature, the process of decoding information is called translation, and this translation clearly indicates the type of code and set of rules. The genetic code is also very simple and shows the one-to-one relationship between chromosome symbols and functions or terminals. The rules are also very simple, they determine the organization consisting of functions and terminals in the development trees and the type of interaction between subexpression trees [35, 36].

Figure 3 shows the steps of the GEP and the operators defined in this algorithm are also shown in Table 1.

4.2. Group Method of Data Handling (GMDH)

The GMDH is used to identify the behavior of nonlinear systems. Among all intelligent computational methods, the GMDH method is known as a self-organized system with a very complex nonlinear problem solving capacity [37, 38].

The GMDH was proposed by Ivakhnenko [39] in 1971. In this algorithm, by keeping the input variables in a flexible network, a connection is made between the input variables and the output, and the possibility of initial estimation of the output by regression equations including small subsets of input variables is provided. Each of these small subsets contains 2–3 independent variables [40, 41]. The main purpose of this method is to build a network based on a quadratic transfer function. The number of hidden layers and neurons, effective input variables, and the optimal structural model is automatically determined in this algorithm [42].

The communication between the input and output variables is done by the data management neural network in a group method which is a nonlinear function called Volterra series, which is as Equation (8).

The Volterra series is processed as a quadratic polynomial by utilizing the following relation (Equation 9).

The purpose of the data management algorithm is to find fixed and unknown coefficients ai in a group method, which will be obtained by regression methods, based on obtaining the least squares of error for each pair of input variables xi and xj [43, 44].

5. Results

In this research, 80% of the records were considered as training records and the remaining 20% as testing records. The method of selecting these records is that the testing records are in the range of training records and allow comparison. As mentioned, in this study, the parameters of magnitude, hypocentral distance, and shear wave velocity are defined as variables and the PGAH as a target function.

5.1. Results of Gene Expression Programing (GEP)

Finally, after several trials, the best answer with the highest accuracy and least error was considered as the final answer. The final relationship chromosomes are shown in Figure 4. In this relation, d0 = M, d1 = R, d2 = V, and the values of the coefficients c0 and c1 for each chromosome are given after that. Also, the links of chromosomes are sum operator (+). By replacing the variables, mathematical operators and the link operator(+) in the calculated equation, the global attenuation relationship for the PGA was obtained as Equation (10).

In this relationship, M is the magnitude, R is the hypocentral distance (km), V is the shear wave velocity (m/s), and PGA is peak ground acceleration (cm/s2). For better visualization of the results, PGA diagrams were drawn in terms of magnitude and hypocentral distance.

Figure 5 shows the PGA variation curve in terms of hypocentral distance (15–150 km) for three magnitudes of 4.5, 5.5, and 6.5. In this diagram, the shear wave velocity is considered 400 m/s. As can be seen, the PGA decreases nonlinearly as it moves away from the epicenter. Figure 6 also shows the PGA variation curve in terms of magnitude (4–7.5) for five hypocentral distances of 20, 30, 60, 100, and 150 km. In this diagram, the shear wave velocity is considered 400 m/s. As can be seen, with increasing magnitude, the PGA increases nonlinearly.

5.2. Results of Group Method of Data Handling (GMDH)

The final attenuation relationship obtained from the GMDH method is shown in Equation (11).

In this relationship, M is the magnitude, R is the hypocentral distance (km), V is the shear wave velocity (m/s), and PGA is peak ground acceleration (cm/s2).

Figures 7 and 8 show the changes in PGA with the attenuation relationship calculated by the GMDH method in terms of magnitude and hypocentral distance.

Figure 7 shows the PGA variation curve in terms of hypocentral distance (15–150 km) for 3 magnitudes of 4.5, 5.5, and 6.5. In this diagram, the shear wave velocity is considered 400 m/s. As can be seen, the PGA decreases nonlinearly as it moves away from the epicenter. Figure 8 also shows the PGA variation curve in terms of magnitude (4–7.5) for five hypocentral distances of 20, 30, 60, 100, and 150 km. In this diagram, the shear wave velocity is considered 400 m/s. As can be seen, while magnitude increases, the PGA increases nonlinearly.

In Figures 58, it can be seen that in addition to the nonlinearity of the diagrams, the lines of each diagram do not follow a pattern. While in the past researches, usually the lines in each diagram had a single pattern. The reason, is due to the methods used in this research. In the past, a fixed formula was used to calculate attenuation relations, which only considered fixed coefficients as variables for different regions. But in the GEP and GMDH methods, due to the nonlinear changes of the chromosomes and the terms of the equation, the patterns are not assumed to be constant.

5.3. Verification and Comparison of Results

As mentioned before, the attenuation relationships obtained from GEP and GMDH methods have high accuracy. The accuracy of each of the relationships presented in this research can be measured with the parameters defined below.

One of these parameters is the root-mean-square error (RMSE), which can be evaluated and calculated by Equation (12).

In the above relation, PGAactual is the PGAH values of the target and the PGAmodel is the PGAH values obtained from the final attenuation relationship presented in this article. It is clear when the predicted values and the target values are equal, the RMSE will be zero, and as this number increases, it indicates more error and less compliance between the target value and the predicted value.

Now by using the RMSE value, the value of the fitness function (f) is calculated from Equation (13).

As a result, from the above equation, the amplitude of the changes in the fit function is 0–1,000. The maximum value is 1,000 for zero error and shows the high proportion of the predicted values with measured values from the previous earthquakes that are the most possible ideal state. The lower value of the fitness function shows a more inadequate matching.

Another parameter that was used to calculate the accuracy is the coefficient of determination (R2), which can be determined by Equation (14).

The range of changes of this parameter is 0–1. The closer this number is to one, the greater the match between the predicted values and the target, and overall, there is a higher accuracy in the results.

Another parameter used in this research to measure accuracy is the standard deviation (SD) of residuals. Using Equation (15) the SD of residuals can be calculated.

For each attenuation relationship obtained from GEP and GMDH methods, the parameters of RMSE, Fitness function, Coefficient of determination, and SD of residuals were calculated and presented in Table 2.

On the other hand, to show the agreement between target and predicted values, it is possible to present a diagram for the difference values between these parameters. These scatter diagrams for GEP and GMDH methods are drawn in Figures 9 and 10, respectively.

Finally, in this section, the relationships presented in this study are compared with some relationships presented in the past. For comparison, it was tried to use attenuation relationships calculated by the other methods. The relationships selected for comparison are: Sarma and Frey [15], Ornthammarath et al. [18], Kumar et al. [25], and Abdelfattah et al. [30], were presented in 1995, 2010, 2017 and 2021, respectively. Also, the relationships of Alavi and Gandomi [21] and Javan et al. [26], which were obtained using intelligent algorithms. For comparison, the values of parameters RMSE, Fitness function, Coefficient of determination, and SD of residuals were calculated for each of these relationships and presented in Table 2. As seen from Table 2, the values of Fitness function and Coefficient of determination in the present study have the highest values and the values of RMSE and SD of residuals have the lowest values compared to the other methods. From this comparison, it can be concluded the methods used in this research are appropriate and highly accurate.

In the continuation of the comparison with other attenuation relationships, comparative diagrams were drawn. The PGAH change curve in terms of hypocentral distance of 15–150 km for earthquakes with magnitudes of 4.5, 6, and 7.5 and shear wave velocities of 400 m/s, for the results of the present study and the mentioned researches, are shown in Figures 1113. As a result, there is an acceptable correlation between the results of the present study and the previous studies.

Also, the horizontal component of the PGA change curve in terms of magnitude is 4–7.5, for a hypocentral distance of 30, 60, and 100 km and shear wave velocities of 400 m/s, belong to the results of the present study and the mentioned researches are shown in Figures 1416. As it is clear from the graphs, the values obtained from the relationships presented in this research are in the range of other researches and have a good match.

6. Conclusion

Every year, new earthquakes occur, accelerometers are installed in new areas and the accelerometer network expands, and in general, the earthquake catalog becomes more complete and accurate. On the other hand, providing new methods for regression has caused that new attenuation relationships are always needed in earthquake risk analysis.

In this study, correcting the existing records and deleting some of them, 1,185 earthquake records were used to provide the attenuation relationship of the horizontal component of PGA. The records used have a moment magnitude of 4–7.5 and a hypocentral distance of 15–150 km. The parameters of moment magnitude, hypocentral distance, and shear wave velocity were considered as variables and the square root of the sum of the squares of peak ground horizontal acceleration in two horizontal orthogonal directions was treated as an objective function. Then new attenuation relationships were extracted by the GEP and the GMDH.

One of the innovations of this research is the addition of the shear wave velocity parameter as a variable in the attenuation relationship, which was rarely seen in the past researches.

The most important advantage of using GEP and GMDH methods in this research is that a predetermined pattern was not used to calculate relationships. As seen, in past researches, PGA changes in terms of magnitude were assumed to be linear, but in this research, it was shown that these changes are nonlinear. Also, changes in terms of hypocentral distance are not assumed to have a fixed pattern and it is observed that the pattern also changes in different intervals. For this reason, as it is clear from the comparisons, the accuracy of these methods is higher and shows a lower amount of error than the previous researches.

Data Availability

The data used in this research have been uploaded in the supplemental files section.

Conflicts of Interest

The authors declare that they have no conflicts of interest.