Abstract

The discrimination of seismic event and nuclear explosion is a complex and nonlinear system. The nonlinear methodologies including Random Forests (RF), Support Vector Machines (SVM), and Naïve Bayes Classifier (NBC) were applied to discriminant seismic events. Twenty earthquakes and twenty-seven explosions with nine ratios of the energies contained within predetermined “velocity windows” and calculated distance are used in discriminators. Based on the one out cross-validation, ROC curve, calculated accuracy of training and test samples, and discriminating performances of RF, SVM, and NBC were discussed and compared. The result of RF method clearly shows the best predictive power with a maximum area of 0.975 under the ROC among RF, SVM, and NBC. The discriminant accuracies of RF, SVM, and NBC for test samples are 92.86%, 85.71%, and 92.86%, respectively. It has been demonstrated that the presented RF model can not only identify seismic event automatically with high accuracy, but also can sort the discriminant indicators according to calculated values of weights.

1. Introduction

The problems of seismic source locations and identifications are two of the most important and fundamental issues in earthquake monitoring, microseismic monitoring, analyses of active tectonics, and assessment of seismic hazards [14].

Seismic analysts identify seismic signals from those of explosions or blasts by visual inspection and by calculating some characteristics of seismogram. As recorded quarry blasts or nuclear explosions can mislead scientists interpreting the active tectonics and lead to erroneous results in the analysis of seismic hazards in the area; an event classification task is an important step in seismic signal processing. Such task analyses data in order to find to which class each recorded event belongs.

Such work supposes a great deal of workload for seismic analysts. Therefore, an automatic classification tool is necessary to be developed for reducing dramatically this arduous task, turning classification as reliable, as well as removing errors associated with tedious evaluations and change of personnel.

Most discrimination methods are designed for a particular source region and a particular distance of the recording station from the epicenter [5]. Some of them heavily depend on the heterogeneity of the uppermost crust in the sense that they might be effective only for a given region.

The widely used methods for discriminators include simulating explosion spectra in order to predict spectral details indicative of explosions and not of earthquakes or single-event explosions [6, 7]; examining compressional and shear-wave ratios (amplitude and spectral) between all types of explosions and earthquakes, in an attempt to apply the basic physical conclusion that explosions excite more compressional waves than earthquakes relative to shear waves [811]; differences in high frequency S-to-P ratios between all types of explosions and earthquakes [1214]; analyzing observed spectra of ripple-fired explosions, instantaneous explosions, and earthquakes and contrasting time-independent modulations, path-independent modulations, spectral ratios, spectral slopes, and spectral maxima and minima [1517]; and examining differences in energy ratios of various wave in velocity windows [18, 19].

However, most of developed methods above are based on single index or liner discriminant methods. And the methods seem to fail to capture the discontinuities, the nonlinearities, and the high complexity of wave series.

Random Forests (RFs), Support Vector Machines (SVMs), and Naive Bayes Classifier (NBC) provide enough learning capacity and are more likely to capture the complex nonlinear models, which are widely used in natural and science areas, including medicine, agriculture, and geotechnics.

So far, as to our knowledge, the RFs and SVMs were not used for seismic classification. The performance of RFs, SVMs, and NBC in this type of application has not been thoroughly compared.

In present work, RF, SVM, and NBC were applied to discriminate between earthquakes and nuclear explosions. And based on the one out cross-validation, ROC curve, and test accuracy, their discriminating performances were discussed and compared.

2. Materials and Methods

2.1. Materials

The measurements or parameters consist of ratios of the “high energies” contained within predetermined “velocity windows” on the seismograms [18]. The choice of velocity windows is guided by the assumption that earthquake source mechanism is extended both in time and space and generates a larger fraction of energy in shear waves as compared to explosion source mechanism.

The different waves of “velocity windows” are listed as follows:(i): first arrival to 4.6 km/s;(ii): arrival to 4.6 to 2.5 km/s;(iii): first arrival to 4.9 km/s;(iv): arrival to 4.9 to 2.0 km/s;(v): arrival to 6.2 to 4.9 km/s;(vi): arrival to 4.9 to 3.6 km/s;(vii): arrival to 3.6 to 3.2 km/s;(viii): arrival to 3.2 to 2.8 km/s; and(ix): arrival to 2.8 to 2.5 km/s.

The factors, including ratios , , , , , , , , and , as well as Average Distance, were expressed as Ratio1, Ratio2, Ratio3, Ratio4, Ratio5, Ratio6, Ratio7, Ratio8, Ratio9, Ratio10, and AD, respectively.

Nine ratios of energies included within certain velocity windows have been computed for 20 earthquakes and 27 nuclear explosions by Booker and Mitronovas [18]. All seismograms were recorded by the VELA UNIFORM LRSM Network on short-period Benioff instruments [18]. Ratio1, Ratio2, Ratio3, Ratio4, Ratio5, Ratio6, Ratio7, Ratio8, Ratio9, and AD were selected as discriminant indicators. -score is used to standardize variables in this work. First, the mean is subtracted from the value for each case, resulting in a mean of zero. Then, the difference between the individual’s score and the mean is divided by the standard deviation, which results in a standard deviation of one. If we start with a variable and generate a variable , the process is

where is the mean of and is the standard deviation of . -score of each ratio and distance for seismic event and nuclear earthquake were listed in Tables 1 and 2, respectively.

Box plot graphs of energy ratios and distance were plotted in Figures 1 and 2, respectively. Each group is represented as a box whose top and bottom are drawn at the lower and upper quartiles, with a small square at the median. Thus, the box contains the middle half of the scores in the distribution. Vertical lines outside the box extend to the largest and the smallest observations within 1.5 interquartile ranges. We conclude that Ratio1, Ratio2, Ratio3, Ratio4, Ratio5, Ratio6, Ratio7, Ratio8, Ratio9, Ratio10, and AD for earthquake and nuclear earthquake are obviously different. Such it is reasonable to select the ten factors as discriminating indicator.

2.2. Methodologies

The first 70% dataset of earthquake and nuclear earthquake were used to establish discriminating models and the other 30% dataset were used to test the model.

2.2.1. Overview of Random Forest

Random Forest (RF), a metalearner comprising many individual trees, was first developed by Tin Kam Ho in 1995 and later improved by Breiman in 2001. It was developed to operate quickly over large datasets and to be diverse by using random samples to build each tree in the forest. Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them [20]. Comprehensive review of applications of Random Forest have been provided by Rodriguez-Galiano et al., [21], Granitto et al. [22], and by Genuer et al. [23]. Also, a number of researches have compared the performance of other data mining technique and Random Forest in different kinds of problems [2326]. The theory of RF is summarized as follows [20].

A Random Forest is a classifier consisting of a collection of tree-structured classifiers , where the are independent identically distributed random vectors and each tree casts a unit vote for the most popular class at input [18].

Given an ensemble of classifiers and with the training set drawn at random from the distribution of the random vector , define the margin function as where is the indicator function. The margin measures the extent to which the average number of votes at for the right class exceeds the average vote for any other class. The larger the margin, the more confidence in the classification. The generalization error is given by where the subscripts indicate that the probability is over the space. In Random Forests, . For a large number of trees, it follows from the Strong Law of Large Numbers and the tree structure the following.

As the number of trees increases, for almost surely all sequences converges to

The margin function for a Random Forest is and the strength of the set of classifiers is Assume that , Chebychev’s inequality gives A more revealing expression for the variance of is derived in the following. Let so

The raw margin function is Thus, is the expectation of with respect to . For any function the identity holds where are independent with the same distribution, implying that Using (12) gives where is the correlation between and holding fixed and is the standard deviation of holding fixed. Then, where is the mean value of the correlation; that is, Write

In this work, A RF model of discriminating between natural earthquake and nuclear earthquake is established with optimal 5000 NT trees and 8 variables in rode. In the developed RF model, the calculated weighted values of the Ratio1, Ratio2, Ratio3, Ratio4, Ratio5, Ratio6, Ratio7, Ratio8, Ratio9, and AD are 1.2713, 0.1034, 0.0759, 0.3093, 0.3432, 0.1782, 0.2536, 0.0943, 0.2463, and 0.1512, respectively.

2.2.2. SVM Algorithm

The original SVM algorithm was invented by Vladimir N. Vapnik and the current standard incarnation (soft margin) was proposed by Cortes and Vapnik in 1995 [27].

SVM models were originally defined for the classification of linearly separable classes of objects. For any original separable set of two-class objects SVM are able to find the optimal hyperplanes that separates them providing the bigger margin area between the two hyperplanes. Furthermore they can also be used to separate classes that cannot be separated with a linear classifier.

The feature space in which every object is projected is a high dimensional space in which the two classes can be separated with the linear classifier. The effectiveness of SVM depends on the selection of kernel, the kernel's parameters, and soft margin parameter .

In the present work we used the Radial Basis Function (RBF) as Kernel functions for the SVM models because of its efficiency in providing very high performance classification results. The optimal RBF parameters and gamma were found to be 9 and 0.6, respectively, reassuring that the model does not over fit.

2.2.3. Naive Bayes Classier

The Naive Bayes Classier produces a very efficient probability estimation based on a simply structure, requiring a small amount of training data to estimate the parameters necessary for the classification. Its construction relies on two main assumptions: independency of features and absence of hidden or latent attributes.

An advantage of Naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

The aim of the NBC, as with other classifiers, is to assign an object to one of a discrete set of categories based on its observable attributes . The NBC calculates the probability that belongs to each category, conditioning on the observed attributes; is typically assigned to the category with the greatest probability. This classifier is naive in the sense that it makes the strong assumption that the attributes are mutually conditionally independent; that is, the conditional probability that belongs to a particular class given the value of some attribute is independent of the values of all other attributes. Despite this unrealistic assumption, empirical studies demonstrate that this assumption does not need to significantly compromise the accuracy of the prediction, and NBCs are used in a variety of applications, including document classification [28], medical diagnosis [29, 30], systems performance management [31], probability classification of rockburst [32], and other fields. Domingos and Pazzani [33] prove optimality of the NBC under certain conditions even when the conditional independence assumption is violated.

In this paper, the prior probability of natural earthquake and nuclear earthquake is calculated according to the size of data. The prior probabilities of earthquake and nuclear earthquake are 0.424 and 0.576, respectively.

The discriminate functions for the earthquake and nuclear are If , the record is an earthquake, otherwise a nuclear event.

2.2.4. Classification Performance

ROC is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied [34]. It is created by plotting the fraction of true positives out of the positives (TPR = true positive rate) versus the fraction of false positives out of the negatives (FPR = false positive rate), at various threshold settings.

ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making.

In this study, seismic event and nuclear explosion were considered as a two-class prediction problem (binary classification), in which the outcomes were labeled either as positive (, events) or negative (, blasts). There are four possible outcomes from a binary classifier. If the outcome from a prediction is and the actual value is also , then it is called a true positive (TP); however, if the actual value is then it is said to be a false positive (FP). Conversely, a true negative (TN) has occurred when both the prediction outcome and the actual value is and false negative (FN) is when the prediction outcome is , while the actual value is .

An experiment from positive and negative was defined, for instance. The four outcomes can be formulated in a 2 × 2 contingency table or confusion matrix, as follows in Table 3.

The specificity or true negative rate (TNR) is defined as the percentage of seismic record which is correctly identified as being blast: The quantity 1-specificity is the false positive rate (FPR) and is the percentage of seismic records that are incorrectly identified as being blasts. The sensitivity or true positive rate (TPR) is defined as the percentage of seismic records which is correctly identified as being events: The accuracy (ACC) can be expressed as

3. Results and Discussions

The back-test classification for training samples is calculated using established models. The back-test accuracies of RF, SVM, and NBC are 100%, 100%, and 96.97% for training samples, respectively. The one out cross-validation method was used to validate the methods. Results show that accuracies of RF, SVM (RBF), SVM (liner), and NBC are 100%, 96.97%, and 84.88%, respectively.

The ROC curve is also used to verify and compare the discriminating performance of established models. The established RF model, SVM model, and NBC model were applied to both the training and test samples. The ROC curve is shown in Figure 3. The area under the curve is listed in Table 4. The classification results of test samples using all developed models are presented in Table 5.

In Figure 3, the closer a result from a contingency table is to the upper left corner, the better it predicts, but the distance from the random guess line in either direction or area under curve is the best indicator of how much predictive power a method has.

As shown in Figure 3 and Table 4, the result of RF method clearly shows the best predictive power with a maximum area of 0.975 among RF, SVM, and NBC. The result of SVM (area: 0.963) is better than that of NBC (area: 0.956).

According to Table 5, we can get the discriminant accuracy of RF, SVM, and NBC for test samples; their accuracy are 92.86%, 85.71%, and 92.86%, respectively. From back test results, one out cross-validation, ROC, and test results, we get the conclusion that RF discriminant model has the best accuracy and discriminant ability. Also, according to weighted values of RF, the most important factors are Ratio1, followed by Ratio5, Ratio4, Ratio7, Ratio9, Ratio6, AD, Ratio2, Ratio8, and Ratio3.

4. Conclusions

RF, SVM, and NBC were applied to seismic event identification. A thorough investigation of the discrimination capabilities of the techniques were undertaken using seismograms from 20 earthquakes and 27 nuclear explosions. Ratios , , , , , , , , and within certain velocity windows, as well as average distance, were selected as discriminant indicators.

The classification performance of RF, SVM, and NBC was analyzed and compared based on back test of training samples, one out cross-validation, and ROC curve. The result of RF method clearly shows the best predictive power with a maximum ROC area of 0.975 among RF, SVM, and NBC. The result of SVM (area: 0.963) is better than that of NBC (area: 0.956). Test results show the discriminant accuracies of RF, SVM, and NBC are 92.86%, 85.71% and 92.86%, respectively.

From back-test results, one out cross-validation, ROC curve, and test results, we get the conclusion that RF discriminant model has the best accuracy and discriminant ability. Not only can RF discriminant method be applied to seismic identification with high accuracy, but also it can give the weighted sorts of discriminant indicators. In this study, the most important factors are Ratio1, followed by Ratio5, Ratio4, Ratio7, Ratio9, Ratio6, AD, Ratio2, Ratio8, and Ratio3.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors gratefully acknowledge the financial support of the National Natural Science Foundation of China (50934006 and 41272304), National Basic Research (973) Program of China (2010CB732004), China Scholarship Council (CSC), Scholarship Award for Excellent Doctoral Student from Ministry of Education of China (105501010), and Support Program for Cultivating Excellent Ph.D. Thesis of Central South University.