Scaling, Self-Similarity, and Systems of Fractional OrderView this Special Issue
Research Article | Open Access
Longjun Dong, Xibing Li, Gongnan Xie, "Nonlinear Methodologies for Identifying Seismic Event and Nuclear Explosion Using Random Forest, Support Vector Machine, and Naive Bayes Classification", Abstract and Applied Analysis, vol. 2014, Article ID 459137, 8 pages, 2014. https://doi.org/10.1155/2014/459137
Nonlinear Methodologies for Identifying Seismic Event and Nuclear Explosion Using Random Forest, Support Vector Machine, and Naive Bayes Classification
The discrimination of seismic event and nuclear explosion is a complex and nonlinear system. The nonlinear methodologies including Random Forests (RF), Support Vector Machines (SVM), and Naïve Bayes Classifier (NBC) were applied to discriminant seismic events. Twenty earthquakes and twenty-seven explosions with nine ratios of the energies contained within predetermined “velocity windows” and calculated distance are used in discriminators. Based on the one out cross-validation, ROC curve, calculated accuracy of training and test samples, and discriminating performances of RF, SVM, and NBC were discussed and compared. The result of RF method clearly shows the best predictive power with a maximum area of 0.975 under the ROC among RF, SVM, and NBC. The discriminant accuracies of RF, SVM, and NBC for test samples are 92.86%, 85.71%, and 92.86%, respectively. It has been demonstrated that the presented RF model can not only identify seismic event automatically with high accuracy, but also can sort the discriminant indicators according to calculated values of weights.
The problems of seismic source locations and identifications are two of the most important and fundamental issues in earthquake monitoring, microseismic monitoring, analyses of active tectonics, and assessment of seismic hazards [1–4].
Seismic analysts identify seismic signals from those of explosions or blasts by visual inspection and by calculating some characteristics of seismogram. As recorded quarry blasts or nuclear explosions can mislead scientists interpreting the active tectonics and lead to erroneous results in the analysis of seismic hazards in the area; an event classification task is an important step in seismic signal processing. Such task analyses data in order to find to which class each recorded event belongs.
Such work supposes a great deal of workload for seismic analysts. Therefore, an automatic classification tool is necessary to be developed for reducing dramatically this arduous task, turning classification as reliable, as well as removing errors associated with tedious evaluations and change of personnel.
Most discrimination methods are designed for a particular source region and a particular distance of the recording station from the epicenter . Some of them heavily depend on the heterogeneity of the uppermost crust in the sense that they might be effective only for a given region.
The widely used methods for discriminators include simulating explosion spectra in order to predict spectral details indicative of explosions and not of earthquakes or single-event explosions [6, 7]; examining compressional and shear-wave ratios (amplitude and spectral) between all types of explosions and earthquakes, in an attempt to apply the basic physical conclusion that explosions excite more compressional waves than earthquakes relative to shear waves [8–11]; differences in high frequency S-to-P ratios between all types of explosions and earthquakes [12–14]; analyzing observed spectra of ripple-fired explosions, instantaneous explosions, and earthquakes and contrasting time-independent modulations, path-independent modulations, spectral ratios, spectral slopes, and spectral maxima and minima [15–17]; and examining differences in energy ratios of various wave in velocity windows [18, 19].
However, most of developed methods above are based on single index or liner discriminant methods. And the methods seem to fail to capture the discontinuities, the nonlinearities, and the high complexity of wave series.
Random Forests (RFs), Support Vector Machines (SVMs), and Naive Bayes Classifier (NBC) provide enough learning capacity and are more likely to capture the complex nonlinear models, which are widely used in natural and science areas, including medicine, agriculture, and geotechnics.
So far, as to our knowledge, the RFs and SVMs were not used for seismic classification. The performance of RFs, SVMs, and NBC in this type of application has not been thoroughly compared.
In present work, RF, SVM, and NBC were applied to discriminate between earthquakes and nuclear explosions. And based on the one out cross-validation, ROC curve, and test accuracy, their discriminating performances were discussed and compared.
2. Materials and Methods
The measurements or parameters consist of ratios of the “high energies” contained within predetermined “velocity windows” on the seismograms . The choice of velocity windows is guided by the assumption that earthquake source mechanism is extended both in time and space and generates a larger fraction of energy in shear waves as compared to explosion source mechanism.
The different waves of “velocity windows” are listed as follows:(i): first arrival to 4.6 km/s;(ii): arrival to 4.6 to 2.5 km/s;(iii): first arrival to 4.9 km/s;(iv): arrival to 4.9 to 2.0 km/s;(v): arrival to 6.2 to 4.9 km/s;(vi): arrival to 4.9 to 3.6 km/s;(vii): arrival to 3.6 to 3.2 km/s;(viii): arrival to 3.2 to 2.8 km/s; and(ix): arrival to 2.8 to 2.5 km/s.
The factors, including ratios , , , , , , , , and , as well as Average Distance, were expressed as Ratio1, Ratio2, Ratio3, Ratio4, Ratio5, Ratio6, Ratio7, Ratio8, Ratio9, Ratio10, and AD, respectively.
Nine ratios of energies included within certain velocity windows have been computed for 20 earthquakes and 27 nuclear explosions by Booker and Mitronovas . All seismograms were recorded by the VELA UNIFORM LRSM Network on short-period Benioff instruments . Ratio1, Ratio2, Ratio3, Ratio4, Ratio5, Ratio6, Ratio7, Ratio8, Ratio9, and AD were selected as discriminant indicators. -score is used to standardize variables in this work. First, the mean is subtracted from the value for each case, resulting in a mean of zero. Then, the difference between the individual’s score and the mean is divided by the standard deviation, which results in a standard deviation of one. If we start with a variable and generate a variable , the process is