Abstract

Seismicity partitioning is an important step in geological structure interpretation and seismic hazard assessment. In this paper, seismic event location and Euclidean distance were selected as the -Means cluster, the Gaussian mixture model (GMM), and the self-organizing maps (SOM) input features and cluster determination measurement, respectively, and 1516 seismic events obtained from the Yongshaba mine (China) were chosen for the cluster analysis. In addition, a Silhouette and Krzanowski-Lai- (KL-) combined S-KL index was proposed to obtain the possible optimum cluster number and to compare the cluster methods. Results show that the -Means cluster obtains the best cluster “quality” with higher S-KL indexes on the whole and meaningful clusters. Furthermore, the optimal number for detailed geological structure interpretation is confirmed as eleven clusters, and we found that two areas probably have faults or caves, and two faults may be falsely inferred by mine geologists. Seismic hazard assessment shows that C5 and C7 have a high mean moment magnitude () and C1, C2, C3, and C4 have a relatively high , where special attention is needed when mining. In addition, C7 is the most shear-related area with a mean S-wave to P-wave energy ratio () of 41.21. In conclusion, the -Means cluster provides an effective way for mine seismicity partitioning, geological structure interpretation, and seismic hazard assessment.

1. Introduction

Microseismic monitoring has been widely used in mining [13], where seismicity partitioning is an important step in geological structure interpretation and seismic hazard assessment [4]. Although this task can be implemented by an expert visually, the expert knowledge-based cluster is nonquantitative, subjective, and hard to interpret for a large amount of data [5]. Intelligent cluster analysis, taking advantages of spatial, temporal, and other features of seismic events, is more highlighted due to its advances in extracting useful knowledge and finding hidden information from numerous seismic data [4]. Motivated by these facts, various cluster methods have been introduced/proposed for seismicity partitioning, among which -Means cluster is the most popular one. However, it is still hard to determine the -Means cluster number, and few works have been done to compare it with other cluster methods.

The general outline of this paper is shown in Figure 1. Firstly, the -Means cluster, taking advantages of seismic event location features , was applied to cluster 1516 seismic events () obtained from the Yongshaba mine (China). Then, to test the effectiveness of the -Means cluster, the Gaussian mixture model (GMM) and the self-organizing maps (SOM) were selected as comparisons. In addition, the Silhouette and Krzanowski-Lai- (KL-) combined S-KL index was proposed to obtain possible optimum cluster numbers and to compare the cluster methods. Results show that the -Means cluster obtains a best cluster “quality” with higher S-KL indexes on the whole and continuous seismic clustering areas. The optimal number of clusters for detailed geological structure interpretation is confirmed as eleven clusters, and seismic hazard assessment shows that C5 and C7 have a high mean moment magnitude (), and C1, C2, C3, and C4 have a relatively high , where special attention is needed when mining. In addition, C7 is the most shear-related area with a mean S-wave to P-wave energy ratio () of 41.21, and we found that two areas probably have faults or caves, and two faults may be falsely inferred by mine geologists.

2. State of the Art

The commonly used seismicity partitioning methods include the -Means cluster, the hierarchical cluster, the self-organizing maps (SOM), the fuzzy cluster, the Gaussian mixture model (GMM), the density-based clustering algorithm (DBSCAN), and some other cluster means, which have been listed in Table 1.

Hierarchical cluster has been applied for seismicity partitioning since the 1990s, and Davis and Frohlich [11], Frohlich and Davis [12], Wardlaw et al. [13], Hudyma [14], Hudyma and Potvin [15], Hashemi and Mehdizadeh [16], and Hashemi and Karimi [17] selected the hierarchical clustering technique using single-link analysis/Ward’s method to evaluate spatial and temporal properties of earthquake catalogues. However, the hierarchical cluster has a tendency to form seismic events into linear groups [12]. SOM is another widely used seismic cluster method, and Zamani and Hashemi [5] introduced a self-organized tectonic zoning for Iran, and then Zamani et al. [18] chose Wilk’s Lambda criterion and a relative discrepancy of Wilk’s Lambda to determine the optimum tectonic zoning number; and Mojarab et al. [19] discussed the effect of SOM input parameters. Fuzzy clusters have also been used in seismic clusters, Ansari et al. [4] and Benitez et al. [21] used the Gath and Giva (GG) fuzzy cluster for Iran and South West Colombia, respectively; while Monem and Hashemy [20] applied the fuzzy -means (FCM) and the Gustafson-Kessel (GK) clusters to the Ghazvin canal irrigation network. In addition, the density-based algorithms [2224] and the TriGen-based method [25] have also been applied for the seismic cluster. A comparison of using different seismic clusters can be found in Leśniak and Isakow [26] and Konstantaras et al. [27].

-Means cluster analysis can be applied to the hypocentral distribution of observed earthquakes in any region [6], which has been widely used in seismic cluster. Weatherill and Burton [6] implemented the -Means cluster to provide a catalogue of earthquakes for Aegea, and results demonstrate that from 20 to 30 clusters are the most appropriate number in capturing the hypocentral distribution and fault type. Rehman et al. [7] applied the -Means cluster to estimate seismic hazard and risk in Pakistan and found that 19 clusters were the best to include the sufficient knowledge of subsurface geological structure and geophysical information. Morales-Esteban et al. [8] proposed an adaptive Mahalanobis -Means algorithm to study the seismic catalogues of the Croatia and Iberian Peninsula, and the result showed the ability of this method to discover not only circular but also elliptical shapes. Ramdani et al. [9] took advantages of earthquake centroids of the -Means cluster to find subduction evidence beneath the Gibraltar Arc and Andean regions. Besheli et al. [10] used the -Means cluster to partition Iran into six main clusters and to study the importance of foreshock precursors for different zones.

It can be seen that most seismic clusters are focused on earthquake data, and few works have been done on mine seismic clusters. Therefore, the cluster adaptively to mine seismic events should be discussed. Though the -Means cluster has been widely used now, it is still hard to determine the -Means cluster number and few works have been done to compare it with other cluster methods. In addition, Table 1 shows that most optimum cluster numbers are below 20, and the Silhouette index and the Krzanowski-Lai (KL) index are the most frequently used indexes to evaluate cluster “quality.” In this paper, the -Means cluster was applied to partition 1516 seismic events obtained from the Yongshaba mine (China) and the GMM and SOM clusters were selected as comparisons, and the cluster results with cluster numbers from 2 to 20 have been evaluated by the Silhouette and Krzanowski-Lai- (KL-) combined S-KL index.

3. Methodology

3.1. -Means Cluster

The -Means cluster is a hard partitioning algorithm proposed by Hartigan and Wong [28]. A dataset of seismic events with dimensions consists of an matrix (1), and the -Means partitions the dataset into clusters with each element allocated to a particular cluster.where each row represents one seismic event and each column represents one variable.

The -Means cluster is an iterative process with initial cluster centroids, and each datum is allocated to its nearest cluster centroid. Then, the mean of each group is calculated to obtain a new cluster centroid. The procedure terminates when no datum changes clusters or the iteration number reaches a preset maximum. The details of the -Means cluster are as follows.

Step 1. Import dataset , set the maximum iteration number, and randomly generate points to be initial cluster centroids .

Step 2. Calculate the Euclidean distance between each datum and each cluster centroid, and the data is allocated to its nearest cluster centroid.

Step 3. Calculate the mean of each cluster by , where and is the data number in cluster .

Step 4. Has any datum changed its cluster?(1)Yes: continue and repeat Steps 25.(2)No: output cluster each datum allocates to and list the cluster centroids.

Step 5. Has it reached the maximum iteration number?(1)No: continue, and repeat Steps 25.(2)Yes: output cluster each datum allocates to and list the cluster centroids.

3.2. Cluster Validity Indexes

Assessment of the “quality” of a partition is an important consideration in cluster analysis. From Section 2, it is known that the Silhouette and the KL indexes are the most widely used, and their brief introductions are given as follows.

3.2.1. Silhouette Index

The Silhouette provides a measure of separation quality between clusters, and the higher the Silhouette index, the better the cluster results [29]. For an event belonging to cluster , the Silhouette width is defined aswhere is the average distance between event and other events in cluster and is the minimum distance between event and events in cluster .

The average Silhouette width and the Silhouette index are defined by (3) and (4), respectively,

3.2.2. Krzanowski-Lai Index

The KL index produces the traces of the within-clusters matrices for two consecutive clusters, and the higher the KL index, the better the cluster results [30]. The KL index is defined aswhere and is the number of features in the dataset. is the sum of all point-to-point distances of observations within each cluster, summed across all clusters.

4. Application of -Means Cluster

4.1. Data Set

We used 1516 seismic events as the cluster dataset, which were obtained from the Yongshaba mine (26°38′N, 106°37′E) from 01/01/2014 to 04/30/2014, with moment magnitude larger than −1.5 (Note: the Institute of Mine Seismology (IMS) system of the Yongshaba mine can monitor a seismic event whose moment magnitude is as small as −3; however, events too small usually have a large location error due to unclear P arrivals and less triggered sensors. In this paper, we chose a moment magnitude larger than −1.5 for seismic cluster).

The distribution of geological structures and sensors of Yongshaba mine is shown in Figure 2(a). It shows that the IMS system contains 28 sensors, including two triaxial sensors (T1 and T2) and twenty-six uniaxial sensors (sensors from 1 to 26), and the sensors have a sampling frequency of 6000 Hz. There are seven main faults, namely, F331, F310, F316, F313, F350, F302, and F309, in which the F350 is inferred by mine geologists. In addition, there are 18 small faults, which are marked from I to XVIII, and the faults V, VI, X, and XI are inferred by mine geologists. The symbols ① and ② represent two caves.

Figure 2(b) shows the two view distributions ( and views) of the seismic dataset. The color and size of the circle represent seismic moment magnitude. It can be seen that the main seismic moment magnitude is 0.0~0.5, then 0.5~1.0, and −0.5~0.0. There are some seismic moment magnitudes smaller than −0.5 (It can be interpreted as that there are many small moment magnitude seismic events not recorded by the system, though the system can monitor a seismic moment magnitude as small as −3) and few seismic moment magnitudes are larger than 1.0.

It is easy to see that the main faults are distributed in the north, then in the south, and fewer faults in the middle, which are in a good agreement with the distribution of seismic events. Therefore, we believe that there should be a good relationship between seismic events and geological structures which needs further study.

4.2. Possible Cluster Number Selection

To compare the cluster “quality” and select possible cluster numbers, the most commonly used indexes Silhouette and KL are selected, and their values of -Means, GMM, and SOM clusters are shown in Figure 3, in which the seismic event location is selected as cluster input variables.

It is easy to note that the Silhouette values within 4 clusters are higher than those of cluster numbers larger than 4 and that the -Means cluster has a higher Silhouette index than that of the GMM and the SOM clusters on the whole. The three clusters all have a high KL value when , and there are some local maximum KL values. Therefore, it is hard to determine possible cluster numbers by considering both Silhouette and KL indexes. To solve this problem, a Silhouette and KL combined S-KL index (6) is proposed, which takes advantage of the product of the normalized Silhouette index and KL index, and the normalization is calculated by (7). The relatively large S-KL values correspond to possible optimum cluster numbers. Generally, the -Means cluster has higher S-KL indexes than those of the GMM and the SOM clusters, which means that the -Means obtains a best cluster “quality” among the three clusters, and the cluster numbers 2, 3, 6, 9, 11 and 15 are selected as possible cluster numbers for further study.

4.3. -Means Cluster Results

Seismicity partitioning with few clusters produces a simple and general classification defining the most basic structure of the area under investigation; however, it is hard to interpret a detailed geological structure, while for a too large cluster number, there may be some clusters that are hard to interpret [7]. Therefore, a balanced cluster number should be selected as the optimum cluster number to interpret the detailed geological structure. Five typical variables, number of seismic events (), maximum moment magnitude , mean moment magnitude (), mean S-wave to P-wave energy ratio , and number of events with moment magnitude >0.5 (), are selected for seismic hazard assessment and seismic source mechanism analysis, where is a measure to evaluate the seismic source mechanism, and the higher the is, the more the geological structure is shear-related [14].

The cluster results and typical variables of 2, 3, 6, 9, 11, and 15 clusters are shown in Figure 4, and the clustering process diagram and geological structure interpretation of 11 clusters are shown in Figure 5 for better understanding. Figures 4 and 5 show that the -Means has a stable cluster result (there are some similar zones for different cluster numbers), and the number of events in each cluster has no significant difference, which indicates that the -Means has a good cluster performance.

When the cluster number is less than 6, the cluster results are in a good agreement with regional faults (the cluster results match with known geological structures). In the two-class division (Figure 4(a)), the cluster divides the dataset into north zone (C1, concentrated faults) and south zone (C2, less faults), and C1 has many more events and a higher than that of the C2; that is to say that the north zone is more active and dangerous than the south zone. Then the dataset was divided into north zone (C1), middle zone (C2), and south zone (C3), looking for a relation between the clustering and the existing geology. C2 has a nearly equal to C3, though there are much more seismic events in C3, so C2 should also be paid attention when mining. In addition, C2 is more shear-related than C3. For the six-partition division (Figure 4(c)), the clusters C1, C2, and C3 with concentrated faults have a higher and more than C4, C5, and C6, and C5 and C6 are less shear-related.

When the cluster number is larger than 9, the cluster results mainly correspond to the detailed geological structures (most clusters for just have few faults except the cluster C4). For the nine-partition division (Figure 4(d)), C2, C3, and C4 have relatively large and , while C1 and C5 have a relatively large or . C6 has a relatively large , while C7 and C9 have both low and . However, there still are many faults in C4, and a more detailed cluster should be studied.

The 11-partition division (Figure 4(d)) separates C4 into clusters C5, C6, and C7; a detailed geological structure interpretation is shown in Figure 5. It can be seen that C7 has the highest and , with values of 0.33 and 41.21, respectively; thus special attention should be paid to the C7 area. Further study shows that there should be faults or caves in areas a and b, and the cluster analysis also confirms that there should be faults X~XII, F350a, and F350b. However, there are probably no faults for VI and the bottom F309, which needs further study, while for , the clusters C3, C4, C5, and C6 are so connected, which makes it hard for geological structure interpretation, and the clusters C14 and C15 contain areas too large to find detailed geological structure information. In conclusion, the authors believe that the 11 clusters are the best for a detailed geological structure interpretation.

5. Cluster Comparisons

For a better comparison of the -Means, GMM, and SOM clusters, the same cluster number of is selected and a possible optimum cluster number near 11 is also discussed in this analysis ( for the GMM cluster and for the SOM cluster), and the cluster results are shown in Figure 6.

It can be observed that there are some discontinuous zones for both GMM and SOM clusters. For the GMM cluster: (1) , C1 surrounds C2 and C3, and C6 is isolated, while C8 and C9 are mixed together; (2) , C1 also surrounds C2 and C3, and C8 is isolated. For and of the SOM cluster, the clusters C1, C2, C3, and C4 are mixed together, while C7 is isolated. In addition, there are small seismic event number clusters for both GMM and SOM clusters. For example, C6 and C11 for the GMM cluster have small seismic event numbers. In conclusion, the results of the GMM and SOM clusters cannot be considered satisfactory, and they partition some areas with no meaningful information.

6. Discussions

The above analysis has shown that the -Means cluster is effective in seismicity partitioning for geological structure interpretation. The GMM cluster is on the basis that the clusters follow the Gauss distributions, and its aim is to make the probability function maximum. The GMM cluster can be affected by the seed number, and the GMM cluster results () of seed = 10, 25, 50, and 100 are shown in Figure 7(b). It is easy to observe that all the four GMM cluster results have discontinuous zones or cross-section, such as the C1, C2, C9, and C10 when seed number is equal to 10. This can be interpreted as that the neighboring seismic event locations sometimes are not in the Gauss distribution and the close seismic events may not be subjected to the same Gauss distribution. The SOM converts high dimension input space into a low dimension representation (typically two dimension), which has a very good image of visualizing. The SOM cluster can be affected by the learning rate, ordering Epochs, and lattice width and height. The lattice width × height is set to ; then we randomly selected ten combinations of learning rate (0.1, 0.3, 0.5, 0.7, and 0.9) and ordering Epochs (2000, 3000, 5000, and 10000) to test the SOM cluster (). However, we obtain the same result with Figure 6(c); this may be due to the fact that the input variable just has a few dimensions (three dimensions), which makes the SOM cluster find an optimum result with different parameters. However, research has shown that the SOM may only be valid for zoning high seismic activity areas [25], and, for low seismic activity zones, it may have bad cluster results. The -Means cluster can be affected by the initial clustering centers, and the typical -Means cluster results are shown in Figure 7(a). It is easy to see that most -Means cluster zones are the same, though there may be some difference using different initial clustering centers: there may be some seismic events that are not in the same cluster, for example, C7 and C8 between Figures 7(a1) and 7(a2); the clusters may be divided into different clusters or combined into one cluster, for example, C1 and C2 in Figure 7(a1) are equal to C1 in Figure 7(a3), and C10 in Figure 7(a1) is equal to C6 and C10 in Figure 7(a3). However, there is no assumption of Gauss distribution for the -Means cluster, and it partitions closer events to one cluster. This makes the -Means cluster obtain continuous cluster zones and help us better interpret the relation between seismicity and geological structure.

7. Conclusions

Three intelligent spatial clusters, the -Means cluster, the GMM, and the SOM, have been applied to partition the seismicity of the Yongshaba mine (China). Then, the Silhouette and KL combined S-KL index was applied to obtain possible optimum cluster numbers and to compare the cluster methods. Results show that the -Means cluster obtains the best cluster “quality” with higher S-KL indexes on the whole and meaningful clusters. The optimal cluster number for detailed geological structure interpretation is confirmed to be eleven clusters, and we found that two areas probably have faults or caves, and two faults may be falsely inferred by mine geologists. Furthermore, seismic hazard assessment shows that C5 and C7 () have a high mean moment magnitude (), and C1, C2, C3, and C4 () have a relatively high , where special attention is needed when mining. In addition, C7 () is the most shear-related area, with a mean S-wave to P-wave energy ratio ) of 41.21. In conclusion, the -Means cluster provides an effective way for mine seismicity partitioning, geological structure interpretation, and seismic hazard assessment.

Additional Points

Highlights. (i) The -Means cluster has been applied for seismicity partitioning, geological structure interpretation, and seismic hazard assessment. (ii) The possible optimum cluster number is determined by the proposed Silhouette and Krzanowski-Lai- (KL-) combined S-KL index. (iii) The -Means cluster has a better seismicity partitioning “quality” for geological structure interpretation than that of the Gaussian mixture model (GMM) and the self-organizing maps (SOM).

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The authors gratefully acknowledge the financial support of the National Key Research and Development Program of China (2016YFC0600706), National Natural Science Foundation of China (51504044), the Research Fund of The State Key Laboratory of Coal Resources and Mine safety Project (14KF05), CUMT, China, and the Research Fund of Chongqing Basic Science and Cutting-Edge Technology Special Projects (cstc2016jcyjA1861).