Abstract

This exploration is aimed at quickly obtaining the spatial position information of microseismic focal points and increasing the accuracy of microseismic rapid positioning, to take timely corresponding measures. A microseismic focal point location system completely different from the traditional microseismic location method is proposed. The search engine technology is introduced into the system, which can locate the microseismic focal point quickly and accurately. First, the propagation characteristics of microseismic signals in coal and rock layers are analyzed, and the focal position information is obtained. However, the collected microseismic signal of the coal mine contains noise, so it is denoised at first. Then, a waveform database is established for the denoised waveform data and focal point position. The structure and mathematical model of the location-sensitive hash (LSH) based on stable distribution are introduced and improved, and the optimized algorithm multiprobe LSH is obtained. The microseismic location model is established according to the characteristics of microseismic data. The values of three parameters, hash table number, hash function family dimension, and interval size, are determined. The experimental data of the parameters of the search engine algorithm are analyzed. The results show that when the number of hash tables is 6, the dimension of the hash function family is 14, and the interval size is 8000, the retrieval time reaches a relatively small value, the recall rate reaches a large value, and the proportion of retrieved candidates is large; the parameters of the search engine algorithm of the measured coal mine microseismic data are analyzed. It is obtained that when the number of hash tables is 4, the dimension of the hash function family is 6, and the interval size is 500, the retrieval time reaches a relatively small value, the recall rate obtains a large value, and the proportion of retrieved candidates is large. The contents studied are of great significance to the evaluation of destructive mine earthquakes and impact risk.

1. Introduction

In recent years, with the rapid progress of the Internet of things (IoT) technology, intelligent sensing algorithm, and computer vision, accurate and fast search of large-scale spatial data has become a major problem perplexing researchers worldwide. The key point of searching large-scale high-dimensional spatial data is to convert the data points. In this way, the distance between the data points and the query can be approximately regarded as the distance between compact codes based on efficient calculation, so as to shorten the time required for retrieval.

The advantage of using this kind of retrieval technology is that it can find large-scale data accurately without compressing it [1, 2]. Combining this technology with microseismic focal point location can realize the balance between microseismic focal point calculation efficiency and positioning accuracy, providing a feasible scheme for fast microseismic focal point location. The application of the location-sensitive hash (LSH) algorithm to the establishment of the microseismic search engine can quickly find and interpret the waveform data most similar to the input microseismic waveform in the built complete database. The spatial position of the microseismic focal point can be accurately located [35] through this method. LSH is involved in computing problems in multiple fields, such as computer vision, language computing statistics, drug design, and text clustering. Therefore, massive scholars have studied the specific problems in its application. Some researchers verify the entropy-based LSH by querying the adjacent data points of randomly generated disturbance points. Finally, the data result of the query point and its set will be returned. This method can reduce the number of hash tables, improve the utilization of hash tables, and save storage space [6]. LSH is also widely used in the fields of audio and image search in China. However, its main application is to detect the repeatability and approximation of web pages to reduce the cost of web page indexing.

Today, LSH technology has encountered a bottleneck problem. Traditional LSH technology usually needs massive hash tables to ensure precision and recall. However, too many hash tables will lead to too much memory, causing some difficulties in storing indexes [7, 8]. Thereby, the structure of the LSH algorithm is designed based on stable distribution, and the multiprobe LSH algorithm is improved, which improves the accuracy of microseismic rapid location and increases the recall rate. Therefore, it is of great significance to study the application of the approximate proximity algorithm of LSH in microseismic focal point location.

2. Materials and Methods

2.1. Principle of Microseismic Monitoring Technology
2.1.1. Generation Principle of Microseismic Signal in Coal Mine Strata [911]

Microseismic refers to the destruction of rock structures under the influence of mining behavior, which produces a kind of elastic energy. When the generated energy continues to accumulate and reaches a limit, it will be released and propagated in the form of an elastic wave. Figure 1 is its schematic diagram.

Mine microseism is determined by the damage stress of rock mass and the accumulation ability of rock mass to elastic properties. There is a positive correlation between the stress causing damage to the rock mass and the strength of the rock mass. The greater the stress required for damage to the rock mass is, the greater the strength of the rock mass is. The Mohr-Coulomb criterion considers the failure criterion of matrix rock block and the weak structural plane of a broken body under compressive shear stress, expressed as where are shear stress and normal stress on the failure surface of rock mass, is the angle when friction is caused inside the rock mass, is the internal friction coefficient, and is cohesion force in rock mass.

Figure 2 displays the causes of microseismic in coal and rock mass.

2.1.2. Propagation of Microseismic Signal in Coal Mine Strata

The vibration wave generated by the coal and rock stratum propagates in the form of an elastic stress wave between coal and rock mass. Therefore, the theoretical basis for deriving the propagation equation of vibration wave in coal and rock stratum medium is the relevant theory of stress wave [1214]. The specific derivation process is as follows.

It is assumed that there is a force shown in equation (2) in any microunit of coal and rock mass:

represents displacement. The equation reads where are the scalar potential and vector potential. In addition to the force shown in equation (2), there are the surface force and inertial force in the microunit of coal and rock mass. Equation (4) can represent the density component of inertial force:

Differential equation (5) when the microunit in coal and rock mass moves can be obtained as follows:

Equations (6)–(9) are geometric and physical equations in elasticity.

Geometric equations:

Physical equations:

Equations (6)–(9) are brought into equation (5), so that the differential equation of elastic wave propagation in coal and rock stratum is obtained as where is constant, is Poisson’s ratio, is Young’s modulus, and is shear modulus.

If , equation (15) can be obtained by deriving equation (10):

If the rotor can replace , equation (16) can be obtained:

Equations (15) and (16) can, respectively, describe the deformation caused by microseism on the volume and position of coal and rock stratum. They can be written as

The above equations display that the force propagating in coal and rock stratum will propagate in two different forms: shear wave (S wave) and longitudinal wave (P wave). P wave propagates faster than S wave in coal and rock strata [1517]. Its velocity expression reads

The propagation direction of the S wave differs from that of the P wave. It is perpendicular to the vibration direction of coal and rock stratum. The velocity expression of the S wave reads

where is density of coal and rock stratum and is Poisson’s ratio.

2.2. Principle and Method of Microseismic Monitoring and Positioning

Figure 3 shows the distribution of focal points and sensors.

If the microseismic focal point in Figure 3 is and the sensor is represented by , represent the spatial position and is time [1821]. Unlike S wave, picking up P wave is relatively simple. Equation (21) can be used to locate focal points at different time points: where all parameters except are known.

One of the methods to locate the microseismic focal point is the simplex method, as shown in Figure 4.

Equation (22) is the functional expression of the method: where is the number of microseismic sensors is is set velocity of vibration wave.

It is assumed that in the left figure of Figure 4 are the occurrence points of microseisms. First, the values of the four points are calculated by equation (22), and the point with the largest function value among the four points is found and removed according to the calculation results. It is assumed that the maximum value of the function is point . After the removal of point and some operations such as shrinking and stretching the graph, vertex is selected to finally form a new tetrahedron . Then, the above operations are repeated until the limit of the objective function or the constraint of the number of iterations is reached, and the process is over. At this time, the minimum point of the function value is the occurrence point of microseisms [22, 23].

2.3. Introduction of the Microseismic Monitoring System

The proposed microseismic focal point location method completely differs from the original location method. The advantage of building a search engine based on the LSH algorithm is that iterative calculation is not required when determining the focal point position, so as to reduce the damage caused by rockburst to rock mass [24].

Microseismic focal point location technology mainly determines the spatial position of the focal point through the vibration wave released by coal and rock mass. It can detect mine seismic activities in real time, determine the spatial position of the focal point, and calculate microseismic energy to judge the microseismic strength of coal and rock strata [2527]. Figure 5 is a schematic diagram of the microseismic monitoring system.

There are 10 geophones in the mining area studied. Figure 6 shows their spatial coordinates.

Mainly, the underground geophone is powered by the ground acquisition station. Each geophone will receive 37-42 V safety voltage from the acquisition station, and the acquisition station contains a fault indicator light, which can monitor the faults of underground lines in real time. The downhole environment is quite complex, so it is essential to protect the geophone sensor. Figure 7 is a schematic diagram of its protection device.

2.4. Scheme Design of Ensemble Empirical Mode Decomposition (EEMD) Wavelet Threshold Denoising Algorithm for Coal Mine Microseismic Waveform

The mining process is often affected by other construction, resulting in a lot of noise in the signals collected by the microseismic focal point monitoring system. Therefore, it is essential to denoise the collected microseismic signals. However, the traditional denoising methods are not suitable for the nonlinear, nonstationary, and non-Gaussian signals generated by microseisms. Empirical mode decomposition (EMD) can process such a signal because of its adaptive nature, but its disadvantage is that there will be mode aliasing. Thereby, the EMD algorithm is improved to EEMD, which has all the advantages of the EMD algorithm. EEMD can decompose the collected signal according to its own characteristics without setting the basis function [28, 29]. Figure 8 is the algorithm flow chart.

2.4.1. Selection of EEMD Algorithm Parameters

The parameters of the EEMD algorithm mainly include the amplitude of white noise and the average number of integration. The amplitude of white noise is mainly selected by experience. The data difference may affect the reliability of the results, and the sensitivity of signals with different frequencies to noise is also different. The signal fluctuation will become stronger when the noise is too large. At this time, it is beneficial to decompose the high-frequency signal, but it will destroy the arrangement of extreme points of low-frequency components, resulting in mode aliasing; however, when the added noise is small, mode aliasing of low-frequency signals will be prevented. Since there is a wide frequency band in the high-frequency signal, the noise does not have a strong pulling ability for the high-frequency signal, resulting in multiple high-frequency signals mixed in multiple IMF, and mode aliasing also occurs. The standard deviation of the added noise amplitude is set to 0.2 according to relevant research. The amplitude shall be appropriately reduced when there are many high-frequency signals in the processed signals. Otherwise, the amplitude shall be increased.

Relevant literature review shows that when the noise is decomposed based on the EEMD algorithm, the average number of integrations shall meet where is average times of integration, is the amplitude standard deviation ratio between the added white noise and the signal to be processed, and is maximum relative error. After massive experiments, relevant scholars believe that the value range of 100-300 can give full play to the advantages of the EEMD algorithm.

2.4.2. Wavelet Threshold Denoising Based on EEMD

The high-frequency IMF is not directly proposed but denoised by the wavelet threshold algorithm based on EEMD to retain some effective information in the high-frequency signal [30, 31]. Before denoising the high-frequency signal, the dividing point of high-frequency and low-frequency IMF needs to be determined by using the continuous mean square error criterion. First, the reconstructed signal is set by where is the th order IMF component after denoising, is the total IMF component, and is the residual IMF component.

Equation (25) can be obtained according to the continuous mean square error criterion:

In (25), is signal length and is the th order IMF component after denoising.

The minimum value of the objective function is obtained through equation (26) to obtain the dividing point of high- and low-frequency IMF:

The IMF component before the dividing point, that is, , is obtained after denoising the obtained dividing point and the high-frequency IMF before the dividing point by the wavelet threshold algorithm. Then, the remaining low-frequency components, , and res are reconstructed to obtain the denoised signal, expressed as

Figure 9 is the algorithm flow chart of denoising based on the EEMD wavelet threshold.

2.5. The Framework of Microseismic Search Engine
2.5.1. Theoretical Framework

The microseismic search engine can accurately and quickly determine the microseismic focal point location. The signal waveform generated by microseism changes with time, which is similar to sound signals [32]. In the determination of the focal point position, it is assumed that a waveform very similar to the input signal is found in the waveform database; then, the found waveform microseismic location is the place of this microseismic occurrence. Figure 10 shows the steps of microseismic search.

2.5.2. Fast Retrieval Algorithm

Finding the waveform similar to the input microseismic waveform from the microseismic database is a crucial step of the microseismic search engine. In short, it is to search for the similarity of the microseismic waveform. At present, the local sensitive hash algorithm has been applied in image and audio retrieval, text retrieval, and other aspects. It can effectively detect high-dimensional data with a short time and high accuracy.

Based on the local sensitive hash function, the local sensitive hash algorithm defines the function mapped from to set as . Any hash function is selected from the hash function family, and any two points and are analyzed to make .

It is assumed that points and meet the conditions in

In general, .

2.6. The Local Sensitive Hash Algorithm Based on Stable Distribution
2.6.1. Stable Distribution

is assumed to be a distribution in the set of real numbers . If there is an integer not less than 0, any real number and random variable are independently and identically distributed in , the distributions of variables and are consistent, and is a random variable subject to . satisfies stable distribution.

When values are 1 and 2, they are 1 stable distribution and 2 stable distribution, respectively. Equation (29) is its probability density function:

2.6.2. Algorithm Description

stable distribution first needs to define the hash function, as shown in where is the vector, subject to standard normal distribution; is the vector formed by dataset; is a random number; and is segment length on the straight line.

For any two vectors , if the value of is small, the two vectors are very likely to collide. If the distance between two vectors is far, the two vectors will not collide. The distance between two high-dimensional data can be kept unchanged by equally dividing and labeling the real number axis.

functions are selected from the hash function to form the function , expressed as

functions form . Any function can map a dataset to a hash table. The vector is used as a projection vector to map the point in the dataset to the hash bucket . The projection values of points in the same hash bucket are the same. The specific process of the algorithm based on stable distribution is as follows.

and dataset dimension have been given. -dimensional vectors are randomly generated to map the dataset, and equation (32) is obtained:

The query points are mapped to the random space to obtain

Approximate points similar to query points are obtained.

2.6.3. Local Sensitive Hash Algorithm Based on Multisearch

Figure 11 shows the steps of the multisearch local sensitive hash algorithm.

Equation (34) is a hash function based on stable distribution:

This function indicates that is projected in the direction of and divided into the slot of width.

2.6.4. LSH Microseismic Search Engine Model Based on Multisearch

Figure 11 shows the specific steps of the LSH microseismic search engine based on multisearch.

In this experiment, ten groups of microseismic events are selected as the parameter set of the LSH algorithm. Each set of parameters is tested 10 times, and the average value is calculated to ensure the accuracy of the parameter results. When the search engine algorithm is used to query the input data, the query results must meet the corresponding conditions. The two indicators of recall and precision can judge the quality of query results. The calculation equations of recall rate and precision read where is the query point, is the set of query results, and is the set of results satisfying the conditions.

The performance of the LSH algorithm largely depends on three parameters: dimension of hash function family, number of hash tables, and interval size . Therefore, the above three parameters need to be determined first. Equation (37) displays the probability of collision between the two vectors after hash:

functions are selected optionally from the hash function family and mapped, represented by

hash functions are selected optionally from the -dimensional hash function family. Equation (39) is the probability of collision of vectors after projection:

The purpose of the waveform query is to obtain all waveforms with a Euclidean distance less than the query range from the database. After the collision probability and density function of the vector are obtained, the calculation equations of recall rate and precision read

An important index to evaluate the retrieval process is the retrieval time. While ensuring the recall rate and precision, the shorter the time spent in the retrieval process is, the better the retrieval efficiency is. When the three parameters remain unchanged, the number of hash tables will exert a certain impact on the retrieval time, so the selected number of hash tables should be appropriate.

3. Results

3.1. Parameter Analysis of LSH Microseismic Search Engine Model Based on Multisearch

The number of hash tables , dimension of hash function family, and interval size are analyzed in detail.

The number of hash tables will have a certain impact on the search accuracy and time. The larger the value is, the longer the search time is. In the case of limited memory, the value is generally 1-10. The dimension of the hash bucket is closely related to the dispersion of the dataset after projection. The larger the value is, the fewer the points that enter the same hash bucket through projection are. The value is generally 2-20. The interval size is determined by the module length of the data point. The larger the value is, the more concentrated the points in the dataset after projection are, and the longer the search time is. The value of needs to be determined through multiple experiments.

The number of hash tables is set to 1-10. All hash tables are tested 10 times, and the average value is calculated to obtain the recall rate. Figure 12 is a curve of the time required for searching and the proportion of candidates retrieved with the number of hash tables.

Figure 12 displays that the more the hash tables are, the longer the retrieval time is. Overall, when the number of hash tables is 6, the proportion of candidates retrieved, retrieval time, and recall rate are better. The hash function family dimension is set to 2-20. All hash function group dimensions are tested 10 times, and the average value is calculated to obtain the recall rate. Figure 13 is the change curve of the time required for searching and the proportion of candidates retrieved with the dimension of the hash function family.

Figure 13 shows that the larger the dimension of the hash function family is, the shorter the time required for data search is. The comprehensive comparison of the above data shows that when the hash function family dimension is 14, the proportion of candidates retrieved, retrieval time, and recall rate are better. The value range of interval size is set to 6000-11500. All interval sizes are tested 10 times, and the average value is calculated to obtain the recall rate. Figure 14 is a change curve of the time required for searching and the proportion of candidates retrieved with the interval size.

Figure 14 shows that when the interval size is 8000, the proportion of candidates retrieved, retrieval time, and recall rate are better. The above analysis suggests that when the number of hash tables, hash function family dimension, and interval size are 6, 14, and 8000, respectively, the time spent in data retrieval is short, the recall rate is high, and the proportion of candidates retrieved is large.

Next, when the number of detection boxes is 1-90, the changes of recall rate, the time required to search, and the proportion of candidates retrieved with the number of detection boxes are analyzed. Figure 15 shows the results.

Figure 15 shows that it will take a longer time to retrieve the data when the number of detection boxes increases. When the number of detection boxes is 10, the retrieval time is 0.73021 s. When the number of detection boxes increases to 20, the retrieval time has been greater than 1 s, so it is decided that the number of detection boxes is 10. At this time, the recall rate and the proportion of search candidates are 0.825 and 0.6967, respectively.

The changes in recall rate and the proportion of retrieved candidates with the number of detection boxes based on stable distribution and multiprobe LSH are analyzed. Figure 16 shows the results.

Figure 16 shows that the proportion of candidates retrieved by multiprobe LSH is significantly better than that of stable distribution, and the proportion of candidates retrieved is about twice higher than that of stable distribution; multiprobe LSH also has obvious advantages on the recall rate.

3.2. Parameter Analysis of Search Engine Algorithm for Measured Coal Mine Microseismic Data

Because there are some differences between the experimental microseismic database and the measured microseismic database, it is still necessary to determine the parameter value of LSH before searching the measured coal mine microseismic data.

The training set selected contains 296 groups of microseismic events. Ten groups of microseismic events are selected as the parameter test set of the LSH algorithm. Each group of parameter sets is tested 10 times, and its average value is calculated to ensure the accuracy of parameter results.

Consistent with the method of determining the parameters in the previous section, the number of hash tables, the dimension of the hash function family, and the size of the interval are analyzed. Figure 17 shows the results.

Figure 17 shows that when the search engine algorithm parameters of the measured coal mine microseismic data are selected, the retrieval time, recall rate, and the proportion of retrieved candidates can get better values when the number of hash tables, the dimension of hash function family, and the interval size are 4, 600, and 500, respectively.

4. Conclusions

As one of the crucial problems in microseismic monitoring technology, the location of the microseismic focal point is the basis of the study of focal point distribution and mechanism. The accurate location of the microseismic focal point is of great significance for mine earthquake monitoring and early warning and impact risk assessment with strong destructiveness such as rockburst. Based on the stress wave method, the signal of the microseismic model is collected, the collected waveform is intercepted, and then, the waveform database is constructed. The search engine location of the constructed database is conducted based on the LSH of stable distribution. To solve the problems of inaccurate location information and low recall rate of retrieval results, a microseismic search engine model is constructed. This exploration is aimed at completing the following work. EEMD combined with wavelet denoising is used to denoise the coal mine microseismic signal. Then, a waveform database is established for the denoised waveform data and focal point location. The structure and mathematical model of LSH based on stable distribution are introduced and improved, and the optimized algorithm, multiprobe LSH is obtained. The microseismic positioning model is established according to the characteristics of microseismic data. The effects of the number of hash tables, the dimension of hash function family, and interval size on the LSH microseismic positioning model are discussed, and the above parameters are determined. The results show that the microseismic search engine based on multiprobe LSH has higher recall and more accurate positioning results.

Although LSH has achieved good results in the fast and accurate positioning of the microseismic search engine, many deficiencies still need to be further improved. (1) Due to the limited experimental conditions, there are fewer experimental models and fewer measured microseismic data of coal mines. More microseismic waveforms will be added to the database, so that LSH has greater applicability in the focal point location. (2) LSH parameters include the number of hash tables, hash function family dimension, and interval size. The parameters here are determined by multiple experiments. In future research work, appropriate algorithms can be used to adaptively adjust these parameters to improve the generalization ability of the model. (3) Although the application of data-oriented multiprobe LSH in microseismic search engines has achieved good results, the dataset information is not fully used. Therefore, in future research, it is necessary to increase the capacity of the dataset, use multiple PCA, make full use of the dataset information, and maintain the positioning accuracy.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.