Abstract

In order to improve the detection ability of dim and small targets in dynamic scenes, this paper first proposes an anisotropic gradient background modeling method combined with spatial and temporal information and then uses the multidirectional gradient maximum of neighborhood blocks to segment the difference maps. On the basis of previous background modeling and segmentation extraction candidate targets, a dim small target detection algorithm for local energy aggregation degree of sequence images is proposed. Experiments show that compared with the traditional algorithm, this method can eliminate the interference of noise to the target and improve the detection ability of the system effectively.

1. Introduction

Dim and small target detection algorithm is a critical technique for the imaging detection system. The system mainly combines the corresponding algorithms in the field of machine vision to automatically discriminate the targets in low signal-noise rate (SNR) scenes. The detection distance of the imaging detection system depends heavily on the quality of an algorithm [1]. Since the long-distance detection target is of a small size in the image, the energy of the target signal arriving at the detector’s plane is weak. The target is easily submerged in the background clutter especially when it is affected by strong noise. As a result, it is difficult to accurately detect dim small target due to the lack of effective information such as shape and texture in the image.

In recent years, many researchers at home and abroad have carried out a lot of research work on dim small targets detection algorithms. Some algorithms have achieved good detection results in engineering applications. They can be summarized into two categories: one is based on single frame filtering detection; the other is based on multiframe association detection. Specifically, the detection algorithm based on single-frame filtering mainly refers to the preprocessing of single-frame images by means of filtering and learning classification algorithms in some signal processing fields, and acquiring corresponding candidate target points, and then use some prior knowledge (such as pretrained model, template matching, etc.) to detect the real target points from numerous candidate targets. These algorithms primarily include detection algorithms based on background modeling and detection algorithms based on machine learning. Of which, detection algorithms based on background modeling mainly are composed of traditional detection algorithms based on time domain [2], space domain [3, 4] or frequency domain filtering [5] and background modeling methods based on statistical characteristics [6, 7], etc. Detection algorithms based on machine learning mainly include detection algorithms based on visual saliency [8, 9], detection algorithms based on dictionary learning and sparse representation [10, 11], detection algorithms based on low-rank background [12, 13], and detection algorithms based on based on CNN (Convolutional Neural Network) [1417], etc. When the target is in a low SNR scene, using merely the single-frame information to detect dim-small targets in the above single-frame filtering detection algorithm may result in a high false alarm rate in the detection results. In this way, targets cannot be discriminated easily. So as to effectively improve the detection capability of the imaging system, most detection tracking algorithms today have adopted multi-frame detection. Detection based on multi-frame association selects real target points from masses of candidate targets making use of the association of multi-frame motion trajectories, such as Moyer et al. proposed projection transformation method [18], the method first transforms the image from the three-dimensional space into the two-dimensional space, and then uses the Hough transform to correlate the multi-frame images in the two-dimensional space to extract the suspicious target trajectory, finally energy accumulation is conducted on each suspicious target trajectory. When the target energy of a trajectory is larger than the preset threshold, the trajectory will be extracted and judged as a final target. Dong et al. proposed a pipeline filter dim small targets detection algorithm based on motion direction estimation [19]. This method estimates the moving speed and direction of the target according to the motion state of the target in multi-frame images to further eliminate the interference of noise on the target, and can effectively detect dim small targets in strong fluctuating background. Xiu put forward an modified high-order correlation method [20]. Based on the continuity of target motion in adjacent multiframes, the method first calculates the high-order correlation of many suspicious target points in the neighborhood region of the multi-frame images, and then extracts target points using the correlation degree and threshold of the target in the multi-frame images.

In most of the above algorithms, whether single or multiframe detection, some filtering algorithms are used to remove most of the background interference to the target in advance and then combined with the corresponding detection methods to extract small and weak targets, which provides a good research idea for this paper. Nevertheless, research deficiencies still exist in the above algorithms in the low SNR (SNR ≤ 3 dB) scenes with dynamic changes. For instance, some need gradually changed backgrounds or high signal-to-noise ratio and some information of the known target to achieve better detection results, some need to make a priori assumption of the model, some need to get the features of the target from the image, and then use these features to train the model in advance. It should be noted that it is practically impossible to obtain the prior information of the target, which will definitely restrict the application range of these algorithms. Furthermore, it is challenging to construct a training model due to lack of effective shapes and textures in dim-small targets. In order to enhance the capability of detecting dim-small targets under dynamic scenes, an anisotropic gradient background modeling algorithm combined with spatial and temporal information is proposed in this paper, the method compares the gradient relationship of neighborhood blocks in four directions within the spatial region of a pixel point of the t-th frame image and the block in the middle is filled by the mean of pixels in the corresponding position of two neighborhood blocks with the least gradient difference among blocks in four direction to be the background image of the image in the t-th frame image, so that the acquired background image contains the information of the pixel point in the spatial domain. Next, the image background can be constructed by calculating the gray mean of the multiframe image of the pixel point in the continuous-time domain. At this point, the background image containing the spatial and temporal information is obtained. After that, the difference image is segmented with the multidirectional gradient maximum of neighborhood blocks. Finally, combined with the coherence of the motion of the target in the sequence image and the randomness of the noise, that is, the degree of aggregation of the target energy in the sequence image is higher than the noise, a dim small target detection algorithm based on local energy aggregation degree of sequence images is proposed, which effectively eliminates the interference of noise on the target and obtains the real target point.

2. Anisotropic Gradient Background Modeling Algorithm

The anisotropic gradient background modeling method mainly combines the imaging mechanism of target signal which can be described by point spread function, that is, the energy of a target block is spread outward from the central pixel, so that the energy in the region where the background locates is lower than that in the target area. In this case, the background can be estimated by comparing the gradient relationships between background blocks in four directions within the neighborhood range of the target block, as shown in Figure 1.

The anisotropic gradient background modeling algorithm combining spatial and temporal information is presented as follows: first, move the pixel point of the image in the t-th frame upwards, downwards, towards left, and towards right at a certain step length k to obtain neighborhood blocks in four directions (the upper and lower blocks are presented in the yellow areas, while the left and right blocks are in the red areas in Figure 1). Second, the mean of four neighborhood blocks is calculated separately. And, the block in the middle is filled by the mean of pixels in the corresponding position of two neighborhood blocks with the least gradient difference among the gradient differences of the upper and lower blocks and the left and right blocks, which is regarded as the background image of the image in the t-th frame. In this way, the information of the pixel point in the spatial domain is contained in the background image obtained. At last, taking the accumulated frame length as M, the background for the image in the cur-th frame is estimated by calculating the gray mean of the image pixel point in the temporal domain. The background can be estimated making full use of the information of the pixel point in its spatial and temporal domain, while the target signal is effectively reserved in the difference image. Formulas are presented as follows:where , , , and represent the mean of the upper, lower, left, and right neighborhood blocks, respectively; is the background image of the t-th frame image, which is determined by the mean of the corresponding coordinates of the upper and lower neighborhood blocks or the left and right neighborhood blocks. When the gradient difference of the left and right neighborhood blocks is greater than or equal to that of the upper and lower neighborhood blocks, the background block in the middle will be filled by the mean of the corresponding coordinates of the upper and lower neighborhood blocks. Otherwise, it will be filled by the mean of the corresponding coordinates of the left and right neighborhood blocks. By doing so, the background image at the moment t can be obtained. is the original image of the current frame cur; is the background estimation of the cur-th frame image; is the differential image; is the range size of neighborhood with the value taking from ; and are the coordinate positions of the neighborhood with the value range of ; and is a rounding function.

3. Multidirectional Gradient Maximum Segmentation Method of Neighborhood Blocks

According to the investigation, dim-small targets are segmented by searching the local maximum of the pixel point in [21]. This method achieves certain segmentation effect, but it only compares the gradient relationship between the central pixel and the surrounding pixel. There is insufficient utilization of local information and does not use the neighborhood block information of the pixel to construct the gradient difference. Therefore, a multidirectional gradient maxima-segmentation method based on neighborhood block is proposed in this paper as follows:(1)First of all, construct five direction blocks according to Figure 1, including a central block with the pixel point as the center at the width and height of , a upper block with the pixel point as the center at the width and height of , a lower block with the pixel point as the center at the width and height of , a left block with the pixel point as the center at the width and height of , and a right block with the pixel point as the center at the width and height of .(2)Secondly, construct a multidirectional gradient of four directional blocks according to the following formula (4):where is the differential image; k is the step length moved by the pixel; other parameters are defined as above; , , , and represent the gradient differences between the central block and the upper, lower, left, and right neighborhood blocks.(3)Obtain the maximum gradient by transforming the step length based on the gradient difference obtained in step 2. If there are more than two directions are greater than a preset threshold in the gradient difference of four directions in the same step length, the center block will be segmented and extracted.(4)Make a similar judgment after transforming the step length. The step length should be transformed for three times in total. If more than half of the three step lengths sizes satisfy the judging criterion in step 3, the central block will be deemed as a candidate target point, obtaining a binary image, record as .

4. The Proposed Detection Algorithm

The local energy aggregation degree of sequence images is defined primarily by the motion coherence of the target in the sequence image, that is, the aggregation of the energy of the target in the sequence image is high due to the motion coherence of the target. The aggregation of the target is measured by the mean gray scale of the target in the continuous multiframe local neighborhood, as follows:where is the local neighborhood; is the coordinate of the pixel point; is the original image; is the binary image, it is obtained by multidirectional gradient maximum segmentation algorithm; is the gray image of the candidate target point, which is obtained by multiplying the original image by the binary image; t is the image number; N is the number of frames in the continuous time domain; means the number of times a candidate target point moves within its neighborhood of (2r + 1, 2r + 1); T is the total number of a candidate point moving on the continuous N frame; is the coordinate of the central point of a candidate block in the binary image; is the neighborhood of a candidate block; is the set of coordinates of all candidate points belonging to the neighborhood ; the area of the candidate block is ; is the mean gray value of the candidate target points (that is the aggregation), which is obtained with a ratio of the sum of the gray values of the target in the continuous N frame and the total area.

The mean gray value of each candidate target of the continuous N frame in the neighborhood of is calculated in this paper. When the area, the mean gray value, and the total number of moving of a candidate block meet the following formula, it can be judged as a real target point, as follows:where is the preset threshold.

Pseudocodes of the dim-small target detection based on local energy aggregation of the sequence image are shown in Table 1.

5. Results and Analysis

Sequence images in real scenes are used for analyzing, which are discussed from the evaluation of background modeling effect, the analysis of segmentation effect, and the evaluation of detection effect.

5.1. Background Modeling Results and Analysis

The algorithm proposed in this paper is compared with other algorithms, and contrast gain, signal-to-noise ratio gain and background inhibitor factor are used to evaluate the background modeling effect of different algorithms. The specific indicators are as follows [23]:where and are the mean of the target area and the mean of the background area, respectively; and are the contrasts of the original image and the difference image, respectively; is the contrast gain; is the standard deviation of the background area; and are the SNR of the original image and the difference image, respectively; is the SNR gain; and are the standard deviation of the original image and the difference image, respectively; and is the background inhibitor factor.

is taken. The length of the cumulative frame M = 3. SNR images in different scenes are selected for testing. Moreover, it also compares with TDLMS [3], Top-Hat [24], improved bilateral filtering (IBF) [25], and anisotropic filtering (AF) [26]. Specific results are shown in Tables 24.

Indicators in Tables 24 indicate that the contrast gain, SNR gain, and background inhibitor factor obtained by the algorithm proposed in this section are greater than 18, 3, and 6, respectively, in different SNR images. The reason why it has achieved a better background modeling effect than the traditional algorithm is that the proposed algorithm in this paper combines the information of the spatial domain to construct the gradient difference of the different direction blocks and fills the image with the mean of the corresponding pixels in the two neighborhood blocks with the smallest gradient difference and then obtains the image mean value in the continuous time domain as the background image. This algorithm utilizing the spatial and temporal information of the pixel blocks to construct the background image can achieve a better background modeling effect that those algorithm relying merely on temporal domain or spatial domain.

Besides, in order to demonstrate the modeling effects of different algorithms, the above five algorithms are used to model the background of the same image. At the same time, background images, difference images, and 3D images obtained by different algorithms are plotted, as shown in Figure 2. Of which, (a) is the original image (the target position is marked in red); (b) is the background image, the difference image, and the 3D image obtained by TDLMS; (c) is the background image, the difference image, and the 3D image obtained by Top-hat; (d) is the background image, the difference image, and the 3D image obtained by IBF; (e) is the background image, the difference image, and the 3D image obtained by AF; (f) is the background image, the difference image, and the 3D image obtained by the algorithm proposed in this paper.

According to Figure 2, the lightness of the background image obtained by TDLMS is unbalanced. And only signals of targets are reserved in the difference image with dim energy, as shown in Figure 2(b). Since Top-Hat filtering is susceptible to structural elements, the background image it obtained is obscure with false targets presenting in blocks in the difference image, as shown in Figure 2(c). The improved bilateral filtering reserves a great number of edge noises in the difference image, although it can construct the background effectively using the space position of the pixel and the similarity in the gray level of pixel to obtain smooth backgrounds, as shown in Figure 2(d). The anisotropic filtering algorithm only uses the mean of the four directions to be the difference filtering result and make it difficult to eliminate edge contour areas in the difference image, as shown in Figure 2(e). In terms of the proposed algorithm in this section, it constructs gradient differences of neighborhood blocks with a full utilization of spatial-temporal domain information of pixel blocks to obtain background images, which can effectively eliminate the edge contour areas in the difference image while well reserving target signals with a false rate in targets.

5.2. Segmentation Results and Analysis

Three scenes are adopted to verify the segmentation effect of different algorithms on images, which are also compared with OSTU and the algorithm proposed in [21]. Parameters in this section are set as follows: the motion k = 3, 4, 5; the neighborhood size  = 3 × 3; and the preset threshold of gradient difference . Simultaneously, the number of true frame (TR) and the number of failed frame (FR) in the same scene are counted, as follows:where T and TF represent the number of true frame and the total number of frames of the image, respectively; TR is the rate of true segmentation frame; and FR is the rate of failed segmentation frame.

As can be observed from Table 5, a better segmenting effect can be achieved by the multidirectional gradient maximum segmentation algorithm based on neighborhood blocks in comparison to the other two algorithms. The obtained TR is greater than 95%, while the FR is less than 4.2%. The study found that transforming the step length to search the maximum value of the neighborhood block by constructing the gradient differences of neighborhood blocks in different directions can not only fully utilize the information of the neighborhood blocks but also effectively highlight the difference between the pixel blocks. Regarding the algorithm proposed in [21], as it depends merely on the differences between the pixel points and neighboring pixels, it has a less efficient segmenting effect in the unstable edge contour area with slight gradient differences, obtaining TR greater than 90% and FR less than 10%. OSTU has the worst segmentation effect. The main reason is that when faced with dynamic transformation and low signal-to-noise scenes, the algorithm is difficult to segment and extract targets because of the small gray difference between pixels.

Segmentation results of different algorithms in the three scenarios are presented in Figures 35. Specially, Figures 3(a)3(d), 4(a)4(d), and 5(a)5(d) in order are the original images, the OSTU segmentation results, the segmentation results of Reference [21], and the segmentation results of the algorithm proposed in this paper. It can be observed from the three scenes that there are a large number of false targets in the segmentation results obtained by the OSTU algorithm; at the same time, due to the small difference between the target gray and the background in the scene, some of the target signals are lost in the segmentation image; the algorithms obtained by [21] also lose pixels in the target area, although it obtains less false targets than the OSTU algorithm, while the algorithm proposed in this section can effectively extract target areas and obtain the least false targets.

5.3. Detection Results and Analysis

Sequence images of three real scenes are employed to verify the effectiveness of the detection algorithm proposed in this paper. The targets of the three sequence images are selected in a dynamically changing scene. The target in the scene A move diagonally and pass through the unstable edge contour area during the motion; the target in the scene B moves from top to bottom and also passes through the unstable edge contour area during the motion, while the target in the scene C moves randomly in the center of the image and is affected by the strong background interference in the center of the image. Furthermore, the detection algorithm proposed in this paper is compared with (TDLMS) [3], Top-Hat [24], improved bilateral filtering [25], and anisotropic filtering [26]. Detection results of all these algorithms in three scenes are presented in Figures 68. Of which, Figures 6(a)6(f), 7(a)7(f), and 8(a)8(f) in order are the original images, the TDLMS detection results, the Top-Hat detection results map, the improved bilateral filtering detection results, the anisotropic filtering detection results, and the result of the detection algorithm proposed in this paper.

As can be seen from Figures 68, since the three scene targets are seriously disturbed by background, the three algorithms of TDLMS, Top-Hat, and anisotropic filtering have more false targets in the detection results. On the basis of the gray characteristics and space information of the pixels, adding a filtering template to the improved bilateral filtering can effectively decrease the number of false targets. The detection algorithm proposed in this paper can effectively eliminate background interference while well reserving target signals on the background modeling at the early stage and the multidirection gradient maximum segmentation algorithm, which can also effectively detect real target points associating with the continuous multiframe motion coherence in the neighborhood.

The detection rate (Pd) and false alarm rate (Pf) are adopted to evaluate the performances and differences of these algorithms. Also, ROC curves of the three scenes are shown in Figure 9. From which, we can see that the detection algorithm proposed in this paper has an efficient achievement, followed by the improved bilateral filtering. Based on Figure 9(a), the Pd of the algorithm proposed in this section is greater than 90%, while other algorithms are lower than 83% when the false alarm rate Pf = 0.01. According to Figure 9(b), the Pd of the algorithm proposed in this section is greater than 92%, while other algorithms are lower than 85% when the false alarm rate Pf = 0.032. In Figure 9(c), the detection rates obtained by other detection algorithms are lower than that of the algorithm proposed in this section. To be specific, the Pd of the algorithm proposed in this section is 91%, while other algorithms are lower than 88% when Pf = 0.026. Tests show that the algorithm proposed in this section achieves a better detection result than other algorithms.

6. Conclusion

An algorithm is put forward in this paper in order to improve the capability of detecting dim-small targets under dynamic changes and low SNR scenes. Based on the imaging mechanism of point spread function, gradient differences of blocks in different directions are firstly constructed with the spatial-temporal information. Secondly, the central block is filled by the mean of pixels of the corresponding positions of two neighborhood blocks with the least gradient difference in four directions. After that, the image background is estimated through obtaining the mean of the images in the temporal domain. On this basis, the multidirection gradient maximum segmentation algorithm is constructed through fully utilizing neighborhood blocks, which segments dim-small targets through the search of the maximum gradient of neighborhood blocks by mean of transforming step lengths. Finally, combined with the coherence of the target motion in the sequence image, the average gray of the target in the local neighborhood of the continuous multiframe is used to define the energy aggregation degree of the target, and the detection of dim and small targets is realized.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declared that they have no conflicts of interest to this work.

Acknowledgments

This work was partly supported by the Guangxi Science and Technology Base and Talent Project (Acceptance No. 2019AC20147) and Doctoral Fund of the Guangxi University of Science and Technology (No. 19Z31).