Abstract

Design of conventional correlation filters requires explicit knowledge of the appearance and shape of a target object, so the performance of correlation filters is significantly affected by changes in the appearance of the object in the input scene. In particular, the performance of correlation filters worsens when objects to be recognized are partially occluded by other objects, and the input scene contains a cluttered background and noise. In this paper, we propose a new algorithm for the design of a system consisting of a set of adaptive correlation filters for recognition of partially occluded objects in noisy scenes. Since the input scene may contain different fragments of the target, false objects, and background to be rejected, the system is designed in such a manner to guarantee equally high correlation peaks corresponding to parts of the target in the scenes. The key points of the system are as follows: (i) it consists of a bank of composite optimum filters, which yield the best performance for different parts of the target; (ii) it includes a fragmentation of the target into a given number of parts in the training stage to provide equal intensity responses of the system for each part of the target. With the help of computer simulation, the performance of the proposed algorithm for recognition partially occluded objects is compared with that of common algorithms in terms of objective metrics.

1. Introduction

Recognition and tracking of objects in observed scenes degraded by additive noise, in the presence of cluttering backgrounds, geometric modifications such as pose changing and scaling, nonuniform illumination, and eventual object occlusions are challenges that a modern recognition algorithm must solve. In this paper, we deal with partial occlusion of objects to be recognized, in other words, when only some parts of the target are visible. Recent works have paid much attention to this problem [15].

Nowadays, object recognition based on correlation filters receives much research interest due to its high impact in real-life activities, such as video surveillance, human-computer interaction, robotics, biometrics, and target tracking [612]. Correlation filtering is a powerful technique for object recognition because of its ability to perform two essential tasks simultaneously: detection of a target within an observed scene and computation of the exact position of the detected object [13, 14]. Another advantage of correlation filters is their ability to detect multiple objects in a single scene simultaneously [1517].

The performance of correlation pattern recognition may be improved either by discarding noise components from the output of a linear system [18] or by using an adaptive approach to the filter design [19]. The former approach is suitable for classification problems [20], whereas the latter is preferable for detection and tracking applications. For the case of nonstationary noise such as a cluttered background, statistical parameters of the noise are space-variant. The frequency response of a correlation filter is locally adapted to the parameters estimated in small spatially homogeneous fragments of the input scene. The locally adaptive filter improves pattern recognition in terms of location errors for a noisy environment that is important for accurate target detection.

Conventional correlation filters without training may yield a poor performance to recognize a target partially occluded by other objects [21], for example, to recognize a pedestrian partially covered with a tree or a man wearing sunglasses. There are several proposals to treat partial occlusions with correlation filters [2229]. All of them use independent parts of the target to synthesize a composite correlation filter. However, no study was carried out on an augmented division of the object into parts.

Campos et al. [22] carried out a study on the performance of some correlation filters to discriminate occluded objects. They compare the phase-only filter, the inverse filter, and the trade-off filter between the minimum variance and minimum average correlation energy. All used filters enhance the edges of the object in order to have a good discrimination. The target is divided uniformly into seven parts without any justification. Moreover, the performance of the filters in the presence of noise and geometric distortions was not analyzed. Adaptive correlation filters for recognition of fragmented objects imbedded into real-life scenes and in the presence of additive noise were presented [23, 24]. The target is divided into independent fragments for the design of an adaptive filter. It was supposed that at least one of the fragments responses to the visible fragment of the target is embedded into the scene. Additionally, the algorithm uses available contour and texture information to improve recognition of partially occluded objects. Recent work [28] improves recognition of partially occluded objects embedded into a known cluttered background with an adaptive composite filter. The proposed filters are able to discriminate noisy similar objects, even, when available information of a target is about 19%. Khoury et al. [30] developed several optimal correlation algorithms for detection of obscured targets embedded into a disjoint background. It was noted that the boundary between obscuring and obscured objects makes a significant contribution to the correlation peak. So, blurring of the boundaries was utilized for detection of obscured targets.

Recently, masked correlation filters (MCFs) were designed [31] to handle partial occlusions in face images. MCFs utilize prior knowledge of the location of partial occlusions in test images as well as the zero-aliasing correlation filtering (ZACF) [27]. Since in real-life applications the location of partial occlusions is usually unknown, the filters cannot be widely used.

Finally, note that, in the design of common correlation-based methods, the target is arbitrarily divided into a number of parts, which are used for the design of composite filters. One of the motivations of this research is to determine a reasonable way for the target division to guarantee a high level of the overall system performance. In order to obtain a good recognition of each target part in noisy input scenes, the optimum correlation filters are also utilized [32].

The paper is organized as follows. Section 2 recalls the design of composite correlation filters. Section 3 describes the proposed algorithm for target fragmentation and robust recognition of partially occluded objects with multiple composite filters. Section 4 with the help of computer simulation presents the performance of the proposed algorithms in terms of detection efficiency. The results are discussed and compared with those obtained with common correlation filters. Finally, Section 5 presents our conclusions.

2. Composite Correlation Filters

We are interested in the design of a correlation filter that is able to recognize a fragment of the target embedded into a disjoint background in the scene corrupted with additive noise. The designed filter should be also able to recognize geometrically distorted versions of the target. Let be an image set containing geometrically distorted versions of the target. The input scene is assumed to be composed by the target embedded into a disjoint background at unknown coordinates , and the scene is corrupted with additive noise , as follows:where is a binary function defined as zero inside the target area and unity elsewhere. The optimum filter for detecting the target, in terms of the maximum of the signal-to-noise ratio (SNR) and the minimum variance of location errors (LE), is the generalized matched filter (GMF) [13], whose frequency response is given bywhere and are the Fourier transforms of and , respectively; is the mean value of the background ; and denote the power spectral densities of and , respectively. Symbol denotes convolution.

Let be the impulse response of a GMF constructed for the th available view of the target . Let be the set of all GMF impulse responses constructed for all training images . Additionally, let be an image set containing unwanted patterns to be rejected. In order to recognize all target views in and reject the false patterns in , by combining the optimal filter templates contained in , we synthesize a composite correlation filter. Filter can be constructed as follows [33]:where the coefficients are chosen to satisfy prespecified output values for each pattern in . Using vector-matrix notation, we denote by a matrix with columns and rows equal to the size of the images, where each column is the vector version of each element of . Let be a vector of coefficients. Thus, (3) can be rewritten as

Let us denote by the desired responses to the training patterns and denote by the matrix whose columns are the elements of The response constraints can be expressed aswhere superscript denotes conjugate transpose. Substituting (4) into (6), we obtain

Finally, substituting (10) into (4), the solution for the composite filter is given by

Note that the value of the correlation peak obtained with (9) is expected to be close to unity for true-class objects and close to zero for false-class objects.

The MACE [34] filter minimizes the average correlation energy of the correlation outputs for the training images while simultaneously satisfying the correlation peak constraints at the origin. The effect of minimizing the average correlation energy is that the resulting correlation planes would yield values close to zero everywhere except at the location of a trained object, where it would produce an intense peak. In the Fourier domain, the MACE filter can be expressed in vector form as follows:where matrix contains along its diagonal the average power spectrum of the training images (i.e., average of the magnitude squares of the columns of ).

The Optimal Trade-off Synthetic Discriminant Function (OTSDF) [14] filter is a correlation filter that is similar to the MACE filter. In the OTSDF formulation, matrix is replaced with , where is an identity matrix and . The inclusion of the identity matrix improves noise tolerance.

The discrimination capability (DC) is a measure of the ability of the filter to distinguish a target from unwanted objects; it is defined by the following [33]:where is the value of the maximum correlation sidelobe in background area and is the value of the correlation peak generated by the target. A DC value close to unity indicates that the filter has a good capability to distinguish between the target and any false object. Negatives values of the DC indicate that the filter is unable to detect the target. Also, if the obtained DC is greater than a prespecified threshold (), then the target is considered as detected and, otherwise, the target is rejected.

3. Recognition of Partially Occluded Objects

In this section we describe the proposed algorithm for recognition of partially occluded objects using a new target fragmentation procedure and a bank of composite correlation filters. To improve the detection performance of correlation filters an adaptive approach to the filter design is utilized [35].

The proposed algorithm for automatic fragmentation of the target into parts is shown in Figure 1.

First, suppose that a visible fragment of the object obtained due to occlusion of the object always contains a part of the object contour. So, we define fragments as sectors of the circle inscribing the object. For each of the fragments and the object , the output of the linear system can be defined as correlation peaks between the impulse response of the correlation filter and the corresponding fragment or the entire object, that is, , and , respectively. In general, the impulse response of the optimum filter (see (2)) depends on the input scene information. If such information is unavailable, the phase-only filter may be used for approximate solution of the problem. We want to divide the target into sectors in such a manner to obtain equal responses of the linear system to each fragment. In other words, we look for the solution to the following functional:where are all possible divisions of the object . Actually, minimization of the functional (12) may produce numerous solutions. In this case, additional constraints such as equal area of the fragments can be used for selection a unique one.

Figure 2 illustrates the optimal fragmentation of the object from 2 to 9 fragments with the help of the proposed algorithm.

The filter design requires knowledge of a typical background image and a target. We construct a bank of composite optimum filters. The proposed algorithm for the design of composite correlation filters is given as follows.

Step 1. The algorithm starts with an optimal fragmentation of the object into fragments (see Figure 1).

Step 2. Synthesize GMF correlation filters in (2) for reliable detection and location estimation of the fragments.

Step 3. For , synthesize a composite adaptive correlation filter as follows [35]: (i) perform correlation between and the background image. If the obtained is greater than a prespecified threshold (), then the fragment could be successfully detected in the input scene, and is added to set ; otherwise, the detected object around false peak is added to the set ; (ii) synthesize a composite filter with the help of (9); (iii) iteratively perform steps (i) and (ii) with until the condition is satisfied.

Step 4. The bank of composite adaptive filters is used for reliable recognition of partially occluded objects. Detection can be carried out by correlating the input scene with each filter of the bank. Next, the DC in each of the correlation planes is calculated, and the plane with the highest DC value is chosen as the system output. If the obtained DC is greater than a prespecified threshold (), then the target is considered as detected; otherwise, the object is rejected. Finally, the location of the correlation peak in the chosen output plane is taken as an estimate of the location of the object in the scene. The recognition procedure with the bank of filters is summarized in Figure 3.

Note that in the proposed algorithm the number of fragments can be properly chosen to ensure a desired accuracy using a minimum number of correlation operations. Detection performance and location accuracy are monotonically increasing with the number of filters in the bank; therefore, there exists a trade-off between a desired quality of detection and computational complexity in terms of required correlations.

4. Computer Simulation

In this section, the performance of the proposed algorithm for recognition of partially occluded objects is presented in terms of detection efficiency. The results are compared with those obtained with successful composite correlation filters, that is, MACE [34], OTSDF [22], SDF [14, 35], and SDF with MACE (SMACE) filters [23, 24].

In this paper, type I and type II recognition errors are used for comparing the recognition accuracy of tested algorithms. Type I error occurs when the algorithm asserts something that is absent, a false hit. Type I error is called false positive (FP). Type II error occurs when the algorithm fails to assert what is present, a miss. Type II error is called false negative (FN).

With the help of extensive computer simulation we show how detection reliability and localization accuracy for recognition of partially occluded objects with common and proposed correlation filters depend on the number of fragments, level of input noise, and level of target overlapping. In order to guarantee correctly statistical results, we use 55 different scenes and 16 different objects (see Figure 4). The algorithms are tested in input scenes containing the entire target and randomly occluded targets with the level of available information of 20%, 40%, 60%, 80%, and 100% of the object area. Also, input scenes are corrupted by additive white noise with signal-to-noise ratio (SNR) of 5 dB, 10 dB, 15 dB, 20 dB, and 25 dB.

Tables 1 and 2 show the FP and FN errors for the proposed algorithm as a function of number of fragments, SNR of 5 dB, 10 dB, 15 dB, and 25 dB, and the level of available information of 20%, 40%, 60%, 80%, and 100%. Note that the threshold = 0.6 provides the minimum of the FP and FN errors.

One can observe that the proposed algorithm with a strong occlusion (20% of available information) yields high FN errors of 62%–100% depending on the number of fragments and noise level. However, the recognition performance of the algorithm improves rapidly when either the number of the fragments or SNR increases. On the other hand, the performance of the proposed algorithm in terms of FP errors is excellent.

Tables 3 and 4 show FP and FN errors for the proposed algorithm with division of the object into 9 fragments (), MACE, OTSDF, SDF, and SDF with MACE.

The tested algorithms use the same number of fragments. However, the division of the object into nonoverlapping fragments is performed uniformly according to the authors’ recommendation. One can observe that the proposed algorithm yields much better performance for different circumstances with respect to the objective criteria.

5. Conclusion

In this paper, we proposed a new algorithm for the design of a linear system consisting of a set of adaptive correlation filters for recognition of partially occluded objects in noisy scenes. The system consists of a bank of composite optimum filters, which yield the best performance for different parts of the target. In the training stage, the system divides the target into a given number of parts to provide equal intensity responses of the system for each part of the target. With the help of computer simulation, we showed that the performance of the proposed algorithm for recognition partially occluded objects is much better that of common algorithms in terms of objective metrics.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work was supported by the Ministry of Education and Science of Russian Federation, Grant 2.1743.2017.