Abstract

Moving object detection is a fundamental step in video surveillance system. To eliminate the influence of illumination change and shadow associated with the moving objects, we proposed a local intensity ratio model (LIRM) which is robust to illumination change. Based on the analysis of the illumination and shadow model, we discussed the distribution of local intensity ratio. And the moving objects are segmented without shadow using normalized local intensity ratio via Gaussian mixture model (GMM). Then erosion is used to get the moving objects contours and erase the scatter shadow patches and noises. After that, we get the enhanced moving objects contours by a new contour enhancement method, in which foreground ratio and spatial relation are considered. At last, a new method is used to fill foreground with holes. Experimental results demonstrate that the proposed approach can get moving objects without cast shadow and shows excellent performance under various illumination change conditions.

1. Introduction

Moving object detection is a fundamental step in many image analysis applications including automated visual surveillance, video indexing, and human machine interaction. Many researches on moving object detection have been proposed, such as background subtraction, optical flow, and temporal differencing [1]. However, moving object detection techniques are often affected by factors such as shadow, illumination changes, and noise. Generally, different factors cause different consequences. In this paper, we focus on shadow and illumination factors.

The shadow causes many problems in object localization, segmentation, object detection, and tracking. Furthermore, the shadow may cause the following problems: objects merge with each other; object shapes may be altered; the background may be misclassified as foreground, and objects are missed [2]. Shadows associated with moving objects can easily be misinterpreted as additional objects. At the same time, if illumination changes, some background pixels may be detected as foreground pixels, which makes it hard to obtain the clean moving objects. Therefore, eliminating shadows and handling illumination changes have a great effect on the performance of subsequent steps such as tracking, recognition, classification, and activity analysis, which need the accurate detection of a moving object and the acquisition of its exact shape.

Some reviews about shadow detection have been reported in the literature [24], from the views of physical and geometrical to heuristic techniques. Most of the real-time shadow detection techniques work at pixel level and use color information for shadow detection directly or indirectly, wholly or partially. Shadow detection methods can be roughly divided into two categories: based on statistic and based on the video features.

The principle of statistic-based methods is to build pixel-based statistical models of detecting cast shadows. In [5], Zivkovic and van der Heijden use Gaussian mixture model (GMM) to detect moving cast shadows. This method consists of building a GMM for moving objects, identifying the distribution of moving objects and shadows and modifying the learning rates of the distributions. Joshi and Papanikolopoulos [6] propose semisupervised learning technique to tackle the static setting with human input for detecting shadows. Jung [7] uses a statistical approach combined with geometrical constraints for detecting shadows, but it is only for gray-scale video sequences. The statistic-based methods identify distribution of shadow pixel value and are robust in different scenes. Although these methods reduce false hits of property descriptions (i.e., color or intensity) of shadow, it cannot eliminate them. Generally, these methods require training video sequences, and shadow must be extracted from training video sequences by hand. Furthermore, it is hard to operate online and in real time because these methods need an additional learning step.

The video features-based methods are based on the fact that video features, including geometric, color, gradient, and brightness, and so forth, are different in shadow, background, and moving objects. These methods are more general than those by statistic. The methods based on image features can be further divided into the following four categories [2]: chromaticity-based, light physical characteristics-based, geometric relations-based, and texture-based.

The chromaticity-based methods are mostly on the basis of a single pixel and combined with color information to test the shadow. These methods make certain assumptions about the shadow properties: (1) a shadow darkens the background area on which it falls; (2) a shadows falls on the background plane; (3) a shadow changes luminance of an area more significantly than color [8]. If these assumptions are not satisfied, the accuracy of color-based approaches for shadow detection will decrease obviously. For better separation between intensity and chromaticity, several color spaces such as HSV [9], c1c2c3 [10], HSL [11], and RGB [12] have been developed to detect moving cast shadow robustly. Most of these methods are computationally inexpensive and easy to implement. However, they are sensitive to noise and will fail when shadow regions are darker or moving objects have similar color information with background.

The light physical characteristics-based methods are based on the linear attenuation model of the light intensity which assumes that illumination source produces pure white light. In outdoor environment, the main light source is sunlight (white) and the reflected light comes from the sky (blue). Generally, other light source is influenced by sunlight. If sunlight is blocked, the effect of reflected light from the sky increases, and the chromaticity of shaded region is shifted toward the blue component. Nadimi and Bhanu [13] presented a dichromatic model which takes into account two light sources to predict the color changes of the shaded region effectively. Further works consider a variety of different light intensity conditions and build a more general nonlinear attenuation model to adapt to the indoor and outdoor scenes [14, 15]. These methods are still using the chromaticity characteristics, and if the foreground color is closer to the cast shadow, there would be some mistakes. The geometric feature-based methods mainly consider that, under certain light source, shape, and size of shadow, position relationship between objects can be ensured and segment shadow from foreground [1618]. These methods do not need background model; however, they need detailed shape information and other information of foreground targets. These methods are limited to detect specific objects, and furthermore they need position relationship between objects and shadow. In the case of multiple shadows and multiple objects, these methods do not work well.

The texture-based methods assume that the texture of shaded region is invariant. These methods generally include the following steps: (1) detect foreground with shadow and (2) classify foreground as either foreground or shadow based on the texture correlation. If texture of candidate area is similar to that of background, it may be misclassified as the shadow. Leone and Distante [19] propose a moving cast shadows method based on Gabor functions and matching pursuit strategy. Zhang et al. [20] employ ratio edge as the ratio between the intensity of one pixel and its neighboring pixels to detect shadows. Confirming the existence of shadows, Xiao et al. [21] reconstruct coarse object shapes and then extract cast shadows by subtracting moving objects from one changed mask. The texture-based methods are effective without colors information and robust to illumination changes. However, texture-based shadow detection methods need to compare adjacent pixels, and their complexity is high.

Generally, we must obtain true foreground pixels before shadow detection by using methods above. A background subtraction method such as Gaussian mixture model (GMM) and its modified versions are some representative methods to detect moving pixels with shadow. However, in a dynamic scene, the varying background is detected as a moving object. In order to remove shadow directly and make it robust to illumination variations, we present a local intensity ratio model (LIRM), which shows illumination invariance. First, we use normalized local intensity ratio to replace the pixel to detect moving object without shadow via Gaussian mixture model. Second, erosion is used to get the moving objects contours and erase the scatter shadow patches and noises. After that, we get the enhanced moving object contours by contour enhancement method from erosion image and foreground image. At last, we use the local foreground density, and contour orientation to fill enhanced moving objects with holes. Experimental results demonstrate that the proposed approach can get moving objects without cast shadow and shows excellent performance under various changing illumination conditions.

This paper is organized as follows. Section 2 introduces illumination change model, presents LIRM, and analyzes its distribution. The process of Gaussian mixture model for foreground detection and corresponding postprocess algorithm are described in Section 3. Section 4 analyzes foreground detection results in four test videos with different light conditions and compares the results with other methods. Finally Section 5 concludes the paper.

2. Illumination Change Model

2.1. Local Intensity Ratio and Illumination Invariant

Zhang et al. [20] proved that the ratio edge is illumination invariant and use it to classify each moving pixel into foreground object or moving shadow. Inspired by this work, we define the local intensity ratio and analyze its illumination invariance.

First, the definition of the local intensity ratio (LIR) is where is the intensity of pixel ; is the number of pixels in a local region; is the local region of given pixel (a rectangle region) and defined as follows: and is the size of the local region.

For the images or videos acquired by a fixed camera, the intensity value of the pixel is where is the reflectance of object surface in pixel , that is, the reflection coefficient; is the amount of light power per receiving object surface area in pixel ; is the sensor sensitivity of the camera. The light in scenes can be divided into direct light of light source and scattered light of environment. We assume that the light source is a distant light source, and the light is parallel light, such as sunlight. Light of environment is scattered light, light direction is random, and its intensity in scenes is assumed to be constant. If objects are occluded, the scene can be divided into three cases: illuminated area, penumbra area, and umbra area [22, 23], and the light intensity of the real target pixel is where is the intensity of ambient light; is the intensity of light source; is the transition inside the penumbra which depends on the light source and scene geometry, and ; is the angle between light source direction and surface normal.

While using RGB model to express light intensity, (4) can be shown as follows

where is the number of color channels, , and 3. For simplification, in this paper we consider only one light source and ignore the influence of different color channels.

The analysis of the local intensity ratio under the three different scenes is as shown in Figure 1.

If all the local regions belong to one of the three areas shown in Figure 1, that is, or , or , and , , and express the illuminate area, penumbra area, and umbra area, respectively, we can obtain the following results by formulas (4), (3), and (2) and assume .

Case 1 (). Consider

Case 2 (). Consider

Case 3 (). Consider
If the pixels are in the same local region, we assume that (1) pixels in the same local region belong to the same object plane, thus ; (2) sensor sensitivity of camera in all local regions is the same; ; (3) intensity of penumbra is also the same; .

According to the previous assumptions, we get where .

By the previous assumptions, and formulas (6), (7), (8), and (9), we also get

If local region belongs to one of the three illumination cases, local intensity ratio is influenced only by reflectivity. Therefore, local intensity ratio is related to the reflectivity of target surface, not to the illumination changes and types, that is, illumination invariance.

The local intensity ratio of pixels is constant under different illumination conditions according to formula (12). As a result, local intensity ratio model not only removes the influence of illumination, but also eliminates shadow of the foreground targets.

2.2. Distribution of Local Intensity Ratio

Generally, it is assumed that the images are corrupted by Gaussian white noise, which can be expressed as where represents actual pixel value; denotes real pixel value in scene; represents noise. The actual local intensity ratio is defined as

Usually, ; formula (14) can be denoted as where is the real local intensity ratio:

If pixel value in scene is constant, obeys Gaussian white noise distribution. Therefore, local intensity ratio and pixel value have the same distribution. On the other hand, if is small, the white noise will be amplified.

Theoretically, the local intensity ratio ranges from 0 to infinite. In order to be consistent with the scope of a pixel value, the normalized definition of the local intensity ratio is as follows:

3. Moving Object Detection Based on Local Intensity Ratio

In this section, we present the detailed procedure of shadow removing. The proposed method is a multistage approach and the flow chart is shown in Figure 2.

Generally, the shadow detection methods utilize object detection or image segmentation algorithms to detect the foreground with shadow and then classify the foreground as foreground or shadow. According to the results in the previous section, the local intensity ratio obeys Gaussian distribution. This paper uses the normalized LIR to replace the pixel to detect foreground, and the process is shown in Figure 2. In this paper, Gaussian mixture model [5] is used to acquire the foreground. In mixture of Gaussians, the recent history of each pixel is maintained using Gaussian distributions and the value of typically chooses from 3 to 5. Each distribution has its associated attributes like weight , mean , and variance . Each Gaussian updates its parameters using LIR in every new frame.

After detecting foreground, the foreground target with little shadow can be obtained (called foreground image as follows, ). Meanwhile, some parts of the foreground object may be darker than the background and easily be false detected as background. The false detected parts of shadow are discrete; the false detected parts of moving object like holes are surrounded by foreground pixels (such as contour). The moving objects contour can be detected, while the shadows contour can be removed at most time. In order to get enhanced moving objects without discrete shadow patches, erosion is used to erase the scatter and noises and to get moving objects contours image without shadow (called contour image as follows, ). Then the contour image is used to enhance moving objects by contour enhancement method. The enhancement procedure is as follows: for a pixel being not foreground pixel in contours image, we calculate the weighted foreground pixel ratio of a local region in the contour image and foreground image as follows: where represents the rate of the enhanced moving object contours in , is the number of foreground pixels in a neighbor region of pixel in contour image, is the number of foreground pixels in a neighbor region of pixel in foreground image, and and are the weight to contour image and foreground image. We select , in our experiments. is the number of pixels in a local region; is the local region of given pixel (a rectangle region) and defined as in (2).

On the other hand, we consider eight directions of a pixel, determine whether there is foreground pixel in each direction of the local region and count how many times it reaches foreground pixels, as shown in Figure 3: where means whether there is a foreground pixel in the th direction, and if it exists, its value is 1, otherwise 0. is the ratio of foreground in eight directions. After calculating and , by verifying whether or not the and of a pixel is in the feasible range, the pixel is determined by where and are the threshold values.

So, we get the enhanced foreground contours image (called enhanced foreground contour image, ). However, some moving object pixels are false detected as background pixels, such as holes in the image. To get integral foreground object, we use filling method to eliminate false detected pixel for background pixel (Figure 4). For a pixel which is detected as background pixel in enhanced foreground contour image, we first consider eight directions of a pixel: determine whether there is foreground pixel in each direction of the local region and count how many times it reaches foreground pixels, as expressed in (19). We also consider the ratio in a local region between numbers of foreground pixels in foreground image and in contour image and the numbers of pixels in the same region: where is the ratio of pixels detected as foreground in local region, is the number of pixels which are detected as foreground, and is the sum of pixels in the local region.

After calculating and , we can get the option that the pixel is moving object pixel or not by where and are the threshold values; denotes the final value of pixel .

4. Experiment Results

In this section, we describe the test videos used in this study, measure the performance of the proposed method, and compare the performance to the state-of-the-art methods. In general, state-of-the-art methods use different steps to remove shadow and reduce the influence of illumination changes. To compare with other methods in different steps, we first compare the shadow removing results and then compare the results under various illumination change conditions in different data sets.

4.1. Experiment Conditions and Parameter Setting

The size of the local rectangle region in (2) is important to the proposed method. If the size is too large, its inner region cannot meet the assumptions (9). On the other hand, if too small, the change of its value is also small. In our experiment, is set as 2.

4.2. Experiment Results

To test the effect of shadow removing based on local intensity ratio and the effect of foreground detection, we use four typical videos from the ATON video set [4] as the test videos, which are Campus, Hallway, Highway 1, and Intelligent room. Intelligent room and Hallway are typical indoor environment. In the videos, people are walking in scene, and their shadows are mapped to the background. The Campus is a video of campus parking lot, and there are shadows of people and cars in the scenes.

To compare the performance of the proposed moving objects detection method with other shadow detection methods, we assume that if shadow is detected as background, it is correct detection, otherwise it is false detection, and if moving object is detected as foreground, it is correct detection, otherwise it is false detection. The proposed method is compared with five methods introduced by [2], which are chromaticity-based method, physical method, geometry-based method, Small region (SR) texture-based method, and large region (LR) texture-based method. To estimate the effect of these methods, shadow detection rate () and shadow distinguishing rate () are the discriminate parameters [4], and their definitions are as follows: where is shadow; is foreground. is the number of the pixels which are detected correctly; is the number of shadow pixels which are mistakenly detected as foreground; is the number of foreground pixels detected correctly after removing shadows. is the number of foreground pixels which are mistakenly detected as shadow. The shadow distinguishing rate is concerned with maintaining the pixels which belong to the moving object as foreground. In this paper, we use the average of the two rates as a single performance measure (avg).

Table 1 is the compared results among the method based on LIRM and other methods in [2]. Table 1 shows the average shadow detection and discrimination rates on each test sequence. From the compared results, the method proposed in this paper is better at most time. The main cause is that the presented method directly detects the foreground from video data, but other methods extract the shadow from foreground by true background and foreground with shadow. Our method use local intensity ratio to replace the actual pixel values; complexity is lower. Figure 6 is the result of proposed method in moving object detection without shadow in different scenes.

Figure 5 represents compared results of the LIRM moving object detection via GMM and postprocess. From Figure 5, the postprocess obviously improves the shadow discrimination rate and slightly reduces the shadow detection rate. However, the average of the two rates is increased obviously. So, the postprocess enhances the moving object detection result.

Finally, we show that the proposed method rapidly adapts to variations in environment. To test moving object detection under illumination changes condition, we compare the result of other techniques which are GMM [5], adaptive GMM [24] that controls the learning rate of the GMM adaptively, LBP [25] that detects moving objects using texture and edges, Xu’s methods [26] that use a color chromaticity, Pilet’s methods [27] that use illumination and spatial likelihood, and Choi’s methods [28] that develop chromaticity difference model and brightness model that estimates the intensity difference and intensity ratio of false foreground pixels. For the quantitative comparison, we have made three test videos [28]. Figure 7 represents moving object detection results of each method. From Figure 7, we find the postprocessing of the proposed method can strengthen the moving objects and remove the shadow patches and the noise, but it sometimes enhances the edge of part shadows incorrectly and makes moving objects’ edge fuzzy. In a word, the proposed method adapts to the changed background completely and detects moving object pixels without shadow pixels successfully. Although some other methods are robust to illumination, their detection moving pixels results show that they cannot detect moving object pixels meanwhile removing shadow.

5. Conclusions

In this paper, we propose a method to detect moving object meanwhile removing cast shadow, which is robust in changing illumination condition. This paper presents a local intensity ratio model according to illumination model and proves its illumination invariance. Meanwhile, if the noise in video obeys the Gaussian distribution, the local intensity ratio also obeys the Gaussian distribution. In the process of Gaussian mixture model to obtain foreground, this paper replaces actual pixel value by local intensity ratio. Finally, postprocess methods are used to get pure moving object pixels without noise, such as erosion method to remove shadow pitch and noise which is false detected as foreground, foreground contour enhanced method to strengthen moving object contour, and filling methods to deal with holes in foreground object. Experiment results demonstrate that the method presented by this paper can eliminate shadow on foreground effectively, and is robust in changing illumination condition. However, in some scenes that foreground is similar to background or foreground is similar to shadow, foreground may be easily detected as background. If illumination changes severely, such as turn on/off light in indoor scene, the background may be easily detected as foreground, and the performance drops.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Jinhai Xiang is supported by the Fundamental Research Funds for the Central Universities under Grant no. 2013QC024. Jun Xu is supported by the National Natural Science Foundation of China under Grant no. 11204099 and self-determined research funds of CCNU from the colleges basic research and operation of MOE. Weiping Sun is supported by the National Natural Science Foundation of China under Grant no. 61300140.