Abstract

This paper proposed a shot boundary detection approach using Genetic Algorithm and Fuzzy Logic. In this, the membership functions of the fuzzy system are calculated using Genetic Algorithm by taking preobserved actual values for shot boundaries. The classification of the types of shot transitions is done by the fuzzy system. Experimental results show that the accuracy of the shot boundary detection increases with the increase in iterations or generations of the GA optimization process. The proposed system is compared to latest techniques and yields better result in terms of parameter.

1. Introduction

With the growth of the Internet, the generation of multimedia contents is also increasing. This leads to the problem of effective utilizing and managing the video data. Effective utilizing and managing of the multimedia contents need effective indexing and retrieval system. This is much more difficult in the case of video. For an effective video retrieval system, the content of the video should be understood so that proper indexing system can be created for better video retrieval. The content of the video can be taken by first performing the video segmentation, dividing the video into meaningful shots, and analyzing each feature of the segments (shots) which is the key feature of each segment. A scene is a combination of more than one shot with different camera angles or a combination of similar shots.

In video segmentation (shot boundary detection), the video is divided into meaningful scenes so that each scene can be analyzed for finding the key feature(s). Shot boundary detection mainly consists of finding the two types of transitions abrupt transition and gradual transition [1, 2]. Abrupt transition (also known as hard cut) is the sudden change of the consecutive frames in a video which marks the scene change due to sudden release of the camera rolling. Gradual transition (also known as soft cut) is of four types: fade-in, fade-out, dissolve, and wipe transitions. All these gradual transitions are a result of the editing effect in a video. Fade-in and fade-out are caused by the lightness value. In fade-in, a picture appears slowly from a darker (usually black) empty frame. In fade-out, a picture slowly diminishes to an empty frame (usually black frame). Dissolve and wipe transition is an effect due to overlapping of the current scene and the future scene. In dissolve, the overlapping is done in such a way that the current scene starts disappearing and the future scene starts appearing simultaneously. In wipe, the overlapping is done in such a way that the future scene grows over the current scene until the future scene appears completely.

Many researchers [13] have tried to detect the transitions (known as shot boundary detection or temporal video segmentation) in a video in compressed and uncompressed domain. MPEG (Motion Picture Expert Group) provides video formats which provide a large area of analyzing frame features in the compressed domain using motion vectors [4], Discrete Cosine Transform coefficients [5], and so forth. The frame feature extraction can be globally and locally. Global feature extraction considers the whole feature of the frame such as the pixel value [6]. Local feature extraction considers some regions of the frame and the features in that region are only taken or in other senses the necessary/important features of the whole frame are considered. MSER [7], SURF [8], and so forth are some of the popular local feature descriptor used for shot boundary detection. These features are extracted from each frame of the video and calculate the differences between consecutive frames to find out the transitions. The gradual transitions are rather difficult than the abrupt transition as it may have the same effect with large object motion and camera motion [1]. Thus, it is necessary to extract features which give less/no effect with large object motion, camera motion, or lighting effect.

Intensity histogram and Color Histogram Difference are of the effective, simple, and widely used methods for shot boundary detection in the uncompressed domain which is not sensitive to motion [6]. In [10, 11], SVD is applied to frame histogram matrix and a similarity measure is applied to find out the abrupt and gradual transitions. In [10], consecutive frames between two frames are skipped for analysis, which reduces the computational time drastically. In [9], HSV color histogram and an adaptive threshold are used for shot boundary detection and also the algorithm can detect flashes. In [8], entropy and SURF features are used to find the cut and gradual transitions where the intensity histogram is used to calculate the entropy of a frame.

Genetic Algorithm [12, 13] and Fuzzy Logic [6, 14, 15] have been used for shot boundary detection. In [16], color histogram is generated using Fuzzy Logic for abrupt and gradual transition detection. In [17], an Adaptive Fuzzy Clustering/Segmentation (AFCS) algorithm is proposed and the fuzzy clustering algorithm is used for image segmentation where it takes into account the inherent image properties like the nonstationarity and the high interpixel correlation. A Multiresolution Spatially Constrained Adaptive Fuzzy Membership Function is used for tuning the AFCS. In [18], Genetic Algorithm is used to generate the membership function of the fuzzy system for image segmentation.

In this paper, we introduced a method of shot boundary detection using Fuzzy Logic system optimized by GA. Fuzzy system is used to classify the video frames into different types of transitions (cut and gradual) using normalized Color Histogram Difference. GA is used as optimizer to find the optimal range of values of the fuzzy membership functions. The result shows that the combination of this feature is efficient and the accuracy increases with increase in iterations/generations of GA.

The paper is organized as follows. Section 3 explains the feature extraction of the system. A detail explanation of the GA optimized fuzzy system to find out that the range of values of the membership functions is given in Section 4. Experimental Results and Discussion and Conclusion are given in Sections 5 and 6, respectively.

3. Feature Extraction

This section discussed the feature extraction used in our proposed system.

3.1. Color Histogram Difference

Color histogram is a global feature extraction technique which is one of the simplest and widely used image feature extractions for shot boundary detection [19]. It is nonsensitive to motion [6, 14]. In [6], the normalized color histogram between two frames, say and frames, in a video is defined as follows:where is the number of pixels in a frame, is the number of red pixels of th frame in bin, and vice versa. , , and represent red, green, and blue components of a frame. It is observed that (1) yields a value with an interval . yields a value 0 when the and frames are same and the value goes on increasing as the similarity between and frames decreases.

4. Fuzzy Logic System with GA Optimization for Finding the Value Range of the Membership Function

Genetic Algorithm (GA) is used as optimizer to find optimal values of the membership functions of the Fuzzy Logic system [20, 21]. The steps are shown as follows.

4.1. Fuzzification

First we define the input and output variables of the fuzzy system.

The input variables are(a) is with linguistic values negligible (N), small (S), significant (Sig), large (L), and huge (H);Variable is the histogram difference value which is the difference between and frames and is computed using normalized histogram intersection;(b) is with linguistic values negligible (N), small (S), significant (Sig), large (L), and huge (H);Variable is the histogram difference value which is the difference between and frames;(c) is with linguistic values negligible (N), small (S), significant (Sig), large (L), and huge (H);Variable is the histogram difference value which is the difference between and frames.

The output variable is(a)transition with linguistic values no (NO), abrupt (AB), and gradual (GR).Variable transition is the type of transition that can occur from one frame to another. no represents the frame where there is no transition.

The rule base consists of 28 rules of the form as in [6]. In Table 1, rules for detecting no transition (frame without any transition) are given. For detecting gradual transition and abrupt transitions, the rules are provided in Tables 2 and 3, respectively.

4.2. Optimization with Genetic Algorithm

GA will be used to find the range of values of the membership function. We use the triangular membership function. The values of the input variables , , and range from 0 to 10. The values of the output variable are 0, 5, and 10 for no transition, gradual transition, and abrupt transition, respectively.

4.2.1. Initialization

The unknown variables in this problem are the lengths of the bases of the five membership functions negligible, small, significant, large, and huge which will be same for the three input variables , , and .

We will use 6-bit binary string to define the base of each five membership functions. The five strings, each of 6 bits, are then concatenated to form a 30-bit string which will be a solution for the population.

4.2.2. Evaluation

The strings are mapped/encoded to values representing the lengths of the bases of the membership functions. This mapping process is computed using the following equation:where and are user-defined constants and they are usually chosen as the minimum and the maximum value of the variable. is the decimal value of each substring, is the number of bits in each substring, and is the base of the membership functions.

In the beginning, the GA randomly creates a population of 10 strings. For a string, the five bases of the five membership functions are calculated using (2).

Using the bases, we then find the initial, middle, and the final value (i.e., , middle, and ) of the triangular membership functions of the linguistic values as given in Table 4.

, middle, and are the initial, middle, and the final value of the triangular membership functions of the linguistic values. is the fuzziness index which is a constant.

We then find the degree of the membership of the values in Table 6 using the rules. Using the degree of membership of the values in a rule, we then find the weight of the rule.

We have the following rule:

( is ) and ( is negligible) and ( is negligible) then is abrupt.

We find the degree of membership of the values contained in the rule as follows:deg of mem for = huge(input1, );deg of mem for = negligible(input2, );deg of mem for = negligible(input3, ).

We then find the weight of rule as follows:

In this way, we then find the weight of all the 28 rules. Using the weights, we then compute the crisp output for row input values in Table 6 for a string/solution:where are preset values determined by us which is either 0, 5, or 10.

The sum of the squares of the above difference between and for all the values in Table 6 becomes the fitness equation. The equation is shown as follows:The fitness is subtracted from 1000 to convert the function from minimization to a maximization problem.

The above processes are repeated for all the strings/solutions of the population to find the fitness of all the strings.

4.2.3. Selection

We then choose a set of strings whose fitness value is greater than some specific number.

4.2.4. Reproduction

The population is modified using operators, namely, crossover and mutation.

These whole processes (evaluation, selection, and reproduction) are repeated for many generations and finally we then choose the bit string with largest fitness value.

This string with the largest fitness value will give the most optimal range of values for all the membership functions of the linguistic values.

After the GA finds the optimal values for the membership functions of the Fuzzy Logic system, the rule evaluation and the defuzzification procedure of the fuzzy system will start.

4.3. Rule Evaluation

We need to find the degree of membership of the linguistic values of the input variables of the fuzzy system in the range of 0 to 1. We used the triangular membership function to find the degree of membership for the input variables. As shown in Figure 1, to and to are the range of values for a variable of a particular linguistic value.

4.4. Defuzzification

To find the crisp or actual output which is either no transition, gradual, or abrupt, we calculate the weights of the set of rules of the fuzzy system using the degree of membership.

Finally, we can calculate the crisp output by using (4).

5. Experimental Results and Discussion

5.1. Dataset

TRECVID 2001 video dataset for shot boundary detection is used for experimental results. TRECVID provides a set of video test data in MPEG compressed for video segmentation. TRECVID 2001 test video data is available on the Open Video Project. The details of the videos are given in Table 5.

5.2. Discussion

For discussion of the proposed system, two videos from the TRECVID 2001, namely, Airline Safety (D5) and Perseus Global Watcher (D6), are used. Table 7 shows the strings of the first generation GA operation with their decimal values, base values, value range of the membership function, and the fitness value. The strings are sorted according to their fitness value. The fitness is calculated as a difference between the actual outputs of some input data as shown in Table 6 and the crisp output of the same input data calculated using the membership function optimized by GA. Table 8 shows the string with largest fitness value in different generations. We can see from the table, as the generation increases, that the fitness also increases.

Figures 2 and 3 show the graph of shot boundary detection of two videos by our Fuzzy-GA system. The -axis represents the iteration/generation of the GA operations. The -axis represents the gradual and abrupt transitions of the video frames by our Fuzzy-GA application. We can see from the graph, as the iteration/generation increases, that the detection of the transition of the frames also increases.

In Figure 2, it is observed that, using the range of the membership function value obtained in 50000 (5K) iteration/generation of the GA optimization given in Table 8, our proposed system detects 20 gradual transitions and 44 abrupt transitions. The actual gradual and abrupt transitions of the video are 26 and 45, respectively, as given in Table 5.

In Figure 3, 40000 (40K) iterations/generations of our proposed system can detect 40 gradual transitions and 38 abrupt transitions which are out of actual 45 gradual transitions and 40 abrupt transitions as given in Table 5.

Figures 4(a), 4(b), and 4(c) show three frames with abrupt transition of a video. The frame numbers of Figures 4(a), 4(b), and 4(c) are 6359, 6360, and 6361, respectively. The values of the input variables of the fuzzy system of this abrupt transition of the frames are as follows:(1), (2) , and (3) .

Using the membership function value range of 10000 generation shown in Table 8, we then find the degree of membership of the linguistic values of the input variables present in the rules. We then calculate the weights of the set of rules using the degrees of membership. The weights of the 28 rules starting from rule number 0 are 0, 0, 0, 0, 0.0355, 0.0079, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, respectively. Finally, using the weights, we calculate the crisp output. crisp output = , which indicates abrupt transition.

Figures 5(a), 5(b), and 5(c) show frames with gradual transition of the video (in case of dissolve). The frame numbers of 8, 9, and 10 are 4675, 4676, and 4677, respectively. The values of the input variables of the fuzzy system of this abrupt transition of the frames are as follows:(1), (2) , and (3) .

Using the membership function value range of 10000 generations shown in Table 8, we then find the degree of membership of the linguistic values of the input variables present in the rules. We then calculate the weights of the set of rules using the degrees of membership. The weights of the 28 rules starting from rule number 0 are 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2089, 0.0514, 0, 0, 0.1107, 0, 0, 0, 0, 0, 0, 0, 0, 0. Finally, using the weights, we calculate the crisp output. crisp output = , which indicates gradual transition.

Similarly, Figures 6(a), 6(b), and 6(c) show another gradual transition (i.e., fade transition) which occurs between frames 4, 5, and 6, respectively. The values of the input variables of the fuzzy system of this abrupt transition of the frames are as follows:(1), (2) , and (3) .

Using the membership function value range of 10000 generations shown in Table 8, we then find the degree of membership of the linguistic values of the input variables present in the rules. We then calculate the weights of the set of rules using the degrees of membership. The weights of the 28 rules starting from rule number 0 are 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2939, 0.0829, 0, 0.2464, 0.0639, 0, 0, 0, 0, 0, 0, 0, 0, 0. Finally, using the weights, we calculate the crisp output. crisp output = , which indicates a gradual transition.

A pictorial representation of the fuzzy membership functions for inputs and output using the bases of 40K iterations or generations of the Genetic Algorithm is shown in Figure 7.

5.3. Evaluation

Recall, precision, and parameters are used for evaluation of the proposed system which is given inThe proposed system is compared with the latest techniques SBD using SVD and pattern matching [10] and SBD using Color Feature [9] and shows better performance in terms of parameter. A comparison of the computational time is also provided in Table 9.

The computational time of the proposed system for all the videos in Table 5 is provided in Table 10. For each iteration/generation, the computational time includes the approximate time taken in seconds by the GA process, feature extraction, and the shot detection of the proposed system for all the videos.

In Table 11, recall, precision and are represented by , , and , respectively.

6. Conclusion

This paper proposed a shot boundary detection using Genetic Algorithm and Fuzzy Logic. In this proposed system, GA is used as an optimizer for the fuzzy system. The GA system uses a preobserved actual input output values of shot boundaries for some videos for calculating the range of fuzzy membership values for the fuzzy system. The fuzzy system is used as a classifier which classifies the frames into abrupt and gradual transitions by using GA as optimizer. Normalized Color Histogram Difference is used for feature extraction and for finding the differences between two consecutive frames in a video. From the experimental result, it is observed that the detection of shot boundaries increases with increase in iteration or generation of the GA optimization process. Experimental results show that the proposed system yields better results and low computational time as compared with the latest techniques.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.