Abstract

We use a new method based on discrete fuzzy transforms for coding/decoding frames of color videos in which we determine dynamically the GOP sequences. Frames can be differentiated into intraframes, predictive frames, and bidirectional frames, and we consider particular frames, called Δ-frames (resp., R-frames), for coding P-frames (resp., B-frames) by using two similarity measures based on Lukasiewicz -norm; moreover, a preprocessing phase is proposed to determine similarity thresholds for classifying the above types of frame. The proposed method provides acceptable results in terms of quality of the reconstructed videos to a certain extent if compared with classical-based F-transforms method and the standard MPEG-4.

1. Introduction

A video can be considered as a sequence of frames of sizes ; a frame is an image that can be compressed by using a lossy compression method. We can classify each frame as intraframe (for short, I-frame), predictive frame (for short, P-frame), and bidirectional frame (for short, B-frame) which is more compressible than I-frame. A B-frame can be predicted or interpolated from an earlier and/or later frame. In order to avoid a growing propagation error, a B-frame is not used as a reference to make further predictions in most encoding standards except in AVC [1]. A frame can be considered as a P-frame if it is “similar” to the previous I-frame in the frame sequence; otherwise, it must be considered as a new I-frame. This similarity relation between a P-frame and the previous I-frame is fundamental in video-compression processes because a P-frame has values in its pixels very close to the pixels of the previous I-frame. This suggests to define a frame containing differences between a P-frame and the previous I-frame, called Δ-frame which has a low quantity of information and hence it can be coded with a low compression rate. A P-frame is decoded via the previous I-frame and the Δ-frame. In the MPEG-4 method [2, 3], that adopts the JPEG technique [4] for coding/decoding frames, the I-frames, P-frames, and B-frames are arranged in a Group of Picture (for short, GOP) sequence. A B-frame is reconstructed by using either the previous or successive I-frame. Here the results of [5] are improved by using a technique based on F-transforms for coding B-frames. For convenience, we assume that the first frame of a video is an I-frame. We assign an ID number to each frame of the video. Then we can say that the th frame is a B-frame or a P-frame if it is “very similar” to the previous th I-frame in the sense that its similarity a parameter defined on the Lukasiewicz -norm (see formula (12)) is greater than a threshold value [5]; otherwise the th frame is assumed to be a new I-frame as the first frame of the successive GOP sequence.

The first algorithm is used for determining the GOP sequences; the second algorithm is used for determining the type of P-frame or B-frame. The first frame of the GOP sequence is always an I-frame and the last frame is a P-frame. The function “analyze GOP sequence (ID1, ID2)” reported in Algorithm 1 describes this process, where ID1 is the ID of the first I-frame and ID2 is the ID of the last P-frame in the GOP sequence. This function is used for determining if the th frame in the GOP sequence, where , is a B-frame or a P-frame. We define a threshold similarity , and we compare it with the frame whose ID is formed from the integer contained in the mean of the previous I-frame or P-frame and the th frame by obtaining a similarity value . In the array element we insert the ID number of the last frame after the th frame for which holds. The variable contains the ID number of the previous I-frame or P-frame; it is initially called ID1; the variable points to the last frame in the GOP sequence; it is called ID2.

Algorithm 1 (analyze GOP sequence (ID1, ID2)). Pseudo-code for determining a GOP sequence(1) of the first I-frame // is the ID of first frame of the video(2) number of frames // is the ID of the last P-frame of the video(3)(4)IF (5)Calculate the similarity between the th frame and the th frame(6)If , (a)the th frame is a B-frame or a P-frame and is inserted in the GOP sequence(b)(7)Else (a)analyse GOP sequence (b)(c)go to (8)End.

Algorithm 2. Pseudo-code for determining type of frames(1) is the ID of the first frame of the GOP sequence(2) is the ID of the last P-frame of the GOP sequence (3) For each in (4)(5) Create the th frame as a new frame whose normalized pixels are obtained as the mean between the normalized pixels of the th and th frames (6) Calculate the similarity between the th and th frames. If , (a)(b)Else (c)go to step (7) next for(8)(9) The frames between the th and NPMin-th frames are labelled as B-frames(10) The -th frame is labelled as a P-frame(11) If then(a), (b)go to step (12) End.

In our approach we determine a GOP sequence at each step. The frame after the last P-frame is the I-frame of the new GOP sequence. After determining the GOP sequences of the color video, we use the F-transforms [5, 710] for compressing the frames. The F-transform method has been developed in [5]. In this paper each frame is converted in the space. Indeed, since the human eye perceives an image mostly in the band (brightness) with respect to the and bands (chrominance), we can use a stronger compression rate for coding the image in and bands with respect to that one used for coding the image in the band, without loss of information in the reconstructed image. In [5] the authors show that the quality of the reconstructed images is better than the one obtained using the F-transform method directly in the space (see also [11, 12]). The proposed method is widely discussed in Section 4. In Sections 2 and 3 the theory of F-transforms and its application are recalled for image compression, respectively. In Section 5 the results are deduced on a large color videos dataset.

2. Fuzzy Transforms

We recall from [9] some essential definitions. Let and be points (nodes) of such that . The fuzzy sets form a fuzzy partition of if

for any ;

if , where and , ;

is a continuous function on ;

is strictly increasing on the interval for and is strictly decreasing on the interval for ;

for any , .

We say that constitute a symmetric fuzzy partition if the following hold:

equidistance of the nodes, that is, for , where ;

for any and ;

   for any and .

Considering functions taking values on a finite set , , we suppose that is sufficiently dense with respect to a fuzzy partition of , that is, if and for each there exists an index such that . Now let , be other assigned nodes such that . Let be another fuzzy partitions of . Let be a function defined on the finite set , with and , where (resp., ) is sufficiently dense with respect to some fuzzy partition of (resp., of ). Then , and , is the fuzzy matrix which is defined as discrete F-transform of with respect to and if the following holds: Afterwards we define to be the inverse F-transform of with respect to and as The following theorem holds.

Theorem 3. Let be a function assigned on . Then for every , there exist two integers , with , and some fuzzy partitions of and of for which and are sufficiently dense with respect to these partitions, respectively, and such that the following inequality holds for every , :

3. The Coding/Decoding Process

Let be an image of sizes , considered as a fuzzy relation ; that is, , with being the normalized value of the pixel with respect to the length of the scale used. For simplicity, let , , , , and . Let the fuzzy sets and , with and , form a fuzzy partition of and , respectively. Following [8], is subdivided in submatrices of sizes , , called blocks, coded to matrices of sizes , , via the following discrete F-transforms for every as and decode via defined as which approximates in the sense of Theorem 3; that is, there exist, for every , two integers , such that the following holds for every : Unfortunately the previous theorem does not suggest a method for finding such integers, and then we try to assign values to and for getting compression rates given by which are useful to code any original block . The recomposition of the blocks gives the image whose PSNR with respect to the original image is calculated via the following well-known formula: In accordance with [8], in the proposed experiments the best results are deduced with the symmetric fuzzy partitions and defined as where , , , and where , , and .

4. Our Proposal

The proposed process includes the following steps:(1)each color frame, seen as a fuzzy relation, is converted from the space to the space ;(2)a classification of the frames is made via the previous algorithms;(3)the compression rate of the I-frames is the mean of three (possibly different) compression rates used in the three bands, that is, if any block of an I-frame has sizes (say) in the band and is coded to a block of sizes (say) for which the related compression rate is given by and the analogous meaning has the symbols , . Of course we have . A similar meaning can be given to (resp., ) for Δ-frames (resp., R-frames).

A color image in the space with pixels normalized in is converted to space via the formula [5] Since no misunderstanding can arise, a frame is denoted by a capital letter instead of its ID number in a sequence of a video. In step , the similarity measure adopted in [5] is used for classifying the type of frame. It is based on the Lukasiewicz -norm between two frames and , with , defined as In the th band we will use the symbol . The authors [5] have shown that Lukasiewicz -norm provides the best results with respect to other -norms as the classical Min and the arithmetical product. For convenience, we assume that the first frame of a video is an I-frame. For determining a GOP sequence in a single band, it can be verified if the successive frame is a B-frame or a P-frame, that is, if it is “very similar” to the preceding I-frame in the sense that , with being a prefixed threshold value; otherwise is assumed to be a new I-frame. We determine a GOP sequence in an assigned band using (12) with the following process:(1)we consider the first frame as an I-frame;(2)we compare with the successive frame ; (3)if , the frame is a B-frame or a P-frame and is enclosed in the GOP sequence. Then we consider the successive frame and go to step ; otherwise is a new I-frame. The previous frame is a P-frame and represents the last frame of the GOP sequence.

After determining the GOP sequence, we check if each frame of the sequence is a B-frame or a P-frame by using the previous algorithms. In step we finally compress the frames. In order to reduce the mean compression rate for a P-frame, in [5] and references therein, the authors introduce a “difference” frame D, called Δ-frame, between a P-Frame and I-frame by defining as The usage of the Δ-frame has the advantage of using a stronger compression rate for the P-frames with respect to the I-frames; indeed a P-frame has values in its pixels very close to the pixels of the previous I-frame. Hence the Δ-frame in (13) has a low quantity of information and it can be coded with a low compression rate. Then, if and are the frames obtained after coding/decoding and , the frame (reconstruction of the frame ), with , , , is deduced from the membership values of and via the following formula: Now we present a new schema for coding/decoding a B-frame which is inserted in a GOP between an I-frame and a P-frame . Then we consider a frame given by and we code it. Let be the frame obtained after decoding , with . All the coding/decoding processes are realized via the F-transforms with the symmetric fuzzy partition given in Section 3. We reconstruct the B-frame, say , by combining the membership values of , , and via the following formula: We use the formulas (14) and (16) for reconstructing the P-frames and the B-frames in the videos, respectively. In accordance with [5], we convert each image in the space by using the formula For simplicity of presentation, in our tests here we adopt , . In [5] a preprocessing phase is adopted for determining the threshold calculated with the following steps: (1)if the initial frame is considered as an I-frame, we compress in the th band with compression rate ; each successive frame is a P-frame and we archive the similarity value calculated with formula (12); we compress the Δ-frame in the th band with compression rate equal to (less than of ) and if is the related decompressed frame, we derive the P-frame via (14);(2)each P-frame is also coded in the th band with compression rate and let be the decoded P-frame by using directly the F-transforms; then we determine the difference ;(3)the trend of diff(PSNR) is plotted with respect to the similarity in each band of the image. As similarity threshold, we assume that value of such that diff(PSNR) does not exceed a prefixed limit is equal to 3 (cf. [5] for details);(4)then the threshold is given by with being the first I-frame of the GOP sequence. In our tests, in addition we put in the preprocessing phase.

5. The Results

For brevity of discussion, we show the results obtained for the color video “tennis2” [6]. We present all the results by assuming for the I-frames, for the Δ-frames, and for the R-frames. Figures 1(a)1(d) show the first frame of the video and the corresponding single-band images in the space, respectively. As example of Diff(PSNR), Figure 2 contains the plots of Diff(PSNR) ≤ 3 for the similarity values obtained in , , and bands for which we choose (as average). As examples we show some Δ-frames and R-frames in each band.

(i)   Band. The first P-frame is given by the fourth frame. Figure 3(a) contains the Δ-frame obtained by using (13) from the fourth frame and the first frame (an I-frame). The second and the third frames are B-frames. Figure 3(b) (resp., Figure 3(c)) shows the R-frame obtained by using (15) from the second (resp., third) frame, the first frame (an I-frame), and the fourth frame (a P-frame).

(ii)   Band. The first P-frame is given by the sixth frame. Figure 4(a) contains the Δ-frame obtained by using (13) from the sixth frame and the first frame (an I-frame). The frames 2, 3, and 4 are B-frames. Figures 4(b)4(d) show the R-frames obtained by using (15) from the first frame (an I-frame), the B-frames 2, 3, and 4, and the sixth frame (a P-frame), respectively.

(iii)  Band. The first P-frame is given by the fifth frame. Figure 5(a) contains the Δ-frame obtained by using (13) from the sixth frame and the first frame (an I-frame). The frames 2, 3, and 4 are B-frames. Figures 5(b)5(d) show the R-frames obtained by using (15) from the first frame (an I-frame), the B-frames 2, 3, and 4, and the fifth frame (a P-frame), respectively.

All the results obtained for the video “tennis2” are synthetized in Table 1.

Figures 6(a)6(c) contain Frame 2 decoded with the proposed method, classical F-transforms, and MPEG-4, respectively.

In Table 2 we report the final PSNR index in the three methods.

6. Conclusions

We present a new method for coding/decoding color videos, in which we classify a frame in I-frame, P-frame, and B-frame using similarity measures for determining the GOP sequences and the type of frames. Our method seems to be fully comparable with classical F-transforms and MPEG-4 for similar mean compression rates to a certain extent.

Acknowledgments

The authors thank the referees and the editor whose suggestions have greatly improved the contents of this paper.