EURASIP Journal on Advances in Signal Processing
Volume 2008 (2008), Article ID 460913, 8 pages
doi:10.1155/2008/460913
Research Article

Motion Entropy Feature and Its Applications to Event-Based Segmentation of Sports Video

1Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
2Department of Electrical and Computer Engineering, University of Wisconsin Madison, Madison, WI 53706, USA

Received 17 January 2008; Revised 8 May 2008; Accepted 14 July 2008

Academic Editor: Mark Liao

Copyright © 2008 Chen-Yu Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

An entropy-based criterion is proposed to characterize the pattern and intensity of object motion in a video sequence as a function of time. By applying a homoscedastic error model-based time series change point detection algorithm to this motion entropy curve, one is able to segment the corresponding video sequence into individual sections, each consisting of a semantically relevant event. The proposed method is tested on six hours of sports videos including basketball, soccer, and tennis. Excellent experimental results are observed.

1. Introduction

Fully automated video skimming and summarization represent a hotly pursued research topic in the field of content-based video analysis. It allows operators to quickly scan surveillance videos to spot events of interests; or viewers to efficiently browse large collection of long video clips. It may also facilitate content-based multimedia authoring and content creation. Sports videos in particular will benefit from the automated video summarization technology. Most sport games are naturally organized into successive and alternating plays of offence and defence, cumulating at events such as goal (or attempt of it) or hit. If a sports video can be segmented according to these semantically meaningful events, it then can be used in numerous applications to enhance their values and enrich the users’ viewing experiences.

A number of content-based analysis techniques have been proposed to analyze a particular type of sport video [17]. Gong et al. [1] adopted domain knowledge to parse content of soccer video programs based on four kinds of components: a soccer court, a ball, the players, and the motion vectors. Image contents analysis and panoramic reconstruction techniques are proposed to automatically detect and extract the soccer highlights in [2]. Sadlier and O'Connor [3] propose a novel audio-visual feature-based framework for event detection in broadcast video of multiple different field sports. The framework contains crowd image detection, speech-band audio activity detection, on-screen graphic tracking, and motion activity measure. Besides, Ye et al. [4] proposed a method for exciting event detection methods in broadcast soccer video based on mid-level visual description, and incremental SVM learning is investigated. A retrieval system is investigated [5] by considering the spatiotemporal behavior of an object in the footage as being the embodiment of a particular semantic event. Shih and Huang [6] introduced a content-based multifunctional video retrieval (MFVR) system where content analysis is carried out based on different content semantics. In [7], Liu et al. presented several feature extraction methods including wavelet-based motion analysis, hybrid field-color model, and a prior knowledge-driven line detection. Then, a boosting chain is used to deal with feature selection and decision making for play detection in American football video. In these earlier investigations, domain-specific heuristic, ad hoc rules are proposed to detect specific scenes or objects from video of specific type of sports. It is unclear that these rules can easily be generalized to other type of sports.

Motion is perhaps the most significant feature of most sport videos. Dominant motion estimation of a video clip has been analyzed in several previous works [810]. The results lead to the development of a motion estimation model to interpret motion activities in video. Liu et al. [11] have devised a perceived motion model (PME) for the purpose of video key frame extraction. The PME value is defined as the motion magnitude between successive frames along the dominant motion direction. However, the authors also observed that the apparent camera motion may cause excessive false alarms.

For many professionally produced sport videos, it is observed that different sports events seem to be often associated with different motion patterns. For example, scenes of a slam dunk in a basketball game are often quite similar not only within the same game, but also are quite similar across different games played by different teams. This may be due to positions of camera as well as common production practice in the studio. Nevertheless, based on this observation, we are motivated to investigate the feasibility of using motion pattern and intensity to characterize potentially semantically significant events in a sport video.

Previously, it has been observed [12] that global camera motion often causes excessive false alarm when the dominant motion value is used as a feature to analyze a sports video. In this work, we define an entropy-based motion value to alleviate this problem. Conceptually, motion vectors due to global camera motion are quite regular, as oppose to motion vectors due to complex object motion in a sports event. Moreover, we propose an entropy measure to simultaneously characterize both the magnitudes and directions of motion vectors. It is expected that regular motion vectors would yield lower entropy value while irregular motion vectors would yield higher entropy value. Hence, by incorporating an entropy-based criterion, it would be possible to distinguish genuine sports events with high motion from those of global camera motion.

The entropy-based motion value feature yields a one-dimensional time function corresponding to the original time-indexed video sequence. Using a maximum-likelihood estimation method based on the homoscedastic error model [13], this time function is approximated by a series of piece-wise linear line segments, jointed by change points. By examining this approximated motion entropy function against the original sports video, it is observed that semantically meaningful sports events are often highly synchronized with certain change patterns by which the video sequence can be divided.

In the remaining of this paper, the motion entropy measure will be presented in Section 2. This is followed by a short discussion in Section 3 of applying change point detection method to break the motion entropy time function into a piece-wise linear approximation. The significant sports event detection is reported in Section 4; and simulation using six hours of sports videos of three different kinds of sports are reported in Section 5.

2. Entropy-Based Motion Analysis

As discussed above, previously, dominant motion feature has been used to aid key-frame identification in sport videos [11]. However, it has also been observed that global camera motion may be confused with genuine sports events when dominant motion feature is used. In this work, a motion entropy feature is proposed that is both effective in rejecting false alarms and efficient in computation.

The notion of motion entropy has been used for video watermarking [14]. In [15], an entropy measure defined on angle distributions of motion vectors is used to compute a global motion ratio, which then is used to calculate the perceived motion energy [11].

In Figure 1, motion vectors of two video frames, representing (a) nonsignificant and (b) significant sports events are shown to the right of corresponding frames. It can be observed that motion vectors of the frame containing more significant events (b) are much less regular than those of a nonsignificant frame (a). This motivates our development of an entropy-based motion value to characterize the randomness of motion vectors.

Figure 1: Motion vectors obtained from uncompressed domain by full-search motion estimation approach: (a) a nonsignificant event, (b) a significant event.

Toward this goal, we exploit the statistical distribution of motion vectors within a video frame along different directions to gauge the degree of regularity of them. To do so, we equally divide the angles from 0 to into AngNum subangles. In this work, we choose based on extensive experimentation. Let be the fraction of motion vectors whose direction fall within the th subangle, we define the motion directivity entropy along the th subangle as (1) Furthermore, we denote as the sum of magnitudes of all motion vectors whose direction fall within the th subangle, and compute a magnitude weight factor (2) Given and , we propose a novel motion entropy measure, called entropy motion value (EMV), for a given video frame as follows: (3)

Previously, the perceived motion energy proposed in [11] also utilizes the magnitudes and angle of motion vectors to characterize the sports contents in a video frame. However, by incorporating the motion directivity entropy in the formula, EMV is insensitive to global motion such as camera panning. The use of motion vector magnitude as a weight to the directivity entropy further improves the performance.

In Figure 2(a), both the EMV and PME values corresponding to the same sports video sequence are plotted against time. Since each sports scene usually lasts more than one second, it is sufficient to calculate the EMV and PME values once per second. The horizontal and vertical axes represent the second index in a video sequence and the corresponding normalized motion value, respectively. From the figure, it is noted that while most of their values are approximately the same, there are a couple noted differences. (i) Near the time index 400, the EMV value (~60) is markedly larger than the PME value (~40). A close examination of the soccer video reveals that it corresponds to attacking-goal event (cf. Figure 2(b)). (ii) Around time index 1400, the EMV value (~40) is much less than the corresponding PME value (~80). It turns out that this scene (cf. Figure 2(c)) corresponds to a closeup shot of players walking around aimlessly and hence is not a semantically significant sports event. In both incidents, the potential advantage of the proposed EMV measure over the previously proposed PME measure in terms of capturing semantically significant sports event is clearly demonstrated.

Figure 2: (a) An example of the EMV and PME [11] curves generated by the same sports video sequence, (b) attacking event only detected by EMV feature, (c) non-event (false) only detected by PME feature.

3. Event Segmentation

The computed EMV value is a function of time. As shown in Figure 2, it includes multiple peaks corresponding, potentially, to various semantically significant sports events. In [11], the PME function is approximated by a train of triangular pulses and then segmented accordingly. However, careful comparison of the EMV function against actual sports video indicates that such an ad hoc segmentation approach may be insufficient. In particular, the triangle model requires the determination of an additional variable [11], and may lead to higher false-alarm rate.

In this work, we formulate the task of sports event segmentation into a change point detection problem which is a well-studied subject in the field of statistics [13, 1618]. In particular, we assume that the EMV curve over frames indices , , can be represented with line segments (EMV curve is divided into segments by change points): (4) In (4), the model fitting errors of th segment at time are modelled as independent, identically distributed random variables with zero mean and variance . Given the set of break points , and the underlying model parameters , and imposing a homoscedasticity assumption (homogeneity of variance) [13], we assume that , the maximum-likelihood function of the set of change points then can be expressed as (5) where is the number of change points, is the number of frames in segment . In practice, since the change points are unknown, an iterative method to find the set of change points given in [13] is adopted in our work. Specifically, let be the likelihood (accumulated from several segments) for frames under the homoscedastic error model. Here, the range of is selected such that at least frames are assigned to each segment. In this work, we set . With this notation, the change point within this segment then will be selected as the minimizing .

A complication of the original change point detection in [13] is that every point from to in the time function will be examined to determine if it is satisfied as a change point. In this work, a slope change concept is used to reduce the number of points that need to be examined. Specifically, a SlopeChange set that includes examining points that correspond to zero crossing of the slope of the EMV curve are maintained. Figure 3(a) presents the example result of the proposed approach and that of the original one. In the example result, there are less candidate change points to be considered and less change points to be detected in the proposed method. In addition, a heuristically selected threshold used in [13] is also eliminated. This leads to an improved change point selection criterion (6) subject to the constraint (7)

Figure 3: (a) An example result of the modified change point detection (solid line) compared to that of the original one (dash line), (b) event segmentation using change point detection under a homoscedastic error model.

These changes result in marked improvement of computation efficiency and more accurate results. The change point selection algorithm is summarized in Algorithm 1.

An example of the detected change points is presented in Figure 3(b). Each vertical line represents a change point found by the above algorithm to partition the whole curve into segments.

4. Significant Sports Events Detection

The homoscedastic error model-based change point detection method provides a tentative segmentation of the sports video based on the EMV curve. However, some of these dramatic changes of motion patterns may not be semantically significant. Here, the semantic significance is defined based on the context of particular sports. Further empirical analysis indicates that semantically significant sports events seem to be highly correlated to certain patterns of the EMV curve. In particular, it is observed that with the approximated EMV curve, line segments with positive slope often correspond to significant sports actions while line segments with negative slope do not. Figure 4(a) displays an example of motion patterns from a soccer video. The video region A in Figure 4(a) is the offensive team trying to pass the ball from the middle field to the front of the soccer goal, and its motion pattern tends to be horizontal. The positive-slope motion pattern in region B is a player preparing for a direct free kick in a soccer video. The region C, the region around the peak value of the motion pattern, comes from the most salient activities, that is, offensive and defensive teams both trying to respond to the free kick situation. After that, the motion pattern of region I is decreasing slowly until other significant event starts.

Figure 4: (a) Significant event segment consisting of the motion pattern, (b) criterion for reserving or dropping segments based on IVal: 1st dropped, 2nd dropped, 3rd reserved, and 4th dropped.

To quantify this empirically determined rule, an accumulated impact value IVal is defined as the difference between the curve increasing from the starting to the peak (PIVal) and the curve decreasing from the peak (with maximum value) to the end (NIVal) for each event segment. An event segment that should be reserved or dropped relies on the difference obtained by subtracting NIVal from PIVal. The equation is defined as follows: (8) where and denote the number of frames in increasing curve and decreasing curve in segment , respectively. For each of the candidate video segments obtained using the change point detection method, a simple rule-based classifier is applied to decide whether this segment is semantically significant. The rule is very simple.

Semantically significant sport event detection rule. For each candidate sport event segment, if the corresponding impact value , then it is semantically significant. Otherwise, it is not.

Figure 4(b) gives an example to illustrate the determined rule. Only the third segment is verified as a significant event as its corresponding IVal value is greater than zero. As shown in an experiment to be discussed in Section 5, this simple decision rule yields fairly accurate decision to distinguish significant events from nonsignificant ones.

The overall significant sport event segmentation algorithm is summarized in Figure 5. The first stage is the entropy-based motion analysis module. Then, it is followed by a homoscedastic-based event segmentation module. Finally, the candidate sports event is subject to the significant sports event detection rule, and the output is the detected significant sports event segments.

Figure 5: Block diagram of the proposed architecture.

5. Experimental Results

Experiments in this paper were conducted using three kinds of video: soccer, tennis, and basketball. Table 1 presents the detail information of the experimental video. In this table, different kinds of video are included to test the robustness of the proposed approaches. We used a total of around six hours of experimental video clips with 30 fps frame rate.

Table 1: Characteristics of the dataset.

The proposed event detection approach detects a set of significant event segments from the video. Events were detected by the three-phase entropy-based motion analysis proposed in this paper. First, entropy-based motion feature is extracted based on (1)–(3) form the uncompressed sport videos. The time series EMV is then segmented by the change point detection module. The detailed algorithm description is presented in Algorithm 1. Finally, segments without meaningful events would be removed by the significant sports event detection module. A sport video summary is created and composed of the reserved segments. Three standard performance evaluation metrics are used: Precision, Recall, and Fscore. These metrics are defined as follows: (9)

The overall performance of significant sports event segmentation algorithm is summarized in Table 2. For each of the three sports under study, three significant sport events are identified manually. They are Fault, Ace, and Volley in tennis video; Scoring, Miss, and Free throw in basketball video; Goal, Free kick, and Corner in soccer video. To conduct the experiment, the EMV curve is first evaluated, and the homoscedastic error model is applied to compute the change points and associated candidate sport video segments. Each of these candidate segments then is manually labelled as being significant or not depending on whether it contains one of the corresponding significant sport events specified above. The impact value detection rule described in Section 4 is also applied to the corresponding EMV curve of each candidate sports event segment. Comparing the algorithm results against the manual labels leads to the Precision, Recall, and Fscore metrics which are summarized in Table 2. These results are also compared against those obtained using PME method [11].

Table 2: Evaluation on event detection using entropy-based motion analysis.

The better performance of the proposed EMV feature is presumed to be from the following two main reasons. First, the proposed feature is more sensitive to the players’ actions in sport video. Therefore, using the proposed feature could detect more events with interaction among players. Second, the entropy-based motion feature can decrease the effect upon camera motion.

Table 3 lists the detailed experimental results between the modified and original homoscedastic error models. To validate the effectiveness of our proposed segmentation method, we also conducted an experiment comparing the results with those obtained using original homoscedastic model [13] and Triangle model [11]. The results are summarized in Table 3.

Table 3: Comparison of experimental results among triangle model () [11], original [13] and our modified homoscedastic error model in a soccer video.

With the demand of video summarization, this paper presents an efficient event detection scheme based on entropy-based motion analysis and modified homoscedastic error model. Utilizing the entropy-based motion analysis and the homoscedastic error model detects the motion directivity entropy curve. The significant sport event detection then extracts the significant events from all the segmented events. Future research directions for improving the proposed architecture include (a) extending the proposed framework for movie video, (b) integration of audio features such as the melody of background music and the environmental sounds to further improve the video semantic analysis.

6. Conclusions

In this paper, an efficient sports video segmentation method is proposed. A motion entropy curve is extracted from a given sport video as a succinct feature to characterize the motion pattern as a function of time. A time series change point detection algorithm that minimizes the homoscedastic error is employed to approximate the motion entropy curve with a piece-wise linear model. The accumulated impact value is then used to decide which segment is a significant sport event. Future research directions for improving the proposed architecture include (a) using machine learning or probabilistic-based methods to enhance semantic information in the significant sports event detection module, and (b) integration of audio feature for enriching the proposed feature extraction model.

Algorithm 1

Acknowledgments

The authors wish to thank Professor Yu-Hen Hu for his valuable suggestion and revision. This work was supported in part by National Science Council (Taiwan) under the grant NSC97-2218-E-006-012.

References

  1. Y. Gong, L. T. Sin, C. H. Chuan, H. Zhang, and M. Sakauchi, “Automatic parsing of TV soccer programs,” in Proceedings of the International Conference on Multimedia Computing and Systems (ICMCS '95), pp. 167–174, Washington, DC, USA, May 1995.
  2. D. Yow, B.-L. Yeo, M. Yeung, and B. Liu, “Analysis and presentation of soccer highlights from digital video,” in Proceedings of the 2nd Asian Conference on Computer Vision (ACCV '95), Singapore, December 1995.
  3. D. A. Sadlier and N. E. O'Connor, “Event detection in field sports video using audio-visual features and a support vector machine,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 10, pp. 1225–1233, 2005.
  4. Q. Ye, W. Gao, and S. Jiang, “Exciting event detection in broadcast soccer video with mid-level description and incremental learning,” in Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05), pp. 455–458, Singapore, November 2005.
  5. N. Rea, R. Dahyot, and A. Kokaram, “Semantic event detection in sports through motion understanding,” in Proceedings of the 3rd International Conference on Image and Video Retrieval (CIVR '04), pp. 88–97, Dublin, Ireland, July 2004.
  6. H.-C. Shih and C.-L. Huang, “Content-based multi-functional video retrieval system,” in Proceedings of the International Conference on Consumer Electronics (ICCE '05), pp. 383–384, Las Vegas, Nev, USA, January 2005.
  7. T. Y. Liu, W.-Y. Ma, and H.-J. Zhang, “Effective feature extraction for play detection in American football video,” in Proceedings of the 11th International Multimedia Modelling Conference (MMM '05), pp. 164–171, Melbourne, Australia, January 2005.
  8. M. J. Black and P. Anandan, “The robust estimation of multiple motions: parametric and piecewise-smooth flow fields,” Computer Vision and Image Understanding, vol. 63, no. 1, pp. 75–104, 1996.
  9. J. M. Odobez and P. Bouthemy, “Robust multiresolution estimation of parametric motion models,” Journal of Visual Communication and Image Representation, vol. 6, no. 4, pp. 348–365, 1995.
  10. P. Bouthemy, M. Gelgon, and F. Ganansia, “A unified approach to shot change detection and camera motion characterization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 7, pp. 1030–1044, 1999.
  11. T. Liu, H.-J. Zhang, and F. Qi, “A novel video key-frame-extraction algorithm based on perceived motion energy model,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 10, pp. 1006–1013, 2003.
  12. F. Coldefy and P. Bouthemy, “Unsupervised soccer video abstraction based on pitch, dominant color and camera motion analysis,” in Proceedings of the 12th ACM International Conference on Multimedia, pp. 268–271, New York, NY, USA, October 2004.
  13. V. Guralnik and J. Srivastava, “Event detection from time series data,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99), pp. 33–42, San Diego, Calif, USA, August 1999.
  14. S. Suthaharan, S.-W. Kim, S. Sathananthan, H.-K. Lee, and K. R. Rao, “Perceptually tuned video watermarking scheme using motion entropymasking,” in Proceedings of the IEEE Region 10th Conference and Exhibition (TENCON '99), vol. 1, pp. 182–185, Cheju Island, South Korea, September 1999.
  15. Y.-F. Ma and H.-J. Zhang, “A new perceived motion based shot content representation,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '01), vol. 3, pp. 426–429, Thessaloniki, Greece, October 2001.
  16. D. M. Hawkins and D. F. Merriam, “Optimal zonation of digitized sequential data,” Mathematical Geology, vol. 5, no. 4, pp. 389–395, 1973.
  17. S. B. Guthery, “Partition regression,” Journal of the American Statistical Association, vol. 69, no. 348, pp. 945–947, 1974.
  18. D. M. Hawkins, “Point estimation of the parameters of piecewise regression models,” Journal of the Royal Statistical Society. Series C, vol. 25, no. 1, pp. 51–57, 1976.