EURASIP Journal on Advances in Signal Processing
Volume 2008 (2008), Article ID 693731, 10 pages
doi:10.1155/2008/693731
Research Article
Optimizing Training Set Construction for Video Semantic Classification
1Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China
2Microsoft Research Asia, Beijing 100080, China
Received 9 March 2007; Revised 14 September 2007; Accepted 12 November 2007
Academic Editor: Mark Kahrs
Copyright © 2008 Jinhui Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
We exploit the criteria to optimize training set construction for the large-scale video
semantic classification. Due to the large gap between low-level features and higher-level semantics, as
well as the high diversity of video data, it is difficult to represent the prototypes of semantic concepts
by a training set of limited size. In video semantic classification, most of the learning-based approaches
require a large training set to achieve good generalization capacity, in which large amounts of labor-intensive
manual labeling are ineluctable. However, it is observed that the generalization capacity of
a classifier highly depends on the geometrical distribution of the training data rather than the size.
We argue that a training set which includes most temporal and spatial distribution information of
the whole data will achieve a good performance even if the size
of training set is limited. In order to
capture the geometrical distribution characteristics of a given video collection, we propose four metrics
for constructing/selecting an optimal training set, including salience, temporal dispersiveness, spatial
dispersiveness, and diversity. Furthermore, based on these metrics, we propose a set of optimization
rules to capture the most distribution information of the whole data using a training set with a given size.
Experimental results demonstrate these rules are effective for training set construction in video
semantic classification, and significantly outperform random training set selection.