Advances in Multimedia

Research Article

Multimodal Semantics Extraction from User-Generated Videos

Area of interest identification.

(1) Obtain the location and horizontal orientation of each camera which is recording during the considered instant (or temporal segment)—we define such a camera as a recording camera.
(2) We derive a linear equation for each recording camera, which represents the direction towards which the camera is pointed. We express the equation in the point-slope form:
,
where the point coordinates are the camera location coordinates, and the slope is derived from the horizontal orientation.
(3) For each pair i of recording cameras we solve a system of two such linear equations representing the pointing directions of the considered cameras. By solving each system we find the intersecting point between the two pointing directions. As a result we obtain a set of intersections between all the pairs of recording cameras, for the considered instant t.
(4) We apply a clustering on all intersecting points, so as to discover the main cluster. In this way we are able to isolate outlier intersections which do not belong to the area of interest. The obtained main cluster represents the instantaneous area of interest of the event.
(5) For each instantaneous area of interest we determine its representative point of interest as the centroid of the main cluster. In particular we compute as the average of the intersecting points belonging to the main cluster.