Research Article
Multimodal Semantics Extraction from User-Generated Videos
Table 4
Performance comparison for the event genre classification task using different feature-sets.
| | | Automatic event genre classification | Event | Ground truth event genre | Feature-set (audio, sensors) | Feature-set (DSIFT, sensors) | Feature-set (global visual, sensors) | Feature-set (audio, DSIFT, sensors) | Feature-set (audio, global visual, sensors)—Proposed set |
| Football match 1 | Sport | Sport | Sport | Sport | Sport | Sport | Football match 2 | Sport | Sport | Sport | Sport | Sport | Sport | Football match 3 | Sport | Sport | Sport | Sport | Sport | Sport | Ice-hockey match 1 | Sport | Sport | Sport | Sport | Live music | Sport | Ice-hockey match 2 | Sport | Sport | Sport | Sport | Live music | Sport | Concert 1 | Live music | Sport | Sport | Sport | Live music | Live music | Concert 2 | Live music | Live music | Live music | Live music | Live music | Live music | Concert 3 | Live music | Live music | Live music | Live music | Live music | Live music | Concert 4 | Live music | Live music | Live music | Live music | Live music | Live music |
| Total accuracy (%) | — | 88.9 | 88.9 | 88.9 | 77.8 | 100 |
|
|