Learning a Mid-Level Representation for Multiview Action Recognition
Framework of our method. Firstly, we densely sample spatiotemporal cuboids from subvolumes of views. Suppose that 24 cuboids are extracted from one subvolume, and then six multitask random forests are constructed, each of which is built upon cuboids from multiple views sampled at four adjacent positions. Then all cuboids are classified by their corresponding random forests, and an integrated histogram is created to represent cuboids of all the views sampled at the same position. The concatenation of histograms for all positions constitutes a mid-level representation of the input subvolumes. At last, random forest is utilized as the final action classifier.