Research Article

Ensemble Convolution Neural Network for Robust Video Emotion Recognition Using Deep Semantics

Algorithm 3

Occlusion detection from patched image.
Input: Keyframes
Output: Representation of occluded face
Input the extracted keyframe as a face image
Generate a feature map (FM) from each keyframe
 Return 24 local patches (, )
 For each local patch
  Decomposes the feature map into 24 subfeature-maps ()
  Encode a weighted vector (wv) of local feature (lf) by a PG-Unit
  PG-Unit computes the weight by an attention net based on its obstructed-ness
  Concatenate the weighted local features
  Return the representation of the occluded face.
 End For