Research Article

A Hierarchical Static-Dynamic Encoder-Decoder Structure for 3D Human Motion Prediction with Residual CNNs

Figure 1

The SDnet architecture for motion prediction, which comprises of the input model, dynamic model, and output model. The encoder of the input model processes spatial appearances and sends the mapped features into the dynamic model. The v-CMUs of the dynamic model explicitly capture the dynamic motion information between the adjacent frames. Then, the decoder maps the features from the dynamic model into the predicted velocity information. Eventually, the output future poses are predicted with the velocity information and the static information from the last input pose.