Research Article

Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information

Table 1

The architecture of the used CNN.

NamePatch size/strideOutput size

Conv 13 × 3/132 × 128 × 64
Conv 23 × 3/132 × 128 × 64
Max pool 33 × 3/232 × 64 × 32
Residual 43 × 3/132 × 64 × 32
Residual 53 × 3/132 × 64 × 32
Residual 63 × 3/264 × 32 × 16
Residual 73 × 3/164 × 32 × 16
Residual 83 × 3/2128 × 16 × 8
Residual 93 × 3/1128 × 16 × 8
Dense 10128
Batch and normalization128