Research Article

Scale Adaptive Feature Pyramid Networks for 2D Object Detection

Figure 1

Feature maps used for region proposal and object class prediction in the faster R-CNN [5] (a), feature pyramid networks (FPNs) [11] (b), and the proposed scale adaptive FPN (SAFPN) (c). (a) The faster R-CNN [5] predicts object bounding box based on a highly semantic yet low-resolution feature map. Due to low resolution of the feature map, mall objects tend to be missed. (b) The feature pyramid network (FPN) [11] predicts object bounding box based on multiresolution, highly semantic feature maps formed by fusion feature maps of top-down and bottom-up pathways. Adjacent scale feature maps are integrated by using fixed weights 0.5. RetinaNet [10] also uses this FPN. (c) The proposed scale adaptive FPN (SAFPN) integrates feature maps by using weights computed, at each resolution level, by using the scale attention module (SAM) to suit scales of objects in each input image.
(a)
(b)
(c)