Research Article

Enhancing Point Features with Spatial Information for Point-Based 3D Object Detection

Figure 3

Details of the image backbone network. Conv2d (cin, cout, k, s, p) represents 2D convolution, and DeConv2d (cin, cout, k, s) represents 2D deconvolution, where cin, cout, k, s, and p represent the number of input channels, the number of output channels, kernel size, stride, and padding, respectively. Each convolution block consists of Convolution, BatchNorm, and ReLU.