Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2020 / Article
Special Issue

Advanced Intelligent Fuzzy Systems Modeling Technologies for Smart Cities

View this Special Issue

Research Article | Open Access

Volume 2020 |Article ID 1823034 | https://doi.org/10.1155/2020/1823034

Yong Zhang, Weiwu Kong, Dong Li, Xudong Liu, "On Using XMC R-CNN Model for Contraband Detection within X-Ray Baggage Security Images", Mathematical Problems in Engineering, vol. 2020, Article ID 1823034, 14 pages, 2020. https://doi.org/10.1155/2020/1823034

On Using XMC R-CNN Model for Contraband Detection within X-Ray Baggage Security Images

Academic Editor: Chenxi Huang
Received18 Jul 2020
Accepted08 Sep 2020
Published16 Sep 2020

Abstract

We present an X-ray material classifier region-based convolutional neural network (XMC R-CNN) model for detecting the typical guns and the typical knives in X-ray baggage images. The XMC R-CNN model is used to solve the problem of contraband detection in overlapped X-ray baggage images by the X-ray material classifier algorithm and the organic stripping and inorganic stripping algorithm, and better detection rate and the miss rate are achieved. The detection rates of guns and knives are 96.5% and 95.8%, and the miss rates of guns and knives are 2.2% and 4.2%. The contraband detection technology based on the XMC R-CNN model is applied to X-ray baggage images of security inspection. According to user needs, the safe X-ray baggage images can be automatically filtered in some specific fields, which reduces the number of X-ray baggage images that security inspectors need to screen. The efficiency of security inspection is improved, and the labor intensity of security inspection is reduced. In addition, the security inspector can screen X-ray baggage images according to the boxes of automatic detection, which can improve the effect of security inspection.

1. Introduction

In recent years, with the increasing seriousness of terrorist activities, the safety of air transport has been paid more and more attention by all countries in the world. At present, there are three pain points of the security inspection, which need to be solved urgently: first, how to improve the effect of security inspection; second, how to improve the efficiency of security inspection; and third, how to reduce labor intensity of security inspection. The contraband detection technology is not used in the traditional technical solution, because the baggage contents are complex and highly varying. How to accurately identify the contraband in the X-ray baggage image is the most important and most difficult challenge for human operators. In addition, during peak periods, the human operators have limited time to screen images.

As the most popular machine learning method, deep learning has achieved excellent results in object classification and detection. For the task of X-ray image classification, the previous work proposed the traditional machine learning method. In this paper, the XMC R-CNN method based on deep learning will be used to detect the contraband within X-ray baggage security images.

In the ImageNet Large-Scale Visual Recognition Challenge 2012 (ILSVRC12), Hinton's team won the championship with the AlexNet model constructed by convolutional neural networks (CNNs), which ignited the enthusiasm of academic and industrial for deep learning. In order to use the deep learning method to detect the contraband in X-ray baggage security images, the framework of deep learning, the backbone model of deep learning, and the detection model of deep learning should be understood, as shown in Figure 1.

With the upsurge of deep learning research, various open-source frameworks of deep learning emerge in endlessly. The main framework is introduced in [14], such as TensorFlow, Caffe, MXNet, Keras, CNTK, Torch, and Theano. Different deep learning frameworks usually have different model design, interface, deployment, performance, and architecture design, so their advantages and disadvantages are different.

The classic backbone models mainly include LeNet [5], AlexNet [6], ZFNet [7], VGGNet [8], GoogleNet [9], ResNet [10], and DetNet [11]. Although the algorithms of object detection are different, the convolutional neural networks are usually used to process the input image, generate the feature map, and then use various algorithms to complete the region generation and loss calculation. The convolutional neural networks are the backbone of the whole detection algorithm. The basic components of the backbone include convolution layer, activation function layer, pooling layer, dropout layer, batch normalization layer, and full connection layer.

The main backbone models are introduced above, which are usually used for object classification models. We study that the contraband detection belongs to classification and detection, and only classification is not enough. There is much contraband in X-ray baggage security images, so it is not only necessary to classify different contraband but also to determine the locations and sizes of the contraband. The classic classification and detection models include R-CNN [12], Fast R-CNN [13], Faster R-CNN [14], Mask-CNN [15], SSD [16], YOLO [17], and R-FCN [18]. These models have developed from two stages to one stage, from bottom-up only to top down, from the single-scale network to the feature pyramid network, and many algorithms have achieved excellent results on the ImageNet dataset.

Computer-aided screening (CAS) [19] has also been widely used in the automatic detection of the contraband in X-ray baggage images; however, this largely remains an unsolved problem. The contraband detection based on multiview X-ray images is carried out in each X-ray image, and then, the constraint between multiview images is used to improve the detection accuracy in [2027]. The contraband detection in computed tomography (CT) images is to extend the detection method of the single X-ray 2D image to 3D image in [2835]. As mentioned above, both the contraband detections in multiview images and CT images are traditional technologies. The contraband detection by deep learning is proposed in [19, 3638], and the accuracy of contraband detection is improved through deep learning methods. At present, there are still some problems in using these techniques to detect the contraband in overlapped X-ray baggage images. For example, a knife or a gun is covered by other objects in the baggage, and the outline of the knife or the gun is not clear in the original X-ray image, and its color is shown as orange (it is not metal color), so it cannot be correctly detected by the prior detection algorithm. Also, an explosive is hidden under several steel plates in the baggage, and it cannot be seen at all in the original image. It is difficult for the security inspector to determine that there is a dangerous object in this area. The XMC R-CNN model is used to solve the problem of contraband detection in overlapped X-ray baggage images by the X-ray material classifier algorithm and the organic stripping and inorganic stripping algorithm.

In the next section, we review related work on the dual-energy X-ray material classifier and the contraband detection based on dual-energy X-ray data. In Section 3, we discuss the contraband detection technology based on the XMC R-CNN model, including the detection model design and the algorithm implementation. In Section 4, we introduce the training and test results of the model. We summarize the work of this paper in Section 5.

We now summarize the related work on the dual-energy X-ray material classifier and the contraband detection based on dual-energy X-ray data.

2.1. X-Ray Material Classifier

The dual-energy method of material classifier has been widely used in the X-ray security inspection systems by measuring the difference of attenuation coefficient of different materials for high- and low-energy X-ray in [3946]. The physical principle of the dual-energy X-ray radiography is based on the exponential law of photon radiation attenuation. When an X-ray beam passes through an object, the detector signal can be described by the Beer–Lambert law:where is the photon energy, is the initial energy intensity of photons emitted from the source, is the attenuation coefficient of a material with atomic number Z for impinging photons with energy , and is the material thickness.

In previous work for the material classifier in the X-ray baggage images, the log-ratio R is defined as the vertical axis, and is defined as the horizontal axis by Ogorodnikov and Petrunin in [47]:where is the high-energy transparency and is the low-energy transparency. By formulas (1) and (2), for a given material with atomic number Z, is the unique value of the material and does not depend on the thickness. Therefore, can be used to discriminate materials.

The -curve method computes the features of the X-ray material in [4850]:where is defined as the vertical axis and is defined as the horizontal axis. The two materials can be easily differentiated if their -curves are largely separated. On the contrary, the materials cannot be easily differentiated if their -curves are too close.

2.2. Contraband Detection

Most contraband detection algorithms based on deep learning use the RGB images and do not really use the X-ray high- and low-energy data. Therefore, the material information is not fully used by the detection algorithm. The contraband detection within X-ray baggage imagers has not been well explored by machine vision community due to the lack of publicly available X-ray image datasets. Since the contraband detection within X-ray baggage imagers is a challenging problem, the detection models that use the X-ray high- and low-energy data are significantly more limited in the literature.

The dual-energy images which provide material information about the objects are used for the contraband detection in [21, 51].The authors show that using multiple local color and texture features improves classification and detection performance. They first present an extensive evaluation of standard local features for object detection on a large X-ray image dataset in a structured learning framework. Then, they propose two dense sampling methods as a key-point detector for textureless objects and extend the SPIN (the image generation process can be visualized as a sheet spinning about the normal of a point) color descriptor to utilize the material information. Finally, they propose a multiview branch-and-bound search algorithm for multiview object detection.

The two-channel dual-energy networks and the four-channel dual-energy networks are described in [37]. For each input, the high- and low-energy images are transformed into the feature space of a given method. The final convolutional layer feeds into the fully connected (FC) layers. The networks used in this work consist of 16 convolutional layers and 3 fully connected layers. The author experimented with the four-channel dual-energy networks, and the four-channel dual-energy networks with perform the best performance metrics, where and are used.

3. Detection Model Design

In this paper, the Faster R-CNN model is combined with the characteristics of the X-ray baggage image, and the XMC R-CNN model is designed for the automatic detection of the typical guns and the typical knives in the X-ray baggage image. The framework of the system is Caffe, and the backbone model of the system is VGG-16 [8].

3.1. XMC R-CNN Detection Model

The XMC R-CNN model is composed of two modules. The first module is X-ray material classifier (XMC) that strips organic and inorganic, and the second module is the Faster R-CNN detector that uses the proposed regions. The XMC R-CNN algorithm model mainly includes the following steps, as shown in Figure 2.

3.1.1. X-Ray Material Classifier

High-energy data and low-energy data are input, and different material values of organic, mixture, and inorganic are output through the material classification algorithm.

3.1.2. Organic Stripping and Inorganic Stripping

High-energy data, low-energy data, and material values are input, and the material value of the organic, the gray value of the organic, the material value of the inorganic, and the gray value of the inorganic are output through the organic stripping and inorganic stripping algorithm.

3.1.3. Convolutional Layer Features

A convolutional layer features adopts the Simonyan and Zisserman model [8] (VGG-16), which has 13 shareable convolutional layers. The organic, mixture, and inorganic images are input, which sizes are different. The feature maps of the input image are output, which will be used for the region proposal network (RPN) layers and the full connection layers.

3.1.4. Region Proposal Networks

A region proposal network [14] is mainly used to generate region proposals. Firstly, anchor boxes of multiple scales and aspect ratios are designed. By the SoftMax layer, it can be determined that anchor boxes belong to foreground or background. Then, the anchor boxes are modified by binding box regression and a more accurate region proposal is obtained.

3.1.5. Region of Interest Pooling

The RoI (Region of Interest) pooling layer in Faster R-CNN is adopted. The feature map from the last layer of VGG-16 and the region proposals from RPN are input, and the fixed size 7 × 7 proposal feature map is output.

3.1.6. Contraband Detection

The proposal feature maps are input. Through the full convolutional layer and the SoftMax layer, we can get the classification of region proposal. At the same time, we can get the locations of the detection boxes by the binding box region. Finally, the classifications and the locations of the typical guns and the typical knives are output.

The detailed implementation of Step (3), (4), (5), and (6) can be seen in the Faster R-CNN [14] and VGG-16 [8]. The X-ray material classifier algorithm of Step (1) and the organic stripping and inorganic stripping algorithm of Step (2) will be described in the following sections.

3.2. X-Ray Material Classifier

In this paper, the organic glass (chemical formula: C5H8O2) is selected to represent the organic material in Table 1, the aluminum (model: 2A12) is selected to represent the mixture material in Table 2, and the carbon steel (model: 45#) is selected to represent the inorganic material in Table 3.


ElementAtomic numberAtomic weightPercentage

C612.0159.98
H11.0088.05
O816.0031.97


ElementAtomic numberAtomic weightPercentage

Cu2963.554.35
Mg1224.311.5
Mn2554.940.6
Al1326.9893.55


ElementAtomic numberAtomic weightPercentage (%)

C612.010.46
Si1428.090.27
Mn2554.940.65
P1530.970.04
S1632.070.40
Cr2452.000.25
Ni2858.690.25
Fe2655.8597.68

Three kinds of materials are selected from different thickness ranges, and data are collected; then, the curves are drawn as shown in Figure 3. The horizontal axis is the high-energy value (Hi) of the dual-energy X-ray, and the vertical axis is the material value (Mat) of the dual-energy X-ray. The log-ratio R is defined as the material value. 122 curves are evenly inserted between the three curves, which, respectively, represent the different materials of organic, mixture, and inorganic. Through these 125 curves, the material values of any point in the space are calculated, and the material table of the dual-energy X-ray is generated.

The whole material space is divided into five regions: low-gray unrecognizable region where the image shows red, high-gray unrecognizable region where the image shows gray, organic region where the image shows orange, mixture region where the image shows green, and inorganic region where the image shows blue, as shown in Figure 3.

The material value of the low-gray unrecognizable region is defined as 125, the material value of the high-gray unrecognizable region is defined as 0, the material value of the organic region is defined as 1, the material value of the inorganic region is defined as 124, and the material value of the mixture region needs to be calculated, which is the transition region from the inorganic region to the organic region. Three material curves in the mixture region have been obtained. According to the X-ray material classifier model in Figure 3, the low- and high-energy data of other 122 curves with different thickness are calculated. Through the materials of 125 curves and 100 data sampling points of different material thickness, the mixture region can be divided into 12276 grid cells. As shown in Figure 3, each grid cell is composed of four points (A, B, C, and D). Using the linear interpolation algorithm, we can calculate the material value of any point in the quadrilateral (A, B, C, and D). Finally, the material values of all points in the whole material space can be calculated and exported to the material table.

3.3. Organic Stripping and Inorganic Stripping

By means of physical model designing, experimental data sampling, and the using of the grid cell method and linear interpolation algorithm, the organic or the inorganic can be screened separately from the overlapped X-ray baggage image. The organic glass (chemical formula: C5H8O2) is selected to represent the organic material, the carbon steel (model: 45#) is selected to represent the inorganic material, and the overlapped simulants of the organic glass and the carbon steel are selected to represent the mixture material, whose thickness is different.

The effective atomic number [52] is given bywhere is the effective atomic number, mi is the mass percentage of the element, is the atomic number, and is the atomic weight. According to formulas (1) and (2), the effective atomic number of the carbon steel is equal to 25.91, and the effective atomic number of the organic glass is equal to 6.56. The material of the effective atomic number (7∼25) can be obtained by overlapping the organic glass and the carbon steel with different thicknesses.

As shown in Figure 4, the horizontal axis is the high-energy value () of the dual-energy X-ray and the vertical axis is the material value (Mat) of the dual-energy X-ray. The log-ratio R is defined as the material value. The orange curve is the organic glass curve, which represents the organic material. The blue curve is the carbon steel curve, which represents the inorganic material. The curves between the orange curve and the blue curve are the different thicknesses overlapped curves of the organic glass and the carbon steel, which represents the mixture material.

By overlaying the organic glass () and the carbon steel (), the effective atomic number (7∼25) is obtained. For example, if we want to calculate the material with , take into formulas (1) and (2), so we can get the mass ratio of the organic glass and the carbon steel (mgl:mfe = 0.997572383:0.002427617). The length and width of the organic glass and the carbon steel models are the same in the test process, we know that the density of the carbon steel is 7.82 g/cm3, and the density of the organic glass is 1.18 g/cm3, so that the volume ratio of the organic glass and the carbon steel can be calculated and then converted to the thickness ratio (Tgl:Tfe = 0.999632928:0.000367072). Similarly, we can calculate the material with Zn= 8-25, and the corresponding thickness of the organic glass and the carbon steel is shown in Table 4.


Effective atomic numberOrganic glass thickness percentageCarbon steel thickness percentage

6.510
70.9996329280.000367072
80.9985481860.001451814
90.9970477530.002952247
100.9950345260.004965474
...
...
...
220.8960181670.103981833
230.8635136260.136486374
240.8178312770.182168723
250.7499596510.250040349
2601

According to Table 4, the material of the effective atomic number Zn= 7∼25 can be obtained by overlapping the organic glass and the carbon steel with different thicknesses. 9 sampling points are designed for each material, and 171 sampling points are obtained in total. Six groups of data were collected at each sampling point, and a total of 1026 sampling data points were obtained.

For example, the thickness of the organic glass is 32.7 mm, and the thickness of the carbon steel is 0.012 mm. We know that the density of the carbon steel is 7.82 g/cm3, and the density of the organic glass is 1.18 g/cm3. In this way, we can calculate the mass ratio of the organic glass and the carbon steel (mgl:mfe = 38.568:0.09384), and bring the mgl and mfe into formulas (1) and (2), so that we can get Zeff= 7.000398316523857 (i.e., lines 7-6 in Table 5). Similarly, it can be calculated that each point in Table 5 meets the corresponding effective atomic number. See Table 5 for the experimental data of the design sampling point, where Z-n is the effective atomic number and the serial number of the sampling point, Tgl is the thickness of the organic glass, is the thickness of the carbon steel, is the high- and low-energy value of the mixture material, is the high- and low-energy value of the organic glass, and is the high- and low-energy value of the carbon steel.


Z-n
HiLoHiLoHiLo

7-110.90.004262223082687248539603732
7-213.60.005200916572081183939133575
7-316.30.006172713741833159038803478
7-421.80.00812709511349111638233342
7-527.20.011137826122099337973256
7-632.70.01283757990770837543151
7-743.60.01663541569852636512952
7-849.00.01831318435824735322712
7-954.50.021638619412633872440
.........
.........
.........
10-111.770.06314228503213307339963773
10-219.960.1263122552724252439323602
10-335.790.18186314471998175938013260
10-444.950.22152311291674143237383126
10-558.140.311297731270104136132866
10-679.930.470844082763834602569
10-799.960.545826455340333352361
10-8119.90.5830016137526132022156
10-9159.60.781447117811529781848
.........
.........
.........
25-10.771260514354026400726581468
25-21.632.121773731395139161828753
25-32.082.701458537392338811520560
25-42.662.471245421388938341305442
25-54.003.9271319737963729764209
25-66.376.38301703613352233478
25-79.3412.12190443520341721348
25-810.7914.0112263384325413431
25-911.7215.25916322330786917

An organic stripping and inorganic stripping table is generated from the data in Table 5. As shown in Figure 4, there are four points: A, B, C, and D. Through Table 5, we can get A (Gm, Ggl, Gfe), B (Gm, Ggl, Gfe), C (Gm, Ggl, Gfe), and D (Gm, Ggl, Gfe). Through the bilinear interpolation algorithm, we can calculate the (Gm, Ggl, Gfe) value of any point in the quadrilateral (A, B, C, and D), so that we can calculate the (Gm, Ggl, Gfe) value of any point in the coordinate system. Finally, we can realize the organic stripping and inorganic stripping of dual-energy X-ray security images.

In Figure 5(a), a knife is covered with some objects. In the original image, the outline of the knife is not clear, and its color is shown as orange, so it cannot be judged as a dangerous metal. In Figure 5(b), the knife is clearly visible, and it is shown as a metal color (blue), because the organic stripping algorithm is used. The outline of this knife is clear in the organic stripping image, and it is easy to be detected by the XMC R-CNN model.

In Figure 6(a), an explosive is hidden under several steel plates. In the original image, it is difficult for the security inspector to determine that there is a dangerous object in this area. In Figure 6(b), the explosive is clearly visible, and it is shown as an explosive color (orange), because the inorganic stripping algorithm is used. This is a good basis for detecting more kinds of contraband in X-ray baggage images in the future.

4. Evaluation

Training and testing are performed via the use of Caffe [53], a deep learning tool designed and developed by the Berkley Vision and Learning Center. The framework of the system is Caffe, the backbone model of the system is VGG-16, and the algorithm model of the system is XMC R-CNN.

4.1. Dataset Design

The experimental samples of different material guns are shown in Table 6, and the experimental samples of different type knives are shown in Table 7.


No.PhotoNameMaterial

G-1PistolMetal
G-2PistolPlastic
G-3PistolWood
G-4PistolMetal
G-5RevolverMetal
G-6RevolverPlastic
G-7RevolverWood
G-8RevolverMetal
G-9RifleMetal
G-10RiflePlastic


No.PhotoNameMaterial

K-1DaggerMetal
K-2DaggerMetal and plastic
K-3DaggerMetal
K-4Double edged knifeMetal
K-5Double edged knifeMetal
K-6JackknifeMetal
K-7JackknifeMetal
K-8Keychain knifeMetal
K-9Watermelon knifeMetal
K-10ToolsMetal
K-11Fruit knifeMetal
K-12Fruit knifeMetal
K-13Utility knifeMetal
K-14Utility knifeMetal and plastic
K-15BladeMetal
K-16Kitchen knifeMetal
K-17ScissorsMetal and plastic
K-18ScissorsMetal and plastic
K-19SwordMetal
K-20SwordMetal

4.2. Training and Testing

The thirty pieces of baggage of different sizes and materials are filled with different objects, whose six pieces of baggage are complex, eighteen pieces of baggage are medium complex, and six pieces of baggage are simple. Each baggage is divided into three layers with the top facing up, and each layer is divided into nine areas. As shown in Figure 7, 1–9, 11–19, and 21–29 are used to represent the nine areas of each layer, and the baggage internal space is divided into 27 areas for placing contraband samples.

Four directions of each luggage are as shown in Figure 8. The top of the luggage is upward to enter the X-ray equipment channel, the top of the luggage is upward to enter the X-ray equipment channel at an angle of about 45 degrees, the bottom of the luggage is upward to enter the X-ray equipment channel, and the bottom of luggage is upward to enter the X-ray equipment channel at an angle of about 45 degrees.

The gun samples in Table 6 and the knife samples in Table 7 are placed in 27 space areas within each luggage, and the sample data are collected in four different directions as shown in Figure 8. The total number of collected samples is (10 + 20) × 30 × 27 × 4 = 97200. As shown in Figure 9, the 80000 samples are selected as the training dataset, and the 17200 samples are selected as the test dataset.

4.3. Test Results

At present, the accuracy and speed of various classical models are tested through the ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) dataset or the PASCAL VOC (Pattern Analysis, Statistical Modelling, and Computational Learning, and Visual Object Classes) dataset. Because the characteristics of the images in these datasets are different from the characteristics of the X-ray baggage images, the performance and indicators of different models on the X-ray baggage image datasets are different from the ILSVRC dataset and the PASCAL VOC dataset. According to the test results of various classical models on the ILSVRC dataset and the PASCAL VOC dataset, then two classic models are selected for comparative testing on the X-ray baggage images dataset.

By comparing the speed and accuracy of various classical models in [1218], the basic conclusion is that the accuracy of the Faster R-CNN model is higher, while the processing speed of the SSD model and the YOLO model are faster. Therefore, the Faster R-CNN model and the SSD model are selected for comparative testing on the X-ray baggage image dataset. The experimental environment is built as follows. The configuration of the high-performance graphics workstation is shown in Table 8. The operating system is Windows 10 × 64. The CUDA (Compute Unified Device Architecture) version is CUDA 10.0, and the CUDNN (CUDA Deep Neural Network) version is CUDNN 10.1, and the Python version is Python 3.7.


Specifications

ProcessorIntel Xeon® Processor E5-2623 v3 (10M Cache, 3.00 GHz)
Memory16 GB (2 × 8GB) DDR4 2133 MHz with ECC REG
GPU cardNvidia Tesla Titan X
System disk120 GB Intel® SSD Pro 2500 Series 2.5 SATA 6.0 Gb/s Solid State Drive
Storage4 TB Seagate Enterprise Class 3.5 SATA 6.0 Gb/s

The Faster R-CNN model and the SSD model are tested on the X-ray baggage image dataset. The Caffe deep learning framework is used in the Faster R-CNN model, and the Torch deep learning framework is used in the SSD model, and both network models are the VGG-16 model. IoU (intersection-over-union) is a concept used in the target detection, which is the overlap rate between the generated candidate bound and the ground truth bound. Here, IoU is equal to 0.5, which is used to test the X-ray baggage image dataset.

The performance of the proposed method and the prior work is evaluated by comparing the following indicators: true Positive (TP%), a positive sample is predicted to be positive, and it can be called true accuracy. True negative (TN%), a negative sample is predicted to be negative, and it can be called false accuracy. False positive (FP%), a negative sample is predicted to be positive, and it can be called false alarm rate. False negative (FN%), a positive sample is predicted to be negative, and it can be called miss rate. Precision (PRE = TP/(TP + FP)), which is the proportion of true positive samples in the predicted positive examples. Recall (REC = TP/(TP + FN)), which is the proportion of true positive samples in the actual positive examples. Accuracy (ACC = (TP + TN)/(TP + FN + FP + TN)), which is the proportion of true positive samples and true negative samples in the all samples. The PR curve is a precision-recall curve, and AP is the area under the PR curve; mAP (mean average precision) is the average value of each category of AP. AP measures the quality of the model in each category, and mAP measures the quality of the model in all categories.

The Faster R-CNN model is that merging RPN and Fast R-CNN into a single network by sharing their convolutional features. The framework of the system is Caffe, and the backbone model of the system is VGG-16. The SSD model is a one-stage object detection algorithm. The framework of the system is Torch, and the backbone model of the system is VGG-16. Table 9 shows the test results of the PASCAL VOC dataset and the X-ray baggage image dataset. The mAP (VOC) and the FPS (VOC) are obtained on the PASCAL VOC dataset, and the mAP (X-ray) and the FPS (X-ray) are obtained on the X-ray baggage image dataset. The accuracy of the Caffe + VGG-16 + Faster R-CNN model is lower on the PASCAL VOC dataset, but the accuracy of the Caffe + VGG-16 + Faster R-CNN model is higher on the X-ray baggage images dataset. The processing speed of the Torch + VGG-16 + SSD512 model is faster on the X-ray baggage image dataset and the PASCAL VOC dataset. The processing speed of the Caffe + VGG-16 + Faster R-CNN model and the Torch + VGG-16 + SSD512 model is faster on the PASCAL VOC dataset than the X-ray baggage images dataset, because the image size of the PASCAL VOC dataset is 500 × 375, and the image size of the X-ray baggage images dataset is 1024 × 700.


ModelmAP (VOC)mAP (X-ray)FPS (VOC)FPS (X-ray)

Faster R-CNN73.262.574
SSD51276.852.11910

Considering the accuracy of the Faster R-CNN model is higher than the SSD model and the processing speed of the Faster R-CNN model can also meet the current application needs, which it takes 2 seconds to collect an X-ray image. Therefore, the XMC R-CNN model was designed based on the Faster R-CNN model combined with the characteristics of the X-ray baggage images for the automatic detection of the typical guns and the typical knives in the X-ray baggage images.

Table 10 shows the test results of the XMC R-CNN model. The detection rates (TP%) of guns and knives are 96.5% and 95.8%, the miss rates (FN%) of guns and knives are 2.2% and 4.2%, and the accuracies (ACC) of guns and knives are 97.1% and 93.1%. It can be seen from the data that the X-ray image features of the guns are obvious, and all types of guns include barrels, butts, triggers, and other components, and these features are obviously different from the features of the common objects in the passenger baggage, so the detection rate of the gun is relatively high and the miss alarm rate of the gun is relatively low. On the contrary, the X-ray image features of the knives are weak, and the features of different types of knives are significantly different. In addition, these features are similar to the features of objects in the passenger baggage, such as baggage handles, baggage locks, umbrellas, and electronic equipment. Therefore, the detection rate of the knife is relatively low and the miss alarm rate of the knife is relatively high. At present, the maximum processing speed of the XMC R-CNN model is 250 milliseconds per image, and the processing speed can meet the requirement of collecting an X-ray image for 2 seconds. In Figure 10, the knives in these images are covered with some objects. The knives are not detected in the X-ray baggage image dataset without the material classifier, and these knives are detected in the XMC R-CNN model with the material classifier.


TP (%)TN (%)FP (%)FN (%)PRERECACC

Guns97.896.93.12.291.397.897.1
Knives95.891.38.84.287.995.893.1

5. Conclusion and Future Work

In this work, the XMC R-CNN model is explored in the tasks of classification and detection within X-ray baggage images. The main contribution of the XMC R-CNN model is used to solve the problem of contraband detection in overlapped X-ray baggage images by the X-ray material classifier algorithm and organic stripping and inorganic stripping algorithm, and the detection rate and the miss rate that meet the requirements of screening on-site are achieved by the deep learning method. The detection rate is greater than 95%, and the miss rate is less than 5%. In some applications, it has exceeded the level of security inspectors.

The automatic detection technology of the contraband based on the XMC R-CNN model is applied to the X-ray baggage security image. According to user needs, the safe X-ray baggage images can be automatically filtered in some specific fields, which reduces the number of X-ray baggage images that security inspectors need to screen. The efficiency of security inspection is improved and the labor intensity of security inspection is reduced. In addition, the security inspector can screen the X-ray baggage image according to the box of automatic detection, which can improve the effect of security inspection.

Future work will consider exploring the threat image projection (TIP) research in order to increase the training dataset, which can improve the accuracy of the automatic detection algorithm. On the other hand, future work will consider developing the XMC R-CNN model based on the safe objects to reduce the false alarm rate. These safe objects have the marked features and are common in passenger baggage, such as baggage handles, baggage locks, umbrellas, watches, power banks, and electronic equipment. The detection rate of the safe object is improved, and the false alarm rate of the contraband is reduced.

Data Availability

The datasets used in this work were provided by the FISCAN Systems. The dataset involving security cannot be shared.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The datasets used in this work were provided by the FISCAN Systems, and this work was partially funded by the First Research Institute of the Ministry of Public Security Project AMRSS (FRI-AX2003). This work was supported in part by the FISCAN Systems.

References

  1. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: a large-scale hierarchical image database,” in Proceedings of the Computer Vision and Pattern Recognition, Miami, FL, USA, June 2009. View at: Google Scholar
  2. H. Dong, Object Detection by Deep Learning: Core Technologies and Practices, vol. 1, no. 1–8, Machine Press, Bejing, China, 2020.
  3. M. Abadi, P. Barham, and J. Chen, “Tensorflow: a system for large-scale machine learning,” in Proceedings of the Operating Systems Design and Implementation, vol. 16, pp. 265–283, Savannah, GA, USA, November 2016. View at: Google Scholar
  4. J. Li, Get Started and Best Practices with TensorFlow for Deep Learning, vol. 1, Machine Press, Bejing, China, 2018.
  5. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. View at: Publisher Site | Google Scholar
  6. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proceedings of the International Conference on Neural Information Processing Systems, pp. 1097–1105, Curran Associates Inc, Lake Tahoe, NV, USA, December 2012. View at: Google Scholar
  7. M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proceedings of the Computers Vision-European Conference on Computer Vision 2014, pp. 818–833, Springer, Zurich, Switzerland, September 2014. View at: Publisher Site | Google Scholar
  8. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, May 2015. View at: Google Scholar
  9. C. Szegedy, W. Liu, and Y. Jia, “Going deeper with convolutions,” in Proceedings of the Computer Vision and Pattern Recognition, pp. 1–9, IEEE, Boston, MA, USA, June 2015. View at: Google Scholar
  10. K. He, X. Zhang, and S. Ren, “Deep residual learning for image recognition,” in Proceedings of the Computer Vision and Pattern Recognition, pp. 770–778, IEEE, Las Vegas, NV, USA, June 2016. View at: Google Scholar
  11. Z. Li, C. Peng, G. Yu, X. Zhang, and Y. Deng, “DetNet: a backbone network for object detection,” 2018, https://arxiv.org/abs/1804.06215. View at: Google Scholar
  12. R. Girshick, J. Donahue, Trevor Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the Computer Vision and Pattern Recognition, pp. 580–587, IEEE, Columbus, OH, USA, June 2014. View at: Google Scholar
  13. R. Girshick, “Fast R-CNN,” 2015, https://arxiv.org/abs/1504.08083. View at: Google Scholar
  14. S. Ren, K. He, and R. Girshick, “Faster R-CNN: towards real-time object detection with region proposal networks,” in Proceedings of the Neural Information Processing Systems, Montreal, Canada, December 2015. View at: Google Scholar
  15. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proceedings of the International Conference on Computer Vision, pp. 2980–2988, IEEE, Venice, Italy, October 2017. View at: Google Scholar
  16. W. Liu, D. Anguelov, and D. Erhan, “SSD: single shot multibox detector,” 2016, https://arxiv.org/abs/1512.02325. View at: Google Scholar
  17. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 2016. View at: Google Scholar
  18. J. Dai, Y. Li, and K. He, “R-FCN: object detection via region-based fully convolutional networks,” 2016, arXiv preprint arXiv: 1605.06409. View at: Google Scholar
  19. S. Akcay, M. E. Kundegorski, C. G. Willcocks, and T. P. Breckon, “Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 9, pp. 2203–2215, 2018. View at: Publisher Site | Google Scholar
  20. T. Franzel, U. Schmidt, and S. Roth, “Object detection in multi-view X-ray images,” in Proceedings of the Joint DAGM (German Association for Pattern Recognition) and OAGM Symposium, pp. 144–154, Graz, Austria, August 2012. View at: Publisher Site | Google Scholar
  21. M. Bastan, W. Byeon, and T. M. Breuel, “Object recognition in multi-view dual energy X-ray images,” in Proceedings of the British Machine Vision Conference, Bristol, UK, September 2013. View at: Google Scholar
  22. M. Domingo and R. Vladimir, “Automated object recognition using multiple X-ray views,” 2014. View at: Google Scholar
  23. D. Mery, “Inspection of complex objects using multiple-X-ray views,” IEEE Transactions on Mechatronics, vol. 20, no. 1, pp. 338–347, 2014. View at: Publisher Site | Google Scholar
  24. D. Mery and V. Riffo, “Automated object recognition in baggage screening using multiple X-ray views,” 2013. View at: Google Scholar
  25. D. Mery, “Automated X-ray object recognition using an efficient search algorithm in multiple views,” 2013. View at: Google Scholar
  26. D. Mery, G. Mondragon, V. Riffo, and I. Zuccar, “Detection of regular objects in baggage using multiple X-ray views,” Insight-Non-destructive Testing and Condition Monitoring, vol. 55, no. 1, pp. 16–20, 2013. View at: Publisher Site | Google Scholar
  27. D. Mery, “Automated detection in complex objects using a tracking algorithm in multiple X-ray views,” in Proceedings of the IEEE Workshop on Object Tracking and Classification Beyond the Visible Spectrum, pp. 41–48, Washington, DC, USA, June 2011. View at: Google Scholar
  28. G. T. Flitton, T. P. Breckon, and N. M. Bouallagu, “Object recognition using 3D SIFT in complex CT volumes,” 2010. View at: Google Scholar
  29. A. Mouton, “An evaluation of image denoising techniques applied to CT baggage screening image,” 2013. View at: Google Scholar
  30. N. Megherbi, “A comparison of classification approaches for threat detection in CT based baggage screening,” 2012. View at: Google Scholar
  31. G. Flitton, T. P. Breckon, and N. Megherbi, “A comparison of 3D interest point descriptors with application to airport baggage object detection in complex CT imagery,” Pattern Recognition, vol. 46, no. 9, pp. 2420–2436, 2013. View at: Publisher Site | Google Scholar
  32. M. Toews and W. M. Wells, “Efficient and robust model-to-image alignment using 3D scale-invariant features,” Medical Image Analysis, vol. 17, no. 3, pp. 271–282, 2013. View at: Publisher Site | Google Scholar
  33. A. Mouton, “A novel intensity limiting approach to metal artefact reduction in 3D CT baggage image,” 2012. View at: Google Scholar
  34. Q. Hao, “Efficient 2D-to-3D correspondence filtering for scalable 3d object recognition,” 2013. View at: Google Scholar
  35. T.-H. Yu, O. J. Woodford, and R. Cipolla, “A performance evaluation of volumetric 3D interest point detectors,” International Journal of Computer Vision, vol. 102, no. 1-3, pp. 180–197, 2013. View at: Publisher Site | Google Scholar
  36. S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon, “Transfer learning using convolutional neural networks for object classification within X-ray baggage security image,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), IEEE, Beijing, China, June 2016. View at: Google Scholar
  37. T. W. Rogers, N. Jaccard, and L. D. Griffin, “A deep learning framework for the automated inspection of complex dual-energy x-ray cargo image,” in Proceedings of the SPIE, San Diego, CA, USA, July 2017. View at: Google Scholar
  38. N. Jaccard, T. W. Rogers, E. J. Morton, and L. D. Griffin, “Detection of concealed cars in complex cargo X-ray image using deep learning,” 2016, https://arxiv.org/abs/1606.08078. View at: Google Scholar
  39. http://www.smithsdetection.com.
  40. http://www.l-3com.com.
  41. http://www.rapiscansystems.com.
  42. http://www.as-e.com.
  43. http://www.nuctech.com.
  44. http://www.fri.com.cn.
  45. http://www.gesecurity.com.
  46. http://www.vividusa.com.
  47. S. Ogorodnikov and V. Petrunin, “Processing of interlaced images in 4–10 MeV dual energy customs system for material recognition,” Physical Review Special Topics-Accelerators and Beams, vol. 5, no. 10, Article ID 104701, 2002. View at: Publisher Site | Google Scholar
  48. L. Li, R. Li, S. Zhang, T. Zhao, and Z. Chen, “A dynamic material discrimination algorithm for dual MV energy X-ray digital radiography,” Applied Radiation and Isotopes, vol. 114, pp. 188–195, 2016. View at: Publisher Site | Google Scholar
  49. X. W. Wang, J. M. Li, and K. J. Kang, “Material discrimination by high-energy X-ray dual-energy imaging,” High Energy Physics & Nuclear Physics, vol. 31, no. 11, pp. 1076–1081, 2007. View at: Google Scholar
  50. V. Novikov, S. Ogorodnikov, and V. Petrunin, “Dual energy method of material recognition in high energy introscopy systems,” Questions of Atomic Science and Technology [translated from Russian], no. 4, pp. 93–95, 1999. View at: Google Scholar
  51. M. Bastan, “Multi-view object detection in dual energy X-ray images,” Machine Vision and Aplication, vol. 26, pp. 1045–1060, 2015. View at: Google Scholar
  52. F. E. Rachard, “Development and evaluation of simulants for X-ray based explosive detection systems,” in Proceeding of Second Explosives Dectction Technology Symposium & Aviation Security Technoloyg Conference, Atlantic City, NJ, USA, November 1996. View at: Google Scholar
  53. Y. Jia, E. Shelhamer, J. Donahue et al., “Convolution alarchitecture for fast feature embedding,” 2014, https://arxiv.org/abs/1408.5093. View at: Google Scholar

Copyright © 2020 Yong Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

47 Views | 10 Downloads | 0 Citations
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.