Computational Intelligence Techniques for Information Security and Forensics in IoT EnvironmentsView this Special Issue
Research Article | Open Access
Rui Yang, Yonglin Zhang, Zhenrong Deng, Wenming Huang, Rushi Lan, Xiaonan Luo, "SK-FMYOLOV3: A Novel Detection Method for Urine Test Strips", Wireless Communications and Mobile Computing, vol. 2020, Article ID 8847651, 14 pages, 2020. https://doi.org/10.1155/2020/8847651
SK-FMYOLOV3: A Novel Detection Method for Urine Test Strips
To accurately detect small defects in urine test strips, the SK-FMYOLOV3 defect detection algorithm is proposed. First, the prediction box clustering algorithm of YOLOV3 is improved. The fuzzy C-means clustering algorithm is used to generate the initial clustering centers, and then, the clustering center is passed to the K-means algorithm to cluster the prediction boxes. To better detect smaller defects, the YOLOV3 feature map fusion is increased from the original three-scale prediction to a four-scale prediction. At the same time, 23 convolutional layers of size in the YOLOV3 network are replaced with SkNet structures, so that different feature maps can independently select different convolution kernels for training, improving the accuracy of defect classification. We collected and enhanced urine test strip images in industrial production and labeled the small defects in the images. A total of 11634 image sets were used for training and testing. The experimental results show that the algorithm can obtain an anchor frame with an average cross ratio of 86.57, while the accuracy rate and recall rate of nonconforming products are 96.8 and 94.5, respectively. The algorithm can also accurately identify the category of defects in nonconforming products.
Defect recognition is one of the important applications of machine vision in the field of industrial manufacturing. It can improve factory production efficiency and reduce human labor and can be used to monitor product quality in real time . However, the accurate identification of product defects is still a challenging problem that is under investigation in current research. To date, two main approaches have been used in research studies, namely, the use of traditional image recognition methods to extract and classify image features, and the direct use of deep neural networks for defect identification.
The traditional defect recognition algorithm includes the following steps: image preprocessing, image segmentation, feature extraction, and classifier training. The goal of image preprocessing is to reduce the noise contained in the images collected in the industrial field [2, 3]. Image segmentation is carried out in order to decompose the image into several areas with different characteristics, with the same or similar image characteristics in each area. Commonly used methods in image segmentation include threshold-based segmentation [4, 5] and edge-based segmentation [6, 7]. Commonly used edge detection operators include Canny operator, Sobel operator, and Roberts operator. Image feature extraction is performed to map a high-dimensional image space to a low-dimensional feature space. Commonly used image features include color, shape, and texture [8, 9]. Color-based feature extraction methods include color histograms , color moments, and color sets. Texture-based feature extraction methods mainly include statistical methods , frequency spectrum methods [12–14], and model methods . Commonly spectral methods include Fourier transform, wavelet transform, and Gabor transform. Image texture features can also be extracted from the second-order moments in the gray histogram, entropy, inverse moments, contrast, and correlation . Huang et al. proposed a CDD-based defect detection algorithm to detect and classify defects . The classifier learns the mapping relationship between the feature vector and the category through the training of features and labels [18–20] and finds the model parameters with the smallest classification error. Commonly used classifiers include the ANN, Bayes, and SVM classifiers.
Traditional image recognition methods are inefficient and inaccurate. To improve the recognition algorithm, Girshick et al.  proposed the R-CNN algorithm that for the first time introduced deep learning into the field of computer vision. Then, He et al.  proposed the SPP-Net algorithm that solved the problem of object deformation caused by the candidate frame scaling to a uniform size. Girshick  proposed fast R-CNN by further improving the shortcomings of R-CNN and SPP-Net. Ren et al.  proposed faster R-CNN that improves the detection speed while ensuring a certain accuracy. The subsequently developed R-FCN  method is also a region-based target recognition method.
Later, regression-based target recognition methods, such as SSD  and YOLO  series appeared. The region-based method has higher positioning accuracy but has the disadvantage of low detection speed. The YOLO network based on regression has fast processing speed and high accuracy , is easy to deploy in industrial production, and has been widely used.
For the recognition of small targets, shortcomings such as small field of vision, single aspect ratio, and low detection accuracy are always present . To solve these problems, many researchers have improved the network performance by improving the structure, introduced the top-down structure, and proposed algorithms such as DSSD [30, 31] and YOLOV3  to improve performance. For example, Tao et al.  developed the OYOLO network by increasing the weight of the positioning error function. After combining with R-FCN, the detection speed is improved, but the weight of the confidence error function is reduced, affecting the confidence prediction of the network. Deng et al.  proposed a small target recognition algorithm based on CGAN that has performs accurate target recognition but only in a single application scenario. Zheng et al.  proposed a dense-YOLO network that improves the recognition of small targets in remote sensing images through feature reuse but has the disadvantage of a huge memory footprint.
To meet the needs of real-time detection in industrial production, this paper improves the detection accuracy of small defects while ensuring fast detection speed. The YOLOV3 network is used as the base detection model, because the YOLOV3 network performs the detection and classification of images simultaneously, greatly improving the detection speed. First, the prediction layer clustering algorithm of YOLOV3 is improved to avoid the influence of the randomly initialized prediction box on the prediction result and improve the accuracy of the prediction box. Additionally, for the smaller defects, YOLOV3 shows missed detections. Therefore, this paper adds a scale to the original YOLOV3, uses 4 scales to detect the target image, and improves the recall rate of small defects. Finally, aiming at the precision of small defects, SKNet structure is added on the basis of YOLOV3 to improve the score of small defects and obtain higher recognition accuracy. For the identification of qualified products and nonconforming products, the precision rate is 96.8, and the recall rate is 94.5. Moreover, our method can accurately identify six minor defects in the nonconforming products, including “Crooked,” “Stains,” “Marker pen,” “Burr,” “Short,” and “Peeling.” The contributions of this paper are as follows: (i)A prediction box clustering method using a combination of fuzzy C-means and K-means is proposed(ii)A fusion framework for small target recognition is proposed, and the SkNet structure is embedded in the YOLOV3 network model(iii)A urine test strip image data set with a size of 11634 was collected that provided data basis for future research and demonstrated new approaches for the identification of small defects in industrial products
Article structure: the article is divided into five parts. The second part introduces the YOLOV3 algorithm and SeNet structure. The third part introduces the designed SK-FMYOLOV3 network model. The fourth part analyzes the performance of the SK-FMYOLOV3 network model and compares and displays the experimental results of industrial product defect detection. The fifth part summarizes the algorithm.
YOLOV3 is a new end-to-end target detection model after R-CNN, fast R-CNN, and faster R-CNN, as shown in Figure 1. It combines training with target classification and detection and returns the position and category of the target detection box directly at the output layer, transforming the detection problem into a regression problem.
YOLOV3 will predict 4 values for each border on each cell, that is, the coordinates of the upper left corner of the border (, ) and the width and height of the target (, ), recorded as (, , , ). If the center of the target is offset (, ) from the upper left corner of the image in the cell, and the anchor box has a width and height (, ), the revised border is
Among these, the selection of the anchor box adopts the method of dimensional clustering. Traditional clustering algorithms include hierarchical clustering and K-means clustering and model-based methods .
YOLOV3 uses the K-means clustering algorithm to cluster the size of the target frame in the training set in order to obtain the optimal size of the anchor box. Thereby, a more accurate target frame can be predicted. The distance metric of the K-means clustering algorithm is given by
Here, box refers to the border size sample in the data set, and centroid refers to the cluster center size. The K-means clustering algorithm randomly selects K target points as the initial clustering center, and K represents K classifications. This random approach increases the randomness of the cluster and affects the clustering effect of the algorithm.
In the neural network, the receptive fields of each layer have the same size, but in human vision, the size of the receptive fields will change depending on the size of the object. To make the neuron adaptively adjust the size of its receptive field for different sizes of input information, the selective kernel network (SkNet)  module is proposed. This module uses a nonlinear method to aggregate kernels of different sizes, and these kernels are mixed together via softmax attention. The size of the receptive field in different fusion layers is different, as shown in Figure 2.
SkNet is divided into three parts: split, fuse, and select. The first is the split operation. For the input (, where is the dimension, is high, and is wide), the different receptive fields are obtained through the convolution kernels of and , respectively. The two feature maps are and . Next is the fuse operation, which adds two feature maps to get as
To obtain global information, perform global average pooling operations:
At the same time, to improve the accuracy and adaptability of the network, a fully connected layer is added after the pooling layer: where is Relu, is BN , and is equivalent to a queue  operation, that is, has a smaller dimension than . The dimension of is set to , and the value of is
Among them, and are artificially set, and is a ratio that compresses the dimensions. The select operation indicates that soft attention between channels is used to select information of different scales:
To summarize the idea of SkNet, there are several scale feature maps, and the features from squeeze are returned to c by several full connections, and then, the N fully connected results are put together. Then, perform softmax on each column vertically, so the same channel with different scales has different weights.
3.1.1. Predictive Box Clustering Algorithm with Fuzzy C-Means (FYOLOV3)
To reduce the randomness and improve the accuracy of the prediction frame, the clustering method is improved. The fuzzy clustering algorithm  is set to generate an initial clustering center. Then, the initial clustering center is passed into the K-means algorithm, and the result of the clustering is the initial position of the anchor.
First, the data are standardized. The set of the image classification objects is . In this set, there are indicators in any sample , and the sample is used to label the characteristic index vector.
In Eq. (3), represents the index of the th characteristic in the sample , and the matrix of the characteristic indicators of the samples is constrain the value in to [0,1] through data transformation. The algorithm uses local neighborhood information and covariance to construct an objective function. The objective function and constraints of the algorithm are as follows: where is the total number of image pixels, is the number of image classifications, and represents the degree of membership of the pixel belonging to the th category. is a fuzzy weighting coefficient greater than 1, and represents the th cluster center. (, ) represents the Mahalanobis distance from the th data point to the th cluster center, which is the covariance distance of the data. represents the balance parameter that controls the influence of neighboring pixels. The Mahalanobis formula is as follows: where represents the matrix determinant, and represents the dimension of the problem. Clustering center and membership of the th pixel can be obtained based on the Langer multiplier method:
is the initial cluster center of the K-means clustering algorithm, where is the number of categories. For each sample in the data set, calculate its distance to the cluster centers and divide it into the class corresponding to the smallest cluster center . For each cluster center , recalculate its cluster center:
Repeat the distance calculation and update the distance center until the position of the cluster center no longer changes. The algorithm flow is presented in Algorithm 1.
3.1.2. Multiscale Detection (MYOLOV3)
The YOLOV3 algorithm uses three different scale feature map fusions, using high resolution of low-level features and high semantic information of high-level features. By upsampling the features of different layers, objects are detected on three different scale feature layers. As shown in Figure 3, the bottom-level downsampling feature map is , and the two upsampling feature maps are and , respectively.
The YOLOV3 network has 32 times downsampling of the input detection image. The downsampling factor is high, the receptive field of the feature map is relatively large, and the shallow information is not fully utilized, resulting in some information loss after multilayer convolution. Therefore, this network is suitable for detecting large-sized objects in an image. In industrial production, the defects of objects are relatively small. For better detection of small defects, the original three-scale detection is extended to four-scale detection.
As shown in Figure 4, when multiscale fusion is performed, an upsampling fusion operation is used, a scale is added for the fusion operation, and a feature map with an upsampling of is added. Due to the addition of a scale, the anchor value also must be readjusted as shown in Table 1.
In industrial images with relatively small defects, the conventional YOLOV3 often has incorrect or missed defects. This is due to misidentification caused by the imbalance of the confidence distribution. To enable the network to learn the global features and autonomously improve the score of small defects, the SkNet structure is embedded in the improved YOLOV3 network. This makes the network make choices about information at different scales. Under the condition that the detection speed is guaranteed, the detection accuracy is improved, and the efficiency of real-time quality inspection in the industry is improved.
Considering that there is a convolution operation in the YOLOV3 convolution layer, there is also one in SkNet. To maintain the original detection speed, starting from the original convolutional layer of layer 4 of YOLOV3, the subsequent convolutional layer was replaced with a SkNet structure. This makes the network have different receptive fields for feature maps of different sizes, replacing a total of 23 SkNet structures, as shown in Figure 5.
A feature map of is passed in a convolution layer, where is the width, is the height, and is the number of channels. Different receptive fields are obtained through the and convolution kernels, and the two feature maps are and , respectively. Add operations are performed on two feature maps to obtain , and then, fuse and select operations are performed on to output feature maps. The specific parameter configuration of the designed SK-YOLOV3 is provided in Table 2.
SeNet (Squeeze and Excitation Networks) and SkNet are network structures proposed by the same team, and both of these introduce attention to improve the global receptive field. SeNet adaptively selects the channel, and SkNet adaptively selects the convolution kernel. Therefore, this paper also designed the SE-YOLOV3 network model for experimental comparison.
To accelerate the convergence speed of the network and avoid overfitting, 0.9 is used as the impulse constant, 0.0005 is used as the weight attenuation coefficient, and the initial learning rate is 0.0005. The experimental environment is the Ubuntu 14.04 operating system with Intel (R) Xeon (R) CPU E5-2698 v4 @ 2.20 GHz processor and 16 GB running memory (RAM), and the GPU is NVIDIA Tesla K80 with a 16 GB video memory.
The evaluation indicators are precision and recall. For these, precision is the precision rate that indicates the proportion of the samples in different categories that in fact belong to that category among the samples that are predicted to be positive:
Here, 8 types of qualified test paper, “unqualified test paper,” “Crooked,” “Stains,” “Marker pen,” “Burr,” “Short,” and “Peeling” are used as detection targets, and predictions are made according to different categories. TP (True Positive) indicates the number of samples that correctly identify the defective target. FN (False Negative) indicates the number of samples for which no defective target was identified. FP (False Positive) indicates the number of samples that incorrectly identify a defective target. Recall is the recall rate that represents the ratio of the number of correctly detected targets to the total number of the targets in the test set:
The denominator of recall is true positives plus false negatives and represents the total number of samples.
While the target objects in the images of large public datasets such as COCO  are relatively complete, there have been almost no studies of small defect classification for industrial products. Therefore, to verify the practicability of the algorithm, it is necessary to manually collect and create the data sets. (1) The urine test strip data are mainly obtained through high-definition camera shooting and crawler technology. The captured data are the main component, and the data obtained by the crawler technology are the minor component. A total of 1562 urine test strip images were collected with a resolution of pixels. (2) The following methods are used for data enhancement of the original image: magnification (image width and height are enlarged to 1.5 times), reduction (image width is reduced to 0.3, height is reduced to 0.5, and image size is guaranteed to be a multiple of 32), brightness enhancement and reduction, flipping (90° and 180°), and clipping. Finally, 11634 images were obtained. (3) Labeling was used to mark 8 kinds of urine test paper defects in 1562 images. As shown in Figure 6, the images are classified as “qualified test paper,” “unqualified test paper,” “Crooked,” “Stains,” “Marker pen,” “Burr,” “Short,” and “Peeling.” The specific method is the selection of all of the defects in the image box and obtain the XML file in the VOC format. (4) The xml file is converted to a txt file with the format of ‘tag’ + ‘X’ + ‘Y’ + ‘W’ + ‘H’, and 6,634 images are randomly selected as the training set and 5000 are selected as the test set.
5.1. SK-FMYOLOV3 Convergence Verification
Based on the improved YOLOV3 structure, the SkNet structure is embedded to train on a homemade urine test strip dataset. Iterative training on the GPU server 300 times, the results show that the model can quickly converge to a stable state during the training process. During the training process, log information is collected for each iteration of the SK-FMYOLOV3 model training. GIoU  is used as the loss of the detection task, and the objectness  is recorded during the training process, as well as the val GIoU and val objectness of the verification set. GIoU takes into account the nonoverlapping areas that IOU does not take into account and can reflect the manner in which the predicted box and the ground truth overlap. The objectness value represents the probability of the target in the prediction box. Through the visualization of information, during the training process, as the number of iterations continues to increase, the loss function gradually converges in the first 200 iterations. The GIoU value of the training set and the validation set is stable at approximately 1.15, and the objectness value is stable at approximately 1.0, as shown in Figure 7.
5.2. Impact of Different Improvement Strategies on the Prediction Box
The impact on the accuracy of the prediction box is calculated for the three improved strategies proposed above, using the original YOLOV3 as the reference, as shown in Table 3. As shown in Table 3, FYOLOV3 represents the addition of an improved prediction box clustering algorithm based on the YOLOV3 algorithm. MYOLOV3 stands for multiscale improved YOLOV3 algorithm. FMYOLOV3 represents the addition of an improved predictive box clustering algorithm and an improved multiscale algorithm based on the YOLOV3 algorithm. SK-FMYOLOV3 represents the addition of SkNet structure on the basis of FMYOLOV3.
Each improvement strategy used in this paper improves the performance of the original YOLOV3 detection network to varying degrees. Among these, the improvement of the prediction box clustering algorithm displays the most significant improvement in the model accuracy, and the average IOU has increased by nearly 6 percentage points. The improvement of the multiscale algorithm leads to the average IOU increase of nearly 4 percentage points. The improved clustering box prediction algorithm and multiscale algorithm based on YOLOV3 increased the average IOU by nearly 7 percentage points. Combining all of the improvement strategies, the final average IOU is improved by nearly 8 percentage points over the original YOLOV3 network.
5.3. Performance Evaluation
The accuracy rate () and recall rate () are used as evaluation indicators, and the same data set is used in the same experimental environment. The YOLOV3 method with different improvement strategies is compared with R-CNN, fast RCNN, and faster RCNN. The test results of the qualified and unqualified products are shown in Table 4.
As seen from the above table, the accuracy and recall of the SK-FMYOLOV3 network is the highest. The reason is that the SkNet structure allows feature maps to be selected for training by different convolution kernels, improving the score of small features. At the same time, the accuracy of classification is increased, making the accuracy and recall of the network higher. The fastest network is YOLOV3, because the improved algorithm increases the number of the layers in the network and therefore increases the recognition time. FYOLOV3 is better than MYOLOV3, because the prediction box clustering algorithm is added to avoid the impact of random initial points on the prediction result. MYOLOVS is better than YOLOV3, because the annotations in the data set are small defects. After adding a small-scale feature fusion, the recall rate is improved, so that previously unrecognized defects are identified. SK-FMYOLOV3 achieved the highest recall and precision, because the convolution autonomous selection can be trained according to the size of different feature maps, leading to an improvement in the accuracy of classification. By embedding SkNet in the improved YOLOV3 structure, the accuracy rate is increased by 9 percentage points and the recall rate is increased by 23 percentage points.
The accuracy and recall of the 8 classifications of the homemade urine test strip data set in SK-FMYOLOV3 are shown in Figure 8. In the first 300 iterations, a higher precision is observed, but as the number of iterations increases, overfitting will occur, the recall will increase, and the precision will decrease. The test results of SK-FMYOLOV3 for 8 categories are as shown in Figure 9.
5.4. Comparison of Experimental Results
In the same experimental environment, the same number of iterations are used for training (epoch = 300). The test results of this method and faster-RCNN, YOLOV3 on the homemade urine test strip data set are shown below.
The first is the detection of qualified products, as shown in Figure 10. The accuracy rate of the qualified urine test paper in this method is 0.99, YOLOV3 is 0.73, and that of faster-RCNN is 0.89; thus, this method is more effective for the detection of qualified products.
Then, it is the defect detection of nonconforming products, as shown in Figure 11. For the detection of burr, it is observed that SK-FMYOLOV3 detected 11 burrs on the urine test strip. YOLOV3’s attention to small defects is not as high as that to large defects. Therefore, crooked was detected on urine test strips, and no burr was detected. Faster-RCNN detected 7 burrs on urine test strips. For the detection of crooked defects as shown in Figure 11, all three algorithms were successful. Because Crooked defects are relatively large and the features are obvious, all three algorithms show better performance. As shown in Figure 11, for detection of marker defects, the three algorithms can detect the marker better, and the marker is also a relatively obvious defect.
The detection of peeling defects is shown in Figure 12. It is observed that SK-FMYOLOV3 and Faster RCNN can detect these defects, while YOLOV3 cannot detect these defects. As shown in Figure 12, for the detection of short defects, it is found that the three algorithms show good performance, but the accuracy of the algorithm in this paper is approximately 0.5, while those for the other two models are approximately 0.2. The detection of stains defects is shown in Figure 12. All three algorithms can be detected to carry out the detection. The accuracy of this algorithm and of Faster-RCNN is approximately 0.5, and that of YOLOV3 is 0.45. It is important to note that in addition to the detection of defects, each urine test strip also detects the nonconforming products. As can be observed from the above group of figures, the accuracy rate of unqualified products is approximately 0.99, and showing that the method is suitable for use in the classification of qualified and unqualified products of industrial products.
To solve the problem of detection accuracy of defective products in industrial production, this paper proposes an SK-FMYOLOV3 algorithm based on the YOLOV3 network. First, fuzzy mean clustering is used to generate the initial clustering points to avoid the influence of randomly initialized prediction frame on detection accuracy. Then, the original three-scale prediction is changed to four scales, making the algorithm more suitable for detecting smaller defects. Finally, the SkNet structure is merged, so that the feature map selects the appropriate convolution kernel for training through the attention mechanism, and the scores of the defects that are not easy to identify are higher. The proposed network structure is based on the homemade urine test strip data set, and the detection precision rate and recall rate of the qualified urine test strip and the unqualified urine test strip are 96.8 and 94.5, respectively, were obtained. This method can accurately identify the 6 types of small defects in nonconforming products. In future research, we will consider using the network structure for other industrial products to conduct experiments in order to save human resources and improve production efficiency.
The [Urine dipstick dataset] data used to support the findings of this study were supplied by [Rui Yang] under license and so cannot be made freely available. Requests for access to these data should be made to [Rui Yang, email@example.com].
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported in part by the National Natural Science Foundation of China (Nos. 61772149, U1701267, and 61762028), GUET Excellent Graduate Thesis Program (No. 18YJPYSS15), Guangxi Key Laboratory of Image and Graphic Intelligent Processing Project (No. GIIP2003), and Guangxi Science and Technology Project (Nos. AB20238013, ZY20198016, 2018GXNSFAA294127).
- Y. Min, B. Xiao, J. Dang, B. Yue, and T. Cheng, “Real time detection system for rail surface defects based on machine vision,” EURASIP Journal on Image and Video Processing, vol. 2018, no. 1, Article ID 3, 2018.
- R. Lan, Y. Zhou, Z. Liu, and X. Luo, “Prior knowledge-based probabilistic collaborative representation for visual recognition,” IEEE Transactions on Cybernetics, vol. 50, no. 4, pp. 1498–1508, 2020.
- R. Lan, L. Sun, Z. Liu, H. Lu, C. Pang, and X. Luo, “Madnet: a fast and lightweight network for single-image super resolution,” IEEE Transactions on Cybernetics, vol. 99, pp. 1–11, 2020.
- R. C. Hardie, R. Ali, M. S. De Silva, and T. M. Kebede, “Skin lesion segmentation and classi_cation for isic 2018 using traditional classifiers with hand-crafted features,” https://arxiv.org/abs/1807.07001.
- C. Yu, G. Zhang, and Y. Gao, “Improved threshold-based segmentation method for millimeter wave radiometric image,” in Proceedings of the 2019 International Conference on Modeling, Simulation, Optimization and Numerical Techniques (SMONT 2019), Shenzhen guangdong, China, 2019.
- H. Lyu, H. Fu, X. Hu, and L. Liu, “Esnet: edge-based segmentation network for real-time semantic segmentation in traffic scenes,” in 2019 IEEE International Conference on Image Processing (ICIP), pp. 1855–1859, Taipei, Taiwan, 2019.
- R. Priyadharsini and T. S. Sharmila, “Object detection in underwater acoustic images using edge based segmentation method,” Procedia Computer Science, vol. 165, pp. 759–765, 2019.
- Z. Zhou, Q. M. J. Wu, Y. Yang, and X. Sun, “Region-level visual consistency verification for large-scale partial-duplicate image search,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 16, no. 2, pp. 1–25, 2020.
- Z. Zhou, Y. Mu, and Q. M. J. Wu, “Coverless image steganography using partial-duplicate image retrieval,” Soft Computing, vol. 23, no. 13, pp. 4927–4938, 2019.
- M. Elawady, C. Ducottet, O. Alata, C. Barat, and P. Colantoni, “Wavelet-Based Reflection Symmetry Detection via Textural and Color Histograms: Algorithm and Results,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1734–1738, Venice, Italy, 2017.
- Z. Xing and H. Jia, “Multilevel color image segmentation based on glcm and improved salp swarm algorithm,” IEEE Access, vol. 7, pp. 37672–37690, 2019.
- H. Dong, X. Zhang, Y. Guo, and F. Wang, “Deep multi-scale gabor wavelet network for image restoration,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2028–2032, Barcelona, Spain, 2020.
- D. N. Thanh, N. N. Hien, V. S. Prasath, U. Erkan, and A. Khamparia, “Adaptive thresholding skin lesion segmentation with gabor filters and principal component analysis,” in Intelligent Computing in Engineering, V. Solanki, M. Hoang, Z. Lu, and P. Pattnaik, Eds., Advances in Intelligent Systems and Computing, pp. 811–820, Springer, Singapore, 2020.
- M. M. T. Zadeh, M. Imani, and B. Majidi, “Fast facial emotion recognition using convolutional neural networks and gabor filters,” in 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), pp. 577–581, Tehran, Iran, 2019.
- J. Nazarinezhad and M. Dehghani, “A contextual-based segmentation of compact polsar images using markov random field (mrf) model,” International Journal of Remote Sensing, vol. 40, no. 3, pp. 985–1010, 2019.
- W. Li, Y. Chen, W. Sun et al., “A gingivitis identification method based on contrast-limited adaptive histogram equalization, gray-level co-occurrence matrix, and extreme learning machine,” International Journal of Imaging Systems and Technology, vol. 29, no. 1, pp. 77–82, 2019.
- Y. Huang, S. Xu, L. Yang, S. Zhao, Y. liu, and Y. Shi, “Defect detection during laser welding using electrical signals and high-speed photography,” Journal of Materials Processing Technology, vol. 271, pp. 394–403, 2019.
- B. Li, F. Zhao, Z. Su, X. Liang, Y. K. Lai, and P. L. Rosin, “Example-based image colorization using locality consistent sparse representation,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5188–5202, 2017.
- Y. Wang, Z. Cai, Z. H. Zhan, Y. J. Gong, and X. Tong, “An optimization and auction-based incentive mechanism to maximize social welfare for mobile crowdsourcing,” IEEE Transactions on Computational Social Systems, vol. 6, no. 3, pp. 414–429, 2019.
- R. Lan, H. Lu, Y. Zhou, Z. Liu, and X. Luo, “An LBP encoding scheme jointly using quaternionic representation and angular information,” Neural Computing and Applications, vol. 32, no. 9, pp. 4317–4323, 2020.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, Columbus, OH, USA, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
- R. Girshick, “Fast R-CNN,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448, Santiago, Chile, 2015.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: object detection via region-based fully convolutional networks,” in Advances in Neural Information Processing Systems, pp. 379–387, Morgan Kaufmann, 2016.
- W. Liu, D. Anguelov, D. Erhan et al., “Ssd: single shot multibox detector,” in Computer Vision-ECCV 2016. ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., vol. 9905 of Lecture Notes in Computer Science, pp. 21–37, Springer, Cham, 2016.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, Las Vegas, NV, USA, 2016.
- J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271, Honolulu, HI, USA, 2017.
- N. S. Samarawickrama, Faster R-CNN Based CubeSat Close Proximity Detection and Attitude Estimation, Mississippi State University, 2019.
- C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd: deconvolutional single shot detector,” https://arxiv.org/abs/1701.06659.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, 2016.
- J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” https://arxiv.org/abs/1804.02767.
- J. Tao, H. Wang, X. Zhang, X. Li, and H. Yang, “An object detection system based on yolo in traffic scene,” in 2017 6th International Conference on Computer Science and Network Technology (ICCSNT), pp. 315–319, Dalian, China.
- J. Deng, G. Pang, Z. Zhang, Z. Pang, H. Yang, and G. Yang, “cgan based facial expression recognition for human-robot interaction,” IEEE Access, vol. 7, pp. 9848–9859, 2019.
- D. Weicong, J. Longxu, L. Guoning, and Z. Zhiqiang, “Real-time airplane detection algorithm in remote-sensing images based on improved yolov3,” Opto-Electronic Engineering, vol. 45, no. 12, article 180350, 2018.
- H. A. Taboada and D. W. Coit, “Data clustering of solutions for multiple objective system reliability optimization problems,” Quality Technology & Quantitative Management, vol. 4, no. 2, pp. 191–210, 2016.
- X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519, Long Beach, California, USA, 2019.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” https://arxiv.org/abs/1502.03167.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, Salt Lake City, UT, USA, 2018.
- K. Nongmeikapam, W. K. Kumar, and A. D. Singh, “Fast and automatically adjustable grbf kernel based fuzzy c-means for cluster-wise coloured feature extraction and segmentation of mr images,” IET Image Processing, vol. 12, no. 4, pp. 513–524, 2018.
- Y. Dong, H. Su, B. Wu et al., “Efficient decision-based black-box adversarial attacks on face recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7714–7722, Long Beach, CA, USA, 2019.
- H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: a metric and a loss for bounding box regression,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2020.
- M. M. Cheng, Y. Liu, W. Y. Lin, Z. Zhang, P. L. Rosin, and P. H. S. Torr, “BING: Binarized Normed Gradients for Objectness Estimation at 300fps,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
Copyright © 2020 Rui Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.