Abstract

Image target detection and recognition had been widely used in many fields. However, the existing methods had poor robustness; they not only had high error rate of target recognition but also had high dependence on parameters, so they were limited in application. Therefore, this paper proposed an image target detection and recognition method based on the improved R-CNN model, so as to detect and recognize the dynamic image target in real time. Based on the analysis of the existing theories of deep learning detection and recognition, this paper summarized the composition and working principle of the traditional image target detection and recognition system and compared the basic models of target detection and recognition, such as R-CNN network, Fast-RCNN network, and Faster-RCNN network. In order to improve the accuracy and real-time performance of the model in image target detection and recognition, this paper adopted the target feature matching module in the existing R-CNN network model, so as to obtain the feature map close to the same target through similarity calculation for the features extracted by the model. Therefore, an image target detection and recognition algorithm based on the improved R-CNN network model is proposed. Finally, the experimental results showed that the image target detection and recognition algorithm proposed in this paper can be better applied to image target detection and classification in complex environment and had higher detection efficiency and recognition accuracy than the existing models. The target detection and recognition algorithm proposed in this paper had certain reference value and guiding significance for further application research in related fields.

1. Introduction

With the rapid development of artificial intelligence and computer technology, deep learning method has been widely used in various fields, such as image detection and recognition processing, machine vision, and natural language recognition. In recent years, some scholars have made great progress in the research of image target detection and recognition using deep learning method [1]. For example, for target detection and recognition in complex environment or natural scene, compared with traditional methods, using deep learning method can greatly improve the target detection rate. Most of the existing research works on target detection and recognition based on deep learning are based on natural images. Because the shooting effect of objects in complex scenes is not ideal, image information is usually collected by satellite or infrared remote sensing. The traditional infrared image target detection and recognition method generally preprocesses the infrared image and then extracts the features manually and then uses the image segmentation algorithm to segment the target from the image [2]. When using traditional methods to detect and recognize image targets, the accuracy and efficiency of target recognition are greatly disturbed by human operation methods and various factors. At the same time, the features extracted manually are generally limited to some specific targets, which usually cannot meet the requirements of relevant target detection and recognition.

Image target detection is very important to realize the application of artificial intelligence technology in related fields. Its task is to search the region of interested targets in the image and determine the category and location of targets. Image target detection usually marks the image, selects the target area of interest in the image with a rectangular box, and finally creates a category label for the image target. Aiming at the shortcomings of traditional target detection methods, some scholars proposed a target detection method based on deep learning to obtain candidate target regions from images. It is found that the image target detection algorithm based on candidate target window is not ideal in feature extraction and recognition efficiency, and the image target classification is not accurate enough [3]. After the rapid development in recent years, the target detection algorithm has developed from the traditional manual feature detection algorithm to the current target detection algorithm based on deep learning. Image target detection algorithm based on deep learning has attracted extensive attention of scholars at home and abroad in recent years. Among them, image target detection algorithm based on deep learning is one of the research emphases in this field. Due to the relatively stable development of the traditional CNN (convolutional neural network), CNN model is usually used to extract the features of the target object in the working process of deep neural network [4].

Target detection and recognition algorithms based on deep learning can usually be classified according to different angles. Examples are region based image target detection and recognition algorithms, such as R-CNN (regions with convolutional neural network features), Fast-RCNN, and Faster-RCNN; regression-based image target detection and recognition algorithms, such as YOLO and SSD [5, 6]; and search based image target detection and recognition algorithms, such as AttentionNet. Although the deep learning method improves the efficiency of image target detection and recognition to a certain extent, due to the high operation cost in the process of target detection and recognition and the low efficiency of image processing for large scenes, in addition, the existing methods have poor robustness; they have a high error rate in target recognition, and the deep learning network model has high dependence on parameters. The setting of parameters has a great impact on the results of target detection and recognition. Therefore, this paper proposed a method of image target detection and recognition using an improved deep learning model. By adding the feature similarity matching function to the existing R-CNN model, a target detection and recognition algorithm based on the improved R-CNN model was proposed. Compared with the existing target recognition model, the proposed model not only improved the target recognition efficiency and detection accuracy but also improved the target recognition and detection performance.

In the early days, people mainly used artificial vision to detect image targets and obtain the relevant attribute information of targets. Therefore, the acquisition of target information largely depends on the manual operation method and professional technical level. With the emergence of advanced image remote sensing technology and information acquisition methods, some people use statistical methods to detect and identify targets. For example, in order to improve the accuracy of target object classification, the maximum likelihood method is used to classify the obtained remote sensing images. Therefore, the traditional target detection methods mainly rely on manual operation to extract target features. Generally, we only need to obtain the local features such as color and shape of the image target and then use the designed template to match these features and finally complete the target detection task. The traditional target detection method mainly adopts the way of sliding window to traverse the image to be detected, and there is no classified retrieval according to the target, resulting in a large number of target windows, which is not only time-consuming but also inefficient [7]. It can be seen that the features extracted by hand are incomplete and the accuracy is low. At the same time, because the traditional detection methods are completed by manual operation, the environmental adaptability and generalization ability are relatively weak, which cannot meet the real-time requirements of target detection, and the application in many fields is limited.

With the emergence of various labeled datasets and powerful computing resources such as GPU, deep neural network has been widely used in the field of image target detection and recognition [8]. Using the deep neural network model, the specific characteristics of the target object can be obtained through the training of different levels of modules. Therefore, it can meet the application needs of many fields. At present, image target detection using deep learning mainly includes two-stage detection method and single-stage detection method. The two-stage detection method mainly generates candidate target regions and classifies the targets in different regions. The single-stage detection method mainly samples the possible areas of the target to directly obtain the target detection results. The two-stage detection method mainly uses convolutional neural network to extract the feature information of the original image, establish different feature maps, and then obtain the specific location information of the target object, that is, the candidate region, from the feature map. Then, the candidate regions are regressed and classified, and finally the target object detection and recognition are realized. Because the candidate region is used to screen the target and then the candidate region is used to further detect and identify the target object, the target recognition accuracy of the two-stage detection method is high [9]. The two-stage detection algorithm not only needs to map all candidate regions to get the corresponding feature map but also needs to regression and classification of all candidate regions. Therefore, the execution speed of this method is slow. The single-stage detection algorithm only needs to generate the key frames of the target object on the feature map and classify and regress all the key frames. Therefore, in terms of target detection speed, it is faster than the two-level detection algorithm.

Most image target recognition methods are realized by slicing and classifying the target. Therefore, image target recognition belongs to the research category of image classification. At present, the deep learning methods of image classification mainly include deep belief network (DBN) and convolutional neural network (CNN) [10]. The method based on CNN is mainly based on deep learning to recognize image targets, which is also one of the most widely used methods in the image field. Some scholars use convolution model based on similarity measure to recognize image targets. Existing experiments show that CNN network has good robustness to dynamic target recognition. Therefore, CNN network model has good application effect in image target recognition. The algorithm based on deep confidence network mainly identifies targets through a series of restricted Boltzmann machines (RBM) and two modes of supervised learning and unsupervised learning [11]. RBM includes invisible layer and visible layer. Neurons at different levels are connected in two directions, while neurons at the same level are not connected. The invisible layer mainly accepts the input of external images, and the visible layer is mainly used for feature detection of all visual units. By training the RBM model, the connection weights between different neurons can be calculated, and the optimal characteristics can be obtained.

The target detection and recognition method using deep learning can use the powerful functions of various neural network models for feature extraction and can effectively simplify and adjust a large number of parameters. Its performance is better than those of traditional methods [12]. However, affected by different image acquisition methods, there are still some problems in target detection and recognition using deep learning method. For example, when the image sample label is not complete and the imaging features are complex, it will be difficult to obtain effective data and features using the existing neural network model. Therefore, the factors of image samples and the dependence of deep learning model on sample data affect the results of image target detection and recognition based on deep learning to a certain extent. In addition, the existing depth learning models generally do not consider the specific imaging methods and characteristics of images, and there may be some differences in the imaging characteristics of different images. From the existing research, it is difficult to apply the existing deep learning methods to different image targets, and the existing detection models are difficult to ensure the recognition effect. Therefore, exploring how to design an image related neural network model according to the existing image depth has become a widespread concern of relevant researchers.

3. Theoretical Basis of Deep Learning Detection and Recognition

3.1. Traditional Image Target Recognition Method

The traditional methods of image target detection and recognition are mainly based on image color, shape, feature, texture, and other information. The processing process generally includes image preprocessing, feature extraction and selection, feature classification, detection, and recognition. The structure of a traditional image target detection and recognition system is shown in Figure 1. Because the images collected by different methods have different attributes, the traditional image detection and recognition algorithms cannot be transplanted.

Image target detection is the premise and foundation of target recognition. It mainly preprocesses the input original image and obtains the area where the image target is located. The accuracy of target detection plays an important role in subsequent target recognition. In the traditional method of target detection, the fixed threshold is compared with the pixel value of the original image, and then the result of target detection is obtained. With the emergence of a large number of remote sensing images, Constant False Alarm Rate (CFAR) detection algorithm is mainly used for image target detection [13]. CFAR uses the adaptive method to calculate the threshold, compares the threshold with the pixel value of the point to be detected, and then infers whether the pixel is in the image target area.

Common methods for target detection in remote sensing images include detection methods based on directional gradient histogram (HOG) and support vector machine (SVM) [14]. HOG mainly uses the directional density distribution of image edge pixels to represent the shape characteristics of local targets. Firstly, based on the preprocessing of remote sensing image, a certain size of local image is segmented to calculate the HOG feature, and then, the gradients and in horizontal and vertical directions are calculated. The calculation formulas of gradient amplitude and direction are as follows:

Firstly, the image target needs to be subdivided. Secondly, after the image is divided into several blocks, calculate the amplitude and direction of all pixels, combine the adjacent blocks into different blocks, and normalize the gradient histogram in the blocks. Finally, the hog feature is used as the input object of SVM classifier, and the maximum interval block is used as the classification standard.

In the process of image target recognition, the key technology is image feature extraction and classifier recognition. When extracting classification features, we should not only make differences between different features but also make certain similarities between similar features, so as to achieve the best classification effect. At the same time, the classifier can improve the classification effect by mapping the features to the high-dimensional space according to the feature distribution and using the collected sample information.

In terms of image target feature extraction, it usually includes image geometric features, local invariant features, and gray statistical features. Among them, geometric features and gray statistical features belong to features based on visual mechanism. Considering the different image types and acquisition methods, there are some differences in feature extraction. For example, due to the noise interference of remote sensing image, it is difficult to obtain all the feature information of remote sensing image by using the basic feature extraction method. Therefore, it is necessary to eliminate the interference of image noise by transform domain features.

The geometric features of the image include the size, shape, and structure of the target. Generally, the pixels of the image can be used to represent the basic information such as the perimeter, area, length, and width of the target. In order to more comprehensively describe the geometric characteristics of the image target, the moment feature can be used. Moment features can not only reflect the boundary and internal attribute information of image targets but also transform high-dimensional image data into a small number of features, so as to improve the recognition accuracy of image targets.

If the image of the target detection area is represented by a two-dimensional function, the moment feature of the image can be obtained according to the function moment [15]. The central moment feature of the image target can be expressed as follows:where and , respectively, denote the abscissa and ordinate of the image target center. and represent the order of the characteristic moment.

The origin moment feature of the image target can be expressed as follows:

The gray statistical features of image targets mainly extract the brightness and texture features of the image according to the gray distribution and structural characteristics of image pixels. Among them, the brightness features mainly include the brightness mean, maximum, and minimum brightness of the image, while the texture features include attribute information such as standard deviation, variance, and fractal dimension.

From the composition and extraction methods of the above geometric features and gray statistical features, they all extract the features by using the image spatial information and according to the distribution of pixel values. In order to use the frequency change in the frequency domain to reflect the image information, the transform domain feature method can be used. It mainly uses Fourier transform and other methods to map the image target information to the frequency domain.

Although some static image feature information can be obtained by using conventional feature extraction methods, the effect of feature extraction for remote sensing images is poor. The image target obtained by remote sensing satellite has certain translation or rotation. Therefore, it is often difficult to recognize the image target by using the basic feature extraction method. In order to meet the feature extraction of moving targets, SIFT (scale invariant feature transform) feature extraction method can be used [16]. SIFT feature can not only maintain the original basic features when the target is translated and the structure changes but also avoid the influence of illumination and noise. Therefore, it can better solve the problem of image target recognition and detection in complex environment. SIFT uses the scale space to map the two-dimensional feature information of the image to the deep scale space in order to realize the multiscale invariance of the image target. Then, all pixels of the image target are compared with the surrounding pixels to determine the spatial extreme points. At the same time, the unstable edge points are further deleted from the obtained spatial extreme points in order to obtain the extreme points with higher accuracy.

For each extreme point, it is necessary to determine the Gaussian convolution image sample closest to the scale and calculate the corresponding gradient value and direction according to the pixel difference between the extreme point and the surrounding points. The calculation formula of gradient value and direction is as follows:

3.2. Basic Model of Deep Learning Detection and Recognition

Compared with the traditional image target detection and recognition methods, the image detection and recognition method based on deep learning has certain universality. The neural network model is used to learn the image characteristics, and the target in complex environment can be detected and recognized. Because the performance of image target detection algorithm using deep learning is higher than that of traditional target detection algorithm, deep learning method is widely used in image target detection and recognition. It is known from the existing research that the image target detection and recognition algorithm using deep learning first needs to extract the features of the original image and then process the image using deep neural network to obtain the specific information of the target. In the image target detection models, most of them use convolutional neural network. From the application of existing models in image target detection, most of the image target detection algorithms based on region and regression are used at present.

At present, the widely used region based deep learning target detection algorithms include R-CNN network, Fast-RCNN network, and Faster-RCNN network. Using these algorithms, the candidate regions of image targets to be detected can be obtained, and the targets in the region can be classified by CNN.

As the earliest target detection method based on CNN, R-CNN algorithm applies neural network to image target detection. Because different convolution kernels can produce various characteristics in different regions of the image, different features of the image target can be extracted. Different from the traditional image detection methods, the target detection method based on R-CNN takes the simplest image as the target object. This method first needs to obtain candidate regions from the image to be detected and set all candidate regions to the same size. Secondly, Alex network model is used to extract relevant features from candidate regions. Finally, support vector machine classifier is used to classify the image [17]. As shown in Figure 2, the process of image target detection using R-CNN is described.

Compared with the traditional image target detection algorithm, R-CNN has certain advantages, but the application of this method needs to first select multiple candidate regions on the image and use the depth network for feature extraction. The amount of data calculation is large and the detection speed is slow. In addition, R-CNN uses support vector machine for classification, and feature extraction and classification are processed separately, which affects the detection effect of image targets.

As a new detection network model, Fast-RCNN is an image target detection system based on R-CNN. It can not only classify and train image targets by properly adjusting the network model but also realize multitask learning. Fast-RCNN is faster than R-CNN in image target training and detection. However, because Fast-RCNN does not use end-to-end detection algorithm, the speed of target detection is still slow and cannot meet the requirements of real-time image target detection.

In order to detect image targets in real time, a Faster-RCNN detection model is proposed based on the existing Fast-RCNN detection algorithm. Faster-RCNN is a real-time image target detection algorithm based on deep learning. The model uses candidate regions to generate networks and adopts multireference windows. Different from the traditional target detection algorithm, Faster-RCNN effectively focuses on the acquisition of candidate regions and feature extraction operations on the same deep network and can realize end-to-end deep learning detection [18]. The basic structure of Faster-RCNN target detection algorithm is shown in Figure 3.

In the Faster-RCNN detection model, Conv layers mainly include convolution, pooling, and activation (ReLU function). After a series of processing of the input image information, this layer can obtain feature maps. In the RPN network layer of the model, the convolution kernel of 3 × 3 is mainly used to convolute with the feature maps output from the previous layer to obtain the feature vector, which is combined with the anchors mechanism to generate proposals. In the ROI pooling layer of the model, the feature maps generated by the convolution layer and the proposals obtained by the RPN network are synthesized and calculated to obtain the proposals feature maps.

Finally, using the classifier of the model, combined with the full connection layer and Softmax function, the category of image target can be calculated according to the candidate feature map output from the previous layer. At the same time, the relevant categories are modified to obtain the accurate position of the target.

4. Image Target Detection and Recognition Algorithm Based on Improved R-CNN

4.1. Image Target Feature Extraction Based on Selection Search Algorithm

In order to obtain the target object, according to the traditional image target detection idea, it is necessary to select the candidate target window containing multiple objects in the original image and take it as the input value of the deep learning network model. After the model processing, the deep features are obtained, and then the features of candidate targets are classified by Softmax classifier, which is sorted according to the confidence of different types of features, and then the features with the highest confidence are output as the result of model recognition.

In order to extract object features from candidate target regions, exhaustive method can be used. Firstly, the original image information of different scales is obtained, from which the area containing the target object is found, and then the image target is searched by adjusting the size of the candidate window. In order to find the image target quickly, the selection search algorithm can be used to find the target object in the candidate window at different scales. The selection search algorithm mainly adopts the grouping method to layer the target window, divides the image candidate window into regions with different scale sizes, obtains several pixel sets with similar characteristics through the continuous subdivision of the region, and then takes the rectangular area formed by the combination of pixels meeting the requirements of the target scale as the window where the candidate target object is located.

In order to reduce the computational complexity, the selection search algorithm needs to continuously subdivide the candidate regions. In the process of combining pixels with similar features, the original pixel features can be used as the newly combined pixel features, so as to reduce the workload of repeatedly calculating new pixel features. According to the characteristics of super pixels and the similarity of different superpixels, several candidate windows of different sizes can be obtained, which can be used as the input object of deep learning network model. The required image features can be extracted through model processing.

In order to compare the similarity of different pixels, comprehensive calculation can be carried out according to the color, texture, size, and coincidence of pixels [19]. In order to obtain the color similarity of two different pixels, the color histogram of each pixel can be calculated first, and the color similarity of two pixels and can be calculated as follows:

In order to calculate the texture similarity of two different pixels, the corresponding texture feature vector can be extracted from each pixel , and the texture feature similarity of two pixels and can be calculated as follows:

The size of the superpixel can be calculated according to the number of pixels contained in the superpixel. Therefore, the size similarity of the two pixels and can be calculated as follows:where and denote the size of superpixels and , respectively, and is the size of the original image.

The coincidence degree is used to describe the coincidence of the region where the superpixels and are located. If the superpixel is covered by , the two pixels are merged. If the intersection of the superpixels and is too small, the two pixels are not merged. The coincidence degree of the two pixels and is calculated as follows:where is a rectangular area containing two pixels and .

In order to more comprehensively reflect the similarity of superpixels and , the similarity of different levels can be weighed to obtain the similarity of two superpixels, which is calculated as follows:where , , , and represent weight coefficients corresponding to different feature similarity.

4.2. Image Target Detection Algorithm Based on Improved R-CNN

Image target detection algorithm using deep learning usually includes two-stage algorithm and single-stage method. The two-stage algorithm mainly uses convolutional neural network (CNN) to extract the feature information of the image, generate ROI features in the first stage, and then classify ROI features in the second stage [20]. The single-stage method mainly integrates the process of feature extraction and ROI feature generation through a deep network model. The earliest two-stage method is R-CNN.

When using R-CNN to detect image targets, it is necessary to obtain ROI feature region by using selection search method, extract relevant features from ROI, and then classify them as the input object of SVM. Although the detection accuracy of image target by R-CNN is higher than that of traditional methods, this method needs to go through multiple stages when training and testing the image, resulting in slow running speed and poor real-time performance of the algorithm.

In order to overcome the shortcomings of R-CNN in image target detection, we can improve the existing R-CNN algorithm. The improved algorithm is mainly composed of multifeature image extraction operation and subfeature image processing operation. As shown in Figure 4, in order to detect small image targets, the feature map extracted from the highest layer of VGG16 network and the feature map extracted from the lower layer can be combined into a hybrid feature map.

In order to generate mixed feature maps with different features, it is necessary to adopt upsampling or downsampling methods to convert the feature maps from different layers to the same scale. In the improved network model, Conv1 can be treated as low-level feature map, Conv2 as intermediate feature map, and Conv3 as high-level feature map, and different levels of feature map can be combined into multiscale mixed feature map. Among them, the low-level feature map is converted to the same size as the intermediate feature map by maximum pool layer downsampling, and the high-level feature map is converted to the same size as the intermediate feature map by deconvolution layer upsampling. Then, the normalization operation is used to fuse different feature layers to keep the ROI feature pixels generated by RPN in the area of 2 × 2 of the original image, so as to use the improved model to detect small targets.

The role of ROI pool layer is mainly to extract subfeature map from mixed feature map. It extracts relevant features through network model and different convolution kernel operations and then takes these features as a part of constructing subfeature map. Due to different observation angles, the dimensions of the feature map are also different. Therefore, it may have a certain impact on the target detection results. Therefore, a coding vector can be constructed to adjust the dimension of the subfeature map, so as to improve the detection performance of the network model.

The deconvolution layer is mainly used to expand the size of the feature graph and concretize the abstract information in the feature graph to generate information containing several details, so as to make the feature graph more intuitive. Deconvolution is to convert the current feature map to the required size by upsampling [21]. Because the transformed eigenvalues show sparse distribution, it is necessary to scan the characteristic map by using convolution check, so that the eigenvalues in the characteristic map no longer show sparse distribution. The operation process of deconvolution is shown in Figure 5.

4.3. Image Target Recognition Based on Improved R-CNN

On the basis of image target detection, specific target objects can be further obtained according to the detection results. Therefore, an image target recognition algorithm based on improved R-CNN is given. The specific process is shown in Figure 6. The target recognition algorithm mainly includes two parts: training and testing. The training part uses the training dataset to train the network model in order to determine the relevant parameters of the network model, while the testing part mainly uses the test dataset to verify the performance of the network model.

In the process of image target recognition, the feature classification and location of image target are the key. The improved algorithm adopts Softmax classifier, which classifies the features according to the loss of cross entropy [15]. The Softmax function is represented as follows:where is the feature of the input image and represents the probability of class . When testing the model, the feature class with the largest probability is taken as the final recognition result according to the calculation result of Softmax function.

When training the model, we need to calculate the classification vector according to the Softmax function and then calculate the corresponding loss function and finally get the parameter update value of the network model.

If a training set is given, where the condition is satisfied, there are classifications in total. For the feature of each input image, there is a probability of corresponding class, that is, . Therefore, it can be expressed in vector form as follows:

The positioning algorithm for image target is shown in Figure 7. Firstly, the candidate target windows can be obtained from the input image. Through the improved R-CNN model processing, the categories and confidence of all candidate target windows can be obtained. Then, taking the candidate target window as the center, the surrounding adjacent windows are sorted according to the confidence of the feature category. The window with the highest confidence is determined as the region of the image target. At the same time, its feature classification is used as the recognition category of the image target. Finally, the image target is located according to the feature category of image target recognition and the reliability of the feature. The window of the image target represents the result of positioning the target.

5. Experiment and Analysis

In order to verify the effectiveness of the image target detection and recognition method based on deep learning proposed in this paper, the remote sensing images of moving objects obtained by satellites are selected in the experiment, so as to test the performance of the proposed model through the detection and recognition of dynamic targets. The dataset is composed of five types of targets: bus, car, bicycle, ship, and aircraft. In order to evaluate the performance of the algorithm, 35% of each category is randomly selected as the test set, and the rest is used for training. Table 1 shows the numbers of training sets, verification sets, and test sets used by different types of targets.

After initializing the parameters of the model, the random gradient descent algorithm is used to optimize the parameters. The training cycle of the model is 12. In order to verify the effectiveness of this model, the improved R-CNN, CNN, and Faster-RCNN models are tested on the given image dataset, and the confusion matrices of the recognition results of different algorithms are compared. Table 2 shows the confusion matrix of various target recognition results obtained by using the improved R-CNN in this paper, Table 3 shows the confusion matrix of various target recognition results obtained by using CNN model, and Table 4 shows the confusion matrix of various target recognition results obtained by using Faster-RCNN model.

Table 5 shows the recognition results of different models on the image dataset. Therefore, the improved model performs best in the test set. The accuracy of CNN model is 98.61%, and the accuracy of Faster-RCNN model is improved by 0.47%. By reducing the parameters in CNN, Faster-RCNN model obtains better recognition results than CNN in the experiment, and the accuracy is 99.08%. Therefore, it is known that the redundant parameters of the model will affect the recognition effect of image targets. At the same time, Faster-RCNN model can optimize network parameters. Compared with other models, the improved model produces the least false recognition samples when testing images.

During the experiment, statistics are made on the accuracy of different models in various target recognition, and the comparison results are shown in Figure 8. The comparison results in Figure 8 show that the proposed model has the highest recognition accuracy for other larger targets, including smaller targets such as bicycles. In short, for different types of target object recognition, the improved model in this paper is better than other models in performance.

6. Conclusion

The traditional target detection and recognition methods had poor robustness and the accuracy of target recognition was not high, and the existing deep learning model depended on a large number of parameters in target detection, resulting in a large amount of calculation workload. Therefore, this paper proposed an improved image target detection and recognition method based on network model. By analyzing the theoretical basis of deep learning method in target detection and recognition, this paper compared the characteristics of traditional method and deep learning method in image target detection and recognition. Combined with the characteristics of the existing R-CNN, by introducing the target feature matching module into the network model and using the features extracted by the model to obtain the feature map close to the same target through similarity calculation, an image target detection and recognition algorithm based on the improved R-CNN network model was proposed. Compared with the existing models, the results showed that the image target detection and recognition algorithm proposed in this paper not only had certain feasibility but also had higher detection efficiency and recognition accuracy. The target detection and recognition algorithm proposed in this paper can provide some theoretical reference and guidance for the application research of image target processing and other related fields in complex environment.

Data Availability

The labeled dataset used to support the findings of this study is available from the author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This work was supported by the Project of Sichuan Health Information Society (Project no. 2018029).