This paper proposes a contour extraction model based on cosaliency detection for remote sensing image airport detection and improves the traditional line segmentation detection (LSD) algorithm to make it more suitable for the goal of this paper. Our model consists of two parts, a cosaliency detection module and a contour extraction module. In the first part, the cosaliency detection module mainly uses the network framework of Visual Geometry Group-19 (VGG-19) to obtain the result maps of the interimage comparison and the intraimage consistency, and then the two result maps are multiplied pixel by pixel to obtain the cosaliency mask. In the second part, the contour extraction module uses superpixel segmentation and parallel line segment detection (PLSD) to refine the airport contour and runway information to obtain the preprocessed result map, and then we merge the result of cosaliency detection with the preprocessed result to obtain the final airport contour. We compared the model proposed in this article with four commonly used methods. The experimental results show that the accuracy of the model is 15% higher than that of the target detection result based on the saliency model, and the accuracy of the active contour model based on the saliency analysis is improved by 1%. This shows that the model proposed in this paper can extract a contour that closely matches the actual target.

1. Introduction

Remote sensing image airport target recognition is an important topic in the field of remote sensing image processing. However, the pros and cons of detection algorithms seriously affect the accuracy of image recognition. Whether in the military or civilian fields, remote sensing image target recognition plays a vital role. In the military field, remote sensing image detection technology has become an important means of military reconnaissance and early warning [1]. In the civilian field, remote sensing image detection is often used in resource surveying, urban planning, map surveying, and mapping [2].

In response to the problem of remote sensing image airport detection, domestic and foreign researchers have achieved certain results. The previous remote sensing image airport detection methods are mainly divided into three types: edge-based, region-based segmentation, and saliency-based. With the development of deep learning, deep neural network algorithms have achieved significant development in the field of image detection, image saliency analysis, and methods such as convolutional neural networks which have gradually emerged.

The edge-based method focuses on the artificial design representation of airport geometric features, runway. In 2011, Wang et al. [3] proposed a method based on perceptual organization, designed belief functions for the runway features. In 2012, Kou et al. [4] used line segment detection to identify airports. This type of method can achieve rapid detection of airport targets with better recognition results, but the presence of complex background noise and unrelated linear objects makes the airport targets difficult to predict.

The method based on region segmentation focuses on the salient structural features of the airport. And it is mainly based on the difference between the airport and the surrounding texture. Image segmentation achieves the purpose of extracting candidate regions, and then the features in the candidate regions are identified. Liu et al. [5] proposed the classifier based on the KMP algorithm, segmented aerial images and classified them into different textures, and extracted regions of interest based on the results of texture segmentation. This method needs to be analyzed pixel by pixel during image segmentation, which is slow and complex. The recognition effect depends on the selection of the segmentation threshold.

The method based on the saliency region of the image mainly uses various saliency cues to replace the traditional airport feature description. In 2011, Tao et al. [6] proposed to use matching scale-invariant feature transform (SIFT) feature key points to cluster information and at the same time to extract regional information through image segmentation. In 2012, Wang et al. [7] proposed a method based on the saliency area of the image, introducing the human eye’s attention selection calculation model to the remote sensing image airport detection, in which the Hough transform is used to initial screening whether there is an airport target in the remote sensing image, and then we use the proposed model to extract salient regions; the detection speed is faster. In 2015, Zhu et al. [8] proposed a top-down and bottom-up saliency method to form a saliency map, used SIFT and support vector machines (SVMs) to determine whether the candidate area for separating the airport target is correct.

Based on deep learning methods, the convolutional neural network is one of the representative algorithms of deep learning. The convolutional neural network can classify input information with translation invariance, which can perform supervised and unsupervised learning. Li et al. [9] summarized the existing representative deep learning-based target detection algorithms and proposed a large-scale, publicly available data set, but the purpose of the target detection in this article is slightly different, so we did not adopt it. Zhang et al. [10] and Xiao et al. [11] used convolutional neural networks to extract high-level features and hierarchical representations of targets. In 2017, Cai et al. [12] and Zhang et al. [13] applied convolutional neural networks to remote sensing images. In the airport detection and recognition task, they greatly improve the speed and accuracy of detection. Cai et al. achieved airport detection in a complex background by improving the area suggestion network and loss function method. However, due to insufficient samples, the detection frame may contain incomplete or multiple bounding frames at an airport. In the original convolutional neural network, Zhang et al. replaced the original RPN network with a weak semantic rule for line segmentation. Although efficiency has been improved, false detection targets have also increased, with an overall detection accuracy of 88.8%. Cheng et al. [14] aimed at problems such as target rotation and intraclass diversity that would interfere with the detection performance of the target, improved the detection performance of the target by optimizing the target function, built up the existing state-of-the-art object detection systems, and proposed a simple but effective method to train rotation-invariant and Fisher discriminative CNN models to further boost object detection performance.

Methods based on convolutional neural networks (CNNs) can uniformly present highly reliable detection results. Generally, supervised detection algorithms have better recognition rates than unsupervised methods. At the same time, there are the following shortcomings: (1) a large number of image samples accurately marked by the researcher are required, which makes the work dependent on labor; (2) the reusability of the model is relatively reduced; (3) the process of pattern matching and sample training may be very time-consuming and largely determined the quality of the entire detection framework.

The concept of cosaliency first appeared in 2010. For a pair of similar images, Jacobs et al. [15] used machine learning to fuse multiple features and finally found the common salient objects in the two images, called cosaliency. To a certain extent, it provides better information for image classification. By 2013, Fu et al. [16] proposed cluster-based cosaliency detection, using bottom-up clues to express saliency, at the same time, through the comparison between images and images to determine whether the target is significant, to a certain extent of data processing speed. In 2014, multiple saliency maps generated using different single-image saliency models are combined based on the rank-one constraint to obtain cosaliency maps in [17]. In 2015, Ye et al. [18] combined the color and SIFT features of a group of images to form overall similarity, found the common salient area of each image as a sample, and obtained the sample saliency map using the saliency model of a single image. The local and global restoration of the salient area is combined, and the boundary connectivity and the focus of the area of interest are used to generate a common salient image set. In 2019, Zhang et al. [19] proposed a hierarchical image cosaliency detection framework as a coarse to fine strategy to capture this pattern and used a mask-guided fully convolutional network structure to generate the initial cosaliency detection result. Tsai et al. [20] proposed stacked autoencoder-enabled fusion (SAEF) to carry out the saliency proposal fusion via jointly exploring the image-level confidence based on the reconstruction error of stacked autoencoder (SAE) and the region-level confidence from cosalient object likelihood, and self-trained convolutional neural networks (STCNNs) can gradually learn cosalient objects in a self-taught fashion.

In summary, there are two main methods of airport target detection for remote sensing images. One is unsupervised based on airport feature modeling, and the other is a method of introducing a supervised learning mechanism into target detection. By detecting the runway area, the airport target is obtained. The first method focuses on the geometric characteristics of the airport, and the main basis method is line detection. The second method mainly uses the saliency clues of the image to replace the airport features. Generally, the second method has a better recognition rate, but the image needs to be manually labeled, resulting in lower repetitiveness of the model.

Our main contributions are summarized as follows:(1)This paper improves a straight-line detection algorithm LSD. According to the characteristics of the airport runway, we have improved the existing straight-line segmentation algorithm. The improvement is mainly based on the length, width, and parallelism of the runway. The line segment is screened in three directions, and the shortness and the difference from the airport runway are screened out. Too many line segments reserve the suspected airport area. The improved line detection algorithm is more conducive to the location of airport targets than the original LSD algorithm.(2)This paper builds a cosaliency detection model. We use VGG-19 to extract the deep features of the image, merge the saliency within the image and the similarity between the images, and call it cosaliency. The model proposed in this paper is easy to reproduce, and the image does not need to be manually marked, which saves time and cost to a certain extent.

2.1. Visual Saliency Detection

Visual saliency detection refers to the extraction of salient areas in an image by simulating human visual characteristics through intelligent algorithms, that is, the area that humans are most interested in when seeing an image, and the specific process is shown in Figure 1.

The development of image saliency detection is mainly divided into three stages. The first stage is as follows: Itti et al. proposed a saliency-based visual attention model in 1998 and further improved the model theory. Itti’s saliency model has become the standard of the bottom-up visual attention model. The second stage is as follows: Liu et al. and Achanta et al. defined saliency detection as a binary segmentation problem. Since then, a large number of saliency detection models have appeared. The third stage is as follows: the convolutional neural network (CNN) [21, 22]. The model based on the convolutional neural network usually contains more than thousands of adjustable parameters and neurons with variable receptive field size [23, 24]. Neurons have a larger acceptance range to provide global information, which can help better identify the most prominent areas in the image. CNN can achieve unprecedented performance, which has gradually become the mainstream direction of salient object detection. Cornia et al. [25] combined convolutional neural networks and stacked autoencoders to predict eye attention points. In addition, there are methods based on frequency domain analysis [26, 27] and graph theory [28, 29].

2.2. Cosaliency Detection

Cosaliency detection is a weakly supervised extension of saliency detection. To a pair of images, Chen [30] proposed a progressive algorithm based on the distribution of sparse representations to enhance the response before the attention shift. Li and Ngan [31] combined conventional saliency detection methods to calculate the single-image saliency and then applied a comultilayer graph model to obtain the multi-image saliency.

Li et al. [32] linearly fused the saliency maps between images and the saliency maps within images and extracted the common significant objects from a set of images. In order to further improve the performance, the fusion-based method combined several saliency models to exclude a single prediction bias while retaining shared information. For this reason, such methods use fixed weight fusion, adaptive weight fusion [17], or regional adaptive fusion [3336] to integrate the saliency suggestions generated by different models. The method based on deep learning [3739] is very effective in extracting semantic object information in complex scenes and greatly enhances cosaliency detection. However, these methods work in a supervised manner and require pretrained deep models or labeled training data. In addition, the supervised settings reduce their versatility for handling objects of invisible categories.

2.3. Active Contour Model

Contour refers to a curve that connects points with the same gray value or color. Extracting a contour is to extract these curves with the same color or gray value. Generally, a contour search is performed in a binary graph; that is, a white object is found on a black background. In 1985, Suzuki and Be [40] mainly introduced the surrounding relationship algorithm to determine the boundary of the binary image.

Generally speaking, active contour models include edge-based and region-based, and they are mainly used in medical images, such as cancer and tumors. Edge-based methods such as Sobel operator, Canny operator, and Laplacian operator. Jing et al. [41] proposed an edge detection technology based on local minimization, and Li et al. [42] proposed an edge detection technology based on deep learning, which is mainly for cancer images.

The region-based active contour model such as C-V is a typical regional active contour model, and many subsequent active contour models are improved based on it. It uses the pixel gray information of the image as energy, cleverly constructs the energy function, and then obtains the minimum value of the energy function to finally segment the target. After that, many people improved it and proposed an active contour model based on local area [43, 44].

2.4. Airport Detection

The development of deep learning has led the trend of industry and academia and also provided a new idea for airport target detection in remote sensing images. According to the characteristics of airport runways, Zhang et al. [45] and others used the transfer learning ability of convolution neural networks to identify airport runways and then used the transfer learning ability of convolution neural networks to directly identify airports. Xiao et al. [11] used convolution neural networks to extract multiscale deep fusion features of images and then used support vector machines for classification. Although airports across the country are different, most of them are similar. The above methods make full use of the powerful classification and recognition capabilities of convolution neural networks. Each image is analyzed separately, which increases the amount of network calculations to a certain extent. Therefore, this paper proposed cosaliency detection method and applied to remote sensing image airport target detection, which reduces the network running time to a certain extent and improves the accuracy of airport detection.

3. Model Building

In the field of computer vision, VGG-19 is mainly used for image recognition and classification, image feature extraction, etc. It can also be used as a pretrained model to apply to its tasks. This paper mainly uses the 16 convolution layers and 5 pooling layers of the VGG-19 network to preprocess remote sensing images and uses them to extract the depth features of the image. We call the feature map extracted by VGG-19 as a mask and let it as the input of the fourth and sixth layers of the cosaliency contour extraction module. The feature map fusion is used to generate a cosaliency prediction map. Finally, the saliency map of simple linear iterative clustering (SLIC), PLSD, and cosaliency prediction is combined with contour extraction, the cosaliency prediction map is refined, and finally, the remote sensing image airport target extraction based on cosaliency detection is completed. The flowchart is shown in Figure 2.

Algorithm steps:(i)Use SLIC for superpixel segmentation.(ii)Use PLSD algorithm to segment the image.(iii)Take 9000 training set images and their corresponding truth maps as the input of the VGG-19 network. After 16 layers of convolution and 5 layers of pooling, the final output image size is 7 × 7 × 512 feature map mask.(iv)Design a cosaliency detection module to compare and fuse the clues in a single image and between multiple images to obtain a predicted cosaliency map.(v)Design a cosaliency contour extraction module to fuse the results of superpixel segmentation, line segmentation, cosaliency detection prediction map, and contour extraction method to obtain the airport target map of the remote sensing image.(vi)Optimize the entire network.

3.1. Superpixel Segmentation Model

In computer vision, image segmentation is a very important and basic research direction. Common terms in image segmentation include superpixel, semantic segmentation, instance segmentation, and panoramic segmentation. The concept of superpixel was first proposed by Ren [46] and others in 2003. One of the more commonly used methods in superpixel is SLIC (simple linear iterative clustering), which is a method proposed by Achanta et al. [47] in 2010. The algorithm is simple in thinking and easy to implement. The color image is converted into a 5-dimensional feature vector in CIELAB color space and XY coordinates, and then a distance metric is constructed for the 5-dimensional feature vector. The process of local clustering of image pixels is determined by k-means algorithm evolved and has been widely used. By comparing with a variety of segmentation algorithms, the SLIC algorithm has a high comprehensive evaluation in terms of computing speed, object contour preservation, etc. SLIC can generate compact and approximately uniform superpixels, which is more in line with the expected segmentation effect, so this article uses the SLIC algorithm to preprocess the image.

As shown in Figure 3, the colored dots in Figure 3(a) represent different clustering centers. Each clustering center is called a superpixel, which is represented by . Assuming an image has pixels, the size of each pixel can be expressed as . The distance between superpixels is .

SLIC algorithm steps are as follows:(i)Initialize the seed point. Distribute the centers of K superpixels to the pixels of the image.(ii)Fine-tune the position of the seed. Within the range of s centered on K, move the center of the superpixel to the point with the smallest gradient among these 9 points, so as to avoid the superpixel point falling on the contour boundary with a larger gradient, so as not to affect the subsequent clustering effect.(iii)Initialize the data. Take an array label to store which superpixel each pixel belongs to. The disarray holds the distance from the pixel to the center of the superpixel to which it belongs.(iv)For each superpixel center x, it is a point within 2S: if the distance (5 dimensions) from the point to the superpixel center x is less than the distance from this point to the superpixel center to which it originally belongs, then this point is a superpixel. Update label for the pixel x.(v)For each superpixel center, recalculate its position.(vi)Iterative optimization, repeat steps (iv) and (v) until the error converges. Generally, the number of iterations is 10.(vii)Enhance connectivity. Create a new label table, the elements in the table are −1, follow the Z-shaped trend from left to right, from top to bottom, redistribute discontinuous or too small superpixels to adjacent super pixels, and traverse the pixels after points are assigned to the corresponding labels until all the pixels are traversed.

It can be seen from Figure 4 that the target regions after superpixel segmentation are roughly in the same cluster, which makes a good preparation for the subsequent contour extraction part.

3.2. Straight Line Segmentation Module

We all know that an airport is composed of a flight area, a ground transportation area, and a terminal area. From Figure 5, we can clearly see that the size of the airport in each place is different, and the architectural style is also very different. However, the similarity of the airport is also obvious; that is, the airport runway is composed of parallel long straight lines.

There are three main algorithms for detecting line segments: line segment detector (LSD), edge drawing line detector (EDLD), and Hough line detector (HLD). After comparison, the line segment detection algorithm of LSD has the best effect. Therefore, we improve the LSD algorithm proposed in [48], mainly by adding the distance limit of parallel straight lines, which makes it easier to detect the airport runway and assist in locating the location of the airport in the remote sensing image.

The detection goal of LSD is to detect local straight contours in the image. The improved PLSD in this paper mainly detects the contours of parallel lines. The design feature of the airport runway is that the length and width are basically the same, and the distance between the two line segments is mostly fixed and consistent and can be well distinguished from street roads. The gradient is calculated as follows:

The horizontal angle is calculated as follows:

If the horizontal angles in the above formula are the same, it proves that the two line segments are parallel. At this time, use the following formula to calculate the length of the two line segments:

LSD is a very effective line segment detection model. The runway feature extraction layer uses the improved LSD algorithm to extract straight lines and generate line density maps. For the superpixel area , define the weight of line segment:

In formula 5, is the number of pixels in the parallel line segment detected by PLSD, and is the number of all pixels in the superpixel area. The items are used to measure the quality of the line segment and consist of two parts. The first part is the length of the line segment, and the second part is the distance between parallel line segments. The specific definition is as follows:where , .

Based on the line density map generated by PLSD, the paper uses Gaussian form to formulate LKS, so high-density areas can be highlighted:where , according to the survey, the actual runway width is generally 60 meters, and according to the satellite image and the actual ratio of 1 : 1000, therefore, if the length of the runway is equal and the distance is greater than 0.05 meters, it is considered an airport runway.

Compared with the LSD algorithm proposed in [49], the improved method PLSD in this paper reduces the false alarm rate and improves the accuracy of target detection. An example is shown in Figure 6.

It can be seen from Figure 6 that compared with the traditional LSD algorithm, the improved article contains fewer small line segments, and the quality of the detected straight line is relatively high. The length and parallel limit of the line segments effectively reduces the other target’s interference with the airport runway.

3.3. Cosaliency Detection Module

Given N remote sensing images as the input of the network, where , each single input remote sensing image I has the salient object of the airport. Our goal is to detect all salient objects in the data set A.

This article first designs two networks and , and both of which use the structure of the full convolutional neural network (FCN) model [50]. The structure of the convolutional layer and the pooling layer is the same as that of VGG-19. The network is used to extract the saliency clues of each image, and the network is used to extract the saliency features between images in a set of images. Secondly, the two detection results are multiplied and fused by pixels and finally obtained by deconvolution and input the predicted cosaliency map of the same size is marked as mask. The schematic diagram is shown in Figure 7, and the network layer design is shown in Table 1.

The feature map after the fusion of the two networks is called a cosaliency map, denoted as , , where represents width, h represents height, and c represents channel.

The objective function proposed in this paper iswhere is the saliency detection of and is the common saliency detection of a group of images, represents the parameters of the model, , , and are all learnable parameters of the network.

We learn two FCN models, denoted as and , respectively. is used to generate the saliency map of a single image, and is used to extract the common saliency between the entire group of images and generate the saliency map . The final saliency map is calculated by the following formula:

Term : through training the FCN network, the saliency map of image is generated. At the same time, this paper uses the saliency detection algorithm proposed in [51] to detect the saliency of and obtain the feature map , which is used as the output reference target of the network:where i means pixel, means saliency score, that is , corresponding to pixel i, means value corresponding to pixel i, means importance score of pixel i, and if pixel i belongs to the salient area, then let , where is the salient pixel. According to the ratio of pixels to the entire image, the image F is divided into two parts. The image is divided into two parts: the foreground and the background, namely, the salient area and the insignificant area. The distinguishing criterion is the average value of . If the pixel i is greater than the average value of , it is significant; otherwise, it is not significant.

Term : the purpose of item is to find the common part between a group of images, which is also the goal of learning network . This also requires that the images input by the network have common objects, so that we can extract the similar parts between the images. For remote sensing images, the background is complex and there are many targets, and it is quite difficult to extract the objects of interest from them. We will have the airport target, but the remote sensing images in different backgrounds are used as a set of input, for example, mountain, city, seaside, and other backgrounds. In this way, it is relatively easier for us to extract the target. The airport area should look similar in a set of images, but it is not the same as the surrounding areas detected. Therefore, the similarity of the salient areas in each image is considered in the design of the network, which is the main factor considered by the network.

Image is generated through network to generate a total saliency map . The last activation layer of the network uses the sigmoid function, and the value of the total saliency map is between 0 and 1, which is . In this part, image will be divided into two parts, the foreground and the background, which can also be called the cosaliency area and the remaining area of the image, denoted as and :

As a measure of the similarity between images, this paper uses feature extractor to calculate the features of an image and at the same time applies feature extractor to all mask images to obtain feature . Based on the obtained features, can be defined aswhere is the estimated score of the common significant area between the detected images, which is composed of the distance between the foreground and the background between any two images and the distance between the foreground and the background in a single image. After the learning of the two networks and , the cosaliency map of image is produced by .

In remote sensing images, the target we pay attention to often occupies a small part of the entire image; that is, the proportions of salient and nonsalient areas are very unbalanced. Therefore, we need asymmetric weighted loss, which can improve the network performance [52].

According to the cosaliency maps generated by the two networks and , the loss function is defined as follows:where , it is the proportion of salient pixels in the truth map, and and indicate the number of training sets and the number of pixels in each training image, respectively. represents the cosaliency map generated by the network in this paper, and represents the truth map.

3.4. Cosaliency Contour Extraction Module

Contour detection uses certain techniques and methods to extract the contour of the target while ignoring the influence of the background and the internal texture of the target and noise interference. The contour extraction model is mainly for grayscale images, and the application direction is medical images. The general idea is to start from a certain point, search the surrounding area pixel by pixel, and determine the pixels inside and outside the area until the initial starting point is found. Influenced by the method of chan and Vese [53], this paper proposes an adaptive contour extraction model based on cosaliency detection, referred to as CSACEM, as shown in Figure 8.

In remote sensing images, the direct use of the model will greatly reduce the accuracy of contour recognition. Therefore, this article first uses VGG-19 to extract features from the input image and combines the results of cosaliency detection, PLSD, and SLIC segmentation to select the target frame, make a mask, which can ignore other image features, reduce the complexity of the image to a certain extent, increase the running speed, and fix the initial contour in the target area at the same time, avoiding the possibility of slow curve movement and false alarms.

3.4.1. Target Positioning Mask under Cosaliency

In order to refine the edge part of the cosaliency map and improve the problem of the image being too smooth, this paper takes into account the feature information obtained by superpixel segmentation and straight-line segmentation. Divide an image into K superpixel areas, represents the K-th superpixel area, and the line segment in the K-th superpixel area is marked as . The pixels in the superpixel are often the most similar parts in the adjacent area. Therefore, they generally belong to the salient object or background at the same time. We divide the superpixels into three groups according to the given cosaliency map.

In equation (11), , contains superpixels that belong to the salient target, contains superpixels that belong to the background, and contains the remaining superpixels; that is, it is not clear whether it is the foreground or the background, so it is not considered for the time being. represents the average significant value of the superpixel, and represent the average and standard deviation of the significant value. The cosaliency map is optimized for training by the following formula:

In equation (12), is generated by the trained network , and are the weights that balance the foreground and the background, recorded as constants, is equal to the ratio of the background area to the foreground plus the background, and is the opposite.

Therefore, our objective function is

, , and are all learnable parameters of the network. After optimizing the objective function, the cosaliency map can be expressed as follows: , and the corresponding parts of the three networks are multiplied pixel by pixel.

3.4.2. Contour Extraction

For input image , corresponding to two-dimensional gray image , based on the LRCV model [43], level set function and CSACEM can be expressed as

According to the LRCV model, and are variables, not constants. They take into account the local characteristics of the image and are more sensitive to the initial contour. In the method of this article, the mask gives a good airport object, which makes the initial contour have an accurate starting point. Use to represent contour C. If the pixel is inside the contour C, ; if the pixel is outside the contour C, ; if the pixel is on the contour C, . represents the length of the contour. The calculation method of and in equation (14) is defined as

The in equation (16) represents the Gaussian kernel function, and Heaviside function and Dirac function are also used. Solve the minimization problem of functions by Euler–Lagrange equation, and get the level set function evolution equation:with the cosaliency mask as the reference, our contour is closer to the airport target, and the fragmented area in the image set is successfully removed.

Optimization: in order to reduce memory loss and improve the training speed of the model, this article proposes a method to optimize as follows: put all images into the network cycle 30 times, learn, and train network , , and through the functions of equations (7), (9), and (12). Then, the result graph of cosaliency detection will be very stable.

4. Experiment

The network in this article is deployed under the NVIDIA GeForce GTX Titan 1080 GPU configuration. The hyperparameter learning rate is set to start as , the momentum is 0.9, and the weight attenuation is set to 0.005. In order to verify the detection effect of the model in this article, we use the 323 remote sensing images collected from Google Earth, the resolution of the image is 896 × 896, and each image has an airport target.

The data set used in this paper comes from Google Earth (Data SIO, NOAA, U.S. Navy, NGA, GEBCO, Landsat/Copernicus, IBCAO, and U.S. Geological Survey). We collected 323 remote sensing images including airport. Due to the small data set of remote sensing images, this paper performs data enhancement processing. The enhancement methods used are as follows: random rotation, blurring, illumination, and noise. In this way, we can get 10,000 images with a size of 896 × 896, of which 9,000 are used as the training set, and the remaining 1,000 are used as the test set.

4.1. Experimental Results

Shadow removal has important research significance. Researchers at home and abroad have been extensively studying this issue, especially in the past decade, but a database specifically constructed for this problem is very rare. Currently, the most widely used databases are presented by [17, 20], but they only contain 76 and 37 pairs of shadow/shadow-free image pairs. In order to better evaluate the shadow removal algorithm proposed in this paper, we constructed a new RSDB database, which contained 2685 shadow and shadow-free image pairs. The comparison with other shadow removal databases in recent years is shown in Table 2.

Figure 9 shows the contour extraction based on the cosaliency map and detection results obtained by the method proposed in this paper. The experimental results show that the algorithm in this paper can accurately locate the location of airport target and has strong applicability. It has achieved detection results for different complex scenarios and different types of airports. For interference items that affect airport target detection such as urban areas, roads, and coastlines, our method uses superpixel segmentation and cosaliency map information fusion to effectively eliminate their interference. For small airports, the structure is simple and the farmland in the background. If there are a large number of parallel lines with the road, they bring great interference to the detection.

This paper eliminates airport target through local contrast. Compared with small airports, large international airports have more runways and larger areas, the linear characteristics and significance of the runways are more prominent, so the detection accuracy is higher than that of small airports. In addition, there are certain false alarms and missed detection in the results. The main reason is that some airports have simple structures and weak straight-line characteristics of runways. Effective positioning and detection cannot be performed based on runways. In addition, some urban areas have high similarities with airports, and their interference cannot be completely eliminated, so there are certain errors.

4.2. Subjective Comparison

Compared with the airport detection algorithm proposed by Liu et al. [54], the cosaliency detection method proposed in this paper is more effective. The visualization results are shown in Figure 10. Compared with the baseline model without edge constraints, in this paper, the feature information fused plays a great role in target detection. Line segment detection and superpixel segmentation can improve the details of the target edge to a certain extent. In contrast, adding a contour extraction module incorporating superpixel segmentation after the original cosaliency detection method can more accurately locate and extract the complete target, which proves that the edge described in our method is more accurate.

Although liu’s method can frame the area where the airport is located, the range that can be framed is too large to accurately locate the exact location of the airport, but the method in this article can accurately frame the area where the airport target is located. Frame selection is better than the method of Liu et al. in both aspects.

4.3. Objective Comparison

We use five indicators to evaluate the detection accuracy. They are as follows: precision, recall, mAP, IoU, and MAE.

The IoU of the predicted significant target and the real target is greater than 0.5 and the confidence is greater than 0.8 for evaluation. The IoU threshold used in this paper is 0.5, and the confidence threshold is 0.8.

4.3.1. Calculation Formula of Precision

where indicates a real case, indicates a false positive case, and indicates a detected target. The accuracy reflects the proportion of the detection result that is correct.

4.3.2. Recall Rate Formula

In equation (19), represents a false negative example, represents the target actually contained in the image, and the recall rate reflects the proportion of the target that the detection algorithm can detect.

4.3.3. Average Detection Accuracy

Count the detection results of the test set, where the predicted target and the real target’s IoU is greater than 0.5 and the confidence is greater than 0.8 for counting, thereby obtaining mAP:

In equation (20), and , respectively, represent the union area and the intersection area of the detected target and the plane space of the true value. The IOU indicator reflects the accuracy of the positioning of the detection frame.

4.3.4. Mean Absolute Error

If the extracted area contains more than 50% of the absolute true value, we consider it a successful detection method. MAE is defined as the average pixel difference between the measurement result and the real situation:where represents the width of the image, represents the height of the image, represents the pixel value of the prediction target, and is the true value corresponding to the image.

We also compare the model with four target detection related methods: Zhao et al. used a regression-based saliency model to detect airports [55], Redmon et al. [56] applied the YOLOv3 algorithm and a saliency-oriented active contour method [57], and the idea of multidimensional feature fusion is applied for target detection [58]. Table 2 shows the test data. As shown in Table 2, the algorithm proposed in this paper is slightly slower than the regression-based method and the YOLOv3 algorithm. However, the results of MAE show that our results are closest to the true value map compared to the four models. The method in this paper is better than the significance model detection the correct ratio is increased by 15%, which is 13% higher than that of the YOLOv3 algorithm and 1% higher than the method of contour extraction to detect airports. And the proportion of airport targets detected is also the largest. Overall, the accuracy of detection is higher.

4.4. Ablation Experiment

In order to prove the effectiveness of the various components in the method proposed in this paper, ablation experiments are carried out. The model in this paper is mainly composed of two parts. The first part is to use the PLSD algorithm to detect line segments in remote sensing images. This algorithm can remove particularly short line segments in remote sensing images, and it can also perform pixel completion on the suspected airport runway. To a large extent, make airport detection more accurate. The second part is the constructed cosaliency detection model, which is to fuse the features of the saliency in the image and the consistency between the images to extract the common saliency target in the remote sensing image.

In Figure 11, it can be clearly seen from the figure that when there is a large difference between the target and the background, the effect is still considerable when there is only a cosaliency detection model. However, for an image with a complicated background in the third row, without the PLSD algorithm, the target location cannot be accurately located. Therefore, the existence of the PLSD algorithm is of great significance.

In Figure 12, it can be clearly seen from the figure that when only the PLSD algorithm is used, the complete airport target cannot be detected. It must be used together with the cosaliency detection to achieve the purpose of accurately positioning the airport target.

Therefore, every component of the model proposed in this article is necessary.

5. Conclusions

This article focuses on airport target detection in remote sensing images. A contour extraction model based on cosaliency detection is proposed to estimate the place of airport. Our model multiplies the saliency maps, intraimage and interimage pixel by pixel to obtain a cosaliency mask. The mask, superpixel segmentation, and PLSD detection play an auxiliary role to refine the airport contour, so that we can get a more accurate airport contour. At the same time, we compared the model with four commonly used methods. The experimental results show that the model can not only extract contours which highly consistent with the actual airport target, but also achieve high detection accuracy and efficiency.


CNN:Convolutional neural network
CSACEM:Cosaliency active contour extraction model
EDLD:Edge drawing line detector
FCN:Fully convolutional network
HLD:Hough line detector
LRCV:Local region-based Chan–Vese model
LSD:Linear segmentation detector
PLSD:Parallel linear segmentation detector
SAE:Stacked autoencoder
SAEF:Stacked autoencoder-enabled fusion
SIFT:Scale invariant feature transform
SLIC:Simple linear iterative clustering
SVM:Support vector machine
VGG-19:Visual Geometry Group-19.

Data Availability

The data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 12 months after publication of this article, will be considered by the corresponding author.


The funding sponsors had no role in the design of the study; in the collection of image; in the construction, training, and optimization of the model; in the writing of the manuscript, and in the decision to publish the results.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Z.H., Z.Z.B., and J.J.L. conceived and designed the review article and analyzed the literature and image data. Z.Z.B. drafted the manuscript, and Z.H. and J.J.L. performed critical review of the manuscript. J.J.L. contributed to the idea of building model. Z.Z.B. performed experimental verification. All the authors have agreed to have the paper submitted for publication and have contributed substantially to the work reported.


The authors acknowledge the support of the National Natural Science Foundation of China (grant nos. 61772319, 62002200, 61976125, 61976124, and 61773244) and Shandong Natural Science Foundation of China (grant no. ZR2017MF049).