Abstract

Edge detection is a boundary-based segmentation method to extract important information from an image, and it is a research hotspot in the fields of computer vision and image analysis. Especially feature extraction is also the basis of image segmentation, target detection, and recognition. In recent years, in order to solve the problems of edge detection refinement and low detection accuracy, the industry has proposed multiscale fusion wavelet edge, spectral clustering, network reconstruction, and other edge detection algorithms based on deep learning. In order to enable researchers to understand the current research status of edge detection, this paper first introduces the classic algorithm of traditional edge detection, compare with advantages and disadvantages of different edge detection algorithms. Then, it summarizes the main edge detection methods based on deep learning in recent years and classifies and compares them according to the implementation technology. Finally, it shows the development direction of edge detection algorithm research.

1. Introduction

The image edge refers to the step change in the gray level of surrounding pixels in the image, which is the most basic feature of the image [1] and often carries the most important information in the image. Based on boundaries, edge detection is a segmentation method which plays an important role in computer vision, image analysis, and other applications and provides a valuable feature parameter for people to describe or recognize targets and interpret images. How to quickly and accurately extract the edge information of an image has always been a research hotspot at home and abroad. At present, edge detection is a problem in image processing. Many existing researches studies have shown that edge detection is of great significance in many fields such as image high-level feature extraction, feature description, target recognition, and image segmentation. How to locate and extract image edge feature information quickly and accurately has become one of the research hotspots [2].

Aiming at the two main problems such as positioning and extraction, researchers have put forward a variety of edge detection methods through continuous experiments. According to the time sequence of research progress, the methods can be divided into two categories: classic traditional algorithms and algorithms based on deep learning. Figure 1 lists some classic traditional algorithms (above the arrow) and algorithms based on deep learning (below the arrow) in the research of image edge detection. In recent years, with the continuous rise of neural networks and image fusion technology, some people have also proposed an edge detection algorithm for image fusion [3].

Because the edge of an image contains a lot of background information and important structural information, traditional edge detection methods often use hand-made low-level features (such as color, brightness, texture, and gradient) as the priority of edge detection, for example, (1) traditional edge detection algorithms, such as the gradient operator Roberts [4], Prewitt [5], Sobel [6], and the widely used Canny operator [7], (2) feature generation methods based on information theory artificial design, such as gPbowl-ucm algorithm [8] and SCG (Sparse Code Gradients) algorithm [9], and (3) structured edge detection algorithm such as SE (structured forests edge detection) algorithm [10]. Although the edge detection methods using low-level features have made great progress, its limitations are also obvious. With the development of deep learning technology [11], especially the emergence of convolutional neural networks (CNN), which has the advantages of having a strong ability to automatically learn high-level representations of natural images, using CNN for edge detection has become a new trend. In 2015, Xie and Tu [12] proposed holistically nested edge detection (HED) to detect and extract the edges of natural images in a nested manner; in 2015, Ou et al. [13] proposed to apply full convolution to semantic segmentation, which lays the foundation for full convolution application in contour detection; in 2016, Ma et al. [4] proposed an end-to-end convolution architecture DeepEdge; in 2016, Lu et al. [14] proposed a depth condition random domain stereo matching method based on convolutional neural network; in 2017, Zhang et al. [15] proposed the edge detection technology of multiscale moving targets; in 2017, Zhong et al. [16] proposed the use of VLAD (vector of locally aggregated descriptor) and descriptors based on deep learning for efficient retrieval of interest regions; in 2018, Liu et al. [17] proposed a richer convolution feature image edge detection extraction algorithm based on five-layer feature diversity; in 2018, Han et al. [18] proposed an end-to-end edge-preserving neural network (called regional network) based on the fast R-CNN (Region-CNN) framework for prominent target detection; in 2019, Zhao et al. [19] proposed a subdivided network for prominent target detection; and in 2022, Zheng proposed anisotropic multiscale edge detection algorithm.

The above algorithms require considerable professional knowledge, sophisticated processing algorithms, and network architecture design to convert the original image data into appropriate feature vectors to construct edge detection models and classifiers.

2. Overview of Classic Algorithms for Edge Detection

2.1. First-Order and Second-Order Edge Detection Operators

First-order and second-order edge detection operators mainly identify and locate the mutation position in the image, which is usually between target and background, target and target, region and region, primitive and primitive. According to the different principles, it can be roughly divided into two categories: the first-order edge detection operator is to find the derivative of the image gray change curve, which can highlight the edge of the object in the image, and the second-order edge detection operator is to find the derivative of the image gray change derivative and is very sensitive to the strong changes of the gray in the image, which can highlight the texture structure of the image (Table 1).

Among them, the Canny algorithm is relatively typical and suitable for various edge detection fields. The method is not susceptible to noise interference and is able to detect true weak edges. The advantage is that the strong and weak edges are detected separately using two different thresholds and that the weak edges are included in the output image only when the weak and strong edges are connected.

2.2. Traditional Edge Detection

Edge detection is a part of the process called image segmentation, and the main purpose of image segmentation is to identify areas in the image. Technically speaking, edge detection is the process of locating edge pixels, while edge enhancement is the process of increasing the contrast between the edge and the background so that the edge can be seen more clearly. In addition, edge tracing is a process of tracing along the edge, which usually collects edge pixels into a list. There are many possible definitions of edge, and each definition is applicable to certain specific situations. One of the most commonly used and generally defined one is the ideal step edge.

Edge detection describes the process of grayscale changes in an image based on the physical process that causes the grayscale changes of the image. The physical process that causes the grayscale discontinuity of the image may be geometric (deep discontinuities, surface orientation, and color and texture differences), or optical (surface reflections, shadows generated by nontarget objects, and internal reflections, etc.). The mixing of these features will make subsequent extraction very difficult, and the image data is often contaminated by noise in actual situations. The ill-conditioned problem of signal numerical differentiation: a small change in the input signal will cause a large change in the output signal. Noise elimination and edge location are two contradictory parts, which are the two difficulties in edge detection.

2.3. Structured Edge Detection Algorithm

In 2017, Wang et al. [25] proposed an image crack detection method based on multiscale down-sampling normalized cuts in view of the low accuracy of multiscale normalized cuts in edge detection and the long time required to solve feature vectors. The method first uses the semireconstruction characteristics of the antisymmetric biorthogonal wavelet transform to extract edge features at multiple scales of the image to be tested; secondly, it combines the strength and location characteristics of each scale to construct a multiscale similarity matrix and a multiscale normalized similarity matrix, then the multiscale similarity matrix is down-sampled and the spectral segmentation method is used to achieve the down-sampling eigenvector solution; finally, the multiscale normalized similarity matrix is used to up-sample the down-sampled eigenvectors and discretized to obtain the final result. The experimental results of the proposed method and the multiscale normalization cut on a single target image of three datasets show that it not only improves the detection accuracy but also reduces the computing time. In view of the problem that existing deep learning methods cannot guarantee the effective transmission and fusion of crack feature mapping under complex backgrounds, and structured forests cannot accurately distinguish similar and random crack features, Wang et al. [26] proposed a crack segmentation method for structure based on full convolutional neural network and structured forest in 2020 [27]. First, based on the full convolutional neural network framework, five ablation neural networks are constructed to expand the global characteristics of microcracks; secondly, a crack segmentation parameter competition strategy based on multiscale structured forest is proposed to effectively improve the ability to distinguish microcracks; finally, a coupled segmentation method based on ablation neural network and structured forest is used for joint prediction of crack images. The experimental verification of the proposed method on the two types of structural crack datasets shows that the proposed method can improve the accuracy of crack detection under complex and similar backgrounds and can realize effective structural health monitoring.

3. Overview of Edge Detection Algorithms for Deep Learning

With the rise of artificial intelligence in the world, deep learning has also become a research hotspot as one of the realization means. Traditional edge detection technology has made great progress, but there are also many limitations [28]. As the edge extraction scene becomes more and more complex, the texture or background of the image has a great impact on these classic edge detection methods that use low-level edge features. The biggest difference between deep learning and traditional edge detection methods is that the features it use are automatically learned from big data instead of manual design. The deep learning model has powerful learning capabilities and efficient feature expression capabilities. A more important advantage is that it extracts information layer by layer from pixel-level raw data to abstract semantic concepts, which makes it outstanding in extracting image global features and contextual information and brings new ideas for solving traditional computer vision problems (such as image recognition and image edge detection). Many scholars at home and abroad have tried to add deep learning high-level semantics to the field of image edge detection to deal with complex scenes. The classification and typical algorithms of edge detection algorithms based on deep learning are shown in Figure 2, which will be introduced separately below. Deep learning edge detection algorithms are also divided into fully supervised learning edge detection algorithms and weakly supervised learning edge detection algorithms.

3.1. Fully Supervised Learning Edge Detection Algorithm

Fully supervised learning is to use samples of known categories (that is, labeled samples), adjust the parameters of the classifier, and train to obtain an optimal model to achieve the required performance. Then use this trained model to map all inputs to corresponding outputs and judge the outputs simply to achieve the purpose of edge detection. At present, most edge detection algorithms are implemented based on full supervision. According to the overall design ideas and key technologies used in the algorithm implementation process, this paper divides them into five categories: edge detection algorithms based on spectral clustering, multiscale fusion edge detection algorithm, network reconstruction edge detection algorithm, codec-based edge detection algorithm, and subpixel convolution edge detection algorithm. Methods based on spectral clustering and subpixel have high detection accuracy, but poor antinoise performance, methods based on neural network, and codec solves the problem of poor antinoise performance, but the detection accuracy is insufficient [29].

3.1.1. Multiscale Fusion Wavelet Edge Detection Algorithm

In order to further improve the edge detection performance of image processing, a fusion-based edge detection algorithm with good noise resistance is proposed [30], combining the highlights of wavelet transform and multiscale morphology [31]. The main idea is to preprocess the image, use multiscale morphological filtering and wavelet transform technology to detect the edge of the image from both the horizontal and vertical directions, and then merge the two based on image fusion to get complete image edges. Through experimental analysis and comparison, it is found that the proposed algorithm has good antinoise performance [32], retains more detailed information, and has a certain timeliness, which make it satisfy the needs of different types of image edge detection.

3.1.2. Edge Detection Algorithm Based on Spectral Clustering

The spectral clustering algorithm is based on the spectrogram theory and clustering by using the eigenvectors of the similarity matrix of the data. The advantage is simple thinking, easy to implement, and capable of identifying non-Gaussian distributions, which enables it to be used in edge detection algorithms [33].

In 2018, Jianmin and Jing [34] found that distinguishing high-frequency noise points and edge points is one of the difficult points in extracting the edges of noise images. In order to obtain clear edges of noise images, an edge detection algorithm based on spectral curvature clustering (SCC) was proposed [35]. This method transforms the edge detection problem into a classification problem and uses the property of image edge points, smooth points, and noise points located in different subspaces. While effectively clustering smooth points and edge points, SCC can suppress noise points. In addition, the algorithm edits the clustering label and converts it into a binary image, on which simple processing is performed to get the edge of the image [36], thus avoiding the threshold selection problem in traditional algorithms successfully. Compared with traditional edge detection methods, the experimental results prove the effectiveness of the proposed algorithm.

3.1.3. Edge Detection Algorithm Based on Codec

The encoder and decoder structure is a mechanism for image semantic parsing by using the symmetric network structure. Its essence is to encode the captured pixel position information and image features by using the encoder constituted by convolutional and pooling operations in deep learning technology. The decoder composed of deconvolution or unspooling operations is used to analyze the image and restore the spatial dimension and pixel location information of the image.

The function of the encoder is to transform an indefinite input sequence into a fixed background variable and encode the sequence information in the background variable [37]. The commonly used encoder is the RNN [38] model, which converts the entire source sequence which is read as a fixed-length code. The decoder is also a RNN model, which decodes the encoded input sequence to output the target sequence. The encoder is used to analyze the input sequence, and the decoder is used to generate the output sequence [39]. The spatial dimension and pixel location information of the image are restored through operations such as convolution and pooling through two cyclic neural networks [40].

3.1.4. Network Reconstruction Edge Detection Algorithm

CNN is a feed forward neural network. Its artificial neurons can respond to the surrounding units of a partial coverage and perform well for large image processing and seek speed and accuracy by reconstructing network architectures such as AlexNet [41], VGGnet, Inception, and ResNet [42].

In 2017, Wang [43] et al. proposed that edge detection is a core step of image processing, and its detection effect directly determines the quality of image processing, but there has been a lack of quantitative standard evaluation methods for image edge detection. They suggested using an algorithm which reconstructs the source image by the edge image, searching the source set in horizontal and vertical directions, reconstructing new pixels by a mixture of linear interpolation and gradual interpolation, and reconstructing the similarity of the image and the source image to evaluate the edge detection effect. Through a variety of experiments, the performance and effect of the reconstruction algorithm are verified. Experimental results show that the algorithm can reconstruct the source image from the edge image quickly and effectively and can evaluate the edge detection effect effectively. Its reconstruction and evaluation results are in line with human visual perception, which means it has good application value for high-level image processing and automatic image processing.

3.1.5. Subpixel Convolution Edge Detection Algorithm

Subpixel is the subdivision of the basic unit of pixel, which is a unit smaller than the pixel, thereby improving the image resolution. Under normal circumstances, subpixel edge points exist in the areas of the image that gradually undergo excessive changes. We can use polynomial fitting and other methods to obtain the subpixel positions of the edge points. Subpixel positioning can be understood as a method of using software algorithms to improve the accuracy of edge detection when the hardware conditions of the camera system remain unchanged, or an image processing technology which can make the resolution less than one pixel.

In 2021, Liu and Zhu [44] adopted a subpixel edge detection algorithm based on improved Zernike moments in order to meet the high-precision positioning requirements for spot edges in beam quality detection. First, perform rough positioning of the light spot by the Sobel operator and then relocate the acquired pixel-level edge by the edge model of Zernike moments. Finally, according to the improved edge judgment conditions, the actual subpixel edge points in the image are determined to complete the subpixel edge extraction of the light spot image. By analyzing the results of subpixel edge extraction of emulational images, it is found that the improved Zernike moment algorithm has a maximum error of 0.338 pixels and a minimum error of 0.088 pixels, and the average running time of the algorithm is 319 ms, which is reduced 35.294% compared to those of the traditional ones.

3.2. Weakly Supervised and Unsupervised Learning Edge Detection Algorithms

The local optimal solution of weakly supervised target detection refers to the phenomenon that the detection results usually only cover a small part of the target object and the target detection task can be achieved through the class label of the sample. Unsupervised learning means that the data learned by the model has no labels, so the goal of unsupervised learning is to reveal the inherent characteristics and laws of the data through the learning of these unlabeled samples, and its representative is clustering.

Aiming at the problems of low adaptability, increased parameters, large calculations, and discontinuous detection edges in multiscale edge detection based on deep learning, Zhang and Ren [45] put forward a multiscale edge detection method based on improved overall nesting in 2020. This method combines multiscale detection and weak supervision model to solve the problem of multiple parameters and large amount of calculation [46]. In order to make full use of the powerful feature expression ability of convolution, a multiscale deep learning structure is proposed based on the overall nested edge detection, i.e., a mutually independent multinetwork multiscale structure composed of multiple networks with different depths and outputs. At the same time, introduce the overall nested weight mixing layer which combines all the weakly supervised prediction results together and learn the blended weights during the training process. The performance of the proposed method is evaluated on the dataset BSDS500 through evaluation indicators. The experimental results show that the proposed method can achieve good performance on the dataset BSDS500.

In 2016, Zhang and Zhao [47] proposed an unsupervised boundary detection algorithm based on orientation contrast for large-scale image collections on the Internet without real demarcated boundaries or with high acquisition costs, and in the calculation of orientation contrast, differences of multiple directions were considered. In particular, this model is particularly suitable for detecting the boundaries of objects surrounded by natural textures. The test results on the Rug standard database show that the proposed algorithm is better than the current best unsupervised boundary detection algorithm, which verifies the effectiveness of the model (Table 2).

4. Overview of Edge Detection Algorithms for Image Fusion

Using multiscale morphology for edge detection can reduce the impact of image noise to a certain extent [58] and can improve the accuracy of edge detection performed by wavelet transform [59], and the image is subjected to wavelet transform to complete the separation of low-frequency information and high-frequency information, realize the multiresolution representation of the image and the fusion of the sequential image multiazimuth angle information, and complete the image preprocessing operation. Combining multiscale morphological filtering and wavelet transform technology to fuse these two methods, the proposed algorithm synthesizes the advantages of the two methods to improve the resolution of target detection and suppress the detection noise of different sensors [60]. The specific method is as follows [61]: (1) Add Gaussian white noise to the original image, and use mathematical morphology to denoise the image; (2) apply a method based on multiscale structure morphology to the image edge detection; (3) use the wavelet transform method to detect the edge of the image by firstly selecting sym4 as the wavelet function to perform wavelet transform on the image, and secondly, through transforming coefficients, obtain the wavelet transform coefficients in the horizontal and vertical directions of the image and the modulus and argument of the dyadic wavelet transform, and then, the edge points of the image are obtained according to the magnitude of the modulus and the argument, and finally, the edge points are connected and selected rationally; and (4) apply image fusion function to the two edge images for image fusion, that is, firstly, use sym4 wavelet to perform 3-layer wavelet decomposition on two images, then take the average of the low-frequency and high-frequency components of the two images for fusion, and then the reconstructed image obtained by the inverse wavelet transform is the final edge image.

5. The Key Technology of Edge Detection

5.1. The Main Model of Edge Detection

The basic idea used in edge detection is to simplify image information and use edge lines to represent the information carried by the image. Edge detection mainly includes the following four steps: filtering, enhancement, detection, and positioning. Fully convolutional neural network is widely used in image detection. It is a new type of deep convolution structure, which can accept images of any format and size and can use one convolution to obtain multiple image area features [62]. As shown in Figure 2, it can be seen that the fully convolutional neural network model includes two parts: a fully convolutional network and a loss layer. The fully convolutional network is used to represent image features. The loss layer is mainly responsible for processing the image with completed feature representation, in which the loss function is used to calculate the loss of image, obtain model parameters, and optimize the fully convolutional neural network model.

CNN has three key operations, namely, local receptive field, weight sharing, and the pooling layer, which effectively reduces the number of network parameters and alleviates the overfitting problem of the model. Typical convolutional neural network architectures (Table 3) include LeNet5 [63], AlexNet [64], VGGNet [65], Google InceptionNet [66], ResNet [67], and DenseNet [68].

The mostly used convolutional neural network structure for edge detection algorithms is VGGNet, which is an improvement made by the Visual Geometry Group of Oxford University on the basis of AlexNet. The entire network uses the same size of 3 × 3 convolution kernel and 2 × 2 maximum pooling. The network results are simple with fewer parameters, and the 3 × 3 convolution kernel can better maintain image features.

5.2. Dataset

At present, the most commonly used edge detection dataset is the Berkeley segmentation dataset (BSDS500), which is the dataset provided by the computer vision group of the University of Berkeley that can be used for image segmentation and object edge detection. The dataset contains 200 training images, 100 verification images, and 200 test images, and all truth values are saved in .mat files, including segmentation and boundaries. Each image corresponds to five truth values, which are the truth values marked by five individuals. When training, the true value can be averaged or used to expand the data. The evaluation code will compare these five true values in turn. Recently, many edge detection algorithms, such as BDCN [69], CRF [70], HED [71], and VCF [72] expand data on the BSDS500 training set and validation set in order to improve the detection accuracy, including rotation, flip, and scale scaling.

In addition to the above-mentioned classic image edge detection algorithms and emerging deep learning image edge detection algorithms, there are also simulated annealing algorithms, ant colony algorithms, genetic algorithms, wavelet transform algorithms, and mathematical morphology algorithms. There are a large number of application research studies in the image edge detection, so it will not be covered again here [73].

6. Conclusions

Edge detection is still a very challenging technical problem because of the following reasons [74]. For weakly supervised and unsupervised edge detection, the training of deep learning-based edge detectors usually relies on a large number of well-annotated images. The annotation process is time-consuming, expensive, and inefficient. In weakly supervised detection technology, only image-level annotations or part of bounding box annotations are used to train the detector, which is of great significance for reducing labor costs and improving detection flexibility [75]. For edge detection of small targets, detecting small targets in large scenes has always been a challenge. Some potential applications of this research direction include the use of remote sensing images to count wild animal populations and to detect the status of some important military targets. For dynamic video edge detection, real-time target tracking edge detection in high-definition video is of great significance for video surveillance and autonomous driving [76]. General edge detection is usually designed for images, while ignoring the correlation between video frames. Using spatio-temporal correlation to improve detection is an important research direction [77].

With the rapid development of graphics processing and automated detection, improved classic edge detection algorithms and deep learning edge detection algorithms will become research hotspots [78]. Especially in complex scenes, algorithms with accurate image edge positioning, short algorithm response time, and strong antinoise ability will be an important research direction in the future [79].

Data Availability

The simulation experiment data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Natural Science Fund Project of Gansu Province (No. 21JR7RA300), the Open Project of Gansu Provincial Research Center for Conservation of Dunhuang Cultural Heritage (No. GDW2021YB15), the Science Foundation of Shanxi Province of China (2021JM-344), and Shanxi Key Laboratory of Intelligent Processing for Big Energy Data (No. IPBED7).