Aiming at the current problem of insufficient extraction of small retinal blood vessels, we propose a retinal blood vessel segmentation algorithm that combines supervised learning and unsupervised learning algorithms. In this study, we use a multiscale matched filter with vessel enhancement capability and a U-Net model with a coding and decoding network structure. Three channels are used to extract vessel features separately, and finally, the segmentation results of the three channels are merged. The algorithm proposed in this paper has been verified and evaluated on the DRIVE, STARE, and CHASE_DB1 datasets. The experimental results show that the proposed algorithm can segment small blood vessels better than most other methods. We conclude that our algorithm has reached 0.8745, 0.8903, and 0.8916 on the three datasets in the sensitivity metric, respectively, which is nearly 0.1 higher than other existing methods.

1. Introduction

The human eyes consist of the following parts: cornea, pupil, iris, vitreous, and retina. Abnormalities in any of these tissue structures may cause vision defects or even blindness. Among them, the study of retinal structure and its blood vessels is significant [1]. The extraction of retinal blood vessels and the characterization of morphological properties, such as diameter, shape, distortion, and bifurcation, can be used to screen, evaluate, and treat different ocular abnormalities [2]. Evaluation of retinal vascular properties, such as changes in width, is used to analyze hypertension, while bifurcation points and tortuosity can help identify cardiovascular disease and diabetic retinopathy [3].

The retinal vessel extraction methods, including pattern recognition, are classified into five core classes [4]. The pattern recognition techniques are generally divided into two categories: supervised learning and unsupervised learning. The supervised learning method needs to use manual segmentation images of ophthalmologists for training. This method requires many training images, and the training time is longer than that of other methods, but this method has an excellent generalized effect and can be applied to other images of the same type. Compared with supervised learning, nonsupervised learning methods, such as matched filtering, mathematical morphology operations, blood vessel tracking, and clustering, do not require corresponding image labels but analyze and process based on the existing data. These two types of methods have been applied and innovated by many researchers in recent years.

1.1. Unsupervised Learning Methods

Literature [5] proposed a new kernel-based technique, viz, Fréchet PDF-based matched filter. The new method performs a better matching between the vessel profile and Fréchet template. Literature [6] improved the extraction method of blood vessels, using a series of morphological operations to extract small blood vessels, and finally fused with the segmented image to supplement the small blood vessels. Compared with other algorithms, it can segment as many tiny blood vessels as possible. However, the steps of the algorithm are too complicated, and although the final segmentation effect obtains the smallest blood vessels, the small blood vessels are in an intermittent state as a whole, and they are not well connected with thicker blood vessels. Literature [7] proposed a new matched filtering method, which applies contrast-limited adaptive histogram equalization and Gaussian second-derivative-based matched filter in preprocessing and uses an entropy-based optimal threshold method performing binarization. This algorithm effectively improves the sensitivity metric of segmentation, but like literature [6], it does not perform well with accuracy. Literature [8] proposed an automatic segmentation method of retinal blood vessels using a matched filter and fuzzy -means clustering. The algorithm uses contrast-limited adaptive histogram equalization to enhance the contrast of the image. After using Gabor and Frangi filters for noise removal and background removal, the fuzzy -means are used to extract the initial vascular network, and the integrated level set method is used to refine segmentation further. The algorithm has good sensitivity and specificity. The problem is that the ability to segment small blood vessels is limited, and many segmentation details are missed. Literature [9] proposed a novel method to extract the retinal blood vessel using local contrast normalization and a second-order detector. The proposed methodology achieves higher accuracy in vessel segmentation than existing techniques. Literature [10] proposed a novel matched filter approach with the Gumbel probability distribution function as its kernel. The reason to achieve the higher accuracy is due to a better matching filter with the Gumbel PDF-based kernel.

1.2. Supervised Learning Methods

Literature [11] proposed a method using deep conventional neural networks and a hysteresis threshold method to detect the vessels accurately. The proposed method gives good performance in which more tiny vessels are detected. Literature [12] proposed a multilevel CNN model applied for automatic blood vessel segmentation in retinal fundus images. A novel max-resizing technique is proposed to improve the generalization of the training procedure for predicting blood vessels from retinal fundus images. Literature [13] proposed a new segment-level loss used with the pixel-wise loss to balance the importance between thick vessels and thin vessels in the training process. Literature [14] proposed a cross-connected convolutional neural network (CcNet) to automatically segment retinal vessel trees. The cross connections between a primary path and a secondary path fuse the multilevel features. This method has relatively advanced performances, including competitive strong robustness and segmentation speed. Literature [15] proposed a method for retinal vessel segmentation using patch-based fully convolutional networks. Literature [16] applied dilated convolutions in a deep neural network to improve the segmentation of retinal blood vessels from fundus images. Literature [17] proposed a new improved algorithm based on the U-Net network model. The algorithm integrates the Inception-Res structure module and the Dense-Inception structure module into the U-Net structure. The algorithm dramatically deepens the depth of the network but does not add additional training parameters. It has good segmentation performance in the image segmentation of retinal blood vessels and has strong generalization ability. Literature [18] proposed a new hybrid algorithm for retinal vessel segmentation on fundus images. The proposed algorithm applies a new directionally sensitive blood vessel enhancement before sending fundus images to U-Net. Literature [19] proposed a supervised method based on a pretrained fully convolutional network through transfer learning. This method simplifies the typical retinal vessel segmentation problem into regional semantic vessel element segmentation tasks. Generally, unsupervised methods are less complex and suffer from relatively lower accuracy than supervised methods [13].

To solve the problem of insufficient segmentation of small blood vessels in most papers, we have devised a new automatic segmentation framework for retinal vessels based on improving U-Net and a multiscale matched filter. The creative points of this paper are summarized as follows:(1)We proposed an improved black hat algorithm to enhance the characteristics of blood vessels and reduce the interference of other tissues(2)An algorithm combining a multiscale matched filter and U-Net neural network is proposed. This paper mainly uses the improved U-Net convolutional neural network combined with a multiscale matched filter to perform multichannel blood vessel segmentation processing on the retinal fundus image(3)We have devised a new loss function to train the improved U-Net neural network to solve pixel imbalance in the image better

The rest of this paper is organized as follows. Section 2 outlines the proposed method and datasets. The performance of the proposed method and the discussion are described in detail in Section 3. A conclusion is drawn in Section 4.

2. Materials and Methods

2.1. System Overview

The proposed algorithm consists of three steps: preprocessing datasets, training U-Net in 3 channels, and postprocessing. This algorithm’s main feature extraction framework is based on the improved U-Net model, using three feature extraction channels. It is mainly to perform a whole feature extraction of the image in channel 1 so that some morphological operations are performed in the preprocessing part to reduce image artifacts and noise. On the remaining two channels, matched filters are used to extract retinal vessels of different scales, and then, the improved U-Net model is used to extract features, and the OR-type operator is used to fuse the final output image. Experimental results verify that the image processed by multichannel matched filtering is better than the unprocessed image. The overall flowchart is shown in Figure 1.

2.2. Datasets

To verify the effectiveness of the algorithm in this paper, this paper chooses three commonly used public datasets for training and testing: DRIVE, STARE, and CHASE_DB1 datasets. These datasets include a wide range of challenging images. The DRIVE contains 40 color retinal fundus images divided into a training set and a testing set. The plane resolution of DRIVE is. The STARE contains 20 color retinal fundus images with a resolution of pixels. Unlike the DRIVE, this dataset does not have a training set and a testing set. The CHASE_DB1 contains 28 color retinal fundus images with a resolution of pixels, and the training set and testing set are also not divided. Each image in these three datasets has a label of retinal blood vessel image segmented manually by two professional physicians. We randomly selected 5 images in the STARE dataset as test images (im0002, im0077, im0163, im0255, and im0291), and the remaining 15 images were set as the training set. In CHASE_DB1, we select the last 8 images as the test set and the remaining 20 images as the training set. Note that mask images of STARE and CHASE_DB1 are not available, so we extracted the green channel of the images and then used some morphological algorithms and threshold algorithm to obtain the mask images.

2.3. Preprocessing

In this paper, the green channel is selected as the input image of the preprocessing part. This is because the retinal blood vessels presented by the green channel have better contrast with the background compared with the red channel and the blue channel [20, 21], as shown in Figure 2.

It can be seen from Figure 2 that the appearance of blood vessels on the green channel of the color image consists of more information compared to that on the red and blue channel images, but the overall image is still dark, and the contrast is not obvious. In order to improve this situation, adaptive histogram threshold processing (CLAHE) [22] and gamma transformation are performed on the extracted green channel grayscale image, as shown in Figure 3. In this part of the process, CLAHE is used to enhance the contrast between the nonvessels and blood vessels, and gamma transformation is used to adjust and reduce the background noise in the image. We can see Tables 1–3 in Supplementary Materials for a comprehensive comparison of blood vessel enhanced algorithms, and these data can prove that the CLAHE method improves the general performance of the proposed method.

2.4. Multichannel Feature Extraction
2.4.1. Channel 1

In order to retain all the blood vessel feature information of the image as much as possible, some morphological operations are used in channel 1 to remove background noise, and then, the U-Net network is used for feature extraction. For the artifacts caused by uneven illumination in the image and nonvascular structures, we use the morphological closing operation algorithm to estimate the background and then perform the result using the mathematical operation shown in equation (1).

It can be seen intuitively from Figure 4 that the brighter video disc structure in the original image is removed, and most of the artifacts are also processed.where is the processed image and is the image after a morphological closing operation. We select disk type structuring elements for the closing operator having a radius of eleven pixels. is the original image; and are the image pixel size.

2.4.2. Channel 2

By analyzing the gray image of retinal blood vessels, it can be found that the cross-sectional gray intensity of blood vessels is distributed in an inverted Gaussian curve, the gray value of the center line of the blood vessel is low, and the gray value at the edge of the blood vessel is high [5]. Aiming at this remarkable feature of retinal blood vessel images, Chaudhuri et al. [23] designed a Gaussian matched filter and used its distribution to simulate the grayscale intensity distribution of blood vessel cross sections and filter the blood vessels in sections. In this paper, the matched filters are used in channel 2 and channel 3 to separately enhance and extract the large and small blood vessels to realize the comprehensive segmentation of retinal blood vessels.

Define the two-dimensional Gaussian kernel function aswhere is the width of the Gaussian kernel and is the length of the Gaussian kernel. The blood vessel starts from the center of the optic disc and extends in multiple directions. Rotating the Gaussian kernel is used to filter the multidirectional blood vessels.

Assuming that is a discrete point in the kernel function, the rotation matrix is

is the angle of the -th kernel function, and the coordinate value of after rotation is ; then, the -th template kernel function iswhere is the template field, and the value range is

In actual algorithm applications, it is often necessary to consider the mean value of the correlation coefficient of the template filter, as shown in

Among them, represents the number of points in the template area. So, the final template kernel function is

This paper improves and optimizes the dependence of Gaussian matched filter response on a vessel diameter. The image enhancement result using large-scale Gaussian matched filtering in channel 2 is shown in Figure 5, where the parameters are set to , , and 8 directions which means in equation (3). It can be seen from the image that the algorithm has a better segmentation effect for thicker blood vessels and strong antinoise, but it has a poor segmentation effect on small blood vessels, and there is a problem that the smaller blood vessels cannot be distinguished from the background, and the blood vessels are easily broken. In order to solve this problem, this paper proposes an improved method based on the black hat algorithm, which can effectively reduce the influence of background noise by subtracting the original image before matching filter processing and the obtained image after processing to enhance the characteristics of blood vessels. We performed a series of processing transformations as shown in equations (8) and (9) on the images processed by large-scale matched filtering. We call this algorithm black hat2.where is the morphological closing operation and is disk type structuring element, is the black hat transformation, is the original image, and is the final processed image.

2.4.3. Channel 3

This paper uses a small-scale Gaussian matched filter to enhance the image of small blood vessels, as shown in Figure 6. After many experiments, the parameters of the matched filter are set as , , and 18 directions which means in equation (3). Using small-scale filters can effectively enhance the small blood vessels in the image, but at the same time, it also enhances much striped noise in the image, and the enhancing effect on the thick blood vessels with central reflection is poor. To reduce the background noise, the black hat2 algorithm used in channel 2 is also used to remove the background in channel 3.

2.5. U-Net Model

In image semantic segmentation using deep learning, the U-Net network model is the most widely used, which is improved based on the classic full convolutional network (FCN) [24]. U-Net is an image-to-image pixel-level classification network, and its network structure is apparent, as shown in Figure 7. U-Net is different from other standard segmentation networks: U-Net uses an entirely different feature fusion method—splicing. U-Net stitches the features together in the channel dimension. This method fuses the in-depth features extracted from the image with the shallow features to form thicker features, while the fusion operation of FCN only uses corresponding point addition and does not obtain thicker features.

Unlike the structure in the original literature [24], this paper sets the padding value of 1 in each layer’s convolution operation, and the convolution kernel size is . The purpose is to ensure that the output and input image sizes are consistent and avoid the size increasing operation in the output layer. It is essentially a binary classification operation in the output layer of U-Net. We use an adaptive threshold segmentation algorithm for processing in this paper. The idea of this algorithm is not to calculate the global image threshold but to calculate the local threshold according to different areas of the image, so for different areas of the image, the algorithm can adaptively calculate different thresholds and perform binary segmentation. The specific calculation process is shown inwhere is the fixed parameter, is the area, and is the area’s threshold.

This paper proposes a new loss function that combines the Dice coefficient with the two-class cross-entropy loss function. The Dice coefficient is widely used in the evaluation of image segmentation. In order to facilitate the formation of the minimized loss function, as shown inwhere represents the common elements of the prediction graph and the label graph, and represent the number of elements of the prediction graph and the label. In order to facilitate the calculation, approximate as the dot product between the predicted probability map and the label, and add the elements in the result. and are quantified by summing the squares of each element. As shown inwhere is the number of pixels, and are the predicted probabilities and true labels of the pixel belonging to category .

The cross-entropy loss function used to optimize the network is shown aswhere TP and TN are the numbers of true positive and true negative pixels, respectively; and are the numbers of segmented pixels and nonsegmented pixels, respectively; is the label value (, segmentation target; , background); and is the predicted probability value of the pixel.

A coefficient is introduced to define the new loss function , as shown in

Notably, the coefficient is set to 0.5 in this work, and the flowchart of U-Net is summarized in Algorithm 1.

Input: Train images, ground truth
Input: Initial epochs 30, batch size 1, learning rate 0.01
Input: Initialize best loss Inf
Output: Predicted images , U-Net parameter
1. preprocessing ()
2. enhancement ()
3. for 0 to do
4. ifthen
6.   else ifthen
8. end if
9.  compute the number of train images
10.  initial parameter of U-Net
11. whiledo
12.    ()
14.   ifthen
17.   end if
19. end while
20. end for
22. return,
2.6. Postprocessing

In the postprocessing, since the final segmentation image merges the three segmentation images, the noise in the resulting image is also superimposed on all the noises of the three images. Noises will undoubtedly have a significant impact on the actual effect of the segmented image, so this paper addresses this issue in the final postprocessing step. In this paper, a morphological algorithm is used to calculate the size of the connected area of the image. The 8-adjacent connection method is adopted to eliminate the area with the connected area less than 25 pixels, which is to reclassify the area pixels as background. This paper selects a test image in the DRIVE dataset for experimental comparison, and the comparison images are shown in Figure 8.

2.7. Experimental Design
2.7.1. U-Net Implementation Details

The U-Net model used in this paper is slightly different from the structure in literature [24]. In order to keep the input and output image sizes of the model consistent, the convolution structure is adjusted accordingly. The specific model structure parameters are shown in Table 1.

In training, we set the epoch to 30 and the initial learning rate to 0.01, and then, the learning rate is set to update in a three-stage formula, as shown in

Setting a larger learning rate at the beginning is to make the model obtain the vicinity of the optimal global parameters faster, and this operation can reduce the training time of the model. After training for a particular epoch, the learning rate needs to be reduced accordingly in order to make the parameters closer to the optimal value in subsequent updates. The stochastic gradient descent (SGD) algorithm is used in the optimization of the loss function.

2.7.2. Training Image Preparation

We randomly select 15 images from STARE and the first 20 images from CHASE_DB1 as their respective training set. Due to the limited number of images in the existing dataset, to avoid the overfitting phenomenon in the model training, we perform data expansion processing on the training set of each dataset. Thanks to the translation invariance of the convolutional structure, the images in the training set in this paper were flipped horizontally and vertically and rotated 180 degrees to increase the amount of data 4 times.

2.7.3. Measuring Metrics

In order to evaluate the segmentation performance of this algorithm, we use the following metrics to perform a comprehensive evaluation of the segmentation result. These metrics are accuracy (ACC), sensitivity (Se), specificity (Sp), and AUC and calculated as follows:where TP is true positive, FP is false positive, TN is true negative, and FN is false negative. Se is the sensitivity, which indicates the degree of classification of blood vessels and nonvascular pixels. In this paper, higher sensitivity indicates that more tiny blood vessels can be detected. Sp is specificity, which is used to express the ability of the algorithm to recognize nonvascular pixels. ACC is the accuracy of algorithm segmentation, reflecting the gap between the algorithm segmentation result and the natural result. AUC is the area under the ROC curve, and we adopt another calculation method to get the AUC, as shown in equation (19) [11].

Besides, we also use two other evaluation metrics to measure the effect of segmentation: MCC and CAL.

MCC is a correlation coefficient between the segmentation output of the algorithm and ground truth. It comprehensively considers TP, TN, FP, and FN, which is a relatively balanced metric. Finally, it is more suitable for an imbalanced class ratio.

CAL can be expressed as the product of , , and as follows:

Suppose and are the segmentation result and the corresponding ground truth, respectively. These functions are defined as follows:(1)Connectivity (): it evaluates the fragmentation degree between and by comparing the number of connected components:where means the number of connected components, while means the number of vessel pixels in the considered binary image.(2)Area (): it evaluates the degree of intersecting area between and and is defined aswhere is a morphological dilation using a disc of pixels in radius. We set .(3)Length (): it evaluates the equivalent degree between and by computing the total length:where is the homotopic skeletonization and is a morphological dilation with a disc of pixel in radius. We set .

According to [26], the CAL metric is essential to quantify thick and thin vessels more equally.

3. Results and Discussion

As shown in Figure 9, one test image is selected from each of the three datasets to display the segmentation results of each channel and the fusion results. It can be seen that some of the intermittent blood vessels of each channel are reconnected after fusion, and the number of small blood vessels in the fusion map is significantly higher than that of each channel segmentation map.

The DRIVE dataset is selected as the experimental object and compares the three channels’ metric data in this paper. The results show that the overall fusion effect of the three channels is better than the segmentation results of every single channel; in particular, the sensitivity has been dramatically improved, as shown in Table 2.

To illustrate this paper’s segmentation effect, we list various metrics on the DRIVE, STARE, and CHASE_DB1 datasets of different papers in recent years in Tables 35. It can be seen that the algorithm in this paper is superior to most similar papers in sensitivity and AUC metrics. To have a more comprehensive understanding of the overall segmentation effect of the test set, we show the relevant indicators of the prediction results of all test set images in Table 6. The other essential metrics are MCC and CAL, and they achieved by the proposed method has been contrasted with existing segmentation techniques on the DRIVE, STARE, and CHASE_DB1 datasets shown in Table 7.

We selected image 19_test from the test set of the DRIVE dataset to display the segmentation results, as shown in Figure 10. Literature [5, 27] segmented some small blood vessels, but it is still slightly insufficient compared to this paper’s segmentation diagram. Literature [10] lacks many details, and the small blood vessels are not segmented. The segmentation result of literature [11] contains a lot of edge noise, and there are many intermittent blood vessels. Compared with the existing segmentation methods, the segmentation results in this paper have a good performance in terms of the integrity of the whole blood vessels and the segmentation of small blood vessels.

As shown in Figure 11, we select the test results of the image im0163 in the STARE dataset for comparison. It can be shown that the segmentation results of this paper are similar to those of literature [13, 14], but the background noise in literature [13] is not eliminated. Compared with literature [5, 10, 27], the algorithm in this paper illuminates the optic disc structure in the original image as much as possible in the preprocessing part, so the problem that is incorrectly dividing part of the optic disc structure into blood vessels like these papers did not appear in the final segmentation result.

The CHASE_DB1 dataset is not used in most of the papers about retinal blood vessel segmentation. One of the reasons is that the dataset contains half of the abnormal images, which may cause some interference to the trained segmentation model. Meanwhile, this dataset is also a new and challenging dataset compared to the classic DRIVE and STARE datasets. We selected four images image_12R, image_13L, image_13R, and image_14L from the test set of the CHASE_DB1 dataset to compare the segmentation results in order to verify the generalizability of the proposed algorithm, as shown in Figure 12. The segmentation result of the algorithm in literature [19] has much noise, and some blood vessels are not effectively separated. Literature [28] does an excellent job in the segmentation of small blood vessels, but there is a problem that some blood vessels are not connected. Due to the postprocessing in this paper, the segmentation result on this dataset contains less noise and guarantees the continuity of most blood vessels. However, compared with the manual label, some tiny blood vessels cannot be completely segmented from the image background.

The source codes of the proposed framework have been running on the PC (Intel Core i5-6300HQ CPU, 2.30 GHz, 12.0 GB RAM, NVIDIA GTX 950M GPU). DRIVE, STARE, and CHASE_DB1 have spent 11.3 h, 7.1 h, and 16.4 h on training separately in each channel. The average testing time of test images was 1.34 s. Table 8 shows the parameter comparison of the proposed method with other methods based on U-Net, which can help us compare the framework complexity of different methods. Note that the parameters are not equal to the training time because some methods use slices of a train image as input of the network. For example, literature [19] has 42421 slices as the training set, which means it needs more time to train the network.

4. Conclusion

This paper proposes a new retinal blood vessel segmentation method, which combines a multiscale matched filter with a U-Net neural network model of deep learning. First of all, we use an improved morphological image algorithm to effectively reduce the impact of image background in feature extraction. Additionally, in order to avoid ignoring the characteristics of small blood vessels, this paper performs multichannel feature extraction and segmentation on retinal blood vessel images. Finally, the segmented images of the three channels are merged, and various characteristics of retinal blood vessels are obtained as much as possible. In the training of the U-Net model, we used the loss function weighted by the Dice coefficient and the binary cross-entropy to solve the image pixel imbalance problem. The algorithm of this paper is tested on the existing public datasets DRIVE, START, and CHASE_DB1. The experimental results show that there is better performance in four metrics compared with similar papers. The average sensitivity of the algorithm in this paper reached 0.8745, 0.8903, and 0.8916 on the DRIVE, STARE, and CHASE_DB1 datasets, respectively. This result is nearly 0.1 higher than the average sensitivity of other papers. The improvement of the sensitivity metric also reflects that the algorithm in this paper has a good performance in extracting small blood vessels. The focus of this paper is to combine the advantages of unsupervised algorithms and supervised algorithms. We did not make too many improvements to the U-Net network. Therefore, how to prune the deep learning network model structure will be an interesting research direction in the future.

Data Availability

The three public open-source datasets used to support this study are available at http://www.isi.uu.nl/Research/Databases/DRIVE/, http://cecas.clemson.edu/~ahoover/stare/, and https://blogs.kingston.ac.uk/retinal/chasedb1/.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.


This work is supported in part by the National Natural Science Foundation of China under Grant nos. 62071161 and 62001149, Key R&D Projects of Shandong under Grant no. 2019JZZY021005, Natural Science Foundation of Shandong Province under Grant no. ZR2020MF067, and Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province.

Supplementary Materials

Table 1: channel 1 results of DRIVE test images. Table 2: channel 1 results of STARE test images. Table 3: channel 1 results of CHASE_DB1 test images. (Supplementary Materials)