Abstract

The automatic segmentation of main vessels on X-ray angiography (XRA) images is of great importance in the smart coronary artery disease diagnosis system. However, existing methods have been developed to this task, but these methods have difficulty in recognizing the coronary artery structure in XRA images. Main vessel segmentation is still a challenging task due to the diversity and small-size region of the vessel in the XRA images. In this study, we propose a robust method for main vessel segmentation by using deep learning architectures with fully convolutional networks. Four deep learning models based on the UNet architecture are evaluated on a clinical dataset, which consists of 3200 X-ray angiography images collected from 1118 patients. Using the precision (Pre), recall (Re), and F1 score (F1) as evaluation metrics, the average Pre, Re, and F1 for main vessel segmentation in the entire experimental dataset is 0.901, 0.898, and 0.900, respectively. 89.8% of the images exhibited a high F1 score >0.8. For the main vessel segmentation in XRA images, our deep learning methods demonstrated that vessels could be segmented in real time with a more optimized implementation, to further facilitate the online diagnosis in smart medical.

1. Introduction

Automatic medical diagnosis is of importance in the construction of smart medical. The smart medical can provide data collection, data analysis, and online diagnosis for the patients in anywhere. Nowadays, more than 17 million people died of coronary artery diseases in 2016 which is about 31% of all deaths, and more than 75% of these deaths occurred in low-income and middle-income countries. However, CADs have many different types of diseases with various characteristics [1]. For the small hospitals in rural and low-income countries, the coronary artery disease diagnosis lacks of the rich experience and knowledge of experts and the advanced diagnostic technology. This may cause the misdiagnosis, inappropriate therapy, and health hazards to patients and even leads to improperly carried out percutaneous coronary intervention (PCI) procedures. Therefore, the automatic analysis of coronary imaging data is necessary in smart medical.

In the clinical setting, X-ray angiography (XRA) is the main imaging methodology to guide the coronary artery disease diagnosis. With the help of the contrast-enhanced XRA images, doctors can diagnose the coronary artery disease and evaluate therapeutic effects based on the morphological characteristics. However, this subjective visual diagnosis inevitably causes unreliable assessment of the lesions, which not only decreases the operative workflow efficiency but also raises the health hazards to patients. Thus, in the XRA analysis system, the automated coronary artery segmentation on XRA images is important for the diagnosis and intervention of cardiovascular diseases.

Although vessel segmentation has always been studied due to its significance and complexity in clinical practice, coronary artery vessel segmentation still remains highly challenging because of the poor visual quality of XRA images, which is caused by the low contrast and high Poisson noise of low-dose X-ray imaging, the overlap of the vessels with stenoses and other artery branches, complex motion patterns, and disturbance of spatially distributed noisy artifacts. Most of the existing 3D/2D vessel segmentation methods for coronary arteries are proposed to segment the vessels from computed tomography angiography (CTA); however, these methods cannot be directly applied to 2D XRA images. To extract the morphological features in the vascular structures from XRA images, some image analyses and machine learning-related methods have been developed. Kerkeni et al. [2] proposed a multiscale region growing method for coronary artery segmentation, and the growing rule incorporates both vesselness and local orientation information. However, it fails to recognize thin vessels and the stenosis lesion regions in the low-contrast XRA images. Jin et al. [3] developed a radon-like feature filtering plus local-to-global adaptive threshold to tackle the spatially varying noisy residuals in the extracted vessels. Although it has refined vessel segmentation, some residuals remained around vessel regions, especially for the stenosis regions. Felfelian et al. [4] used Hessian filter to detect coronary artery regions of interest and identified vessel pixels with the flux flow measurements. Nevertheless, a series of postprocessing should be performed first to eliminate falsely identified vessel pixels. However, these methods are not efficient enough for the XRA online analysis in the smart medical system. Besides, these methods are not designed for the main vessel segmentation on the XRA images, which is more important for the clinical decision-making. In this work, we mainly study the main vessel (LAD, LCX, and RCA) segmentation from XRA images (as shown in Figure 1).

With the development of neural network-based deep learning [5], convolutional neural network (CNN) with fully-connected layers is used to perform vessel segmentation. In [5], the input XRA images are divided into many patches that overlook the global structure information of the different vessel regions. To alleviate these issues, Fan et al. [6] proposed a multichannel fully convolutional neural network to learn the hierarchical features between vessel and background. However, the mask images should be considered and collected to distinguish real blood vessel structures. It is not practical and inefficient in the clinical applications. Some of XRA image segmentation algorithms integrate the temporal-spatial contextual information in XRA sequences, which can be important to infer whether the pixels belong to the foreground vessel regions or not. However, the clinical diagnosis for the CAD is based on the keyframe, which is always the end-diastolic angiographic frame with best image quality and full contrast-agent penetration. Although current vessel segmentation methods have made great progress in segmentation accuracy, they are still inefficient in the main vessel segmentation from XRA images with many noisy and overlapped background artifacts.

In order to achieve the XRA image segmentation in the smart medical system, we apply four deep neural networks (DNNs), which were constructed on the basis of UNet architecture [7]. Deep learning models based on UNet have demonstrated powerful performance in binary semantic segmentation of gray-scale images. UNet consists of a fully convolutional encoder called a backbone and a deconvolution-based decoder (UNet). By replacing the backbone of UNet with one of the most popular networks for image classification, such as ResNet [8], DenseNet [9], or Residual Attention Network [10], deep learning models were applied for segmentation of X-ray angiography. The class imbalance problem caused by the imbalance ratio between the number of foreground vessel pixels and background pixels typically lies in the challenging vessel segmentation tasks and must be well treated to boost the vessel segmentation. We employ Dice loss function in deep network to alleviate the severe class unbalance problem in sequential vessel segmentation. Figure 2 shows the schematic diagram of the main vessel segmentation in the smart medical system. Various XRA images are collected from different hospitals. Then, these images are transmitted online for the subsequent segmentation, and the fast analysis and diagnosis suggestions from experts would be received.

In this section, we briefly review the previous methods which are most related to our work including the X-ray angiography image segmentation, conventional medical image segmentation method, and deep learning-based segmentation method.

2.1. X-Ray Angiography Image Segmentation

The World Health Organization (WHO) has announced that cardiovascular diseases (CVDs) are the leading cause of death in today’s world. X-ray angiography (XRA) is the primary imaging method to observe CAD by presenting the visualization of the coronary morphology [11]. Especially in XRA images, it is important to identify the main blood vessels correctly because the morphological information existing in the main vessel is most related to the clinical quantification of the coronary stenosis. Besides, the functional information related to the coronary revascularization can also be extracted from these morphological features [12]. However, in clinical, the identification of main vessels currently depends on the manual segmentation from the radiologist, which is time-consuming and has variability due to the radiologist characteristics (e.g., experience and board-certification in cardiology) [13]. It may lead to the unreliable assessment during the clinical decision making. Moreover, due to the low-contrast, nonuniform illumination and low signal-to-noise ratios of X-ray angiography, it is difficult to accurately achieve the vessel segmentation task [5]. Therefore, an automated vessel segmentation method is required in order to meet the clinical needs that are reducing time and cost and helping relevant experts. Among them, the main vessel segmentation is a difficult problem because it does not identify the whole blood vessels shown in the image but only the main blood vessel is segmented.

2.2. Conventional Vessel Segmentation Method

Many traditional approaches have been proposed in the past decades, including filtering-based methods [1418] and tracking-based algorithms [19, 20]. Commonly, the filtering-based methods develop specific filters convolving with the original images to enhance the vessel structures [14, 15, 16]. In [14], an operator for feature extraction based on the optical and spatial properties of vessel was introduced. Specifically, 12 different matched filtering templates were designed to search for the vessel segments along different possible directions. In order to develop vessel enhancement, Frangi et al. [15] proposed a common vesselness enhancement technique, where the multiscale second-order local structure of an image (Hessian) is examined. Some other filtering-based methods are proposed to extract vessel features, such as the ridges feature, the radon-like features [19], and Gabor wavelet features [17, 20]. Although these methods obtained the enhancement performance for vessel structure, they are not efficient in clinical due to the high computational complexity for the pixel-wise manipulation. Besides, these methods are often utilized as the preprocessing step, and the final segmentation needs the further postprocessing step like morphology operation.

Tracking-based segmentation methods first chose the seed points on the vessel edges, and then the tracking process starts under the guidance of image-derived constraints. Makowski et al. [21] employed the two-phase-based method during vessel extraction, which uses two approaches (balloons and snakes) to active contour segmentation. A recursive tracking method was performed to extract the branch structures in CTA, by analyzing the binary connected components on the surface of a sphere that moves along the vessels [22]. Manniesing et al. [23] presented an approach to tracking the vessel axis based on the surface evolution, by imposing shape constraints on the skeleton topology. However, these tracking methods usually need the human intervention to adjust the performance, and they fail to achieve a better segmentation for the small vessels in low quality images, such as XRA images.

2.3. Deep Learning-Based Segmentation Method

Compared with conventional segmentation methods, deep learning methods can learn discriminative feature representation effectively and have generalization capacity. Recently, convolutional neural network (CNN)-based methods have been broadly applied to medical image segmentation, including vessel segmentation, such as retinal vessel segmentation [24]. In [5], the common convolutional neural network with fully-connected layers was used to perform vessel segmentation. The input XRA images are divided into many patches that overlook the global structure information of the different vessel regions. Besides, the model [5] with three fully-connected layers contains many parameters, which may lead to the overfitting and inefficiency, especially for the images with high dimension. Xu et al. [25] proposed a spatio-temporal patch-based LSTMRNN network for direct delineation of the myocardial infarction area, using cardiac motion features [26]. Besides, some deep learning-based segmentation methods for coronary artery have used 3D CT/CTA data [27]; however, these methods cannot be directly applied to the main vessel segmentation in the 2D XRA images.

3. Materials and Methods

In this section, we introduce the materials used in this study. We also introduce our model framework for main vessel segmentation using deep learning networks, which was inspired by the integration of UNet with deep convolutional networks.

3.1. Dataset

A total of 3200 coronary angiography images obtained from 1118 patients who visited the Sun Yat-Sen Cardiovascular Hospital of Shenzhen from September 2015 to July 2018 were evaluated. All patients participating in the study had the approval of the institutional review board for a retrospective observational study. The subjects’ ages were from 30 yrs to 89 yrs, with an average of 62.8 yrs. The size of the XRA images in our dataset is 512 × 512, and the pixel spacing ranges from 0.183 mm/pixel to 0.741 mm/pixel, with the mode of 0.37 mm/pixel.

Several XRA image sequences are first selected by an experienced (more than 10 years) physician from the appropriate viewpoints according to the standard clinical procedure [28]. Specifically, given the X-ray angiography image sequence, the keyframe (i.e., end-diastolic angiographic frame) with best image quality and full contrast-agent penetration is selected by the physician [29].

Two experts with more than 10 years of experience manually segment the main vessels (LAD, LCA, and RCA) independently from the ostium to the distal site by using the ITK Snap software [30]. As a result, three main vessel angiography images are obtained per patient. However, the image which is of poor quality is excluded from the dataset, and representative reasons include blurring and device appearing in the image.

3.2. Data Processing

Our data consists of 3200 gray-scale images with a size of 512 × 512. Of the 3200 XRA images, 1100 are LAD, 980 are LCX, and 1120 are RCA. We first normalize the data by using 2-D min/max normalization. However, the dataset in our study is small compared to the datasets of some general computer vision tasks (e.g., image classification [31]); the model will inevitably face overfitting problem. To avoid this problem, we perform the data augmentation in this study.

(1) We zoom in and zoom out an image at a random ratio within ±15%. (2) In addition, the height and width of the image are shifted at a random ratio within ±15% of image size 512 × 512. (3) In order to reduce the influence from the viewpoint selection in the real clinical angiography, we rotate the XRA images within ±20° at random rates. In other words, any angle is randomly selected within −20° to 20° degrees and the image is rotated based on the clockwise direction. (4) Due to the variability of the brightness of the XRA images, the brightness is also changed within ±30% at random rates.

3.3. Networks

We present four deep learning models, which are constructed on the basis of the UNet framework for semantic segmentation [7]. Recently, many works have demonstrated that deep learning models based on UNet have powerful performance in gray-scale image segmentation. The encoder of UNet consists of a fully convolutional network (backbone), and the decoder is a deconvolution-based network. In this study, we replace the backbone of UNet with three popular networks (ResNet [8], DenseNet [9], or Residual Attention Network [10]) for image classification, respectively. Then, we evaluated these four deep learning models for segmentation of X-ray angiography. The framework of our deep learning model is presented in Figure 3.

UNet [7]: It is an encoder-decoder architecture, which has achieved very good performance on very different biomedical segmentation applications.ResNet [8]: It is a residual learning framework, which explicitly reformulates the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. The ResNet-50 is used in our work.DenseNet [9]: It connects each layer to every other layer in a feed-forward fashion. For each layer, the feature maps of all preceding layers are used as inputs, and its own feature maps are used as inputs into all subsequent layers.Residual Attention Network [10]: It integrates attention mechanism and stacks multiple attention modules. It can capture mixed attention and is an extensible convolutional neural network.

Commonly, the cross entropy (CE) loss is adopted in most semantic segmentation tasks. In main vessel segmentation from XRA images, the vessel region is small compared with a large number of background pixel regions. Dice loss can avoid the aforementioned problem by measuring the overlap ratio between the ground truth and the prediction. We integrate the cross entropy loss (equation (1)) and dice loss (equation (2)) to optimize the network. Therefore, the loss function of our deep learning models is presented as equation (3). In equations (1) and (2), c is the number of classes and and are the ground truth label and the prediction for the pixel in all the pixels.

4. Results and Discussion

This section presents the results and discussion of our experimental study. We evaluate the deep learning models on the clinical XRA dataset.

4.1. Evaluation Metrics and Experimental Setup

For training the deep learning models, we optimize them by using the Adam optimizer with a batch of 8 images per step and an initial learning rate of 0.0002. The decay rate is set to 0.95, and the hyperparameter α is set to 0.15. In our experiments, 10-fold cross-validation is employed on the dataset. These models were implemented by using TensorFlow 1.13.0, and they were trained for 200 epochs and tested on an NVIDIA Tesla P40 24 GB GPU.

The evaluation metrics used to assess the predictability of the deep learning models were precision, recall, and F1 score, which were defined as precision = TP/(TP + FP), recall = TP/(TP + FN), and F1 = 2 × precision × recall/(precision + recall), where TP is true positive, FP is false positive, and FN is false negative.

4.2. Results and Discussion

To assess the segmentation performance of the deep learning methods based on UNet architecture, we evaluated the precision, recall, and F1 score. Table 1 presents the results of the four deep learning methods on the all dataset and different artery types (LAD, LCX, and RCA). The result values are presented as mean ± standard deviation. As shown in Table 1, the models with the backbones (ResNet, DenseNet, and ResAttNet) statistically outperform the UNet in terms of precision, recall, and F1 score. Specifically, ResAttNet achieved highest F1 scores of 0.921 ± 0.102 in overall vessel, 0.897 ± 0.122 in LCX vessel, and 0.912 ± 0.105 in RCA vessel. Although DenseNet has little superiority over ResAttNet in terms of F1 scores in LAD vessel, ResAttNet obtained the best performance on the overall vessel and each main vessel in terms of the three metrics.

Comparing with ResNet, ResAttNet has better performance on each main vessel. The main vessel is small compared with the other regions (e.g., background) in the XRA images. The attention mechanism in ResAttNet can bring the more discriminative feature representation. In particular, ResAttNet achieved average F1 score improvements of 2.8%, 2.9%, 3.9%, and 1.8% for overall vessel, LAD, LCX, and RCA, respectively. Besides, we can see that the overall performance decreases on the LCX vessel. This is because the number of the LCX samples is less than LAD and RCA. Although the amount of LAD and RCA are approximately equal, the deep learning models perform well in RCA vessel than in LAD because anatomical diversities of LAD vessel are larger than that of RCA vessel.

Figure 4 depicts the representative examples of main vessel segmentation of the deep learning models. Regarding the small and thin vessel region, ResAttNet has superiority over the other three models. The attention mechanism can help to extract more discriminative features for the small regions. In Figure 4, we can see that these four deep learning models fail to have a better segmentation for the angulated segment due to the larger diversity of the segment curvature. It can be further improved when more XRA image data are available. We also show the Bland–Altman analysis for ResAttNet. We randomly select onefold data and use the area of the main vessel region as the index. Besides, no significant difference was found between the prediction obtained from ResAttNet and the physician-annotated mask in the XRA images. The details are presented in Figure 5.

In terms of computational efficiency (inference time) of our proposed methods, we compute the processing time for one XRA image. The methods performed fast and used on an average only 0.05 s to process one XRA image. In the standard X-ray angiography procedure, the frame rate of all XRA sequences is commonly 7.5–15 frames/s. Therefore, our methods have the potential to run in real time with a more optimized implementation to further facilitate the PCI procedure.

The main limitation of this study is the lack of sufficient XRA image data for different artery types. The proposed deep learning-based methods still have difficulty in stenosis region segmentation, as well as the thin vessel segmentation. To achieve the robust and general performance, the introduced deep learning segmentation framework requires diverse data with more artery morphological features and lesion characteristics. We intend to segment the main vessels and the stenosis lesion simultaneously in XRA images. Besides, the quantification of the stenosis is also significant in clinical angiography [32]. The segmentation task and quantification task would be performed simultaneously using the multitask deep learning method [33], which can benefit physicians to make diagnosis more accurately and quickly and improve the clinical workflow efficiency.

5. Conclusion

In the smart medical system, the automated coronary artery segmentation on XRA images is important for the diagnosis and intervention of cardiovascular diseases. In this study, we apply the deep learning networks for main vessel segmentation. The experiment results on a clinical dataset show that the deep learning networks accurately identify and segment the main vessels in XRA images. The models have a bigger potential to run in real time with a more optimized implementation to further facilitate the PCI procedure.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded in part by the ShenZhen Innovation Funding (JCYJ20170413114916687).