Abstract

Automatic and accurate segmentation of ground glass opacity (GGO) nodules still remains challenging due to inhomogeneous interiors, irregular shapes, and blurred boundaries from different patients. Despite successful applications in the image processing domains, the random walk has some limitations for segmentation of GGO pulmonary nodules. In this paper, an improved random walker method is proposed for the segmentation of GGO nodules. To calculate a new affinity matrix, intensity, spatial, and texture features are incorporated. It strengthens discriminative power between two adjacent nodes on the graph. To address the problem of robustness in seed acquisition, the geodesic distance is introduced and a novel local search strategy is presented to automatically acquire reliable seeds. For segmentation, a label constraint term is introduced to the energy function of original random walker, which alleviates the accumulation of errors caused by the initial seeds acquisition. Massive experiments conducted on Lung Images Dataset Consortium (LIDC) demonstrate that the proposed method achieves visually satisfactory results without user interactions. Both qualitative and quantitative evaluations also demonstrate that the proposed method obtains better performance compared with conventional random walker method and state-of-the-art segmentation methods in terms of the overlap score and F-measure.

1. Introduction

Lung cancer is the leading cause of cancer-related deaths among both men and women. Lung cancer accounts for approximately 27% of all cancer deaths because its diagnosis occurs at the advanced stages of the disease. Lung cancer is controllable if it is timely diagnosed and appropriately treated in an early stage. Currently, only 15% of all lung cancers are diagnosed at an early stage, which causes a five-year survival rate of only 16%. Therefore, lung cancer diagnosis is of importance to increase the chances of survival and reduce the mortality rate in an early stage, when treatment options are better. Lung cancer potentially manifests itself as pulmonary nodules [1]. Computed tomography (CT) is one of the most prevalent modalities for early inspection and analysis of pulmonary nodules. In recent years, medical image processing research has been underway for detection and segmentation of pulmonary nodules in CT images. In particular, segmentation of pulmonary nodules is a worthy task for subsequent planning of treatment strategies, monitoring of disease progression, and prediction of treatment outcome, because some key indicators can be readily calculated in segmented pulmonary nodules, such as volume [2, 3] and size [4]. In clinical routine, segmentation of pulmonary nodules is manually delineated in a slice-by-slice manner under the guidance of radiologists. However, manual segmentation is time-consuming and subjective in larger studies. Therefore, automatic segmentation of pulmonary nodules is highly desirable to relieve radiologist workload.

Based on position of pulmonary nodules in the lung parenchyma and proximity to other anatomical structures, five types of nodules are identified: well-circumscribed, juxta-vascular, juxta-pleural, cavitary pulmonary nodules, and ground glass opacities (GGO). Over the last decades, a lot of efforts have been devoted to studying the segmentation of pulmonary nodules. However, most of nodule segmentation methods that have been previously published focus on well-circumscribed pulmonary nodule segmentation. There have been limited literature on segmentation of GGO pulmonary nodules. Unfortunately, GGO pulmonary nodules have a higher malignancy rate than other solitary nodules [5]. Therefore, an accurate and efficient method is urgently demanded for segmentation of GGO nodules, which is the focus of our paper.

Segmentation of GGO pulmonary nodules still remains challenging due to inhomogeneous interiors, blurred boundaries, and irregular shapes from different patients. In this paper, an improved random walker-based method is proposed for segmentation of GGO pulmonary nodules. The random walker was firstly introduced by Grady [6] for interactive image segmentation. The initialization of random walker model requires user inputs to guide the segmentation process. The user provides some seeds that indicate certain pixels as the object itself and few others as the background. For each pixel, it is necessary to compute the probability that a random walker leaving the pixel will first arrive at each seed. Many models of random walker have been widely applied in many image segmentation processing tasks [79].

Random walker model needs the user to specify object and background seed points. However, it is extremely sensitive to the locations and quantity of seeds. Once the locations of seeds are not precise and the quantity of seeds is not sufficient for accurate pulmonary nodule segmentation, unlabeled pixels can be assigned the wrong labels. This may lead to degrading the quality of segmentation. Therefore, the user needs frequently select seed points that could improve the segmentation performance. However, this process is considerably tedious and the computational cost is extremely expensive. In addition, user interaction seriously restricts the applications of random walker method. In clinical practices, most clinicians would benefit from automated methods. This motivates us to present an alternative solution strategy for automatic seeds acquisition. Please see Section 3.1 for more details.

In addition, the random walker method might be prone to get stuck at a local minimum in the energy landscape. At present, some existing random walker methods have attempted to exploit some prior knowledge to address this shortcoming. For instance, T. Messay et al. [10] proposed the guided random walk method for left ventricle segmentation. The authors incorporated prior knowledge into the energy function of the random walks. However, the sensitivity of seeds location and quantity has not been addressed under guided random walk framework. Mi et al. [11] proposed an iterative method based on random walkers for tumor segmentation. The authors took into account prior knowledge of the influence of tumor growth prediction. In this paper, an improved random walker method is proposed for segmentation of pulmonary nodules. A constraint term is introduced by extending the fundamental energy function of the random walker. The main contributions of the proposed method are summarized as follows:(1)Introducing an alternative solution strategy for automatic seeds acquisition. Different from many random walker methods, we propose a fully automatic method for segmentation of GGO pulmonary nodules. An automatic solution strategy is proposed to locate nodule and background seeds. Subsequently, the acquired seeds are further fed to the random walker.(2)Constructing a new affinity matrix that measures the similarity between a pair of neighboring nodes in some predefined feature space. The affinity entry is composed of two components: an adjacency component and a feature-based component. The adjacency component is defined based on the spatial distance to enforce the spatial coherence. The feature-based component is defined based on intensity and texture features. Finally, two components are combined by a point-wise multiplication operation.(3)Defining a novel energy function for segmentation of GGO nodules. A label constraint term is introduced to the fundamental energy function of the random walker, which alleviates the accumulation of errors caused by the acquired seeds.

The remainder of the paper is organized as follows. Section 2 briefly reviews the relevant literature for segmentation of pulmonary nodules. The details of the proposed method are described in Section 3. Extensive experimental evaluations are conducted on the LIDC dataset and experimental results are given, which show excellent performance of the proposed method in comparison with traditional random walker and several other previously published methods. The overlap score, F-measure, and executive time are discussed in Section 4. Discussions are given in Section 5. Finally, conclusions and future work are provided in Section 6.

Over the past decade, a lot of efforts have been devoted to the segmentation of pulmonary nodules. These methods can be broadly categorized as either intensity-based or shape-based and deep learning methods. The former separates pulmonary nodules from the surrounding background by using purely intensity information, such as thresholding [12, 13], pixels classification [14], region growing [15, 16], clustering [1719], and mathematical morphology [20, 21]. Some hybrid methods have been applied in pulmonary nodule segmentation. Dehmeshi et al. [16] incorporated the thresholding and region growing methods for the segmentation of the pulmonary nodules. The mathematical morphology-based methods also have been applied to segmentation of pulmonary nodules. The main difficulty with these methods is to decide a suitable size of structuring elements to segment all different kinds of pulmonary nodules. Shape-based methods [22, 23] for segmentation of pulmonary nodules have been well studied. This category yields better segmentation results than the former, since it takes into account the nodule-specific geometrical constraints for segmentation of pulmonary nodules. Strictly speaking, these geometrical assumptions are not valid by ground glass opacity (GGO) nodules because these types of pulmonary nodules show the large variability of the morphology.

Deformable models [2427] have been attracting more and more attention for segmentation of pulmonary nodules. Deformable models are flexible to cope with topological variability. They have been shown to achieve very good segmentation results. Although these deformable models have achieved satisfactory segmentation of pulmonary nodules with strong boundaries, they are highly sensitive to noise and are also dependent on the location of the initial contour. Meanwhile, deformable models demand the high computational burden due to the use of a huge number of iterations.

Deep learning techniques [2835] have been used extensively in segmentation of pulmonary nodules. Compared with traditional segmentation methods, deep learning uses the deep neural network models to train a large number of images. According to the data, deep learning actively learns the low-level features of nodules and forms more abstract high-level features, so as to achieve a better segmentation. Although these deep learning models have achieved satisfactory segmentation of pulmonary nodules, deep learning needs the setting of hyperparameters in the network model, and the challenges are still not very well understood. Different from these models, our proposed method performs simply and efficiently and obtains the segmentation results by solving a linear system. In addition, the proposed method is more robust against intensity variations and noise than deformable models, since the random walkers capture the spatial connectivity much better than deformable models. The details of the proposed method are described in the following section.

3. Method

In this section, an improved random walker method is proposed for segmentation of GGO pulmonary nodules, which consists of four main steps: acquisition of seeds, construction of undirected weighted graph, designation of the energy function, and optimization of the energy function. The flowchart of the proposed method for GGO nodule segmentation is shown in Figure 1. The details of each step are described in the following sections. Before the improved random walker method is presented, the description of some basic notations is firstly given in this section.

3.1. Acquisition of Seeds

After segmenting the GGO pulmonary nodules by the provided random walk method, nodule and background seed points should be first acquired. In this section, an efficient local search strategy is presented for acquiring the reliable seeds, which is often unattainable for most of the existing studies of random walker methods.

The preprocessing stage is essential within a lung CT image before pulmonary nodules segmentation. The coherence filter is adopted to remove the effect of the image noise while preserving the nodule boundaries very well. The random walk segmentation method strongly depends on the initial seed acquisition. The user needs to frequently acquire seed points until the satisfactory segmentation results are achieved. However, the process is considerably tedious and time-consuming for large-scale images. Therefore, automatic acquisition of seed points is essential for the next step. We will focus on presenting an effective method to make the manual seed acquisition automatically which reduces user interaction. Firstly, the adaptive threshold method [12] is adopted to find a global threshold. The pixels are roughly identified as a part of pulmonary nodule, whose intensities are greater than the global threshold. The remaining pixels are considered as the background pixels. Figures 2 and 3 show an example of the nodule and background seeds acquired using the proposed method as red and blue spots, respectively. Figure 2(b) shows the binary image in the filtered CT image. As observed from Figure 2(b), the adaptive threshold can roughly separate the nodule from the background. The morphological open operation is used to eliminate small holes and noises and connection component analysis is employed to remove the undesired regions. Finally, an initial segmentation result is obtained.

Secondly, a small region is created based on geodesic distance [36] from the initial segmentation result . is the largest connected component within and , where is the maximum geodesic distance value starting from the boundary to the center of and is a predefined parameter. The pixels in the region are identified as the nodule seeds. Noticeably, nodule seeds may be restricted to a homogeneous part of the nodule. Figure 2(c) shows the nodule seeds marked with red spots. However, it is also well known that GGO pulmonary nodules are often inhomogeneous. The accuracy of GGO nodule seed acquisition could be further improved if more nodule seeds are available. We need to acquire the nodule seeds as uniformly as possible for accurate segmentation of GGO pulmonary nodules. Hence, we introduce a local search strategy to acquire the other nodule seeds. The pixels of are used as the initial pixels to search the other nodule seeds. If pixels are close to the boundary of and have features similar to those of pixels in the region, they will be identified as the nodule seeds.

In GGO pulmonary nodules with intensity inhomogeneity, the use of intensity feature alone will not be sufficient. Texture feature provides complementary information of intensity feature. Texture feature gives a measure of the variation in the intensities at pixels of interest, which has been proved to be extremely effective for pulmonary nodule detection [3740]. For calculating texture feature, gray-level cooccurrence matrix (GLCM) [40] and Gabor filter are used in this paper. The GLCM is generated by counting the occurrences of intensity pairs between the current and neighbor pixels of l-gray-level image. The normalized GLCM is calculated in the following equation:where and are intensity values in the l-gray level. is the relative frequency matrix given in the following equation:where and are the x-axis and y-axis spatial domains, respectively. and are pixel positions. is the l-gray level. Gabor transformation [41] is another commonly used texture feature extraction method. A Gabor filter is the multiplication of a Gaussian distribution by a harmonic, which is formulated as follows:where denotes the standard deviation of 2D Gaussian envelope. and are wavelength and orientation, respectively. and are phase shift and spatial aspect ratio, respectively. In this paper, eight orientations , two wavelengths , and two standard deviations are employed to extract texture features. The magnitude map of Gabor filter is calculated to describe the local texture features in this paper. The corresponding maximum amplitude is defined in the following equation:where is the filtered image by the set of Gabor filters. is the number of Gabor filters with different orientation . After the intensity and texture features are extracted, a local search strategy is introduced to acquire the other nodule and background seeds. Let be a boundary Lipschitz domain and let be defined as gray-level image function. The similarity between a pixel in and its adjacent pixel is calculated as follows:where and denote the intensity values at pixel and its adjacent pixel , respectively. Eight-neighbor connections at a pixel are used in this paper. and denote the texture values at pixels and , respectively. denotes the norm to measure the feature difference between two adjacent pixels. is a neighborhood system of a pixel . Herein, the exponential function is used to stress the importance of intensity feature. If the similarity satisfies [34] the rule in (7), we will add pixels to the nodule seed set .where is a predefined threshold. We empirically set , which works well in our method. The rule is very simple and effective. The local search strategy is iteratively implemented and the iteration stops when all pixels of undergo the boundary. The new identified nodule seeds are added to obtain the final nodule seeds set . Figure 3(a) shows the final nodule seeds marked with red spots. Figure 3(b) shows the zoomed-in region of Figure 3(a).

After the nodule seeds are acquired, we will automatically acquire the background seeds. If pixels are close to the boundary of and have a large feature difference with the pixels of the region , they will be identified as the background seeds. If the similarity satisfies the following rule, we will add pixels to the background seed set . The rule is defined in (8).

if and , .

and .where is a region that the distance of two pixel positions from the center of the pulmonary nodule region to pixels outside the region satisfies a predefined threshold . The background seed set will be updated and the iteration stops when there are no new pixels of the region to be added to the background seed set . Figure 3(c) shows the background seeds marked with blue spots.

3.2. Undirected Weighted Graph Construction

The construction of the suitable graph is inevitable. The key step on the graph construction is to define a discriminative affinity matrix. An unreasonable affinity weight potentially captures erroneous spatial relationship between two adjacent pixels, resulting in accumulating the erroneous information. Further, errors are conveyed into the subsequent step of the energy function of random walker.

The input image is represented as an undirected weighted graph , where is a set of nodes and is a set of edges. The image consists of pixels. A node represents the pixel of the input image. An edge connects a pair of neighboring nodes and . Edges are weighted by the nonnegative weighted function , where on edge reflects the similarity between two neighboring nodes and . Note that is considered nonrelevant relation between a pair of nodes and . In addition, the edge weights are symmetric; that is, in an undirected graph. The nodule seed set along with the background seed set will constitute the overall seed set . The remaining unlabeled pixels are denoted as , such that and .

3.3. Definition of an Affinity Matrix

The success of random walker depends on how accurately the relationships between two neighboring nodes of the weighted graph represent the pulmonary nodule and background. In other words, the nodes belonging to pulmonary nodules should have the high affinity among them. The nodes belonging to background should also have the high affinity among them. The key step of the graph construction is to define an affinity matrix , which measures the similarity between two neighboring nodes in some predefined feature space. Apparently, intensity feature alone is insufficient for segmentation of GGO pulmonary nodules. In general, different features describe an object characteristic from different views and provide complementary information to each other, such as texture feature, Haar feature, and histogram of gradient (HOG) feature. We have managed to exploit texture feature and spatial distance to define an affinity matrix. Texture descriptor can be a characterized property of object surface, such as contrast, regularity, coarseness, and structural arrangement. Recently, the efficiency of texture feature has been proved [43, 44]. Distinct from [43, 44], we integrate local binary pattern (LBP) [4547] and Gabor filter [48] to extract texture features. In addition, the spatial distance is employed to control the spatial influence between two adjacent nodes. In other words, the closer two nodes are in spatial distance, the more likely they are to influence each other.

The affinity entry between a pixel and its neighbors is calculated by incorporating texture feature and spatial distance. It is composed of two components: an adjacency component and a feature-based component. The adjacency component is defined based on the spatial distance. The closer two nodes are, the stronger the penalty imposing the similar labels is. In other words, increasing the spatial distance will decrease the affinity entry. The feature-based component is defined based on intensity and texture features. We adopt Gaussian kernel function with affinity measurement for simplicity. Euclidean distances of features between two neighboring nodes are associated with the edges of the weighted graph. Consequently, two components are combined by a point-wise multiplication operation. Hence, the affinity entry from a node to its neighboring node is calculated in the following equation:where and denote the spatial coordinates of nodes and , respectively. denotes the norm to measure the distance between a pair of adjacent nodes on each feature space. The higher affinity entry indicates the higher discriminative power of the representative features. As a result, they are assigned the same label. Meanwhile, the smaller affinity entry indicates that feature differences between two neighboring nodes will be the larger; thus the guided random walker tends to cross these edges. With the help of the discriminative affinity matrix, it can make segmentation results of pulmonary nodules more accurate and efficient. After the affinity matrix is constructed, we will discuss how to design a new energy function for accurate GGO pulmonary nodule segmentation, which is described in more details in Section 3.3.

3.4. The Energy Function Designation

The random walker segmentation method is formulated as an energy function minimization problem. The fundamental energy function of the random walk imposes the consistency in the labels of neighboring pairs. In other words, the labels will be assigned the same in a neighborhood system, when their corresponding features are similar. How to adequately use the acquired seeds is crucial for accurate segmentation of GGO pulmonary nodules. Note that it is not always possible to accurately initialize seeds from the segmented pulmonary nodule regions. Once the initial segmentation results are not precise, the label assignment inevitably shows the potential errors. The accumulation of errors can degrade the quality of segmentation. To address this problem, a label constraint term is added to the fundamental energy function of the random walk by incorporating the prior knowledge of seeds. Then, a weight function of the label constraint term is based on fuzzy membership value, which is discussed below.

The fuzzy membership is the degree of membership of a node belonging to the foreground or background. In this paper, we build a foreground Gaussian Mixture Model (denoted as ) and a background Gaussian Mixture Model (denoted as ) as the global guidance for segmentation of GGO pulmonary nodules, where Gaussian Mixture Model (GMM) is generated from the acquired seeds. Fuzzy membership value is calculated based on the posterior probability of Gaussian Mixture Model (GMM). The intensity and texture features are incorporated to construct an augmented feature vector . An indexing function is defined, where indicates that the probabilities of the nodes are assigned a node . indicates the assignment of a node to the foreground and indicates the assignment of a node to the background. represents the posterior probability of a node to belong to nodules and represents the posterior probability function of a node to belong to background. The higher value has, the higher the probability that node belongs to . A similar expression is applicable to . The weight function will be assigned to a large value when the predefined label and the calculated label are similar during the energy minimization. To obtain a desired probability vector , the energy function is defined in the following equation:where and represent the probabilities on nodes and and is the membership function on a node . is a tradeoff parameter and is the number of seeds. is the preassigned label on a node , which is defined as follows:

and is calculated as follows:

The label constraint term enforces the consistency between the calculated probability and the preassigned probability after the energy minimization, which reflects the information of seeds. After the energy function is defined, we will discuss the minimization of the energy function in Section 3.4.

3.5. The Energy Function Minimization

The energy function is minimized by expanding (10) and (13) is in the matrix form.where is a diagonal matrix denoted as . is an N-dimensional indicating vector. The degree matrix is a diagonal matrix with degrees of the node in main diagonal, denoted as . Then every pixel is identified uniquely by a node in our undirected graph, where the degree of each vertex is computed as for all the edges that incident on the vertex. is the Laplacian graph matrix, which is denoted as . The parameter is a positive constant which controls the tradeoff between two terms.

By the partial derivatives with respect to , the following system of the linear equations is solved for each seed to obtain the probabilities of the unlabeled nodes in the following equation:

and after the probabilities of unlabeled nodes are solved, a node can be assigned to the foreground label “+1” if the probability . Otherwise, it is assigned to the background label “−1” if the probability .

4. Experimental Setup and Results

In this section, the experimental results of the improved random walker are shown and the performances are validated on the LIDC dataset. The visual results of extensive experiments and the results of quantitative analysis in terms of overlap score and F-measure are shown. We also validate the sensitivity of the improved random walker by varying the number of background seeds in Section 4.5 and the robustness of running time in Section 4.6, respectively.

Experimental results have demonstrated that the improved random walker is capable of segmenting GGO pulmonary nodules without user interaction. Quantitative and qualitative evaluations on the LIDC dataset also show that the improved random walker significantly improves segmentation performance of GGO pulmonary nodules. All tests are performed on a Windows platform using MATLB R2013a and under the same computer configuration: Intel (R) CPU E3-1225 v5 @3.30 GHz with 4.0 GB RAM.

4.1. LIDC Dataset

All experiments are conducted on the LIDC dataset. The Lung Images Dataset Consortium (LIDC) [49] is a web accessible international pulmonary nodule dataset for the evaluation of pulmonary nodule segmentation methods, which contains 1018 CT scans and associated XML files that record the nodule information of a two-phase reading process performed by four board-certified thoracic radiologists. Lung images were acquired by several CT scanners with different manufacturers and pulmonary nodules were judged by four board-certified radiologists. Figure 4 shows an example result of ground truth generation through the annotations of CT slice. As shown in Figure 4(b), the outlines of the pulmonary nodules were drawn manually by four radiologists. For the visualization purpose, four different colors of the outlines indicate four different segmentation results of the pulmonary nodules obtained by four radiologists. The aquamarine, yellow, blue, and purple colors indicate segmentation results of the pulmonary nodules obtained by four radiologists. Figure 4(c) shows the corresponding ground truth used in this paper. A 50% consensus criterion [14] is used to produce the outline of ground truth in this paper. We randomly selected 100 CT images with GGO nodules from the LIDC database, which provide different shapes, sizes, and texture information. Their diameters range from 3 mm to 30 mm (average 9.80 mm). All slices used in the experiments are intensity-normalized with gray level from 0 to 255.

4.2. Parameter Setting

In the seed acquisition step, there are four parameters that control the location and quantity of the seeds. The parameter is introduced to determine the size of . In this paper, is set a large range from 0.5 to 3.5, which is based on nodule size. One validation metric for segmentation performance is the overlap score, which measures the overlapping area between the segmentation results and ground truth. The overlap score is formulated in the following equation:where and are the segmentation result and the ground truth, respectively. represents the number of pixels in both and , and is the number of pixels in either or , or both. The value of overlap score ranging from 0 to 1 indicates the degree of the accuracy. A high value of overlap score indicates the better segmentation performance. On the contrary, the overlap score has a low value when the segmentation result and the ground truth are inconsistent.

The quantity of background seeds is controlled by the parameter . It is also well known that the random walk method depends on the initial nodule and foreground seeds. We validate how sensitive the improved random walker is to the number of background seeds. First, we run the codes of the proposed method on 10 different cases using the different number of background seeds. The number of background seeds is in the range of [10, 100], where the step length is 20. Table 1 shows the overlap score results for ten different cases with ten different numbers of background seeds. To better verify the effectiveness of the proposed method, 538 CT images with GGO pulmonary nodules are randomly selected from the LIDC dataset to perform the experiment. Overlap scores slightly increase as the number of background seeds increases. Ultimately, it will remain stable to some extent. The parameter produces considerably good results for the experiments.

In the energy function designation step, the parameter in (8) is introduced to control the tradeoff between two terms. When the value of parameter is zero, the segmentation is based on the conventional random walk. The impact of the label constraint term increases as the value of increases. We vary the values of the parameter from 10 to for each CT image to obtain different segmentation results. Figure 5 shows the segmentation results of the improved random walker with varying parameter . From left to right are the segmentation results obtained with , , , and , respectively. Table 2 shows the overlap scores of 10 cases using four different . As shown in Figure 5 and Table 2, there are not significant changes when is a large positive number. In all experiments, the parameter is set to . Similarly, we set and in the local search strategy process.

4.3. Experimental Tests

In this section, the visual analysis of massive experiments is available to validate the improved random walker. The LIDC dataset is used to conduct all experiments. To verify the effectiveness of the acquired nodule seeds, we conduct a comparison experiment between the improved random walker with the acquired nodule seeds and the improved random walker without the acquired nodule seeds. For a fair comparison, the same background seeds are employed in this experiment to reduce the influence of background seeds.

Figure 6 shows the acquired nodule seeds. Figure 6(b) shows the nodule seeds obtained by user and Figure 6(c) shows the nodule seeds obtained by geodesic distance and a local search strategy. Red spots specify the nodule seeds. Figure 7 shows segmentation results by the improved random walker with the acquired nodule seeds and the improved random walker without the acquired nodule seeds. Figure 7(a) shows the background seeds obtained by a local search strategy. Blue spots specify the background seeds. Figure 7(b) shows the segmentation results by the improved random walker without the acquired nodule seeds and Figure 7(c) shows the improved random walker with the acquired nodule seeds. As shown in Figure 7(b), many pixels belonging to the nodule region cannot be accurately segmented. From Figure 7(c), we can see that the improved random walker with the acquired nodule seeds improves a GGO pulmonary nodule segmentation. Therefore, the experimental result indicates the benefits of the acquired nodule seeds.

To verify the effectiveness of the proposed energy function, we will conduct a quantitative comparison experiment between the improved random walker and the conventional random walk and discuss the comparison results of the conventional random walk. To run random walker, the source codes from the author’s homepage were downloaded. The results by the improved random walker are displayed in Figure 8(b). The results by the conventional random walker are displayed in Figure 8(a) for comparison. As shown in Figure 8(a), the conventional random walk yields seriously the oversegmentation phenomenon. It is because conventional random walk has high sensitivity that some nonnodule pixels are labeled as nodule seeds inevitably in the initial nodule seeds acquisition step. In contrast, the improved random walker completely removes the part of oversegmentation. As observed from Figure 8(b), it can be clear that the improved random walker achieves a better segmentation result than the conventional random walk. The outlines of pulmonary nodule segmentation by the improved random walker are more close to the ground truth than conventional random walk, which is shown in Figure 8(d).

Although the conventional random walker can segment out the most part of pulmonary nodules from surrounding pulmonary parenchyma, the pulmonary nodule pixels are more or less leaked into pulmonary parenchyma incorrectly. The improved random walker considers the consistency between the redefined labels and the calculated labels in the new energy function optimization to alleviate the disturbance of seeds to some extent. Therefore, the segmentation performance will be further improved by encouraging the label consistency according to the energy optimization. After adding the label constraint term of the energy function, our method performs well for segmentation of GGO pulmonary nodules, as shown in Figure 8(d). This segmentation improvement may be because the local search strategy of seed acquisition yields the reliable seeds to guide the segmentation.

4.4. Quantitative Results
4.4.1. Quantitative Results Using Overlap Score

To verify the effectiveness of the improved random walker, we perform a quantitative comparison between the improved random walker, the conventional random walker with the acquired seeds, and the conventional random walker without the acquired seeds by using the overlap score. 23 CT images with GGO pulmonary nodules are randomly selected from the LIDC dataset, which provide different shapes, sizes, and texture information. In the experiment, the parameter is set to 2, is set to 10, and is set to 5 in the local search strategy process. The parameter is set to 100 in background seed acquisition. The parameter is set to 100 in the energy function designation step.

The comparison results of overlap scores are shown in Table 2. The mean value and variance of overlap scores are then calculated in Table 3. The first column shows the case ID numbers. The third column shows the overlap scores calculated by the improved random walker. The fourth and fifth columns show the overlap scores calculated by the conventional random walk with our seeds and the conventional random walk without our seeds, respectively. As shown in Table 3, the improved random walker generated higher average overlap score than the conventional random walk with our seeds, which proves that a label constraint term of energy function can obtain better prior information and further improves the segmentation result. Except for the advantage of the improved energy function process, the comparison between the conventional random walker with our seeds and the conventional random walker without with our seeds also shows the necessity of the acquired seeds. As shown in the fourth and fifth columns of Table 3, the conventional random walk with the acquired seeds obtains the average of 0.8011, and the conventional random walk without the acquired seeds obtains the average of 0.7604. The conventional random walk with the acquired seeds slightly outperforms the conventional random walk without the acquired seeds by less than 0.04 on average. To better verify the effectiveness of the acquired seeds, 538 CT images with GGO pulmonary nodules are randomly selected from the LIDC dataset; the conventional random walk with the acquired seeds obtains the average of 0.8354, and the conventional random walk without the acquired seeds obtains the average of 0.7849. The experimental results show that the acquired seeds are effective.

Hence, the acquired seeds can improve segmentation performance. The improved random walker appears to be more stable, in that it has smaller standard deviation of 0.0529. The results demonstrate that the improved random walker has a higher degree of accuracy in terms of the highest average overlap scores and has a higher degree of robustness in terms of the lowest standard deviations of overlap scores. To better verify the effectiveness of the proposed method, 849 CT images with GGO pulmonary nodules are randomly selected from the LIDC dataset, the proposed method obtains the average of 0.8649, and the conventional random walk obtains the average of 0.7937. The experimental results show that the proposed method outperforms the conventional random walk.

Overall, the improved random walker outperforms the conventional random walk. To validate the effectiveness of the proposed method, an execution time comparison experiment of the proposed method with two other methods was implemented. The conventional random walk without the acquired seeds obtains the average of 3.8497 and the standard deviation of 1.2478. The conventional random walk with the acquired seeds obtains the average of 2.6462 and the standard deviation of 0.8394. The improved random walker obtains the smaller average of 1.3536 and the standard deviation of 0.1165. The results demonstrate that the improved random walker has a smaller execution time.

4.4.2. Quantitative Results Using F-Measure

To further verify the performance of the improved random walker, we adopt the second metric, F-measure. Precision is defined as the ratio of the sum of intensities inside the nodule region to the total intensities calculated in the CT imaging. Recall is defined as the ratio of the total pixels captured inside nodule region to the area of the user annotated window. F-measure [51] is defined as the weighted harmonic mean between the Precision and Recall values, which is formulated as follows:where is a tradeoff factor controlling the importance of Precision and Recall. In our experiments, it is fixed to 0.3 empirically to weight Precision more than Recall.

In this experiment, 10 cases with GGO pulmonary nodules are randomly selected from the LIDC dataset, which provide different shapes, sizes, and texture information. For each binary map of pulmonary nodules, the Precision, Recall, and F-measure are calculated on ten different images. The results are shown in Table 4. As shown in Table 4, the improved random walker obtains a high average F-measure of 0.8951. To evaluate the proposed method’s efficiency, the comparison is produced between the improved random walker and the other two methods by F-measure. The results are shown in Table 5. As shown in Table 5, the improved random walker obtains a high average F-measure of 0.8951.

Further, to evaluate the computational efficiency, the comparison between the improved random walker and Kubota’s method [14] in terms of the executive times is shown in Table 6, which is measured in seconds on LIDC dataset. The improved random walker required 1.35 seconds to segment each CT image on average, which was much faster than Kubota’s method [14]. We expect that the execution time would be better.

4.5. Comparisons with Other State-of-the-Art Methods

We evaluate the improved random walker with several state-of-the-art methods, including Kostis’s method [3], Okada’s method [23], Kuhnigk’s method [20], Kostis’s method [14], Messay’s method [52], Ye’s method [33], and Wang’s method [34]. These methods employed the LIDC database to evaluate the performance of pulmonary nodules segmentation and the overlap scores have been calculated by the authors. For the evaluation of nodule segmentation methods from the LIDC dataset, deep learning network can obtain satisfactory segmentation results, but it relies more on the data to train the network. They cannot be guaranteed using the same cases. Therefore, the evaluation results may have the variability for different nodules to a certain degree. Despite these differences, the performance comparison between the improved random walker and the state-of-the-art methods is valuable. Table 7 summarizes the average and standard deviation of overlap score values of five segmenting methods on the LIDC dataset. In total, we obtained the average overlap score of 0.86. As we can see, Okada’s method [23] presented an overlap score of , which is relatively lower compared to other methods due to the discrepancy between the ellipsoid model and nonellipsoidal nodules, which did not well handle nonellipsoidal GGO nodules. Kostis’s method [3] and Kuhnigk’s [20] method presented the overlap scores of and , respectively. Kostis’s method potentially assumed that pulmonary nodule is (usually) roughly spherical or ellipsoidal shapes. So, it also did not well handle GGO nodules. Kuhnigk’s method used morphological opening processing for pulmonary nodule segmentation, which is suitable for both small and regular nodules. However, for GGO nodules with fuzzy and irregular boundary, the segmentation results were unsatisfactory, since erosion operation may remove a portion of the nodule. Therefore, this method presented a relatively lower overlap score. Kubota’s method [14] reported the overlap score of , which has a relatively high overlap score compared to the above three methods. Kubota et al. employed competition-diffusion (CD) method to obtain the foreground object and region growing to obtain final segmentation results, which was also less robust and accurate, resulting in undersegmentation results for GGO nodules compared to our proposed method, especially for GGO nodule with spiculations. Messay et al. [52] reported the overlap score of for the hybrid method, which obtained a higher value than those of the above-mentioned methods. The performance of segmentation has a considerable boost by using a regression neural network approach; however, this method required carefully manual supplied control points to improve the segmentation results. Ye’s method [33] was used in AlexNet and GoogLeNet to detect GGO pulmonary nodules and created the input image of the three-dimensional features to train the deep network, which obtained the overlap score of . Ye’s method has a relatively higher overlap score than the above-mentioned methods. Wang’s method [34] built a cascade architecture with both segmentation and classification networks for automatic GGO nodules segmentation and obtained the overlap score of . The cascade model in the data level performs better and is more stable than Ye’s method. Herein, we achieved a relatively higher overlap score compared to other methods, , which is a boost in comparison to the other methods. The good performance of the proposed method can be attributed to the powerful discriminating affinity matrix and a label constraint term of the energy function for handling the fuzzy and irregular GGO nodules. In addition, the rapid training of large amounts of data and the determination of hyperparameters are also problems to be solved in the future of deep learning. The existing network model or network model combined with traditional methods will become a popular trend.

5. Discussion

The irregular shapes, fuzzy boundaries, and low contrasts between the pulmonary nodule and surrounding background prohibit accurate GGO pulmonary nodule segmentation using simple methods based on thresholding, region grow, and morphological methods. It is clear that the segmentation of GGO pulmonary nodules requires a specialized method. The random walker has been paid more and more attention for interactive image segmentation. It produces a good segmentation. The intensity, texture, and spatial features are incorporated to construct a new affinity matrix. It strengthens discriminative power between two adjacent nodes on the graph. To automatically acquire seeds, the geodesic distance is introduced and a novel local search strategy is presented to automatically select reliable seeds. For segmentation, a label constraint term is introduced to the energy function of original random walker, which alleviates the accumulation of errors caused by the initial seeds acquisition. The improved random walker requires no user interaction.

The differences in performance of the improved random walker and the conventional random walker were found to be statistically significant in terms of the overlap score. Based on 23 cases consisting of different sizes, shapes, and locations of pulmonary nodules, the improved random walker method obtains a higher average overlap score compared to the conventional random walker, which is shown in Table 3. This good segmentation performance can probably be ascribed to the acquired seeds and the energy function designation. To further verify the performance of the improved random walker, we adopt the second metric, F-measure. The improved random walker has a high average and a low standard deviation of F-measure values. Dakua and Sahambi [53] proposed a method of the automatic seed selection using cantilever beam equation and a combined adaptive threshold technique and located the seeds on demand at different locations around LV boundary. The proposed method adopted the adaptive threshold method to roughly separate the nodule from the background. The pixels are roughly identified as a part of pulmonary nodule, whose intensities are greater compared to the global threshold. The remaining pixels are considered as the background pixels. Geodesic distance was adopted to create a small region from an initial segmentation result. To acquire the nodule seeds as uniformly as possible, a local search strategy was introduced to acquire the other nodule seeds.

6. Conclusions and Future Works

In this paper, we propose an improved random walker method for segmentation of GGO pulmonary nodules. This algorithm is an extension of the previously proposed algorithm [54]. The automatic seeds acquisition is significant when a massive CT dataset needs to be examined. The geodesic distance and a local search strategy are introduced to automatically acquire GGO nodule and background seeds. The main advantage of the improved random walker is to automatically and accurately segment GGO pulmonary nodules without any user interaction and shape assumption. The proposed local search strategy incorporates intensity and texture features to define a similarity rule, which can assign the pixels to a nodule seed set or a background seed set. The proposed affinity matrix consists of an adjacency component and a feature-based component. The adjacency component is defined based on the spatial distance. The feature-based component is defined based on intensity and texture features. The proposed energy function adds a label constraint term, which alleviates the accumulation of errors caused by the initial seeds acquisition. The weight of the label constraint term is based on fuzzy membership value, which is calculated by building two GMM models. The proposed method is implemented efficiently and simply.

The results have demonstrated the robustness and efficiency of the proposed segmentation method for the segmentation of GGO pulmonary nodules. The proposed energy function is minimized by solving a linear system. The experimental results have shown that the improved random walker achieves satisfactory segmentation results by both quantitative and qualitative performance assessment, especially in complex GGO pulmonary nodules.

In future work, we will improve the accuracy and efficiency of the improved random walker for some complex GGO nodules. We also will extend this method to segment various pulmonary nodules. When we directly calculate the inverse matrix in (14), the computation cost is expensive, especially when the number of image pixels is very large. Therefore, we will research how to speed up the computation of the improved random walker.

Data Availability

Supporting data are available from LIDC database:https://download.csdn.net/download/aristocles118/12390876.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank Dr. P. Chen, Dr. R. Bai, Dr. L. Zhang, Dr. F. Long, Radiologist G. Q. Qiao, and Engineer L. Tang for their helpful comments and advice which contributed much to this paper. This work was supported by the National Natural Science Foundation of China (61305038 and 11901113), the Guangdong Youth Innovation Talent Project, China (natural sciences, 2018KQNCX086), the Guangzhou Science and Technology Plan Project, China (202002030231), Guangdong Basic and Applied Basic Research Foundation (2019A1515011148), and the Fundamental Research Funds for the Central Universities, SCUT (2019MS139).