Abstract

The accurate segmentation of cervical cell images is one of the key steps of the cervical cancer computer-aided diagnosis system. For the problem of overlapping cell and boundary blurring in cervical cell clusters, the researchers propose a segmentation algorithm based on the nuclear radial boundary enhancement for overlapping cell of cervical cytology images. This method not only suppresses the noise of cervical cytology images but also preserves the contrast of overlapping cell boundary. The researchers generate the weight graph by the candidate contour points and contour line segment attributes and utilize the dynamic programming algorithm to find the shortest path in the weight graph. The shortest path corresponds to the coarse segmentation contour in the cell image. The level set model is used to finely segment the obtained coarse cell segmentation boundary, so as to obtain the final cervical cell boundary. Through the quantitative and qualitative evaluation results, such as dice similarity coefficient, true positive rate, and false positive rate, it can be seen that the overlapping cell segmentation algorithm in this paper has achieved better segmentation results. Compared with other current overlap cell segmentation algorithms, the segmentation results obtained in this paper have greater advantages.

1. Introduction

The accurate segmentation of cervical cell images is one of the key steps of the cervical cancer computer-aided diagnosis system. After years of development, segmentation technology of the cervical cytology images has made vast improvements and the performance of the cervical cancer computer-aided diagnosis system has been greatly improved. However, due to the characteristics of the actual cervical cytology images and the complexity of the cell morphology, the current segmentation technology still needs to be improved.

Cytoplasmic characteristics have been shown to be critical for the identification of abnormal cells [1]. The accurate segmentation of cytoplasm in cervical cytology images is the core part of cervical cell segmentation. Once the cytoplasmic boundary is located, quantitative evaluation index values such as cell diameter and nuclear-to-cytoplasmic ratio can be calculated. Some early methods solved the problems of segmentation of free cells which did not overlap with neighbouring cells or only a small part of overlap and partially overlapping cells [27].

When the intensity of the cervical cytology images meets the bimodal distribution, the automatic threshold segmentation method [3, 5] can get better segmentation results. Wu et al. [5] used threshold technology to identify the cell in single-cell image. In the poor contrast and uneven staining cervical cytology images, the segmentation results obtained by the threshold method were not ideal. Harandi et al. [3] proposed a multi-resolution threshold cell clump and cytoplasm detection method. This method locates cell clump in low-resolution images, and detects cytoplasm and cell nucleus boundaries in high-resolution images. And the geometric activity model is used to segment the cytoplasm of the single-cell cervical image and the cell image with a low overlap rate.

The active contour model based on gradient vector flow-Active Contour Model (GVF-ACM) [8] is not only robust to the initial contour, but also has the advantage of converging to the cell concave boundary, so this method is widely used in single Cytoplasmic segmentation of cell images. However, the GVF-ACM method cannot obtain better segmentation results under the situation of poor contour contrast. To solve this problem, Changkong et al. [6] proposed a Cytoplast and Nucleus Contour (CNC) detector for extracting the nuclear and cytoplasmic regions in cervical cytology images. At the same time, this method utilizes the K-means method to segment the cytoplasm from the background area of the cervical cell image. When malignant cells are exposed to cytotoxic effects, Song et al. [9] proposed a process in which cysteamine is accumulated in diseased cells and converted to DMS on a massive scale. This method is not only reliable for identifying malignant cell lines, but it may also be used to investigate how drug-related cytotoxic effects affect cancer cell metabolism.

Yang-Mao et al. [2] proposed an Edge Enhancement Nucleus and Cytoplast Contour (EENCC) based on the research results of Changkong [6]. EENCC effectively removes the Gaussian noise and impact noise in the cervical cell image by truncating median filter, and retains the sharp boundary. The main contribution of this detector is the Mean Vector Difference (MVD) algorithm, which assigns a directional gradient vector flow to each pixel in the image, and finally detects the nucleus and the cytoplasmic boundary by the Otsu threshold method. However, this method cannot solve the problems of boundary closure and continuity, and the method based on GVF is too sensitive to noise. By adjusting the level of the speckle noise, Bhardwaj et al. [10] showed that the mean filter offers better results than the median filter on the sampled ultrasonic images. This experiment uses extremely simple filtering techniques that can be improved by utilizing a variety of other filters such as bilateral, trilateral, wavelet-based, and entropy-based filters, among others.

Tsai et al. [7] combined unsupervised classification algorithm and ACM model for cytoplasmic segmentation of single-cell images. The ACM model based on boundaries and regions can accurately detect pixel-level closed target boundaries. Therefore, this method solves the problems of boundary closure and continuity in the literature [2]. In order to solve the noise interference problem of the GVF algorithm, Li et al. [4] proposed the radial GVF snack algorithm on the basis of the multi-directional GVF snack algorithm [11]. The algorithm calculates the gradient of the radial gradient value of each pixel in the image on the radial line segment of the specified nucleus, and uses the radial gradient value to replace the gradient in the GVF algorithm. The ability of the GVF method based on radial gradient to locate the fuzzy boundary is significantly improved, and the gradient distribution along the radial direction can reduce the false gradient caused by dyeing and uneven illumination. This method can accurately segment the cytoplasm and nucleus in an image containing only one cervical cell.

Generally speaking, overlapping cell segmentation is the most challenging task in cervical cytology image segmentation. In the process of cell sample preparation, uneven illumination and differences in dye concentration will result in low contrast between different components in the cell image. Other factors, such as drying cells, red blood cells, mucus, bacteria, and white blood cells, will increase the difficulty of overlapping cell segmentation [12, 13]. In addition, complex cell shapes and a high overlap rate between cells will also decrease the contrast of cell boundaries. In order to promote the development of overlapping cervical cell image segmentation technology [14, 15], the IEEE International Biomedical Imaging Society held the first and second cervical cell image segmentation challenges in 2014 and 2015, respectively. The dataset of the challenge is publicly available, and the dataset contains training set and test set, and the training set gives the results of manual annotation. This makes it possible to evaluate and compare different overlapping cell segmentation methods. Therefore, most of the overlapping cell segmentation methods in recent years are proposed to solve the overlapping cell segmentation task of this dataset.

It was Ushizima team [16] that won the championship for the first time in the challenge. Ushizima et al. used graph-based linear time algorithm [17] and global search truncation algorithm [18] to obtain cell clumps and nuclear regions based on the similarity of the intensity of adjacent pixels. The overlapping cells in the cell clump are divided into convex polygons through the narrow-band seed region of the nucleus, the graph-based region growth algorithm [19] and the Voeonoi diagram method [20], as shown in Figure 1. It can be seen from the figure that this method can only divide the overlapping area of cells by a straight line, and cannot obtain the area between overlapping cells. Therefore, the accuracy of cell segmentation by this method has a great improvement [21].

The champion of the second challenge was Phoulady’s team [22], which adopted the iterative threshold method and the regularization level set evolution algorithm based on the ellipse shape assumption. The iterative threshold is used to segment the cell clump and cell nucleus area, and the cell clump area is divided into windows of specified size. According to the attribute values such as the intensity mean and variance of the window, the area closest to the nucleus of the window is detected to complete the segmentation of overlapping cells. Finally, the smooth cytoplasmic boundary is obtained by the evolutionary algorithm of the regular level set assumed by the ellipse shape. On this basis, the literature [23] defines a focus metric for each subimage in the image, and assigns a nucleus to each subimage according to the position information and focus similarity. A two-step method including subimage-level boundary coarse segmentation and pixel-level boundary refinement segmentation is used to complete overlapping cell segmentation.

In order to obtain a more accurate cytoplasmic segmentation boundary, Phoulady et al. [24] improved the two-step boundary segmentation algorithm. This method finds new boundary candidate points by defining a weight vector, and uses a smoothing filter to smooth the candidate boundary points. Identify the outliers in the candidate boundary points by the distance between each candidate point and its new position. The smoothing filter is used again on the candidate boundary points from which outliers are deleted to obtain the final cytoplasmic boundary. The greatest contribution of the literature [24] is the use of the depth information of the stacked images to obtain a more accurate coarse cytoplasmic segmentation boundary.

On the basis of the ellipse shape prior model in literature [22], Nosrati and Hamarneh [25] optimized the algorithm in literature [17] by using the star shape prior to replace the original elliptical shape prior and obtained more accurate cytoplasm segmentation result. These methods prove that combining shape priors into the parameter segmentation process can significantly enhance the segmentation effect of overlapping cells. However, the ellipse shape prior and the star shape prior are too simple to represent the true shape of cervical cells. The overlapping cell segmentation algorithm using a fixed shape prior cannot provide sufficient shape information for the overlapping part of the cell. In order to solve this problem, in literature [26], the authors proposed an automatic shape prior algorithm based on sparse approximation to segment overlapping cervical cells. In the literature [27], Tareef et al. improved the method, by using methods based on deep learning and dynamic shape models to separate individual nuclei and cytoplasm. This method improves the performance of cell nucleus detection and solves the problem of low sensitivity of cervical cell segmentation, and improves the accuracy of overlapping cervical cell segmentation.

In view of the computational efficiency of the super-pixel method, some scholars utilize an improved super-pixel method to complete overlapping cervical cell segmentation. In literature [26], the authors used a fast migration algorithm to generate super-pixel images, and used unsupervised binary classifiers and MSER algorithms to complete cell clump and nuclei segmentation. Finally, the overlapping cytoplasm is segmented by the combination of multi-level set functions. Huang et al. [28] used the SLIC method, adaptive threshold method, and local threshold method to obtain super-pixel maps, cell clusters, and cell nucleus regions, respectively. The method of assigning each super-pixel to the nearest cell nucleus is used to complete the coarse segmentation of the cervical overlapping cell image. Then the fine segmentation of the coarse segmentation boundary is completed by the graph cut algorithm. Since this method uses the “closest distance” method to segment overlapping cells, this method can only achieve better segmentation results if the nucleus is roughly at the centre of the cell.

With the development of artificial neural network technology, some neural network methods have also been applied to overlap cervical cell segmentation [29]. Song et al. [30] used a segmentation framework based on deep learning and deformation models to segment overlapping cells. This framework not only explores the shape relevance of overlapping cells, but also uses cell structure, background information and multi-cell labelling models to complete overlapping cell segmentation. The multi-label model transforms the segmentation problem into a discrete point labelling problem, and marks the cytoplasmic pixels as corresponding nuclear labels obtained by the CNN method. Then use Gaussian kernel fitting to determine the coarse segmentation boundary of the cell nucleus, and finally use the dynamic multi-template deformation model to refine the segmentation of the cell boundary. CNN has the disadvantages of requiring a large number of samples and high computational complexity.

In cervical cell images, the segmentation of overlapping cell is largely dependent on image quality. As the researchers all know, when the image is polluted by noise, how to complete the denoising operation of the image while retaining the boundary features is a very difficult problem. Typically, cervical cell images are affected by Gaussian noise and shock noise. In the early years, many scholars have carried out extensive research on how to remove noise in images, such as median filtering, Gaussian filtering, and Type-B filtering. Early filtering methods can remove Gaussian noise and shock noise in the image very well, but these methods also reduce the resolution of cell boundaries. Based on this problem, Li et al. [31] proposed an improved median filter (Trim-meaning), which can effectively remove the impact noise and Gaussian noise, and retain the sharp part of the boundary. The method extracts the pixel value in the neighbourhood window of each pixel in the image, arranges it in ascending order, deletes the pixel value of the head and tail according to a certain size, and then replaces the pixel with the mean value. Although this method can retain sharp boundaries, the boundaries of overlapping cells are indistinguishable for cervical cell images [32].

Guan et al. [33] proposed an edge enhancement method based on gradient decomposition. The method first uses morphological filtering to denoise the cell image, and then uses the direction of the gradient value at the pixel point and the angle between the point and the centre of the nucleus to enhance the cytoplasmic boundary belonging to the nucleus. This method needs to calculate the gradient and radial direction angle of each pixel point, which is overly dependent on the condition that the cells need to satisfy the convex boundary. Many boundaries of real overlapping cervical cells do not meet the condition of convex boundary, so this method cannot obtain the ideal boundary enhancement effect.

2. Methods

2.1. Preprocessing

In the preprocessing step, this paper uses the filtering and gradient calculation algorithm based on the radial region of the nucleus to complete the image denoising and gradient calculation tasks. The flowchart of the cervical cell segmentation method in this paper is shown in Figure 2.

2.1.1. Filtering Based on the Radial Region of the Nucleus

In the overlapping clumps of cervical cell images, different cells overlap each other, making their boundaries more blurred. The structural analysis of cervical cells shows that most of the nuclei are located in the cells [34]. Although the positions vary greatly, the cytoplasmic boundary surrounds the nucleus. Cell boundaries exhibit different orientations in different radial directions of the cell. Based on this observation, a filtering algorithm for the radial region of the nucleus is proposed, which can not only effectively remove the noise in this region, but also preserve the resolution of the cytoplasmic boundary to the maximum extent.

is the specified cell, is the centre of the cell, and is the radial with as the centre and as the direction. The coordinate of the point on the radial is , where is the Euclidean distance between and . According to the above definition, with the nucleus as the centre, the cell area is divided into several radial areas according to the step size . For different radial regions, different kernel functions are utilized to denoise the regions. If the step size is too large, the robustness of the algorithm is poor, and if the step size is too small, the corresponding cytoplasmic boundary cannot be preserved.  = 22.5 is selected experimentally, and the image was divided into 16 intervals. When  = 22.5, the division result of the radial area is shown in Figure 3.

According to the value of , this paper sets four denoising kernel functions and the kernel function settings are shown in Figure 4.

For the pixel point on the cell image, select the corresponding kernel function according to its position in the specified cell. The selection of the kernel function is shown as follows:

In the formula, represents the radial direction of the pixel; is the round-up operation; is the remainder operation; is the round-down operation. When the point on the image finds the corresponding kernel function , the window area cantered on the point is convolved with the selected kernel function , as shown in

Here, is the indicator variable of the contour points in the region, and is the convolution operation. Comparison of filtering results of different methods is shown in Figure 5.

The method in this paper uses to replace the gray value of the point, so as to achieve the purpose of filtering and retaining the boundary. The definition of is shown in

Here, is the boundary judgment threshold, which is selected as 5. If is greater than the set boundary judgment threshold, the point may be a contour and is replaced by the local maximum; if the intensity of the point is smaller than the threshold, the point is replaced by the mean value of the window area. As shown in Figure 6, the algorithm based on radial region filtering can filter the cervical cell image more effectively and preserve the cell boundary. The overlapping cytoplasmic boundary has a higher resolution after filtering by the regional filtering algorithm.

2.1.2. Gradient Algorithm Based on the Radial of the Nucleus

The contour corresponds to a series of brightness jumps, and the gradient of the image can be used to extract the target contour. There are many boundary gradient algorithms, such as gradient algorithm, Prewitt operator, Sobel operator, etc. [30]. Figure 6 shows the boundary gradient image obtained by the gradient algorithm, Prewitt operator and Sobel operator. As can be seen from Figure 6, the Sobel operator gives a stronger gradient than the other two algorithms. Therefore, this paper uses the Sobel operator based on the radial region of nucleus to generate the gradient map of the image.

This paper proposes a gradient map calculation method based on the radial region Sobel algorithm, which is used to calculate the gradient image of the denoised image. The algorithm utilizes Sobel gradient operators in different directions to enhance the cell boundary. Figure 7 shows the 5  5 Sobel convolution operators in four different directions.

The radial region gradient of point is defined as formula (4).In the formula, represents the convolution operation, and the calculation method of is shown in formula (4). The maximum gradient and the minimum gradient of the radial region are and respectively, and the definition of the gradient of each pixel in this region is shown in formula (5). The gradient image of the specified nucleus is , and the definition of is shown in formula (6).

The filtering and gradient calculation method based on the radial region proposed in this paper can remove the noise in most of the images under the condition of retaining the specified cell boundary. Compared with the methods of literature [2, 33], it has better results, as shown in Figure 8.

2.2. Overlapping Cell Segmentation of Cervical Cytology Images
2.2.1. Constructing a Weight Graph Based on Candidate Boundary Points

After specifying a cell in the cell clump, the researchers can get a gradient image obtained by the radial region gradient algorithm and then search candidate contour points in the . The weight map of the cell is constructed using the candidate pixel points. For a given cell, there is one and only one true contour point in any radial direction. In general, the contour points have the largest gradient values in the radial direction. However, due to the overlapping of cells and other reasons, the maximum value in the radial direction may not be the real contour point. Therefore, it is necessary to find the pixel points with larger gradient values in the radial direction as candidate contour points, and then find the true cell contour in the candidate contour points.

For the cell in the cell clump, the pixel point on the radial with the nucleus as the centre and as the direction is defined as , and the discrete points on the radial can be defined as . In order to obtain the candidate contour points on the radial , the researchers firstly arrange the pixels on the radial in descending order of intensity value, and take the first 1/3 of the points as the primary selection points of the candidate contour points. In fact, candidate contour points are contained in these pixel points. If a candidate pixel has a large gradient value, the surrounding pixels may also have a large gradient value. The Linkage algorithm [35] is used to cluster the coordinates of candidate contour points. The point with the largest gradient value in each category in the clustering result is selected as the candidate cell contour position. The candidate contour positions of the gradient image are shown as red marks in Figure 9(a).

It can be seen from the candidate points in the gradient map that each candidate cell contour point in the radial direction contains the real contour point. Next, the researchers construct these candidate contour points into a weight graph, and use the path search algorithm of the weight graph to complete the rough segmentation of the cell boundary.

In order to complete the graph construction of the candidate contour points, the starting point of the candidate contour points is the specified seed contour points. When there is a seed contour point on a radial line segment, other candidate contour points in the radial direction are deleted. The following three steps are used to determine whether a candidate contour point is a seed contour point:(1)Detect the candidate contour points located on the boundary of the cell clump(2)Connect the point to the centre point of the nucleus, if any point on the connection line is in the background, exclude the point(3)Calculate the mean distance of all the seed contour points from the centre of the nucleus, and only retain the seed contour points whose distance from the centre of the nucleus is less than the mean distance of the contour points

The researchers can obtain seed contour points by using the above three steps. The seed contour points are marked in green as shown in Figure 8(b). Next, based on the obtained candidate contour points and seed contour points, a cell coarse segmentation weight graph is constructed. The weight graph construction is mainly divided into four steps:(1)Construct column empty weight graph , each column in the weight graph corresponds to a radial direction of the cervical cytology image(2)Select any seed contour point in the image as the starting contour point, and add a node to the first column of graph (3)Starting from the starting contour point, select the three candidate contour points closest to it in the next radial direction, and add three nodes to the next column of the graph (4)And so on, until the column nodes in the weight graph are all added

2.2.2. Generating Boundary Weights and Shortest Paths

The generation of weights between nodes in the weight graph is the core step in solving the optimal contour problem using graph methods. The following constructs the weights between nodes in the weight graph around the attributes of contour points and the attributes of line segments between contour points.

Although overlapping cells, uneven illumination, and artefact can degrade image quality, most cell boundaries are still identifiable. By observing the contour points in the image, it can be seen that the intensity and gradient of different candidate contour points located on the boundary have a certain similarity. The radial gradient of the cell contour points and the window mean are used to define the weights between nodes in the weight graph.

Define as the pixel value of the pixel at the position, as the centre of the specified nucleus; in the radial direction , the candidate contour point is the farthest point from the center of the nucleus; is the line segment between the centre of the nucleus and the point . Figure 10 shows an example of two in different directions.

In this paper, the Bresenham algorithm [36] is used to obtain the coordinate points on the and line segments and the radial direction difference of the coordinate point along is defined as shown as follows:

The radial gradient of the coordinate point is shown as follows:

Figure 11(a) shows the gradient intensity distribution in the radial direction. In order to better measure the intensity similarity between contour points, the gray window mean is used to replace the intensity value of the contour points, and the definition of the window mean is shown as follows:

In the formula, is the neighbourhood window centred at the point, and is the intensity value of the pixel in the window. Figure 11(b) shows the radial window mean intensity distribution.

For two connected nodes in the weight graph, the smaller the difference between their radial gradient values and the mean, the more likely they belong to the same cell. The gradient-window mean weight is defined as shown in formula (10). The intensity distribution of the gradient-window mean is shown in Figure 11(c).

The distance between candidate contour points is a very important condition to measure the similarity between contour points. In general, the smaller the distance between two contour points in adjacent radial directions, the greater the possibility that these two points belong to the same cell. As shown in Figure 12, it is obvious that for node C, it is closer to the contour point of C then B. In fact, both the “A contour point” and the “C contour point” lie on the contour of the specified cell. There are two main advantages to using distance weights: (1) distance weights reduce the possibility of contours moving away from their corresponding nuclei; (2) distance weights try to keep the cytoplasmic boundary as smooth as possible. In this paper, the distance attribute between candidate contour points is selected as the distance weight in the weight graph. The distance weight between nodes is defined as

The contour point weight between adjacent nodes in the weight graph can be obtained from the gradient-window mean attribute and distance attribute. The definition of is shown as follows:

Here, and are the weight factors of the gradient mean attribute item and the distance attribute item, respectively. Before is combined, both and are normalized to [0, 1].

The similarity between contour points has certain limitations, which can only reflect the local features of contour points, and cannot extract sufficient cytoplasmic contour features. Therefore, the attribute of line segment between contour points is further adopted as a supplement to the attribute of contour point.

The contour point line segment refers to the line segment between the candidate contour point on the line segment in the radial direction of and the candidate contour point on the line segment . For most cell boundaries, the closer it is to 90 degrees from the radial direction, the more likely it is to be a true cell outline. Therefore, the angle between the radial direction and the line segment is defined as the direction attribute, and the definition is shown as follows:

By observing the cell image, it can be seen that the cell contour has a significant gradient, and in the gray domain of the image, the boundary region has a large variance. Therefore, the gradient and variance are used to define the attribute weights of the contour segment regions. The contour segment area is obtained by performing morphological operations on the contour point line segment, as shown as follows:

Among them, represents a structural element with a radius of 3 and is a morphological expansion operation. The definition of the contour segment region attribute is shown as follows:

Here, represents the gradient value of the pixel; represents the number of pixels in ; represents the standard deviation of the area.

Through the analysis and definition of the above-mentioned contour segment direction and area attributes, this paper defines the contour segment weight as shown as follows:

Finally, combine the attribute weight of the candidate contour point and the contour segment to obtain the weight between the connected nodes in the weight graph. The definition of is shown as follows:

After the weight graph is constructed, the Dijkstra dynamic programming algorithm is used to find the shortest path in the graph. The shortest path in the graph corresponds to the rough segmentation contour in the cell image, as shown in Figure 13(a).

2.2.3. Cell Contour Refinement Segmentation

Since the rough segmentation boundary is only the connection between the candidate contour points in the radial direction, it can only roughly describe the boundary of the specified cell. In order to obtain accurate cytoplasmic contours, this paper uses the distance regularized level set evolution (DRLSE) model [31] to fine-tune the obtained rough segmentation boundaries of cells. The rough cytoplasmic segmentation contour obtained is used as the initial value of the DRLSE level set algorithm.

The energy functional of the DRLSE level set algorithm can be approximately expressed as

The minimum value of the energy function is obtained by solving the gradient flow, as shown as follows:

When the energy functional reaches a minimum, the level set iteration stops. Figure 13(b) shows the refined cytoplasmic segmentation result of the DRLSE algorithm. It can be seen that the coarsely segmented cell boundary has evolved to the vicinity of the real cytoplasmic boundary, and the red boundary is the refined cytoplasmic segmentation boundary obtained by the DRLSE algorithm.

3. Results

3.1. Data

In order to evaluate the performance of the proposed overlapping cervical cytology segmentation algorithm, the proposed algorithm was evaluated using the datasets published in the first and second “ISBI Overlapping Cervical Cell Segmentation Challenge.” The samples in the dataset contained different numbers of cervical cells, and the cells varied in contrast, texture, and overlap between adjacent cells. Table 1 summarizes the datasets published by the two challenges.

The first ISBI Cervical Overlapping Cell Challenge was published in two separate datasets containing 945 synthetic images with a resolution of 512  512. For the first time, the challenge organizing committee announced 45 synthetic images and 90 test images. Among them, 45 synthetic images are used to adjust and train the parameters of the segmentation algorithm and 90 test images are used to evaluate the algorithm performance. The second published dataset contains 810 synthetic cervical overlay images that were used to evaluate the performance of the algorithms submitted by the contestants.

The second ISBI Cervical Overlapping Cell Challenge dataset contains 17 sets of cervical cell images from different cervical cell samples, of which the training set is 8 and the others are the test set. Each set of cervical cell images contains 20 stacked images and 1 EDF (Extended Depth of Field) image [37], and the resolution of each image is 1024  1024. Each set of images contains 20–60 cells distributed in different cell clumps. The number of cells in the clump and the overlap ratio between adjacent cells are different. For each set of stacked images, each image corresponds to a different focal plane of the same sample. The depth of the different focal planes under the microscope is approximately 1 micron. The test images marked the cytoplasmic boundaries by manual annotation. The overlapping cell segmentation algorithm proposed in this paper is evaluated on test images in different datasets.

3.2. Cytoplasm Segmentation Evaluation Method

The cervical cytology segmentation evaluation methods specified by the ISBI challenge include: pixel-based dice similarity coefficient , pixel-based true positive rate , pixel-based false positive rate , and object-based false negative rate . If the metric value of the overlapping area between the obtained cell segmentation region and the manually labelled (gold standard) cell region is higher than the specified threshold, the segmentation algorithm is success to identify the cell. is shown as follows:

Here, represents the number of pixels in the region. The ISBI Challenge specified threshold  = 0.7 for correct cytoplasmic identification. For the cells in the manually labelled (gold standard) dataset, if there is no value between the segmented cells and the manually labelled cells in the segmentation result greater than 0.7, the segmentation algorithm is said to have not segmented the cell correctly.

The cytoplasmic evaluation standard provided by ISBI Challenge is false positive rate based on pixel points. Due to the small cell size relative to the image size, previous studies [4, 15, 23, 28, 37] have achieved segmentation results with less than 0.01 for different datasets. It can be seen that the metric cannot effectively measure the performance of different cytoplasmic segmentation algorithms. Therefore, this paper adopts the object-based false recognition rate proposed in [24] to replace the used in the ISBI challenge. represents the proportion of misidentified cytoplasm among the objects identified by the segmentation algorithm. The combination of and can effectively estimate the cytoplasm identification performance of different algorithms. This paper uses the geometric median metric of and to measure the overall performance of the segmentation algorithm in terms of accuracy and recognition rate. The geometric median [24] is defined as

3.3. Cytoplasm Segmentation Results

In this paper, the parameters of the DLRSE algorithm for cell nucleus refinement segmentation are set as follows: the value of internal iterations  = 10; the value of outer  = 5; the weight of the regular term ; the time step  = 5. The weight parameters of the contour length functional and area energy functional of the zero level set are , , respectively.

For the weight value and of the distance attribute, the researchers need to find the best combination of and such that the overlapping cell segmentation algorithm achieves the highest metric value on the training set. Within the range of 5 sets of parameters (0.2, 0.5, 0.8, 1, 1.5) and 10 sets of parameters (0.2, 0.4, 0.6, …, 2), the optimal combination (0.8, 1.4) of and was found through experiments to obtain the maximum . The experimental results of the highest , and the lowest and are shown in Table 2.

The test dataset published by the ISBI2014 challenge contains 810 images of cells with different cell numbers and overlapping ratios. The 810 images in the test set were divided into 45 subsets according to the cell number and overlap ratio, and each subset contained 18 synthetic images with the same cell number and overlap ratio. Based on the 45 subsets of the ISBI2014 test set, the performance of segmentation algorithm with different cell numbers and overlapping rates can be evaluated. The cell image segmentation results of our algorithm with  = 0.7 for different cell numbers and overlapping rates are shown in Table 3. It can be clearly seen from it that when the overlap rate between cells is between [0, 1], our algorithm can identify all cells and obtain not less than 0.95. When the overlap rate is in the range of [0.1, 0.2], the algorithm can also identify all cells when the number of cells is less than 6; when the number of cells is greater than 6, only individual cells are not identified. Therefore, when the overlap ratio is less than 0.2, our algorithm can ignore the number of overlapping cells in the image, and the cell segmentation metric results and are higher than 0.95 and 0.93, respectively. The above analysis shows that the overlapping cell algorithm in this paper can not only accurately identify the cells in the overlapping cell images with the overlapping rate in the range of [0, 0.2] but also accurately segment them.

When the overlap ratio is in [0.2, 0.4], the proportion of misidentified cell numbers increases slightly with increasing cell numbers. When the number of overlapping cells is less than 8, the segmentation accuracy indicators metrics and metrics are both higher than 0.86. When the overlap rate is in [0.4, 0.5], the correct identification rate of overlapping cells decreases significantly with the increase of the number of cells. This is because when the cells overlap, the resolution of the cell boundary is significantly reduced, which makes the algorithm cannot accurately identify cells.

It can be seen from Table 3 that when there are only 2 or 3 cells in the image, the metric value of our algorithm is less than 0.07. When the overlap rate in the image is in the range [0, 0.3], the algorithm in this paper can identify all overlapping cells. Although some cells with overlapping rates in the range of [0.4, 0.5] were not correctly identified, the object-based false positive rate of our algorithm is close to 0.1. This shows that when the number of cells is small, most overlapping cells can be correctly identified even if the overlapping rate is high.

When the overlap ratio between cells was in the range [0.4, 0.5], both and metrics for correctly identified cells were higher than 0.8. It can be seen that even in the overlapping cell clumps with a high overlap rate and a large number of cells, our algorithm can still perform well in the identification and segmentation of cells in the overlapping cell clump. Figure 14 can more clearly show the change trend of the segmentation results with the number of cells and the overlap rate.

For the ISBI2014 and ISBI2015 challenge datasets, some scholars have proposed different overlapping cell segmentation methods. Next, based on the test dataset of the ISBI challenge, the proposed method and other overlapping cervical cell segmentation methods are compared quantitatively and qualitatively.

First, compare the algorithm in this paper with the algorithms mentioned in other literature based on the test set of ISBI2014 challenge, including the ISBI2014 challenge champion and runner-up algorithms: Ushizima et al. [16], Nosrati and Hamarneh [38], and some algorithms after the challenge [14, 23, 28, 37].Then, based on the 9 EDF images of the ISBI2015 test set, our algorithm is quantitatively and qualitatively compared with the ISBI challenge winner and runner-up algorithms: Phoulady et al. [23] and Ramalho et al. [39] and some recent algorithms [22, 28, 37, 40].

As shown in Table 4, based on the test set of ISBI2014 challenge, the bold font is the best result of the metric in the comparison results. The test set contains 810 cervical cell images with different numbers and overlapping ratios. From the metrics in Table 4, it can be seen that the previous studies on cytoplasmic segmentation algorithms have obtained high metric values, and Phoulady’s algorithm [24] with the best performance is 0.901.

For this dataset, the metric values of the previous algorithms are all greater than 0.8. The lowest value of Lu’s algorithm is 0.805, and the highest is Tareef’s three-step watershed algorithm, which has a value of 0.94. Combining the and metrics, it can be seen that earlier algorithms have higher segmentation accuracy for correctly identified cytoplasm. and metric values of the cell segmentation of our algorithm are 0.904 and 0.927, respectively, and the metric value is higher than the highest Phoulady’s algorithm [24] by 0.33%. However, for the metric, our algorithm is slightly lower than Tareef’s three-step watershed algorithm but higher than the other six algorithms. It can be seen from the comparison results of and metric values that the segmentation method in this paper has a greater advantage in segmentation accuracy than other cervical cell segmentation algorithms.

and metric values characterize the cell identification ability of the cell segmentation algorithm. From the values of different algorithms provided in Table 4, it can be seen that the metric values of different algorithms are quite different. Under the metric, the proposed algorithm and Phoulady’s algorithm [24] have great advantages over other methods, and the proposed algorithm obtains the lowest metric value. Among the algorithms of Ushizima, Lu, and Phoulady that provide metric values, Phoulady’s algorithm [24] achieves the lowest metric value, and the algorithm in this paper is slightly higher than Phoulady’s algorithm. The metric of our algorithm and Phoulady’s algorithm is much lower than the other three methods that provide metric. Combining and measurement results, it can be seen that compared with other methods, the algorithm in this paper and Phoulady [24] have great advantages in overlapping cell identification.

In order to more comprehensively evaluate the algorithm’s performance for overlapping cell detection and segmentation, this paper evaluates the above methods through the metric .It can be seen from Table 4 that the measurement values of Lu and Ushizima methods are low, 0.782 and 0.799, respectively; the measurement values of Nosrati, Phoulady [23], Lee, and Tareef methods are medium, between 0.868 and 0.898. This paper and Phoulady’s [24] algorithm have higher measurement values of 0.918 and 0.915, respectively.

4. Discussion

In summary, for the ISBI2014 synthetic dataset, under the , , metrics, the algorithm in this paper has obtained the best results compared with other algorithms. Under the and metrics, our algorithm is only slightly worse than the best performing method. Therefore, for the ISBI2014 dataset with different cell numbers and overlapping rates, the algorithm in this paper has achieved better segmentation results.

Table 5shows the comparison results of overlapping cell segmentation between our and other algorithms based on the ISBI2015 challenge test set, where the bold font is the best result of this metric. The test set contains 9 EDF images, each image contains multiple cell clumps, and the cell clumps contain different cell numbers and overlapping rates.

On the ISBI2015 dataset, our algorithm achieves the best metric values relative to other methods on , , and metrics. For the metric, the previous algorithms have obtained higher metric values, the lowest is Phoulady’s [22] algorithm, with a value of 0.831; the highest is Lee’s algorithm, with a value of 0.879. The algorithm in this paper is 5.57% and 0.11% higher than the two algorithms under the metric, respectively. In terms of recognition rate, there is a big difference between different algorithms. The metric value of Ramalho algorithm is only 0.501. In recent years, some algorithms have significantly improved the performance of overlapping cell recognition, such as the algorithms of Phoulady and Tareef. Under the metric, these two algorithms obtained values of 0.352 and 0.336, respectively, which were 42% and 47% lower than that of Ramalho’s algorithm, respectively. Compared with Phoulady et al. [23], the of Tareef [37] algorithm decreased by 5% and 10%, respectively. It can be seen that the algorithm in this paper has great advantages in overlapping cell detection. The algorithm in this paper also achieves the best results on evaluation metric, which are 3% and 3.5% higher than Phoulady ’s [23] and Tareef ’s [37] algorithms, respectively.

For the ISBI2014 and ISBI2015 challenge datasets, our algorithm achieves the best results on the , , and metrics. Since the ISBI2015 dataset is a real cervical cytology image, the image quality is more seriously affected by uneven illumination and cell overlap. Therefore, the recognition rate and segmentation accuracy based on the ISBI2015 dataset are lower than the ISBI2014 dataset.

Figure 15 shows the overlapping cytology segmentation results of this paper, manual annotation, and four other algorithms for ISBI2014 synthetic images and ISBI2015 EDF images. As shown in Figure 15(b), Ushizima’s overlapping cell segmentation method [16] uses straight line segments to separate adjacent overlapping cells. It is clear that this method cannot obtain accurate boundaries between overlapping cells. This method uses the segmentation results obtained by the watershed algorithm, which cannot describe the true boundaries between overlapping cells. In contrast, Tareef’s algorithm [37] solves this problem using an ellipse model and an iterative watershed algorithm. It can be seen from Figure 15(e) that the overlapping boundary between cells can be estimated using the watershed algorithm of the ellipse model.

As can be seen from Figure 15(c), the cells segmented by Nosrati’s algorithm [38] are usually located inside the real cells, which cause a high false negative rate and only describe an approximate cell boundary. The method proposed by Lu et al. [14] performs well for the segmentation of synthetic overlapping cell images. However, for EDF images, although the segmentation method can roughly segment overlapping cells, the segmentation results are not ideal, and the oversegmentation and undersegmentation are serious. Among the four qualitatively compared methods, the MPFW method proposed by Tareef et al.[37] has the best segmentation results, and this method achieves better results whether it is synthetic cell images or EDF images. However, because this method adopts the iterative level set method of ellipse prior, the overlapping boundary is replaced by ellipse, which leads to many cases of oversegmentation.

Figure 15(f) is the segmentation result of the algorithm in this paper for synthetic images and EDF images. Observing the segmentation results in this paper, it can be seen that the overlapping cell segmentation algorithm in this paper can successfully detect the cell boundaries located in the overlapping area. By comparing Figure 15(f) with the gold standard in Figure 15(a), it can be seen that the cytoplasmic segmentation results in this paper are very similar to the boundaries in the gold standard. Therefore, the overlapping cervical cell segmentation method proposed in this paper can complete the segmentation of cells in overlapping cell clumps of different types of datasets. The algorithm in this paper has some segmentation errors in EDF images. The reason is that the occlusion between some cells in the EDF image is too severe, and the cells are completely located between overlapping cells, so accurate candidate boundary points cannot be obtained.

5. Conclusions

This paper proposes an overlapping cytology segmentation method for the ISBI2014 synthetic dataset and the ISBI2015 real cervical cell dataset. The noise of overlapping cervical cell images can be significantly suppressed by the proposed cell-based radial region filtering algorithm, and the contrast of cell boundaries in the images is preserved. By generating the weight graph and boundary weights based on the attributes of candidate contour points and contour line segments, a coarse segmentation boundary is obtained. Comparing the algorithm in this paper with the algorithms mentioned in other literature based on the test set of ISBI2014 and ISBI2015 test set, our segmentation method has a greater advantage in segmentation accuracy than other cervical cell segmentation algorithms. Observing the segmentation results in this paper, it can be seen that the overlapping cell segmentation algorithm in this paper can successfully detect the cell boundaries located in the overlapping area. It can be seen from the quantitative and qualitative evaluation results that the overlapping cytoplasm segmentation algorithm in this paper has achieved better segmentation results. Compared with other current overlapping cytoplasm segmentation algorithms, the segmentation results obtained in this paper have great advantages. Next, the author will study the key technologies of cervical cancer screening based on graph attention neural network.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported in part by the Youth Reserve Talent Support Scheme of Harbin University of Commerce (2020CX22) and Doctoral Research Support Program of Harbin University of Commerce (22BQ29).