Abstract

The intestine is an important organ of the human body, and its internal structure always needs to be observed in clinical applications so as to provide a basis for accurate diagnosis. However, due to the limited intestinal data obtained by a single institution, deep learning cannot effectively train the intestines, and the effect is not satisfied. For this reason, we propose a distributed training method to carry out federated learning to alleviate the situation of patient sample data shortage, not shared and uneven data distribution. And the blockchain is introduced to enhance the interaction between networks, to solve the problem of a single point of failure of the federated learning server. Fully excavate the multiscale features of samples, to construct a fusion enhancement model and intestinal segmentation module for accurate positioning. At the local end, the centerline extraction algorithm is optimized, with the edge as the main and the source as the auxiliary to realize centerline extraction.

1. Introduction

The intestine is an important organ of the cavity. At present, imaging-based detection of the intestine is the mainstream method. It is welcomed by physicians and patients because of its convenience and noninvasiveness. In order to more accurately obtain the internal conditions of the intestine, a large number of processing technologies have been derived, such as virtual colonoscopy [1], virtual valgus [1], and virtual flattening [2].

Due to the particularity of medical imaging, it is professionalism and privacy. In order to make full use of the network to obtain better data analysis under the premise of ensuring privacy, a connection between local domain servers has been established in this case. Representative works in data transmission include knowledge-driven model [3, 4], which extracts features based on the analysis of known data and presents data transmission in the form of probability. This type of model is more effective for known data features.

Distributed network structure [5, 6], based on the characteristics of decentralized data distribution, constructs a network structure to extract data from different data sources, and its network structure directly affects the data transmission effect. Dynamic data fusion enhances data [7, 8], which uses a variety of methods to obtain information of different dimensions of data and extracts statistical characteristics of data to enhance data transmission performance. In terms of data privacy, representative works include data coding [9, 10], encoding the data according to the data characteristics and decoding it through the secret key. In multisource data enhanced encryption [11, 12], the use of multisource means to obtain information to assist data encryption. Partial data sharing [13, 14], according to the actual needs, only partial data or feature level sharing is provided to achieve data privacy protection.

In the aspect of computer-aided intestinal detection, it mainly focuses on intestinal segmentation and intestinal centerline extraction. Representative works on intestinal segmentation mainly include human-computer interaction model [15, 16], which guides computer segmentation by marking the start and end points or part of the area by the physician and modifies the computer segmentation effect. The boundary focusing model [17, 18] constructs a constraint function to approach the boundary successively, and the effect is better for the area where the boundary of the image is more obvious. The 3D model [19, 20] constructs the spatial structure according to the three-dimensional geometric structure of the organ to realize the overall extraction of the organ. However, this model has a large amount of calculation data and consumes a lot of calculation resources.

In medical anatomical models [21, 22], with the continuous development of medical technology, we have a more in-depth understanding of the tissue structure of organs and construct a morphological constraint model to extract the tissue structure. The deep learning model [23, 24] converts the image segmentation problem into a probability problem, learns data with labeled information, and achieves target segmentation. In the aspect of intestinal centerline extraction, local model [25, 26], the idea of segmentation is introduced to segment the intestinal tract, find the central point locally, and connect it to form the centerline. The global model [27, 28] extracts the centerline based on the constraints of intestinal connectivity and medical morphology.

In summary, although some progress has been made in the research based on intestinal data, facing the distribution of intestinal data and the current situation of various hospitals, the main problems of centerline extraction are (1) uneven distribution and privacy of image data; (2) insufficient interaction of deep learning network parameters. (3) the centerline extraction algorithm based on source and supplemented by edge has high data calculation redundancy.

Based on the above three deficiencies, this paper proposes a new intestinal centerline extraction algorithm. (1) Establish a federated learning framework, and establish a data learning mechanism under the condition of ensuring the privacy of the data sets. (2) Introduce the blockchain mechanism, to enhance data interaction, and solve the single point of failure of the federated learning server. (3) Optimize the existing centerline extraction framework and put forward the idea of focusing on edges and supplemented by sources to realize centerline extraction.

2. Algorithm

Through the above analysis, we build an algorithm flow under the framework of federated learning (as shown in Figure 1). First, fully mine the image data under the premise of ensuring data privacy, introduce the blockchain mechanism, to enhance data interaction, solve the single point of failure of the server, and then, extract the complete area of the intestine. At the client, a centerline extraction algorithm based on edge and supplemented by source is proposed to finally realize the virtual endoscopy of the intestine.

2.1. Network Framework

Federated learning (FL) refers to a machine learning setting in which multiple clients collaborate to train a model under the coordination of a central server, and its structure is shown in Figure 2. Federated learning enables multiple organizations to achieve AI collaboration without sharing data, so as to ensure the privacy and security of user data. The traditional federated learning is implemented under the coordination of a central server. Each participant only uses locally owned data to train a machine learning model, obtain model parameters, and send updated model parameters to the central server. Then, the central server aggregates the model parameters received from different devices and distributes the aggregated model parameters to each local device after updating, and the local device uses the aggregated parameters to update the local model.

Blockchain has the advantage of decentralization, that is, instead of using a central organization to establish the trust relationships between distributed nodes, it stores data in network nodes and updates it in real time. All nodes in the blockchain network participate in the maintenance together. According to the sequence of the generation time of the blockchain, the blockchain is connected to form a data bar. As long as not all participating nodes in the network crash at the same time, they can run all the time.

Through the above analysis, we realize the intestinal segmentation of abdominal CT images based on the construction of a client-server system. Suppose that each medical institution is taken as a client, the blockchain network is used to construct the server side, and all the data of each medical institution are jointly trained to train a segmentation model. The server is responsible for maintaining the global parameter model and coordinating the model training of the client. The structure is shown in Figure 3.

2.2. Configuration of Client and Server

For client model training, suppose that the client device set is . Let the data set sample owned by the th client device be . The set of blockchain node devices is . The client device uploads the local model update to the connected to it. The loss function of the corresponding target is where is the loss function of deep learning. The client layout follows the federal learning settings, and each client model is trained by random gradient descent method. Approximate Newton method is used to aggregate the model updates of all client devices: where is the global update parameter, which sends and to the server. When the nodes in the blockchain generate new blocks, the client downloads the new blocks and updates them.

In server parameter aggregation, each client device has a server associated with it, and the server device connected to the client device serves as a node to form a blockchain, when the server receives the parameters () uploaded by each client and the local calculation time . In order to ensure the authenticity of node exchange local model update, the block is divided into two parts. The first part stores the address of the previous block pointer, the block generation rate , and the output value of the proof-of-work mechanism (PoW). The size is designed as , where represents the length of the head, and represents the size of the overall model update. The second part stores the updated data (). When each block fill reaches a predetermined size or exceeds the waiting time, it is transmitted.

POW randomly generates a hash value by changing its input until the generated hash value is less than the target value. When a qualified hash value is found, its candidate block generates a new block. indicates the difficulty of POW. The newly generated block is propagated to all nodes. All nodes receiving the generated block must stop the current operation and add the new block to the local ledger.

If another node successfully generates a block within the propagation delay of the first generated block, the node may mistakenly add the second generated block to their local ledger, which becomes a fork. The existence of a bifurcation situation may cause some devices to apply an incorrect global model update to their next local model update. Once bifurcation occurs, we will restart a new round of iteration.

The blockchain network also introduces a reward mechanisms to client devices and servers. The data reward of the client device is received from its associated server, and the amount is directly proportional to the data sample size . When a node generates a block, the mining reward is obtained by the blockchain, and the number of mining rewards is proportional to the total sample of its associated equipment.

The specific steps of the whole operation process are as follows: (1)Local device parameter update: device calculates parameters through iterations(2)Local model parameter upload: device is randomly connected with node ; build blocks of data(3)Cross-validation: the parameters of each node are updated, and all local model updates obtained from the connected local device or other nodes are verified. The verified local model updates are stored in the candidate block of the node until the candidate block reaches or exceeds the maximum waiting time (4)Block generation process: each node runs POW until it finds the target situation or the generated block broadcasted by other nodes(5)Block propagation: let represent the first node to achieve the target. To avoid bifurcation, we set a specific frame to indicate whether the bifurcation occurs or not. If the bifurcation occurs, we will start from the first step(6)Global model download: local device downloads the new generation block from the node of the vector machine(7)Global model update: local device updates parameters through the aggregation result of local device parameter update in the new block until the condition is met

2.3. Intestinal Segmentation and Centerline Extraction Model

We build a deep learning network model as shown in Figure 4. We need to enhance the abdominal CT image. The DeeplabV3+ network model implements feature extraction, but the hollow convolution of multiple expansion rates can easily cause a checkerboard effect, resulting in the loss or segmentation of small intestines, the problem of discontinuity. For this reason, the HRnet network model extracts more detailed information in the feature map, which is helpful for the segmentation of small targets. However, the network model has the problem of complex structure and large amount of parameters.

We propose a multiscale fusion enhanced network architecture. The structure is independent of the ASPP model as a separate path. In the process of feature extraction, the high-resolution information is fully preserved, and the learning ability of the model for detail features is improved. At the same time, the structure of the network model is simplified, and the computing efficiency is improved.

Xception Module in DeeplabV3+ is used to replace Bottleneck in HRNet network model. Xception Module continuously deepens the network model through residual learning unit, extracts rich features, and uses deep separable convolution to replace the standard convolution in Bottleneck. Under the condition of ensuring the accuracy, the model parameters are reduced, and the operation cost is reduced.

The number of residual structure is adjusted from 4 to 2. The residual learning unit of Bottleneck and Xception Module contains three convolution layers, but the purpose of the first and third convolution layers of Bottleneck is through convolution to adjust the dimensions of output features. Xception Module uses three convolution layers to extract features. In order to avoid feature redundancy or overfitting and reduce the number of repeated residual structures, 2 is selected.

HRNet single-input and single-output network structure is transformed into three-input and three-output network structure. The feature extraction and exchange unit uses Xception Module to extract the features repeatedly in the same resolution branch. After extracting the features, multiscale fusion and enhancement are performed on the output feature maps of the three branches in order to realize the repeated extraction and exchange of features and obtain more abundant context information. In the process of feature extraction, HRNet needs to downsample to generate low-resolution feature extraction branches. Three middle layer features of DCNN are used as the input of multiscale fusion enhancement network to remove high-resolution branches and continuously downsample to generate low-resolution branches.

The multiscale fusion enhancement network consists of two identical feature extraction and exchange units. Each cell contains three independent branches to extract information of different scales. The resolution of the second branch is half of that of the first branch, and the channel number of the feature map is twice of that of the first branch; the resolution of the third branch is half that of the second. The steps are as follows: (1) the feature images with different resolutions are input into the multiscale fusion enhancement network. (2) In the first feature extraction and exchange unit, feature extraction is carried out, and then, information exchange of different feature graphs is realized through feature fusion. Three characteristic images with different resolutions can be obtained.

In order to make the network model have a larger receptive field in the process of feature extraction, enhance the context semantic information while maintaining high-resolution features, and extract small-scale target segmentation accuracy, a deep learning model with small-scale target segmentation ability is constructed by embedding multiscale fusion enhancement network into DeeplabV3+ network model.

The steps are as follows: (1) the original image is input into DCNN to extract the initial features. (2) The initial features of DCNN output are input into ASPP module. Multiscale feature maps obtained are stitched and fused then, and the number of channels is adjusted to 256. (3) The output features of the three convolution layers in the middle of DCNN are extracted as the input of the multiscale fusion enhancement network. The output feature maps of the three branches obtained are stitched and fused then, and the number of channels is adjusted to 128. (4) The output of multiscale fusion enhancement network and ASPP module is up sampled four times and fused with the shallow splicing of corresponding levels in DCNN. The number of output channels of shallow features is 48. (5) The fused feature is up sampled 4 times to get the feature image with the same resolution as the original image. (6) Through convolution, the number of channels of output features is adjusted to the number of categories to be segmented, and the predicted segmented image can be obtained by activating Softmax function.

On the basis of DeeplabV3+, the network constructs multiple fusion enhanced network. The shallow features extracted by DCNN are extracted by multiscale feature extraction and repeated information fusion. Multiple feature extraction can make the model obtain more comprehensive information on different scales. The same key points on different scale feature maps help the model to predict different semantic meanings more accurately, while repeated information fusion helps the model to obtain high-resolution detail features. The multiscale fusion enhancement network enhances the key features of different scale features and high-resolution detail features, making full use of multiscale information and shallow information to improve the segmentation accuracy.

2.4. Intestinal Centerline Extraction

To realize the extraction of the intestinal centerline, as shown in Figure 5, the main idea is to keep the data that the centerline extraction plays a key role. The source distance field can reflect the distance between the points of binary image and the source point. The initial node is defined as . The source distance can be expressed as: . The point corresponding to the minimum value is assigned as the current node, and the center can be extracted by traversing. However, this method has a large amount of calculation data. We introduce the maximum spanning tree (MST) algorithm to obtain the centerline extraction results. We use the search direction and step size theory to describe the algorithm. The search direction is the node with the largest boundary distance between adjacent nodes connected to all nodes in the tree.

Firstly, the boundary distance field of intestinal data is calculated and transformed into bidirectional weighted graph. Each pixel point represents the vertex of the graph. 26 neighborhood of voxel represents the edge. Each side has two directions, and the boundary distance value of the point pointing to the edge is taken as the weight. Specify the starting point as the root node. The concept of spanning tree in connected graph is used, and the vertex with the largest weight is connected to the tree until all the points are connected. The path from the specified end point along the spanning tree to the starting point is the centerline.

From the MST algorithm flow, the layer with low center degree is connected into the tree, and this part of data cannot work for the centerline extraction, because there will be data redundancy. Therefore, we adopt a path search strategy with edge as the main and source as the auxiliary and improve the performance of the algorithm by reducing a large number of redundant data.

3. Experiment and Result Analysis

The experimental data are from CT abdominal data collected by Siemens equipment. A total of 403 sets of intestinal CT data (1624 pieces of DICOM data) are obtained by labeling the area of the intestine by professional doctors and labeling the centerline based on endoscopic images. To better demonstrate the intestinal structure, we can show them from axial position, sagittal position, and coronary position, respectively. As shown in Figure 6, the intestine is shown as connected closed tubular structure. There is volume effect in the intestine (as shown in Figure 6(a)). The intestinal cavity is large and hovers in the abdomen (as shown in Figure 6(b)). The intestine runs through the whole area of the abdomen (as shown in Figure 6(c)).

We distribute 403 sets of data on 10 clients as shown in Figure 7. In order to simulate the real situation, we randomly distribute the data. The specific distribution is as follows: the largest amount of data is 78 on the eighth client, and the smallest is 10 on the third client.

3.1. Network Effectiveness Evaluation

In order to verify the performance of the federated learning algorithm proposed in this paper, we compare traditional federated learning and centralized training with our algorithm. We introduced: where TP represents the number of pixels that are correctly predicted as positive samples, and TN represents the number of pixels that are correctly predicted as negative samples. FP represents the number of pixels that were incorrectly predicted as positive samples, and FN represents the number of pixels that were incorrectly predicted as negative samples.

It can be seen from Table 1 that in the Dice index, the result of the algorithm proposed in this paper is slightly lower than the result of centralized training, but the difference is not big. However, the algorithm we proposed adopts the blockchain system to solve the single point of failure problem of the federated learning server. The blockchain guarantees the security of the data to a certain extent. It is in line with actual application requirements, and the data stored by different clients is fully used.

The corresponding ROC curve is shown in Figure 8. The effect of the centralized training algorithm is the best. This is because the data is local and the data does not need to be interacted. Traditional federated learning has the worst effect. This is because the data is unevenly distributed, and the interaction between the data is not taken into consideration.

Figure 9 shows the change in Dice of the algorithm during the convergence process. It can be seen that after the blockchain is introduced to the central server, the accuracy loss is small. And in the initial stage of training, the effect is significantly better than the traditional federated algorithm. Because the devices are trained in parallel, the weight of each device is the same, so the average is updated during the update, and the learning efficiency is slightly lower than that of the data-based training method. Compared with traditional federated learning, the convergence speed of the algorithm proposed by us is higher.

In the case of the same signal-to-noise ratio, the effect is shown in Figure 10. In general, the completion delay of learning is inversely proportional to the signal-to-noise ratio. The larger the signal-to-noise ratio, the larger the amount of useful information received by the device, so the shorter the learning completion time. The block generation rate and the learning completion delay are concave, that is, under the same SNR, at the beginning, a low generation rate indicates that PoW is difficult, and the block generation time is long. The learning completion delay decreases as the block generation rate increases after reaching a certain level; the greater the block generation rate, the occurrence of bifurcation problems must be considered, which will lead to the time cost of repeated cycles of the process. For a larger block generation rate, the frequency of bifurcation events is high, which enhances the delay of learning completion.

In the case of possible equipment failures, we count the relationship between the number of local equipment and the learning completion delay, and the results are shown in Figure 11. The red line indicates that the miner is faulty, and the blue line indicates that there is no fault. Gaussian noise is added to each miner’s aggregate local model update to simulate failure.

3.2. Intestinal Segmentation Algorithm

In order to verify the performance of the proposed segmentation network, we run the algorithm on the local computer and introduce the following indicators to measure the performance of the algorithm: where is the gold standard and is the segmentation result. AOM and CM are directly proportional to the segmentation result, while AVM and AUM are inversely proportional to the segmentation result.

The segmentation effects of different algorithms are shown in Table 2. The double-threshold method [29] manually sets two thresholds to deal with the influence of volume effect and realize intestinal segmentation. The effect of this method has a great relationship with the selection of threshold and does not consider the difference of internal data in the same group, so the effect is not good. 3D model [19] builds a model based on the connected structure of intestinal closure to realize intestinal extraction. FCN algorithm [30] transforms the full connection layer of traditional CNN into convolution layer, extracts deeper features, and improves the segmentation accuracy. UNET network [31] contains local texture information, which makes the segmentation accuracy better. Our algorithm constructs multiscale enhancement module and semantic segmentation module, and the segmentation effect is the best from the local and semantic level.

3.3. Effect Display

In the algorithm proposed in this paper, the intestinal segmentation effect is shown in Figure 12. Figure 12(a) is displayed as bright and dark areas on the image due to the incomplete emptying of the intestine; in this case, the traditional threshold method cannot effectively segment it. As shown in Figures 12(b) and 12(c), because the intestines hover in the abdomen of the human body, there will be multiple connected areas of the intestine in a single frame image, and the traditional threshold algorithm cannot segment it effectively. Our algorithm builds a deep learning module and proposes a network structure enhanced by multiscale fusion to achieve a complete segmentation of the intestine.

In order to intuitively show the extraction effect of intestinal centerline, we show it from 2D and 3D levels (as shown in Figure 13), where red is the area where the centerline is located. It can be seen that the centerline cannot be determined from 2D level alone. As shown in Figure 14, the intestine is tubular structure. Based on our proposed edge based and source assisted algorithm, the centerline can be better extracted.

4. Conclusion

For the purpose of solving the problem of uneven distribution of intestinal medical imaging data, which leads to insufficient model training, we propose federated learning and blockchain algorithms to fully mine data features and enhance the interaction between networks. In this way, the existing deep learning framework is optimized; a multiscale fusion enhanced network is proposed to realize the accurate segmentation of intestinal tract. And to solve the problem of multiple intestinal region segmentation on a single-frame image, we proposed a centerline extraction algorithm with edges as the main source and supplemented by the source, which shows the effectiveness of this algorithm through 3D display as well.

Data Availability

All data are from the hospital; if you need to contact us, only for scientific research use.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the Shaanxi Provincial Key R&D Plan (Nos. 2020SF-163 and 2019SF-105).