Abstract

Computed tomography image (CTI) sequence is essentially a time-series data that typically consists of a large amount of nearby and similar CTIs. Due to the high communication and computational costs, it is difficult to perform a progressive distributed similarity retrieval of the large CTI sequence (CTIS)s, particularly in resource-constraint mobile telemedicine network (MTN)s. In this paper, we present a DPRS method—progressive distributed and parallel similarity retrieval scheme for the CTISs in the MTN. To the best of our knowledge, there is little research on the DPRS processing, especially in the MTN. Four supporting techniques (i.e., (1) PCTI-based similarity measurement, (2) lightweight privacy-preserving strategy, (3) SSL-based data distribution scheme, and (4) the UDI framework) are developed. The experimental evaluation indicates that our proposed DPRS method is more progressive than the state of the art, with a significant reduction in response time.

1. Introduction

The continuous innovation of medical imaging and diagnosis technologies make CT image (CTI) sequence more and more important in disease diagnosis. Doctors can grasp the lesion tissue of patients more fully with CTI sequence (CTIS) than they can with two-dimensional medical imaging, which makes it easier to make accurate medical diagnoses. Figure 1 illustrates an example of a CTIS which is made up of several nearby and similar CTIs with temporal information. As one of the important image data types in the mobile telemedicine network (MTN), the high-quality CTISs created by these high-end medical imaging instruments have become a significant foundation for helping doctors diagnose illness.

Mobile-terminal-based CTIS retrieval is one of the MTN’s key tasks, as it can assist doctors in detecting abnormal CTIs that look like lesion tissues, diagnosis and treatment. Traditional CTI retrieval, on the other hand, takes a single CTI as the retrieval one to support similarity comparison instead of the entire CTIS, which is ineffectual and insufficient in describing the retrieval CTIS, leading to poor retrieval precision ratio. Meanwhile, the power reserve and computational capability of these mobile terminals are both limited [1], and the screen resolution is relatively lower. The data communication is negatively affected by the unstable network bandwidth which cause retrieval and transmission delays, particularly in countryside with poor mobile communication facilities [2]. Moreover, since the patients’ personal privacy information is contained in CTIS data, it needs to be encrypted during retrieval processing; otherwise, there is a great risk of users’ personal privacy disclosure. During remote consultations with patients, clinicians will routinely retrieve and examine their CTISs in real time to better comprehend the condition, which entails high computational and communication costs of similarity, encryption and decryption calculations, as well as the intensive delivery of the CTISs. As a result, significant increase in computing overhead will have a detrimental impact on overall retrieval efficiency. In order to effectively speedup the retrieval performance, the paper proposes a progressive distributed and parallel similarity retrieval method for large CTISs in the MTN, called the DPRS. To our knowledge, this is the first study on the DPRS processing issue, especially in the resource-constraint MTN.

For the DPRS scheme, when user submits a retrieval CTIS () to the master node at which the destination slave nodes are determined, the efficient filtering and refinement process is performed at the slave nodes to retrieve the answer CTISs () swiftly and safely by a uniform distributed index (UDI) with security guarantee; and finally, the answer CTISs are transmitted to the user end. The following is a summary of the paper’s main contributions: (i)We propose the DPRS, a progressive distributed and parallel similarity retrieval method for the large CTIS databases in the MTN(ii)We develop four supporting techniques to better facilitate the progressive DPRS processing, such as (1) PCTI-based similarity measurement, (2) lightweight privacy-preserving strategy, (3) SSL-based data distribution scheme, and (4) the UDI framework(iii)We conduct extensive experiments to evaluate the effectiveness and efficiency of our proposed DPRS method

The rest of the paper is laid out as follows. The related work is presented in Section 2. Section 3 contains preliminary work. Four supporting techniques are introduced in Section 4. Section 5 describes the DPRS method. The experimental evaluations are performed in Section 6 before we conclude the paper in Section 7.

The content-based image retrieval (CBIR) is a long-standing and challenging research issue during the last 50 years. Most of the state-of-the-art methods [36] are based on the low-level visual features, and retrieval accuracies are still not satisfactory due to the “semantic gap.”

For the research of content-based medical image retrieval (CBMIR), ASSERT [7] is the first CBMIR system designed for high-resolution lung CT images. After that, other prototype systems have been developed such as IRMA [8] and FIRE [9]. Huang et al. [10] proposed a medical image retrieval method based on unclean image bags. Huang et al. [11] applied a noisy-smoothing-based relevance feedback to the CBMIR. Kitanovski et al. [12] designed a multimodal CBMIR system. Lan et al. [13] presented a simple texture feature extraction approach for retrieval of medical images. Ali et al. [14] provided a multipanel medical image segmentation framework for the CBMIR system. Kasban et al. [15] developed a robust CBMIR system based on the combination of the image encoding techniques. Recently, Tuyet et al. [16] applied the deep learning techniques to the salient region-based CBMIR scheme.

Due to the limited system scalability, the retrieval performances of the aforementioned single-PC-based CBMIR systems are not satisfactory, especially when dealing with enormous volumes of medical images [17]. Anbarasi et al. [18] applied traditional distributed database techniques to a distributed CBMIR system. Due to the powerful parallel processing capability of peer-to-peer (P2P) computation, Charisi et al. [19] developed a CBMIR system based on a P2P network. Depeursinge et al. [20] proposed a mobile retrieval method of medical information based on hybrid features. Despite the fact that Zhuang et al. [21] proposed a fast and reliable CBMIR technique in a mobile cloud network, the retrieval performance is not good due to the poor load balance strategy. Based on the above work [21], to further improve the retrieval efficiency, Zhuang et al. [22] introduced a progressive batch medical image retrieval approach in mobile wireless network from a perspective of multiretrieval optimization. Cruz et al. [23] developed a mobile teleradiology system that is suitable for facilitating the CBMIR process. Chitra et al. [24] proposed an improved retrieval algorithm for brain images using carrier frequency offset compensated OFDM technique for telemedicine scenarios. Jiang et al. [25] tried to solve the “semantic gap” of the CBMIR using the crowdsourcing model in the MTN, which is empirically verified to be both successful and efficient.

Considering the characteristics of the CTI, most of the retrieval methods are still based on a single CTI-based retrieval. Lei et al. [26] introduced a high-definition CTI retrieval approach based on sparse CNN model. Yu et al. [27] developed a nontensor product wavelet-based liver CTI retrieval algorithm. Hatibaruah et al. [28] proposed a CTI retrieval approach based on the use of an adder to combine two local bit plane-based dissimilarities. Hwang et al. [29] applied a CBIR technique to retrieve diffuse interstitial lung disease based on a CNN and a chest CT. Although Zhuang et al. [30] presented a distributed CBMIR method for large CTIS database in the MTN, the retrieval accuracy and security are still unsatisfactory.

3. Preliminaries

Firstly, Table 1 lists the major notations frequently used throughout the paper.

Definition 1. A mobile telemedicine network (MTN) can be modeled by a graph () which can be modeled by a three-tuple: (i) means a collection of nodes, formally represented as , in which means a user node that is used to (1) submit retrieval, (2) decrypt the RIBs, and (3) reconstruct and display the CTISs; represents a master node to route the retrieval to the corresponding ; consists of slave nodes, which are used to (1) segment the CTI into IBs, (2) encrypt the RIBs and store the IBs, and (3) transmit the answer CTISs to (ii) denotes a collection of edges which represent the network bandwidths for transmission at time , formally represented as , where refers to the -th edge in in which and are connected. and can be the two nodes of the same type or different types

Due to the instability and the heterogeneity of the resource-constraint MTN, as illustrated in Figure 2, the network bandwidth between nodes in MTN may vary greatly with the passage of time. Furthermore, the MTN’s data transmission distance is limited.

As indicated in Section 1, there are usually some lesion tissues that the doctors may concentrate on in the CTISs. Pathological regions (PR) are the regions of such lesion organs that can be preliminarily annotated by medical specialists.

Definition 2 (PR). A pathological region (PR) in a CTI can be denoted by a two-tuple: where is the PR’s ID and PO means the position of the PR in a CTI.
Based on Definition 2, a nonpathological region is denoted as NPR.

Definition 3 (IB). An image block (IB) can be represented by where bid means the block ID, PO refers to the position of the IB in a CTI, and TP is the transmission priority of the IB.
According to Definitions 2 and 3, Figure 3 shows an example of a CTI in which there are two PRs ( and ) and one NPR (i.e., ). The CTI is segmented into IBs that are identified by blue dashed lines.

Definition 4 (RIB). Given a PR (i.e., ) in a CTI, the relevant image block (RIB) of is an IB which is contained in or intersects with it, formally denoted as , where and means the number of the PRs in the CTI.

Definition 5 (NIB). A NIB is an IB that is contained by a NPR in a CTI, formally represented by .
For example, based on Definitions 4 and 5, the RIBs in Figure 3 are IB21, IB22, IB23, IB24, IB25, IB26, IB31, IB32, IB33, IB34, IB35, IB36, IB41, IB42, IB43, IB44, IB45, IB46, IB51, IB52, IB53, IB54, IB55, and IB56. The remaining IBs are NIBs.

4. Supporting Techniques

Four supporting techniques are introduced in this section to speed up the retrieval performance of the DPRS: (1) PCTI-based similarity measurement, (2) lightweight privacy-preserving strategy, (3) SSL-based data distribution scheme, and (4) the UDI framework.

4.1. PCTI-Based Similarity Measurement

Since the CTIS is a time-series data in which all CTIs are not all visually similar, as shown in Figure 1, only a few neighboring CTIs are similar. The 2nd and 3rd CTIs, for example, are visually similar, as are the 4th and 5th CTIs as well. As one of the preprocessing steps, the CTIS is partitioned into three subsequence (SS)s to improve the accuracy of IB restoration during the image reconstruction, that is, SS1(CTI1), SS2(CTI2,CTI3), and SS3 (CTI4, CTI5). Based on the above, we first present a cluster-based pivotal CTI(PCTI) selection scheme.

Definition 6 (PCTI). Given a CTIS: , suppose that there are clusters () that are obtained based on the CTIs in , the pivotal CTI (PCTI) in the -th cluster () can be formally expressed as follows: where function is defined in Table 1, and represent the number of CTIs and PCTIs in , respectively, means the number of CTIs in the -th cluster, and .

Algorithm 1 details the PCTI selection processing in which the PCTIs correspond to the SSs. Note that, in line 6, the cluster center CTI refers to a CTI whose total distance to other CTIs in the cluster is the smallest.

Input: : a CTIS
Output: the PCTIs
1. for each in do
2. Its corresponding visual features are extracted as a high-dimensional vector
3. end for
4. Grouping the CTIs in to obtain clusters using the -means algorithm
5. for each cluster() do
6. Choose the cluster center CTI in as a PCTI based on Equation (4)
7. end for
8. return PCTIs

Based on Algorithm 1, given a CTIS , the PCTIs are obtained in which can be modeled by a vector: . Next, we will focus on designing a similarity measurement of two CTISs. Since the corresponding visual feature of a PCTI can be regarded as a high-dimensional vector, the similarity measure between two CTISs (i.e., and ) can be derived in where function is described in Table 1 and is a positive similarity threshold.

According to Equation (5), the similarity of two CTISs (i.e., and ) is determined by the percentage of similar PCTIs in and .

4.2. Lightweight Privacy-Preserving Strategy

Before introducing the lightweight privacy-preserving strategy, let us first give a definition.

Definition 7 (PRR). Given a PR (i.e., ), its corresponding PR-related region (PRR) is composed of all RIBs of , subjecting to the following criteria: where PRR() refers to the corresponding PRR of and Num(●) means the number of RIBs in ●.

As shown in Figure 4, the CTI is equally segmented into some IBs and there are two PRs (i.e., and ) in it. Based on Definition 7, the corresponding PRRs of the two PRs are represented by the green shadow areas which consist of 20 RIBs. Due to the continuous distribution of the original ID numbers of the neighboring RIBs, it is relatively easy to reconstruct the image by using the ID numbers of the RIBs. Therefore, the goal of the privacy-preserving strategy is to disrupt the ID numbers of the nearby RIBs in the CTI by encoding the ID numbers of the

RIBs such that the CTI reconstruction is hard to perform.

For each RIB in a CTI, its corresponding IB replica ID (IBID) can be derived below: where SID is a sequence ID of the CTIS that the RIB belongs to, IID is a CTI ID in the corresponding CTIS, row means row ID, col refers to column ID, are stretch constants, and .

Based on Equation (7), the encryption and decryption principles for the RIBs are illustrated below.

4.2.1. Encryption Principle

Algorithm 2 summarizes the encryption process, where and are two key values and , , and .

Input: SID, IID, row, col of a RIB
Output: IBID: the encrypted ID number of the RIB
1. if row is an odd number then
2. if col is an odd number then
3.  
4. else
5.  
6. end if
7. else
8. if col is an odd number then
9.  
10. else
11.  
12. end if
13. end if
14. return IBID
4.2.2. Decryption Principle

Similarly, the decryption is discussed in Algorithm 3.

Input: the encrypted SID, IID, row, col of a RIB
Output: IBID: the original ID number of the RIB
1. if row is an odd number then
2. if col is an odd number then
3.  
4. else
5.  
6. end if
7. else
8. if col is an odd number then
9.  
10. else
11.  
12. end if
13. end if
14. return IBID

For example, assume that SID is 6, IID is 3, and are 1000, 100, and 10, respectively, and then, the original ID numbers of the RIBs before encryption are depicted in Figure 5(a). Figure 5(b) shows the encrypted ID numbers of the RIBs after encryption when and .

It can be seen from Figure 5(a) that the ID number distribution of the neighboring RIBs is continuous, whereas the ID number distribution of the neighboring RIBs in Figure 5(b) after encryption is discrete. As a result, finding the corresponding neighboring RIBs in the image reconstruction becomes increasingly difficult, resulting in the encryption of the RIBs.

4.3. SSL-Based Data Distribution Scheme

As an optimal data distribution is very important for the retrieval performance optimization, so in this subsection, to better parallelize the DPRS processing, we design a start-slice(SSL)-based data distribution scheme at the slave nodes.

Definition 8 (SDist). Given a CTIS (), its corresponding start distance (SDist) is represented as , where is a virtual CTIS in which the visual features extracted from the PCTIs can be denoted as a vector: . sim() is the same as Equation (5).

Definition 9 (SSL). Given a CTIS (), the ID number of the start-slice (SSL) in which is contained can be derived in

According to Definition 9, as shown in Figure 6, the high-dimensional feature space is evenly segmented into SSLs based on the similarity distance between and (i.e., ), where is a virtual CTIS in which the visual features of the PCTIs are denoted as and .

Algorithm 4 describes the SSL-based data distribution at the slave node level. It should be noted that refers to the total number of CTISs in the -th SSL. Since the CTISs are randomly and equally distributed at each slave node, our DPRS method selects all slave nodes to execute the similarity retrieval in parallel for each retrieval.

Input: : the CTISs
Output: the optimal distribution of the CTISs at the slave nodes
1. The high-dimensional space is equally segmented into the SSLs
2. for to do //for each SSL
3. for to do //for each slave node
4.  Randomly select CTISs in to
5. end for
6. end for
4.4. The UDI Framework

To support faster CTIS filtering and refinement processing at the slave nodes, as illustrated in Figure 7, we propose a unified distributed indexing (UDI) framework in which two types of index schemes (i.e., LSI and GSI) are introduced.

In the UDI framework, the corresponding local index in the -th slave node () can be represented as the that is based on iDistance [31]. Specifically, let the CTISs in be . To begin, based on similarity metric (i.e., Equation (5)), the CTISs in are clustered into the clusters using the well-known AP-cluster algorithm [32]. For the CTISs in each cluster, their corresponding organ categories are obtained previously. So given a CTIS , its indexing key is derived in (i)cID means the corresponding category ID of , (ii) and are two constants used to stretch the key value ranges(iii) is the cluster center sequence in that belongs to, as shown in Equation (11), , and is the same as Definition 6

Algorithm 6 summarizes the global SI (GSI) creation processing in which the index (i.e., ) construction for each slave node is detailed in Algorithm 5.

Input: : the CTISs in
Output:
1.
2. The CTISs in are grouped into clusters
3. for each in do
4.  
5.  Insert into a B+-Tree ()
6. end for
7. return
Input: : the CTISs at the slave nodes
Output: GSI
1.
2. for each slave node () do
3. 
4. 
5. 
6. end for
7. return GSI

Definition 10 (cluster sphere). For the -th cluster, given a cluster center CTIS () and a cluster radius (), its corresponding cluster sphere can be denoted as .

As depicted in Figure 8, when user submits a retrieval request with a category ID cID, before introducing the index-support similarity range retrieval algorithm (i.e., Algorithm 7), let us first study the six cases in terms of the placements of and .

Case 1: as in Figure 8(a), intersects with in which is contained, formally expressed as and , then the search range can be represented by .

Case 2: as in Figure 8(b), intersects with and is not contained in , formally expressed as and , then the search range can be represented by .

Case 3: as in Figure 8(c), is contained by and is not in , formally expressed as and , then the search range can be represented by .

Case 4: as in Figure 8(d), is contained by and is in , formally expressed as and , then the search range can be represented by .

Case 5: as in Figure 8(e), is contained by , formally expressed as , then the search range can be represented by .

Case 6: as in Figure 8(f), and do not intersect, formally expressed as . There are no CTISs retrieved.

Algorithm 7 summarizes the detailed steps of the UDI-based similarity range retrieval of the CTISs in which the function BTreeSearch() is to return the candidate CTISs of the B+-Tree-based range search.

Input: : the retrieval request
   : the CTISs in
Output: : the answer CTISs
1. ;          / initialization /
2. for each slave node do
3. ;        / initialization /
4. for the CTISs in do
5.  ifandthen  //case 1
6.   ,
7.  else ifandthen  //case 2
8.   ,
9.  else ifandthen  //case 3
10.   ,
11.  else ifandthen //case 4
12.   ,
13.  else ifthen  //case 5
14.   ,
13.  else             //case 6
14.   exit()
15.  end if
16.  
17. end for
18. for each in do
19.  ifthen
20. end for
21. 
22. end for
23. return

5. The DPRS Algorithm

In this section, we proceed to study the DPRS algorithm using the previously discussed supporting techniques. Before introducing the algorithm, the PRs of the first CTI in a retrieval CTIS () are preliminarily annotated by the medical experts and saved in the database. Each CTI in the sequence is evenly split into a number of IBs (i.e., NIB and RIB) among which the RIBs require encryption and the NIBs not. Finally, since RIBs have higher transmission priorities than NIBs, they can be transmitted based on the priority in a descending order.

Our proposed DPRS algorithm is summarized in Algorithm 8. As in Figure 9, a retrieval CTIS () is first sent to , and then, it is routed to the corresponding for efficient filtering and refinement processing in parallel with the aid of the UDI framework; finally, the user node receives the answer CTISs. It is worth noting that in line 4, before transmitting the CTISs to , the decryption processing of the RIBs in the CTISs must be completed. As a result, the reconstruction and display of the answer CTISs may be guaranteed to be accurate. DPRSearch () is implemented in detail in Algorithm 7.

Input: : a retrieval CTIS, : retrieval radius, and : the CTISs
Output: : the result CTISs
1. A retrieval CTIS () is submitted to ;      //at the user node level
2. The retrieval request is routed to ;       //at the master node level
3. ;         //at the slave node level
4. Transmit the CTISs in to based on different transmission priorities

6. Experiments

In this section, we conduct extensive simulation experiments to demonstrate the efficiency of the proposed DPRS method.

The mobile client is powered by a Qualcomm® Snapdragon™ 650 processor with a 1.8 GHz quad-core CPU and a 5.9-inch full HD 1080p screen. The client system is built using the Java programming language and runs on the Android platform [33]. The master and slave nodes have 1 Gbps network connections. On the slave nodes, the IB (RIB and NIB) replicas with varied transmission priority are stored in a file system, and some structured data are stored in MySQL [34]. Each node is equipped with a 2.7 GHz quad-core Xeon CPU, 2.0 Gigabyte of RAM, and a 1 Terabyte hard disk. The wireless network communication rate ranges from 10 Mbps to 100 Mbps.

The experiment dataset comes from the affiliated Hangzhou First People’s Hospital of School of Medicine in Zhejiang University, and it contains 50,000 CTISs used to diagnose various types of lesions. Table 2 shows the distribution proportion of the lesion organs.

6.1. A Demo for Prototype System

Figure 10 depicts a demonstration of the prototype system. An example of the CTIS preprocessing backend interface is shown in Figure 10(a) in which a PR has been marked by a blue rectangle line. In Figure 10(b), a CTIS with the category “stomach” has been inputted as a retrieval sequence. Six result CTISs were quickly retrieved, and their matching IBs are restored and shown.

6.2. Effectiveness of the DPRS Method

In the first experiment, we use three categories of the CTISs (e.g., lung, leg, and heart) as experimental data to demonstrate the effectiveness of our DPRS method. To objectively evaluate the retrieval effectiveness, we adopt two metrics (i.e., recall and precision rates) that are defined below: where rel refers to the set of ground-truth and ret means the set of result CTISs returned by a similarity range search.

In Figure 11, the performance comparisons of our technique with three categories of the CTISs are shown. In this experiment, comparisons of the retrieval effectiveness of the 10 CTISs with the same organ (i.e., lung, leg, and heart) are conducted which are chosen at random from the database. It can be seen from the figure that the precision decreases steadily when the recall ratio increases.

6.3. Effect of Data Size

This experiment investigates the effect of data size (i.e., the number of the CTISs) on the retrieval efficiency, in which the network bandwidth is 100 Mbps and the number of slave nodes is 20, and the UDI framework is used. In Figure 12, as the data size increases, the overall response time grows dramatically at first and then slowly. The reason for this is that the index performs better when there is more data.

6.4. Evaluation of Data Distribution Scheme

In this experiment, we proceed to empirically investigate the data distribution scheme for the DPRS processing performance. Method 1 uses an SSL-based data distribution scheme, and method 2 distributes CTISs randomly across multiple slave nodes. Suppose that the network bandwidth is relatively steady (e.g., 100 Mbps), and the retrieval radius () is fixed. Figure 13 demonstrates that when the number of the slave nodes in the retrieval process grows, the total response time for method 1 is smaller than that for method 2. The performance gap rises dramatically with the further increase of the number of slave nodes. This is because method 1’s load balance is superior to method 2, particularly when the number of slave nodes is large.

6.5. Evaluation of the UDI Framework

The last experiment takes an evaluation of the UDI framework on retrieval performance. Here, method 1 employs the UDI, while method 2 linearly scans the CTISs at each slave node to obtain the answer ones. When the data size is 50,000 and the network bandwidth is 100 Mbps, the gap between the response times of the two approaches widens as the number of the slave nodes increases from 20 to 100 (ref. Figure 14) since with the aid of the UDI framework, finding the answer CTISs is significantly faster than without it, especially when there are a lot of slave nodes.

7. Conclusions

In this paper, we introduced the DPRS method—a progressive distributed and parallel similarity retrieval of CTISs, which can ensure the high performance of the privacy-preserving similarity retrieval under the condition of low and unstable network bandwidth. To efficiently minimize the image transmission cost and enhance the retrieval reliability, we also devised four supporting techniques, i.e., (1) PCTI-based similarity measurement, (2) lightweight privacy-preserving strategy, (3) SSL-based data distribution scheme, and (4) the UDI framework. The experimental results reveal that the DPRS method is better for retrieving CTISs in terms of decreasing network communication costs and maximizing the parallelism in I/O and CPU while maintaining security.

Data Availability

The raw data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported in part by the Natural Science Foundation of Zhejiang Province, China, under Grant No. LY22F020010 and the Zhejiang Province Public Welfare Technology Application Research Project under grant no. LGF22H180039.