Abstract

The Internet of Things has a wide range of applications in the medical field. Due to the heterogeneity of medical data generated by different hospitals, it is very important to analyze and integrate data from different institutions. Functional magnetic resonance imaging (fMRI) is widely used in clinical medicine and cognitive neuroscience, while resting-state fMRI (rs-fMRI) can help reveal functional biomarkers of neurological disorders for computer-assisted clinical diagnosis and prognosis. Recently, how to retrieve similar images or case histories from large-scale medical image repositories acquired from multiple sites has attracted widespread attention in the field of intelligent diagnosis of diseases. Although using multisite data effectively helps increase the sample size, it also inevitably introduces the problem of data heterogeneity across sites. To address this problem, we propose a multisite fMRI retrieval (MSFR) method that uses a deep hashing approach and an optimal transport-based domain adaptation strategy to mitigate multisite data heterogeneity for accurate fMRI search. Specifically, for a given target domain site and multiple source domain sites, our approach uses a deep neural network to map the source and target domain data into the latent feature space and minimize their Wasserstein distance to reduce their distribution differences. We then use the source domain data to learn high-quality hash code through a global similarity metric, thereby improving the performance of cross-site fMRI retrieval. We evaluated our method on the publicly available Autism Brain Imaging Data Exchange (ABIDE) dataset. Experimental results show the effectiveness of our method in resting-state fMRI retrieval.

1. Introduction

With the rapid construction of digital hospitals, a large number of medical images are generated every day in hospitals. Fusion medical images from different hospitals and establishing an effective medical image retrieval system can reduce the workload of doctors to a certain extent, assist doctors in diagnosis, and strengthen cooperation and exchanges between different hospitals. Autism spectrum disorder (ASD) is a common neurodevelopmental disorder that usually occurs in early childhood. The main symptoms of this disease are social and verbal communication difficulties, narrow interests, stereotyped behavior, and impaired self-care ability [1]. According to the Centers for Disease Control and Prevention (CDC), one in every 59 American children was diagnosed with ASD in 2018, with the prevalence continuing to rise (https://www.cdc.gov/ncbddd/autism/data). The American Autism Association estimates that the lifetime treatment costs of ASD range from to million, bringing a heavy burden on patients and their families [2]. According to the World Health Organization, ASD has become one of the major diseases that seriously affect the quality of life and physical health. In order to auxiliary clinical diagnosis, it is important to develop an effective medical image retrieval system according to previous cases or medical images for better diagnosis and treatment of ASD. Several studies have already shown that neuroimaging can help find the imaging biomarkers and pathological changes in the brain, thus having great prospect in the diagnosis of ASD.

Magnetic resonance imaging (MRI) techniques have been widely used in the analysis of brain diseases such as ASD [3, 4], Alzheimer’s disease [57], Parkinson’s disease, and others [811]. Previous studies have revealed that behavioral and cognitive deficits in people with ASD are closely related to abnormal connectivity in brain networks, which include hyperconnectivity and underconnectivity [1214]. Currently, the rs-fMRI technique is the best noninvasive tool to observe the changes of neural activities in brain networks. The most widely used method of rs-fMRI technology is the blood oxygen level-dependent (BOLD) method, which measures brain neural activities by detecting changes in blood flow of subjects without performing any specific task. Supervised learning usually has a strong assumption—independent and identically distributed, i.e., all training and test data are sampled independently from the same unknown data distribution. However, this is difficult to meet in most production practices. For multisite studies, there are differences in the distribution of data across these research institutions due to differences in the equipment used, parameters set, and demographic characteristics of the data collected by these institutions. Although many studies have demonstrated the effectiveness of rs-fMRI-based machine learning methods for automated diagnosis of ASD [15], most of them ignore the problem of fMRI heterogeneity across different imaging sites, which will significantly degrade generalization ability of models.

Conventional approaches using multisite data usually train data from each site separately and test them at different sites [16] or mix data from all sites [17]. These approaches clearly ignore the heterogeneity of data across sites and lead to poor model generalization. Other approaches of using multisite data take into account the data heterogeneity across different imaging sites and alleviate this problem by appropriate methods of domain adaptation [18, 19]. Typically, these approaches train the model on the source domain, perform domain adaptation on both source and target domain data, and finally test model using the target domain data. As shown in Figure 1, the model trained from source domain data may perform poorly in the target domain when the distribution difference between the source and target domain is significant. Thus, it is helpful to improve the model generalization by bringing the source and target domain distributions closer together through appropriate domain adaptation methods. Furthermore, these methods can be divided into semisupervised domain adaptation and unsupervised domain adaptation, which depends on whether the labels of the target domain can be used. Specifically, semisupervised domain adaptation approaches require labeling information of partial the target domain data besides the necessary labels of source domain data. However, labeled data is often labor-intensive, costly, and time-consuming. Therefore, unsupervised domain adaptation approaches are more widely used.

Existing studies on unsupervised domain adaptation methods can be broadly divided into two categories. Many approaches tend to use some predefined distribution divergence [20], such as maximum mean discrepancy (MMD) [21, 22], cosine similarity, Kullback–Leibler divergence, mutual information, and higher-order moments. Many approaches are based on generative adversarial network (GAN), which learns domain-invariant features to confuse feature extractors and discriminators [23, 24]. Optimal transport (OT) is another popular approach [25, 26] that seeks a probabilistic coupling by minimizing the distribution between the source and target domains. The coupling is used to transform the source data through an estimated mapping (called barycentric mapping), which ultimately brings the source and target domain distributions closer together.

Most current retrieval works based on medical images such as MRI and CT data rely on natural image retrieval techniques. Due to the powerful nonlinear representational capabilities of deep learning and the advantages of hash codes in data storage and fast searching, deep hash learning for image retrieval has achieved promising performance in terms of retrieval accuracy and speed. Deep hashing methods can generally be classified into (1) unsupervised hashing [27] and (2) supervised hashing [28, 29] according to whether supervised information is used. Unsupervised hashing usually uses topological information and data distribution to learn hash functions but often requires longer hash code to obtain better retrieval accuracy. Supervised hashing can learn higher-quality hash functions through supervised information, so it is better than unsupervised hashing methods. Existing supervised hashing methods typically learn hashing functions from pairwise or triplet relations, which only captures data similarity locally. A recent paper proposed a global similarity metric method to compute a central similarity metric by introducing the concept of hash center, which encourages hash codes of similar sample pairs to approach the same hash center and those of dissimilar sample pairs to converge to different hash centers, thus improving hash learning efficiency and retrieval accuracy [28]. However, achieving true end-to-end deep hash training remains a difficult task since the use of function usually lead to gradient vanishing problems. Thus, most approaches employ a relaxation scheme in place of strict discrete constraints [2830], but these relaxation strategies usually lead to suboptimal hash codes due to quantization errors. Su et al. [31] designed a hash coding layer to learn hash codes directly, which uses a strictly function in forward propagation to maintain discrete constraints and intactly transmits the gradients to the front layer in backward propagation to avoid gradient vanishing.

In this article, we propose a multisite fMRI retrieval (MSFR) method for multisite fMRI retrieval. To achieve better retrieval results, the MSFR will firstly reduce the heterogeneity of rs-fMRI between different sites and then employ a deep hashing method for fMRI retrieval where a novel hash coding layer is introduced to use classification loss to assist in better hash code learning. As shown in Figure 2(b), we design a multiway parallel framework for domain adaptation retrieval. Specifically, we design a structurally identical deep neural network to each source domain. On this basis, the optimal transfer module is used to reduce the discrepancy between each source and target domain, while hash function learning is performed. Experimental results on rs-fMRI scans from the publicly Autism Brain Imaging Data Exchange (ABIDE) dataset demonstrate the effectiveness of the proposed method. To our knowledge, this is among the first studies to apply domain adaptation methods to multisite functional MRI retrieval.

Major contributions of this work can be summarized as follows: (1) We propose to reduce the marginal distribution between source domain sites and target domain sites through optimal transport theory to alleviate the problem of data heterogeneity between different sites. (2) We develop a method to learn high-quality hash functions using a hashing method with a global similarity metric combined with a classification loss introduced by a novel hash coding layer. (3) We introduce a parallel framework using an inconsistency model to further improve multisite fMRI retrieval.

The rest of this paper is organized as follows. In Section 2, we review the relevant studies. In Section 3, we introduce the materials and present our proposed method. In Section 4, we present the experimental settings and experimental results. In Section 5, we perform ablation study, analyze the influence of two hyperparameters, and discuss the limitations of the current work. This paper is finally concluded in Section 6.

Unlike the general natural image dataset, medical images are more difficult to collect due to their privacy and the high level of expertise required to support the various annotations. Data from multiple institutions are typically pooled together to form larger data sets and to facilitate research. However, multisite data introduces the problem of data heterogeneity, so domain adaptation has attracted extensive research in recent years in the diagnosis and analysis of disease in order to reduce data heterogeneity and improve model performance. For example, Wang et al. [32] proposed a multisite ASD detection algorithm. To eliminate intersite data heterogeneity, they first divided the multisite training data into two groups according to whether the sample was ASD or healthy control (HC). Then to learn the latent representations and cluster them within each group, similarity-based multiview linear reconstruction model was used. A nested singular value decomposition (SVD) method was designed to mitigate the heterogeneity of the data across sites. Zhang et al. [33] aligned feature and label distributions based on optimal transmission theory to reduce multisite heterogeneity in ASD diagnosis and achieve better classification performance. Many studies have demonstrated that domain adaptation can promote the efficiency of medical image analysis, so it is natural that for multisite learning tasks, domain adaptation becomes an essential component.

Medical image retrieval can help medical practitioners to quickly and accurately retrieve relevant data from the vast amount of medical images and to detect abnormalities in brain information in a timely manner, enabling automatic prediction and diagnosis of brain diseases in order to make effective preventive measures. There has been a lot of work done using medical images for retrieval. For example, Peng et al. [34] designed a retrieval model that combines triple loss with existing deep Cauchy hashing methods to accelerate nearest neighbor search in Hamming space and tested it on colorectal cancer (CRC) histology dataset.

3. Materials and Method

3.1. Materials and Image Preprocessing

In this work, we used rs-fMRI data from the International Autism Brain Imaging Data Exchange (ABIDE) repository (http://preprocessed-connectomes-project.org/abide/). It consists of 17 international sites containing 1112 subjects, including 539 subjects with ASD and 573 healthy controls (HCs). The dataset is a publicly available online project that provides imaging data from ASD and healthy control participants and information on their phenotypes [35]. Considering the limited amount of data at some sites, we selected 4 sites with a sample size of more than 50 among the 17 sites for the experiment, namely Leuven, NYU, UCLA, and UM. So we use the 4 imaging sites with 408 subjects, including 228 ASD patients and 180 HCs. Table 1 shows the demographic information and imaging parameters used in the four sites.

To improve the signal-noise ratio of rs-fMRI and to better extract blood oxygen level-dependent (BOLD) time series, in this work, we used rs-fMRI provided by the Preprocessing Connectome Project Program (http://preprocessed-connectomes-project.org/abide/) and preprocessed it using the Configurable Pipeline for the Analysis of Connectomes (C-PAC) [36]. The specific processes of this preprocessing pipeline include slice-timing, motion correction, nuisance signal regression, temporal filtering, and spatial normalization of Montreal Neurological Institute (MNI) templates. The brain was then divided into 116 regions of interest (ROIs) using anatomical automatic labeling (AAL) atlas [37]. Subsequently, for each sample, the mean BOLD time series of a set of brain regions were extracted, and the Pearson correlation coefficients between the individual ROIs were calculated, resulting in a symmetric matrix, which is the resting-state functional connectivity matrix. To facilitate the use of this data as input to the model, we obtained the functional connectivity features by retaining only the elements on the upper triangle of the functional connectivity matrix and converting the retained triangles into a 6,670-dimensional feature vector for representing each subject. The construction process of the functional connectivity features is shown in Figure 2(a).

3.2. Proposed Method

The overall framework of our proposed MSFR is illustrated in Figure 2. It consists of two major parts: (a) constructing the functional connectivity matrix and extracting functional connectivity features for each subject and (b) learning the entire hash retrieval model and acquiring the retrieval database and query sample hash codes. The training of the hash retrieval model consists of two important processes: (1) reducing the marginal distribution discrepancy between source and target domain based on optimal transport theory to alleviate the problem of data heterogeneity between different sites and (2) learning high-quality hash functions using a hashing method with a global similarity metric combined with a classification loss introduced by a novel hash coding layer. More detailed descriptions are given below.

3.2.1. Optimal Transport-Based Multisite fMRI Adaptation

Given a dataset of -labeled source domains , where denotes the functional connectivity features of all subjects and is the category label associated with the -th source domain data, the unlabeled target domain is represented as , where is the number of target subjects. In this work, we use a traditional assumption in unsupervised multisource domain adaptation, namely that the conditional probability distribution , but the marginal probability distribution [38].

Recently, Damodaran et al. [39] combined deep learning with optimal transform- (OT-) based domain adaptation for a classification task, by learning a new representation for the source and target domains in the same distribution and preserve the discriminative ability of the classifier. DeepJDOT uses a deep convolutional neural network to map the input to a latent feature space and then uses a classifier to map the data in that space to a label space over the target domain. By jointly optimizing the feature space and label space to solve the coupling and the model , , DeepJDOT can achieve good performance on the target domain.

Inspired by this, our work draws on DeepJDOT’s approach but differs in that we consider only the marginal distribution. This is because getting a robust hash function is more important in our retrieval task than getting a robust classification model. However, the alignment conditional distribution in DeepJDOT supposes that the target domain samples are predicted accurately enough, which requires a more robust classifier, which is different from our task. So on this basis, in each submodel, we embed the feature extractor into the optimization of the optimal transport coupling between the marginal distributions and by aligning the marginal distributions.

So the loss function for our domain adaptation component between source domain and target domain is defined as follows:

3.2.2. Central Similarity Metric-Based Hashing Learning

While existing supervised hashing methods typically capture data similarity only locally, we build on the central similarity metric proposed by Yuan et al. [28] to capture global similarity between data while using a novel hash coding layer to introduce classification loss to accelerate the convergence of the retrieval model.

Specifically, we define the hash centers as a set of points in the -dimensional Hamming space, where . Note that the hash centers have sufficient distance in Hamming space. Similar samples should be clustered into the same center, and different samples should be clustered into different centers so that we can aggregate similar samples together and separate dissimilar samples better. In the -dimensional Hamming space, the average pairwise distance between hash centers satisfies: where denotes the number of different combinations of hash centers and , is the number of hash centers, and is the Hamming distance. For single-label data, a corresponding number of hash centers are generated based on the number of categories . Specifically, each bit of the hash center is sampled from the Bernoulli distribution Bern (0.5). After that, each sample is assigned a hash center corresponding to its category to obtain the semantic hash center , where is the hash center of sample .

Maximizing the logarithm posterior of the hash codes, the hashing learning objective can be obtained. Specifically, given the semantic hash center , the logarithm maximum a posteriori probability estimation of the hash code for training data can be obtained by maximizing .

where , obtained by a hash function mapping the data from the input space to the -dimensional Hamming space; is the a priori distribution of hash codes; and is the likelihood function modelled as the Gibbs distribution. Then we obtain the central similarity loss for source domain : where is the -th bit of the hash center of the -th sample in the -th source domain. It is worth noting that in the training phase is not really a binary code. This is because the binary hash code is implemented with a function appended after the hash encoding layer, but since the use of the function causes the gradient to vanish, this can make the optimization NP-hard. A common practice is to use the function instead of the discrete constraint of the function. However, the use of a relaxation scheme generates quantization errors, so the bimodal Laplace prior proposed in deep hashing network for efficient similarity retrieval (DHN) [40] is introduced here for quantization, and since is a nonsmooth function and it is difficult to calculate the derivative, we use the smooth function to replace it. The quantization loss is defined as

The hashing method based on the central similarity metric will lead to quantization error, so we introduce a novel hash coding layer in Greedy Hash [31] and utilize the classification loss to assist in learning a better hash code. Specifically, the function (i.e., ) is used strictly for the output of the hash coding layer in forward propagation, while the gradient is transmitted directly to the previous layer during backward propagation, effectively preventing the gradient from vanishing. Thus, the classification loss using cross-entropy in source domain can be defined as where is the cross-entropy and is the classifier. We define the overall loss function as

and are parameters that control the contribution of OT domain adaptation loss and classification loss, respectively.

3.2.3. Alternating Optimization Algorithm

In the training phase, we randomly sample a batch of samples in the target domain and one of the source domains and assign a submodel to that source domain. Here, each submodel consists of a feature extractor , a hash coding layer , and a classification layer . The following optimization is then performed:

Step A: by fixing the parameters of in the submodel, this domain adaptation part becomes a standard OT problem, and the coupling can be obtained by optimizing Equation (7) with the network simplex flow algorithm:

Step B: we set to denote all parameters of model to be optimized. With a fixed obtained in the previous step, the submodel can be updated using stochastic gradient descent (SGD) by objective function:

3.2.4. Implementation

Our proposed MSFR model is implemented using the Python based on PyTorch. A corresponding number of submodules are assigned according to the number of source domains. Each submodel has exactly the same structure, including a feature extractor , a hash coding layer , and a classification layer . Here, consists of two fully connected layers containing and neurons, respectively, each followed by a ReLU activation function. consists of a fully connected layer, and the number of neurons is determined by the length of the hash code (we set four sets of hash code lengths of bits, bits, bits, and bits), followed by a function or a function according to the needs of the task. Specifically, the hash coding layer is followed by the activation function when learning the global similarity of samples using the central similarity metric, and the hash coding layer is followed by the function when training the classifier directly using binary hash codes. Note that the function is only used in forward propagation, and the gradient is transmitted directly to the previous layer during backward propagation to prevent the gradient from vanishing. consists of a fully connected layer of two neurons and a softmax layer. We set the batch size to , used the SGD optimizer and set the learning rate to , and fixed the hyperparameters to and to .

4. Experiments

4.1. Experimental Setup

We used data from four sites in the ABIDE database, Leuven, NYU, UCLA, and UM, to evaluate the effectiveness of our method. In the training stage, we take turns selecting one site as the target domain, and the remaining three sites are used as the source domain. The test stage retrieval database consists of hash codes of all source domain data, and the query samples consist of hash codes of target domain data.

Following previous work [30, 41, 42], we use the precision recall curve (PR curve), precision curves with respect to different numbers of top returned samples, and mean average precision (mAP) as evaluation metrics to measure the retrieval effectiveness. The mAP is the widely used metric to measure the accuracy of the Hamming ranking protocol. The mAP is computed as the mean value of the average precision (AP) for all queries, and AP is computed as where is the top- search results returned and denotes the precision of the -th result returned. means that the -th result queried is correct, and vice versa . Here, the mAP value is calculated based on the first half returned neighbors for source domain sites. Namely, assuming that the number of samples in the database is , then .

4.2. Competing Methods

In our experiments, we compare our MSFR with the following 12 methods, including six deep hash methods and four traditional hash methods, as well as deep and traditional domain adaptation hash methods. The following is a brief description of each comparison method. (1)DHN [40] used pairwise cross-entropy losses to ensure similarity learning between samples and quantization losses to control the quality of hash learning. To eliminate errors due to inner product instead of the Hamming distance, the authors used bimodal Laplacian prior. With the value , the prior probability density is the largest, which means the hash function can learn the hash code of with maximum probability. is a diversity parameter of the bimodal Laplacian prior, and we use the default value of 10 for (2)HashNet [43] improved on DHN in two main ways, firstly by balancing positive and negative sample pairs and secondly by using different in the relaxation quantization phase so that the output keeps approaching 1, i.e., (3)LCDSH [44] pointed out that directly reducing the quantization error would change the feature distribution of the neural network which in turn changes the similarity between the query and retrieved images, so the authors proposed a locally constrained deep supervised hashing algorithm to solve this problem. Since is a trade-off parameter that balances the discriminability and the locality constrain, we use the default value of 3 for (4)DCH [30] also used loss of control similarity and loss of control quantization error, differing by proposing the use of a sharply varying Cauchy distribution to convert distance into similarity, thus making the learned hash codes more discriminative. is an important Cauchy distribution scale parameter, used to control the balance between precision and recall, and we use the default value of 20 for (5)QSMIH [45] uses the quadratic mutual information (QMI) in information-theoretic measures to learn hash codes. In order to meet the large-scale hash retrieval and further improve the retrieval accuracy, quadratic spherical mutual information (QSMI) is proposed on this basis. Since is a parameter to force the learned hash code to be close to 1 or -1, we use the default value of 0.01 for (6)DTSH [46] used a triple-label likelihood function to learn hash codes, and maximizing the triple-label likelihood can make the query samples more similar to the positive samples and more distinguishable from the negative samples. Here, is a hyperparameter to balance the negative log triplet likelihood and the quantization error. And we use the default value of 1 for (7)TAH [42] is a deep domain adaptation hashing algorithm that combines pairwise -distribution cross-entropy loss to learn concentrated hash code and an adversarial network to align the data distribution between the source and target domains. Here, is the trade-off parameter between maximum posterior loss and adversarial learning loss. The penalty of adversarial networks is increased from 0 to 1 gradually(8)SH [47]. In this method, the process of encoding image feature vectors by SH can be viewed as a graph segmentation problem, where a relaxation solution to the graph segmentation problem can be provided with the help of the analysis of the eigenvalues and feature vectors of the Laplacian matrix of similar graphs(9)ITQ [41]. In this method, the data in the original space is first downscaled using PCA, and then the data points in this dataset are mapped onto the vertices of a binary hypercube such that the corresponding quantization error is minimized, resulting in an excellent binary encoding for this dataset(10)LFH [48] proposed a latent factor-based model that uses the Hamming distance to model the similarity between pairs of samples. To solve the time-consuming problem of finding hyperparameters in different datasets, specialized hyperparameters are automatically assigned based on the number of samples in the dataset and the number of similarity labels(11)SDH [49] proposed a method to directly learn binary hash codes without relaxation, and the learning objective is to generate the best hash code for linear classification. Specifically, the training data is first mapped into the Hamming space, and then the transformed data is classified in this space. One of the key steps of the algorithm is to utilize discrete cyclic coordinate descent (DCC) to generate the hash code bit by bit, which solves the NP-hard binary optimization problem. Here, is the regularization parameter, and we use the default value of 1 for (12)GTH [50] is a traditional unsupervised domain adaptation hashing algorithm, which starts from the perspective of the hash projection errors in the source and target domains and seeks the maximum likelihood estimate of the errors to reduce the domain discrepancy. At the same time, the hash projections of the source and target domains are optimized iteratively, with the two influencing each other to facilitate the final optimal state

The first seven competing methods (DHN, HashNet, LCDSH, DCH, QSMIH, DTSH, and TAH) are deep hashing approaches, and three methods (SH, ITQ, and GTH) are unsupervised traditional hashing methods, while two methods (LFH, and SDH) are supervised traditional hashing methods. In all deep hashing methods, to match the input to our data and to ensure fairness, we replace the convolutional neural network with one that is consistent with the structure of , and the hash encoding layer is consistent with the setting of . All deep hash retrieval methods are supervised learning with labels, and the models all consist of two parts: the image feature extraction part and the hash coding layer that generates the hash codes.

The use of data (including preprocessing and functional connectivity feature extraction) in all comparison methods is consistent with our MSFR. In the training phase, we treat one site as the test set, and the remaining three sites are combined to be used as the training set. In the test phase, the retrieval database consists of hash codes of all source domain data, and the query samples consist of hash codes of the target domain data. We use the default values of hyperparameters in their corresponding articles for all comparison methods.

4.3. Results on ABIDE with Multisite rs-fMRI Data

We evaluate our MSFR method using data from four real sites in the ABIDE database, Leuven, NYU, UCLA, and UM, and compare it with other state-of-the-art hash retrieval methods. In all experiments, data from each site is used as target domain (test set) in turn, and data from the remaining sites are used as source domains (training set). The hash codes of the target domain data are used as the query set, and the hash codes of all source domain data are used as the retrieval database for test, with the aim of retrieving samples similar to the query samples in the database. The comparison methods include seven deep hashing algorithms (DHN, HashNet, LCDSH, DCH, QSMIH, DTSH, and TAH) and five traditional hashing algorithms (SH, ITQ, LFH, SDH, and GTH). The experimental results are shown in Table 2 and Figure 3.

As shown in Table 2, our method has the best mAP values for different bits on almost all target domain sites. Taking the hash code length of 48 bits as an example, when Leuven is used as the target domain, our MSFR (74.56%) is 7.78% higher than the best performance achieved by QSMIH (66.78%). When NYU is used as the target domain, our MSFR (69.25%) is 3.85% higher than the best performance of DCH (65.40%). Our MSFR (75.42%) outperforms the best comparison method (i.e., DCH with an mAP of 74.23%) by 1.19% when UCLA is used as the target domain, and our MSFR (75.45%) is 6.84% higher than the best performance yielded by TAH (68.61%) when UM is used as the target domain. This shows that our MSFR can be effective for cross-site retrieval. Furthermore, we can see that TAH has an advantage over other deep comparison methods on almost all target domain sites, especially when the target domain site is UM. This suggests that a proper domain adaptation method can be helpful for multisite fMRI retrieval.

Figure 3 shows that our method achieves the best average results with different hash code lengths when each of the four sites is used as a target domain for retrieval of other source domains. Specifically, compared to the deep hashing methods, our MSFR has a higher average mAP for different hash code lengths when retrieved as a query set against the source domain database at each of the four sites. Compared to the traditional hash retrieval methods, the average mAP of our MSFR is higher for different hash code lengths for each of the four sites as query sets retrieved from the source domain database. In addition, we can notice that the traditional hashing methods are generally less effective than deep hashing methods. The possible reason is that shallow hashing models may be not able to learn discriminative features and compact hash codes, while deep hashing methods can learn fMRI features and hash codes in an end-to-end manner.

In Figures 4 and 5, we further show the PR curves, recall curves, and precision curves (from left to right) with a hash code length of bits with NYU and UM as the target domain, respectively, for retrieving the source domain samples and returning top samples based on the Hamming distance ranking. Figures 4 and 5 show that our MSFR achieves almost best results in terms of PR curve (a), recall curve (b), and precision curve (c) compared to all other comparative methods. Specifically, from the PR curves in Figures 4(a) and 5(a), we can observe that MSFR basically achieves the highest precision at all recall levels. Besides, MSFR achieves higher precision at lower recall than other methods, which is important for accuracy-oriented medical image retrieval systems. Precision curves with respect to different numbers of top returned samples are shown in Figures 4(c) and 5(c). As can be seen, our MSFR achieves almost the best precision, especially when the number of samples retrieved is within 100. This indicates that our MSFR achieves more precise retrieval. And in medical image retrieval tasks, users tend to pay more attention to the top ranked samples retrieved, and MSFR has significantly better accuracy than other methods when the number of samples returned is small.

5. Discussion

5.1. Ablation Study

We introduced three variants (denoted as MSFR-1, MSFR-2, and MSFR-3) of MSFR for ablation experiments. Specifically, MSFR-1 denotes a hash approach that retains only the central similarity metric, MSFR-2 denotes a hash approach that retains only the central similarity metric and classification loss, and MSFR-3 denotes a hash approach that retains only the central similarity metric and OT domain adaptation approach. In Table 3, we report the mAP results achieved by the MSFR and its three variants with different hash code lengths when the data from the four sites of the ABIDE database were retrieved. As shown in Table 3, we can observe that MSFR generally outperformed the other three variants, which also implies that our optimal transport domain adaptation loss and classification loss both contribute to the model performance. Specifically, the performance of MSFR-2 is generally higher than MSFR-1 when the target domain is Leuven, NYU, and UCLA, and there is a sharp drop in performance only when UM is the target domain and the hash code length is 48 bits and 64 bits. And the results of MSFR-1 are 2.23%, 10.42%, and 6.75% lower than MSFR-2 at 24 bits, 48 bits, and 64 bits, respectively, probably due to the large data heterogeneity. When we deal with the data heterogeneity problem for MSFR-1 and MSFR-2, that is, MSFR-3 and MSFR, we find that the results for MSFR-1 and MSFR-2 are significantly improved in all target domains. Besides, we can observe that MSFR-3 and MSFR usually outperform MSFR-1 and MSFR-2 in most cases, implying that OT-based domain adaptation helps improve the retrieval performance. With UM as the target domain, the MSFR result is 5.18% higher than MSFR-3 for a hash code length of 48 bits, and the gap between MSFR and MSFR-3 is reduced to 1.56% and 4.78% for hash code lengths of 24 bits and 64 bits, respectively.

We also conducted experiments on single-source domain, with the UM site as the target domain and the hash code length set to bits. The experimental results are shown in Figure 6, from which we can observe that our approach achieves the best overall results. By comparing MSFR-2 and MSFR, it can be observed that domain adaptation for both source and target domains can significantly improve the retrieval results, which further illustrates the effectiveness of our domain adaptation method in alleviating the data heterogeneity between different sites. Thus, the proposed fMRI retrieval framework is expected to be applied to multisource domains to reduce data heterogeneity across sites and improve retrieval performance.

5.2. Parameter Analysis

We further analyzed the influence of two hyperparameters in the proposed method. Figure 7 reports the mAP results of our method using different values of the hyperparameter (for the domain adaptation component) and the hyperparameter (for the classification loss). Figure 7(a) shows that our MSFR can yield stable results when but cannot generate good results when . It can be seen from Figure 7(b) that the results of MSFR fluctuate within a small range of with different values of , suggesting that our method is not very sensitive to the hyperparameter .

5.3. Limitations and Future Work

This work still has some limitations to be addressed in the future. First, to address the data heterogeneity of multiple source and target domains, we designed a multiway parallel framework based on the number of source domains. This means that the larger the number of source domains, the larger the number of submodels, which may ultimately lead to higher overall complexity and training costs. Therefore, a unified model needs to be investigated in the future to solve the problem of high computational cost under a large number of source domains. Second, our model only considers data heterogeneity in the source and target domains, ignoring the fact that there is also data heterogeneity between different source domains. It is interesting to employ multisource domain adaptation strategies to further promote the performance of our method. Third, there is insufficient exploitation of the constructed brain functional connectivity data. In general, each brain network contains not only node features but also topological information between different nodes, and the high-level topological information of brain networks cannot be captured using traditional neural networks alone. As graph convolutional networks (GCN) can automatically learn node features and topological information between nodes, we will design GCN-based models to fully exploit the information in fMRI data, which will also be our future work.

6. Conclusion

In this paper, we proposed a multisite fMRI retrieval (MSFR) method that uses a deep hashing approach and a domain adaptation strategy to mitigate multisite data heterogeneity for accurate fMRI search. For a given target domain site and multiple source domain sites, our MSFR used a deep neural network to map the source and target domain data into the latent feature space and minimize their Wasserstein distance to reduce their distribution differences. We then used the source domain data to learn high-quality hash code through a global similarity metric, thereby improving the performance of cross-site fMRI retrieval. We validated the MSFR method on a real ASD multisite dataset, with results demonstrating its effectiveness in rs-fMRI retrieval.

Data Availability

The dataset used in this work can be found on the public Autism Brain Imaging Data Exchange (ABIDE) website (http://preprocessed-connectomes-project.org/abide/). The investigators within the ABIDE contributed to the design and implementation of ABIDE and provided data but did not participate in the analysis or writing of this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Acknowledgments

J. Yang, Q. Wang, and T. Tao were supported in part by the Taishan Scholar Program of Shandong Province and National Natural Science Foundation of China (No. 62176112). J. Yang and S. Niu were supported in part by the Development Program Project of Youth Innovation Team of Institutions of Higher Learning in Shandong Province.