Abstract

The existing studies have shown that miRNAs are related to human diseases by regulating gene expression. Identifying miRNA association with diseases will contribute to diagnosis, treatment, and prognosis of diseases. The experimental identification of miRNA-disease associations is time-consuming, tremendously expensive, and of high-failure rate. In recent years, many researchers predicted potential associations between miRNAs and diseases by computational approaches. In this paper, we proposed a novel method using deep collaborative filtering called DCFMDA to predict miRNA-disease potential associations. To improve prediction performance, we integrated neural network matrix factorization (NNMF) and multilayer perceptron (MLP) in a deep collaborative filtering framework. We utilized known miRNA-disease associations to capture miRNA-disease interaction features by NNMF and utilized miRNA similarity and disease similarity to extract miRNA feature vector and disease feature vector, respectively, by MLP. At last, we merged outputs of the NNMF and MLP to obtain the prediction matrix. The experimental results indicate that compared with other existing computational methods, our method can achieve the AUC of 0.9466 based on 10-fold cross-validation. In addition, case studies show that the DCFMDA can effectively predict candidate miRNAs for breast neoplasms, colon neoplasms, kidney neoplasms, leukemia, and lymphoma.

1. Introduction

miRNAMicroRNAs (miRNAs)areshort endogenous noncoding RNAs with about 22 nucleotides. A number of studies have shown that miRNAs play important roles in many biological processes including cell proliferation, development, differentiation, death, apoptosis, metabolism, aging, signal transduction, and viral infection [16]. Biological studies have revealed that dysregulation of miRNAs is closely related to the occurrence and development of complex diseases [79]. Dysregulation of miR-15 and miR-16 was discovered to be related with B-cell chronic lymphocytic leukemia firstly [10]. So far, it has been verified that many miRNAs are related to cancers. Five members of the miRNA-200 family (miR-200a, miR-200b, miR-200c, miR-141, and miR-429) are downregulated in the development of breast cancer [11]. Epigenetic modulation of the miR-200 family relates to transition to a breast cancer stem cell-like state [12]. Some studies demonstrate that in human colorectal cancer cells, miR-186, miR-216b, miR-337-3p, and miR-760 could work in synergy to induce cellular senescence by targeting the alpha subunit of protein kinase CKII [13]. By accurately measuring expression levels of miRNAs in the serum of 220 patients with early-stage non-small cell lung cancer and 220 matched controls, researchers found that the expressions of miR-27a, miR-106a, miR-221, miR-146b, miR-155, miR-17-5p, and let-7 were lower than those in controls, while the expression of miR-29c was increased [14].

Identifying the miRNAs associated with diseases will contribute to exploring the pathogenesis, diagnosis, treatment, and prognosis of diseases and help to develop new drugs. Some studies showed that miRNA-23, miRNA-24, and miRNA-27 contained underlying therapeutic factors in ischemic heart and vascular disease [15]. By targeting the BCL6 corepressor such as BCORL1, the migration and invasion of hepatocellular carcinoma (HCC) cells are restrained by miR-876-5p, which provides a new idea for the treatment of HCC [16]. However, the experimental methods for finding associations between miRNAs and diseases are expensive and time-consuming. The computational methods for predicting potential miRNA-disease associations can provide verifiable hypotheses for further experimental verification, which can reduce biological experiment time and improve the experimental efficiency.

Recently, plenty of computational methods have been proposed to predict potential miRNA-disease associations [17]. Most of the computational methods are based on the assumption that miRNAs with similar functions are more likely to be associated with phenotypically similar diseases and vice versa. These methods are based on different principles to predict miRNA-disease associations, such as similarity-based methods, machine learning-based methods, and matrix factorization-based methods.

The previous similarity-based computational methods were based on miRNA-target interaction network and protein-protein interaction (PPI). For example, Jiang et al. [18] proposed a method to predict potential miRNA-disease associations by applying a scoring system to human phenome-microRNAome network and functionally related miRNA network. Shi et al. [19] developed a computational framework to identify miRNA-disease associations by preforming random walk with restart. The method utilized the function connections between miRNAtargets and disease genes in protein-protein interaction (PPI) networks. Mrk et al. [20] presented a miRNA-protein-disease association prediction model (miRPD) in which miRNAs are linked to diseases via the underlying proteins. However, these methods largely relied on miRNA-target interactions which have high false-positive rate and false-negative rate, so they cannot achieve satisfying prediction performance.

To solve the above-mentioned problem, some similarity-based methods without relying on miRNA-target interactions were proposed. Similarity computation strategy is the key issue for miRNA-disease prediction [21]. Xuan et al. [22] presented a prediction algorithm called HDMP. HDMP predicts potential disease-associated miRNAs based on weighted -nearest similar neighbors. However, HDMP could not predict miRNAs (diseases) associated with new diseases (miRNAs) due to local similarity networks. So, some global similarity-based methods were proposed, which construct a heterogeneous global network by integrating miRNA similarity, disease similarity, and known human miRNA-disease associations. For example, Chen et al. [23] proposed a global network-based prediction model, RWRMDA, to infer potential miRNA-disease association by implementing the random walk algorithm on a global network. However, it was not applicable for new diseases without any known associated miRNAs. Xuan et al. [24] proposed another prediction model called MIDP based on random walk. Compared with RWRMDA, it could predict related miRNA for new diseases. Liu et al. [25] proposed a miRNA-disease association prediction method by random walk on heterogeneous network constructed by integrating multiple data sources. In addition, some improved algorithms based on random walk were proposed [26, 27]. In addition to the random walk algorithm, other global network-based methods were proposed. For example, Chen et al. [28] developed the model for miRNA-disease association prediction (WBSMDA) by utilizing within score and between score. The within-score can capture miRNA similarity and disease similarity in known miRNA disease pairs, and the between-score can capture miRNA similarity and disease similarity in unknown miRNA-disease pairs. Next year, Chen et al. [29] proposed a computational model based on super-disease and miRNA for potential miRNA-disease association (SDMMDA) prediction. You et al. [30] proposed a path-based miRNA-disease association (PBMDA) prediction model. PBMDA adopted depth-first search algorithm on a heterogeneous graph. Zeng et al. [31] applied link prediction algorithm named structural perturbation method (SPM) on the miRNA-disease bilayer network to predict potential miRNA-disease associations. Chen et al. [32] proposed a computational model of bipartite network projection for miRNA-disease association (BNPMDA) prediction. The model took advantage of the agglomerative hierarchical clustering and improved the baseline algorithm of bipartite network recommendation based on the constructed bias ratings. In addition, some researcher utilized lncRNA-related other information to predict potential miRNA-disease associations. Chen et al. [33] developed a triple layer heterogeneous network miRNA-disease association (TLHNMDA) prediction model. In the model, the triple layer network was constructed by integrating the known miRNA-disease associations, miRNA-lncRNA interactions, miRNA function similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. Zhao et al. [34] developed a computational method based on a distance correlation set to predict miRNA-disease associations (DCSMDA), which integrated known lncRNA-disease associations, known miRNA-lncRNA associations, disease semantic similarity, and various lncRNA and disease similarity measures to construct a miRNA-lncRNA-disease network.

Furthermore, many computational models using machine learning to identify potential associations between miRNAs and diseases have also begun to appear. Chen and Yan [35] proposed regularized least squares for miRNA-disease association (RLSMDA) to uncover the relationship between diseases and miRNAs. Next, Chen and Huang [36] presented a prediction model based on Laplacian regularized sparse subspace learning called LRSSLMDA, which extracted two informative feature profiles by performing feature extraction from the integrated similarity. Chen et al. [37] developed a model of random forest for miRNA-disease association (RFMDA) prediction based on machine learning. Liang et al. [38] developed a method to discover disease-related candidate miRNAs based on adaptive multiview multilabel learning. Zhao et al. [39] developed adaptive boosting for miRNA-disease association (ABMDA) prediction to predict potential associations between diseases and miRNAs, which can balance the positive and negative samples by performing random sampling based on -mean clustering on negative samples, and integrated weak classifiers to form a strong classifier based on corresponding weights. Wang et al. [40] proposed miRNA-disease association prediction model- (LMTRDA) based logistic model tree, which fused multisource information, especially the introduced miRNA sequence information. Chen et al. [41] proposed an ensemble of decision tree-based miRNA-disease association (EDTMDA) prediction model, which is a computational framework of integrating ensemble learning and dimensionality reduction. Deep learning can capture hidden, complex, and nonlinear relationships from the original data. Deep learning has been applied to various fields of bioinformatics. With the rapid development of deep learning, some deep learning-based methods have been proposed to solve the problem about miRNA-disease association prediction. For example, Chen et al. [42] proposed a model of restricted Boltzmann machine to predict multiple types of miRNA-disease association (RBMMMDA). Xuan et al. [43] presented the convolutional network-based methods for predicting candidate disease. Zeng et al. [44] developed a neural network model to predict miRNA-disease associations (NNMDA). NNMDA not only aggregated the neighbor information during the process but also preserved the topology of the original network at the same time. Gong et al. [45] proposed a network embedding-based multiple information integration method (NEMII) for miRNA-disease association prediction. Peng [46] proposed a learning-based framework, MDA-CNN, for miRNA-disease association identification. The model captures interaction features based on disease similarity network, miRNA similarity network, and protein-protein interaction network and employed an autoencoder to identify the essential feature combination for each miRNA-disease pair, and it used a convolutional neural network to predict the final label. Chen et al. [47] developed a model of deep-belief network for miRNA-disease association (DBNMDA) prediction. DBNMDA utilizes the information of all miRNA-disease pairs by introducing the unsupervised pretraining process. Then, according to the parameters obtained by pretraining, positive samples and the same number of randomly selected negative samples were applied to fine-tune deep-belief network.

Recently, some researchers have introduced the recommendation system to predict miRNA-disease association. Matrix factorizations are widely used in the recommendation systems. Some computational models based on matrix completion have been proposed. For example, Li et al. [48] proposed a miRNA-disease association prediction method based on matrix completion (MCMDA). MCMDA could not predict miRNAs for new diseases with no associations. In order to solve this problem, Chen et al. [49] proposed a computational model-based inductive matrix completion for miRNA-disease association prediction (IMCMDA). The model integrated miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. In addition, Chen et al. [50] integrated neighborhood constraint with matrix completion and proposed a computational model based neighborhood constraint matrix completion for miRNA-disease association (NCMCMDA) prediction. On the other hand, matrix decomposition is also used for identifying potential miRNA-disease associations. For example, Xiao et al. [51] proposed a prediction framework called graph regularized nonnegative matrix factorization (GRNMF) to infer the unknown miRNA-disease associations in heterogeneous omics data. Chen et al. [52] took advantage of the matrix factorization and network algorithm to develop a matrix decomposition and heterogeneous graph inference (MDHGI) for miRNA-disease association prediction. Cui et al. [53] proposed a robust collaborative matrix factorization method to predict novel miRNA-disease associations. The method improved the prediction accuracy by introducing the weighted nearest known neighbors and the . Gao et al. [54] presented a computational framework based on graph Laplacian regularized -nonnegative matrix factorization () for inferring possible disease-connected miRNAs.

To further improve the prediction performance, we study to predict potential miRNA-disease associations based on matrix factorization and deep learning. We propose a new miRNA-disease association prediction method called DCFMDA, which combines the multilayer perceptron (MLP) and the neural nonnegative matrix factorization (NNMF) in a deep collaborative filtering framework. Firstly, we obtain miRNA and disease similarity matrices by integrating multiple heterogeneous data. Then, we utilize MLP to extract high-level features from miRNA and disease similarity matrices and decompose the known miRNA-disease association into two low rank matrices by NNMF. Finally, we merge the output of the MLP submodel and the NNMF submodel to obtain prediction results for miRNA-disease potential associations.

The rest of this paper is organized as follows. Section 2 describes the data and method. Section 3 presents experimental results. Section 4 summarizes the paper.

2. Data and Methods

2.1. Data
2.1.1. Human miRNA-Disease Association

HMDD is a database that curated experiment-supported evidence for human miRNA-disease associations. We downloaded known miRNA-disease association data from HMDD V2.0 [55], which includes 5430 experimentally verified miRNA-disease associations between 383 miRNAs and 495 diseases. We used adjacency matrix to formalize the miRNA-disease associations, where and are the number of miRNAs and diseases, respectively. If miRNA is experimentally verified to be related with disease , the value of is 1, otherwise 0.

2.1.2. Disease Semantic Similarity

In the National Library of Medicine MeSH, each disease is described as a hierarchical Directed Acyclic Graph (DAG). As described in [56], the disease semantic similarity can be calculated based on these DAGs. For example, disease can be represented as a graph , where is the disease set of all ancestor nodes of disease including disease itself and is the edge set of corresponding links. The contribution of disease in DAG to the semantic value of disease is defined as follows: where is children set of .

The semantic value of disease is calculated by

The larger the part the two diseases share in their DAGs, the higher the similarity between the two diseases is. The semantic similarity of disease and is defined as follows:

2.1.3. Disease Functional Similarity by Functional Gene Network

The score of functional similarity between two diseases can be measured by disease-gene and gene-gene association data [57]. The functional similarity between gene and is defined as follows: where denotes edge between gene and , denotes the set of all edges in the HumanNet V2 database [58], and is an associated log likelihood score (LLS) that measures the probability of a functional linkage between gene and after normalization.

The functional association between gene and gene set is defined as follows: where indicates the number of genes in , is the ith gene of , .

The functional similarity between disease and is defined as follows: where and are gene set related to diseases and , respectively, is the number of genes in , and is the number of genes in .

2.1.4. miRNA Functional Similarity

We obtained the miRNA functional similarity data by the method provided in [56]. In the previous subsection, we have described calculating the semantic similarity between diseases. The functional similarity for each miRNA pair was calculated based on the semantic similarity of diseases. Firstly, the similarity between disease and disease group is calculated by where denotes the number of diseases in and is the ith disease of , .

Then, calculation of the functional similarity between miRNA and is equal to calculating the similarity between and , where and represent the related disease sets of miRNA and , respectively. Finally, the matrix is used to denote the miRNA functional similarity. represents the functional similarity between miRNAs and , which is calculated as follows:

2.1.5. Gaussian Interaction Profile Kernel Similarity

Gaussian kernel is a commonly used kernel function, which has been proven effective for measuring both miRNA similarity and disease similarity [59]. The interaction profile of disease is the ith column vector of the miRNA-disease association matrix. It is a binary vector representing the presence or absence of its associations with each miRNA. Gaussian interaction profile kernel similarity between disease and is defined as follows [59]: where and is used to control kernel bandwidth, which is obtained by normalizing the average number of associated miRNAs per disease.

Similarly, the Gaussian interaction profile kernel similarity between miRNA and is defined as follows: where .

2.1.6. Integrated Similarity for miRNAs and Diseases

In order to overcome the shortcomings of single similarity measure to accurately reflect the characteristics of miRNA similarity and disease similarity from different perspectives, some method integrated multiple different similarity measure data to construct the miRNA similarity matrix and disease similarity matrix to improve the prediction performance.

We integrate miRNA functional similarity and miRNA Gaussian interaction profile kernel similarity to construct the miRNA similarity matrix between miRNA and .

We also integrate disease semantic similarity , the disease functional similarity , and the disease Gaussian interaction profile kernel similarity to construct the disease similarity matrix between disease and . The formula is as follows:

2.2. Methods

As a universal computational algorithm, the recommendation algorithm has been applied in many fields including bioinformatics. The miRNA-disease association prediction can be regarded as a recommendation problem. This kind of prediction method regards miRNAs as users and diseases as commodities and recommends miRNAs to a disease according to its known preference on miRNAs and vice versa. Traditional recommendation models are mainly divided into collaborative filtering, content-based recommendation system, and hybrid recommendation system. Recently, researchers have proposed some recommendation algorithms using deep learning to overcome the shortcomings of traditional collaborative filtering models [60].

In this paper, we proposed a new deep collaborative filtering framework for miRNA-disease association prediction called DCFMDA. This method combines the multilayer perceptron (MLP) submodel and neural nonnegative matrix factorization (NNMF) submodel in deep collaborative filtering framework. Firstly, in the MLP submodel, the mth row of the miRNA similarity matrix (i.e., the similarity data between miRNA and all the other miRNAs) was fed into a multilayer perception, and the dth row of the disease similarity matrix (i.e., the similarity data between disease and all the other diseases) was fed into another multilayer perception. The two MLPs would be trained to learn high-level biological patterns from miRNA similarity and disease similarity, respectively. Secondly, all known miRNA-disease association pairs were fed into the neural NNMF submodel to train. Finally, the output of the two submodels was merged to get prediction scores of miRNA-disease pairs. The proposed method is shown in Figure 1.

2.2.1. Neural Nonnegative Matrix Factorization (NNMF)

Matrix factorization (MF) based approaches are proven to be highly accurate and scalable in addressing collaborative filtering (CF) problems [61]. The purpose of nonnegative matrix factorization (NMF) is to find two nonnegative matrices whose product is optimal approximation to the original matrix. Given miRNA-disease association matrix A, it can be decomposed into the product of two low rank nonnegative matrices and , namely, . Solving the problem of prediction miRNA-disease potential associations using NMF can be described as the following objective function:

However, the matrix factorization method only uses the fixed inner product to predict miRNA-disease associations, which leads to some limitation of prediction algorithm. So we use a nonnegative matrix factorization submodel (NNMF) based on a two-layer fully connected neural network to predict the potential association between miRNA and disease. In the NNMF submodel, one-hot encodings of miRNA and disease are used as the input vectors and two embedding vectors and are obtained by the embedding layer. We have two two-layer fully connected neural networks to transform the representations of and . Through the neural network, and are mapped to low-dimensional vectosr and in a latent space, respectively. The miRNA-disease association prediction submodel based on NNMF is as follows: where

2.2.2. Multilayer Perceptron (MLP)

Multilayer perceptron (MLP) is a deep learning structure, which is a feedforward neural network with multiple hidden layers between input and output layers. A single layer perceptron cannot classify linear inseparable problems, but the multilayer perceptron can overcome this weakness by the nonlinear mapping of input space based on activation function. MLP has high ability of nonlinear modeling. The structure of the multilayer perceptron model is shown in Figure 2.

We regard the miRNA-disease association prediction as a binary classification problem. That is, if there is a correlation between miRNA and disease , the corresponding label will be 1, otherwise 0. We use a submodel based on MLP to solve the problem for miRNA-disease association prediction. The number of neurons in each hidden layer of MLP is less than that in the previous layer. The number of neurons in the first hidden layer is equal to the dimension of the input vector, and the number of neurons in the last hidden layer is equal to the dimension of the output vector. Formally, we denote the input vector by , the output vector by , the number of hidden layers by , the connection weight matrix from hidden layer to hidden layer by , and the bias vector of the lth layer by , where . The MLP model is formulated as follows [62]: where denotes the output of the lth layer and denotes the activation function of the lth layer. We select ReLU as the activation function of each hidden layer, which can be computed by. ReLU is employed to alleviate the problem of the gradient disappearance and solve the overfitting problem of machine learning [63].

In our proposed MLP submodel, we used two MLPs to transform the representations of miRNA and disease. The miRNA similarity matrix and disease similarity matrix are the input of these two multilayer perceptrons. is the ith row of matrix , which represents the similarity feature of miRNA . is the jth row of matrix , which represents similarity feature of disease . We used to train the left MLP and used to train the right MLP. Through the neural network, and are finally mapped to a low-dimensional vector in a latent space. So the similarity feature vectors of the miRNA and disease can be formulated as follows:

The miRNA-disease association prediction submodel based on MLP is as follows: where , .

2.2.3. Method DCFMDA

We construct a prediction model based on NNMF and MLP. We capture the linear relationship between miRNAs and diseases by NNMF and learn the nonlinear relationship between miRNAs and diseases by MLP. The NNMF submodel and the MLP submodel share the embedding layer.

The NNMF submodel learns from known miRNA-disease association to obtain the original prediction score. The MLP submodel learns the low-dimensional feature of miRNA and disease from the miRNA similarity matrix and disease similarity matrix, respectively. Finally, we merge outputs of the NNMF submodel and MLP submodel to obtain the final prediction score for disease-related miRNAs. The presented model is formulated as follows: where and are two embedding vectors of miRNA and disease , respectively. is the similarity feature of miRNA , is the similarity feature of disease , denotes the output of the left MLP, denotes the output of the right MLP, denotes original prediction score, is obtained by concatenating with the miRNA embedding vector , and is obtained by concatenating with the disease embedding vector .

The final layer of the DCFMDA is used for the classification task, its activation function is the sigmoid function .

In our proposed model, we use the binary crossentropy loss function: where is the index of the training examples, represents the true label for the given input sample , and represents predicted result. The purpose of deep learning is to minimize loss function through continuous training iterations to get the best prediction.

Algorithm 1 describes our proposed miRNA-disease association prediction algorithm using deep collaborative filtering called DCFMDA.

Input: the number of miRNAs, the number of diseases, iterative number , the known miRNA-disease associations , the miRNA similarity matric , the disease similarity .
Output: the predicted score matrix
for each do
  for each known miRNA-disease association do
   Randomly generate four negative samples;
   Obtain the embedding vector of miRNA ;
   Obtain the embedding vector of disease ;
   Obtain the latent vector of miRNA ;
   Obtain the latent vector of disease ;
   ;
   ;
   ;
   Concatenating and to form a vector according to formula (21);
   Concatenating and to form a vector according to formula (22);
   ;
   Compute by loss function according to formula (24);
   Optimize model parameters by back propagation
  end for
end for

3. Result

3.1. Performance Evaluation

To evaluate the prediction performance of algorithm DCFMDA, we perform a 10-fold cross-validation on known experimentally verified miRNA-disease associations. The 5430 experiment-supported miRNA-disease associations are considered as positive samples. We are not sure which miRNAs are not associated with diseases. So, for each known miRNA-disease pair, we will randomly sample four unobserved miRNA-disease pairs as negative samples. For the 10-fold cross-validation, all positive and negative samples are randomly divided into ten parts. In each fold, nine of the ten parts are used for the training model in turn, and the remaining one is used as test samples.

We use the receiver operating characteristic (ROC) curve, area under ROC (AUC), precision-recall (PR) curve, and F1 score to evaluate the performance of the predictive algorithm. The ROC curve plots the true-positive rate (TPR) versus the false-positive rate (FPR) at different thresholds. The value of AUC is usually between 0.5 and 1. When AUC is 1, it means that the prediction result will achieve the best effect. Our data is seriously unbalanced, because the number of negative samples (unconfirmed miRNA-disease associations) is much larger than the number of positive samples (experiment-supported miRNA-disease associations). Therefore, we also draw a PR curve to evaluate the prediction ability of different miRNA-disease association prediction algorithms. We compare DCFMDA with four existing miRNA-disease prediction algorithms HDMP, RLSMDA, IMCMDA, and SPM. Figure 3 shows ROC curves and PR curves of the five prediction algorithms and reports their corresponding AUCs in a 10-fold cross-validation on experimentally verified miRNA-disease associations. Figure 3(a) shows that DCFMDA almost always has the highest TPRs under the same false-negative rates, and obtains the highest AUC (0.94662) among these algorithms, whereas the AUCs of HDMP, RLSMDA, IMCMDA, and SPM are 0.87156, 0.87453, 0.84081, and 0.86274, respectively. Figure 3(b) shows that DCFMDA achieves a higher precision than all the other algorithms for any given recall value. For ROC curve or PR curve, DCFMDA performs significantly better than the other four algorithms. F1 score is the harmonic mean of both metrics of recall and precision. Since there is a trade-off between precision and recall, F1 score is also used to evaluate the performance of algorithms. Table 1 shows the F1 score of the top- candidates. The F1 score of DCFMDA is more stable, while F1 scores of the other four algorithms were decreasing from the top 50 to top 1000. From Table 1, we can see that DCFMDA achieved better performance than other algorithms in terms of the F1 score.

The experiment results indicate that our method can achieve higher prediction performance. One reason is that we introduce deep learning into miRNA-disease prediction and effectively integrated neural nonnegative matrix factorization and multilayer perceptron technology to predict potential miRNA-disease associations. Another reason is that different miRNA similarity and disease similarity data are used to construct the miRNA similarity matrix and the disease similarity matrix.

3.2. Case Study

In order to further verify the prediction performance of DCFMDA, we carried out case studies on five diseases including breast neoplasms, colon neoplasms, kidney neoplasms, leukemia, and lymphoma. From HMDD V2.0, we obtained 5430 known associations and 184155 unknown associations between 495 miRNAs and 383 diseases. For our case study, all the known miRNA-disease associations were used as training samples, and other unknown associations were regarded as candidate associations for validation. For each investigated disease, we ranked candidate miRNAs according to their predicted scores and selected the top-50 candidate miRNAs to verify whether the candidate miRNAs were associated with the current disease by two other databases, namely, dbDEMC [64] and miRCancer [65], as well as published literatures. The validation results are shown in Table 2. The database dbDEMC 2.0 is an integrated database that documents 209 expression profiling data sets with 36 cancer types and 73 subtypes, and a total of 2224 differentially expressed miRNAs were identified. It allows users to make a quick search of the differentially expressed miRNAs in certain cancer types. The database miRCancer is a miRNA-cancer association database that provides comprehensive collection of miRNA expression profiles in various human cancers. A user can search the database by miRNA or cancer name.

There are intersections between known miRNA-disease associations obtained from databases HMDD V2.0, dbDEMC, and miRCancer. For example, 546 of the 5430 known miRNA-disease associations in HMDD V2.0 also exist in dbDEMC 2.0. Because we only predict and verify the candidate miRNAs unrelated to the investigated disease in HMDD V2.0, none of these candidate miRNAs exist in HMDD V2.0. We can be sure that the validation of candidate miRNAs is completely independent of HMDD V2.0.

Breast neoplasms are one of the most common cancers for women. We have inferred associations between all the candidate miRNAs for breast neoplasm and confirmed 49 of the top-50 candidate miRNAs to be association with breast cancer by dbDEMC, miRCancer, and published literatures (see Table 3). Because the same miRNA gene may have different identifiers, we can use the alias to verify whether the miRNA is associated with breast cancer. We obtain alias of miRNAs by retrieving the miRBase and GeneCards databases. For example, hsa-mir-371 (the alias of hsa-mir-371a) and has-mir-642 (the alias of hsa-mir-642) can be confirmed to be related to breast cancer by dbDEMC2.

Colon neoplasms are a common malignant tumor of the digestive tract occurring in the colon. Various evidences indicate that miRNAs potentially play an important role in predicting markers of early diagnosis, prognosis, and chemosensitivity of colon cancer [66]. DCFMDA has inferred associations between all the candidate miRNAs for colon cancer, and all top-50 candidate miRNAs are confirmed to be associated with colon neoplasms by dbDEMC, miRCancer, and published literatures (see Table 4).

Kidney neoplasms are one of the most rapidly growing malignant tumors. Abnormal expression of miRNAs has been detected in several kinds of kidney cancers. For example, compared with the normal samples, the expression of hsa-mir-194 (the third in Table 5) was reported to be downregulated in kidney neoplasm patients. Literature [67] confirmed that the expression level of hsa-mir-378 (the twelfth in Table 5) is up-regulated in the blood of patients with renal cell carcinoma compared to healthy controls. Predicting miRNAs related with kidney neoplasm by DCFMDA, 49 of the top-50 candidate miRNAs have been validated (see Table 5). Considering the different names of the same miRNA, we can find previous IDs of these miRNAs in the miRBase database. For example, hsa-mir-378a cannot be retrieved to be related with kidney cancer from database dbDEMC and miRCancer, but it can be retrieved by its previous ID has-mir-378.

Leukemia is a cancer caused by an overproduction of damaged white blood cells. It is the most common cancer among people under 15 years old. MiRNAs play an important role in the development of leukemia. One of the most typical examples is the association of miR-15a and miR-16a with chronic lymphocytic leukemia. Researchers found that 65% of B cell chronic lymphoblastic leukemia patients have deletions of chromosome 13q14, a locus that includes miR-15a and miR-16a, which consequently present downregulated expression [10]. In our case study, 46 of the 50 candidate miRNAs related to leukemia have been verified by relevant databases (see Table 6). We have verified hsa-mir-323 to be associated with leukemia by database dbDEMC, where hsa-mir-323 is the previous ID of hsa-mir-323a. We are not sure whether the remaining four of the top-50 miRNAs, namely, hsa-mir-302b, hsa-mir-216b, hsa-mir-668, and has-mir-489, are related to leukemia.

Lymphomas are the most common ones of hematologic tumors. For the top-50 lymphoma-associated miRNAs predicted by DCFMDA, 49 of them have experimental literature evidence (see Table 7). For example, Literature [68] found that the expression of hsa-mir-223 was downregulated more than twice in diffuse large B-cell lymphoma (DLBCL) .

4. Conclusion

Predicting disease-related miRNAs will help people understand the underlying pathogenesis of diseases. To overcome the time-consuming and expensive shortcomings of experimental methods, researchers have focused on identifying miRNA-disease potential association by computational methods. Compared with existing methods, the main contribution of our work is to propose a method of predicting potential miRNA-disease association by deep collaborative filtering. In addition to the experimental confirmed miRNA-disease association, our method constructs the miRNA similarity matrix by integrating the miRNA functional similarity and miRNA Gaussian interaction profile kernel similarity and constructs the disease similarity matrix by integrating the disease semantic similarity, disease functional similarity, and disease Gaussian interaction profile kernel similarity. The performance of our method is validated by 10-fold cross-validation and case studies. The experiment results indicate that our method can achieve effective and reliable prediction results. In the future, we will further improve the prediction performance of DCFMDA by the following three aspects. Firstly, considering that the random selection of negative samples may lead to a false negative, we will use an unsupervised deep learning model for prediction. Secondly, our model simply combined the lncRNA similarity and disease similarity as the feature vector of miRNA-disease association, which cannot accurately describe the association features. Lastly, to better integrate various miRNA similarity and disease similarity by weighted average, we will study an optimal weighting strategy so that object similarity matrices can be appropriately constructed.

Data Availability

Known miRNA-disease association data were taken from database HMDD 2.0 (http://www.cuilab.cn/hmdd), human microRNA functional similarity (http://www.lirmed.com/misim/), and disease semantic similarity (https://github.com/IMCMDAsourcecode/IMCMDA).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61962004 and 61462005.