Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2018, Article ID 5747489, 11 pages
https://doi.org/10.1155/2018/5747489
Research Article

SRMDAP: SimRank and Density-Based Clustering Recommender Model for miRNA-Disease Association Prediction

1College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
2Key Laboratory of Trusted Computing and Networks, Hunan Province, Changsha 410082, China
3School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China

Correspondence should be addressed to Yaping Lin; nc.ude.unh@rats

Received 26 November 2017; Accepted 23 January 2018; Published 21 March 2018

Academic Editor: Tao Huang

Copyright © 2018 Xiaoying Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Aberrant expression of microRNAs (miRNAs) can be applied for the diagnosis, prognosis, and treatment of human diseases. Identifying the relationship between miRNA and human disease is important to further investigate the pathogenesis of human diseases. However, experimental identification of the associations between diseases and miRNAs is time-consuming and expensive. Computational methods are efficient approaches to determine the potential associations between diseases and miRNAs. This paper presents a new computational method based on the SimRank and density-based clustering recommender model for miRNA-disease associations prediction (SRMDAP). The AUC of 0.8838 based on leave-one-out cross-validation and case studies suggested the excellent performance of the SRMDAP in predicting miRNA-disease associations. SRMDAP could also predict diseases without any related miRNAs and miRNAs without any related diseases.

1. Introduction

MicroRNAs (miRNAs) are small endogenous noncoding RNAs which are approximately 22nt long. Since the discovery of the first two miRNAs lin-4 and let-7, thousands of miRNAs have been identified in eukaryotic cells [1, 2]. A series of studies have shown that miRNAs play an important role in many biological processes, such as cell growth and apoptosis, proliferation, differentiation, and signal transduction [36]. Given that miRNAs are involved in the normal function of cells, aberrant miRNA expression has been associated with many types of human diseases, ranging from common diseases to cancers [79]. Therefore, the identification of disease-related miRNAs is beneficial in understanding the molecular mechanism of the disease pathogenesis and disease diagnosis and to further promote the level of treatment and prevention.

To date, many biological experimentations have been performed to determine a large number of miRNA-disease associations. Many studies have built databases, such as HMDD [10], miR2Disease [11], dbDEMC [12], miRCancer [13], and PhenomiR [14], to serve as a solid data foundation for predicting miRNA-disease associations. HMDD is a database manually retrieved from the literature [10]. The latest version is HMDD v2.0, which integrates 10,368 miRNA-disease associations of approximately 572 miRNA genes and 378 diseases from 3,511 papers. MiR2Disease documents 1,939 manually curated miRNA-disease associations between 299 human miRNAs and 94 human diseases [11]. The dbDEMC stores differentially expressed miRNAs in human cancers obtained from microarray data [12]. The updated version dbDEMC 2.0 contains 2,224 differentially expressed miRNAs in 36 cancer types [15]. The miRCancer stores miRNA-cancer associations obtained by text mining method [13]. PhenomiR provides information about differentially regulated miRNA expression in diseases and other biological processes [14].

However, using experimental methods to identify the disease-related miRNAs is time-consuming and costly. Based on existing data, computational methods have been developed as a valuable supplement to the experimental methods to save experimental time and cost. Computational methods can calculate and rank the similarity scores of all miRNAs for a given disease. Top-ranked miRNAs are treated as the most promising candidate disease miRNAs for further experimental studies. Similarity calculation is the key issue in computational methods [16]. According to the calculation of similarity score, most computational methods are divided into two categories [17, 18], namely, network-based methods [1928] and machine-learning-based methods [24, 2934]. Network-based methods predict miRNA-disease associations by considering the hypothesis that miRNAs with similar functions usually tend to be associated with phenotypically similar diseases [10]. Jiang et al. [19] constructed a human phenome-miRNAome functional association miRNA network using the hypergeometric distribution scoring system to select the candidate disease miRNAs. However, high final prediction accuracy may not be obtained if only the local information of each miRNA is issued and the study is strongly dependent on the predicted miRNA-target interactions. Chen et al. [21] adopted global network similarity measures and developed RWRMDA to infer the associations between diseases and miRNAs by implementing random walk on the miRNA-miRNA function similarity network. Based on the weighted k most similar neighbors, Xuan et al. [22] proposed HDMP to infer disease-related miRNAs. HDMP evaluates miRNA function similarity by incorporating the information content of disease terms, disease phenotype similarity, and weight information of the miRNA family or cluster. However, RWRMDA and HDMP cannot be useful for predicting disease without any related miRNAs. Based on social network analysis, Zou et al. [24] proposed KATZ method to compute the similarity score based on walks of different lengths between the miRNA and disease nodes. However, KATZ has relatively poor capability of sparing known associations. Gu et al. [25] calculated miRNA similarity and disease similarity of known miRNA-disease associations through the Jaccard similarity measure. They incorporated miRNA similarity of known miRNA-disease associations, miRNA functional similarity, and miRNA family information to construct miRNA similarity network and incorporated disease similarity of known miRNA-disease associations to construct disease similarity network. Then, they applied network consistency projection method to predict the disease-related miRNAs.

Machine-learning-based methods extract features from data to initially obtain effective features of miRNAs and diseases and then utilize machine learning models to predict miRNA-disease associations. Jiang et al. [29] showed a support vector machine (SVM) classifier method by integrating the feature vectors of miRNA-target and phenotype similarity. Xu et al. [31] introduced an approach based on the miRNA-target-dysregulated network to prioritize novel disease miRNAs. This method also constructs a support vector machine classifier based on the features and changes in miRNA expression. However, these two computational methods are mainly limited by the difficulty or impossibility of obtaining negative training samples, and this drawback would largely influence the predictive accuracy. To solve this problem, Chen and Yan [30] developed a semisupervised method of regularized least squares for miRNA-disease association (RLSMDA). RLSMDA integrates known disease-miRNA associations, disease similarity dataset, and miRNA functional similarity network to infer potential disease-related miRNAs. The main drawback of RLSMDA is the intricate adjustment of parameters. Xiao et al. [35] used graph-regularized nonnegative matrix factorization framework to predict potential miRNA-disease associations using weighted nearest neighbor profiles to incorporate miRNA similarity and disease matrices. Chen et al. [34] presented a computational method DRMDA based on stacked autoencoder, greedy layer-wise unsupervised pretraining algorithm and SVM, and this method was implemented to predict potential miRNA-disease associations. However, DRMDA results are not highly accurate, because of the difficulty in obtaining negative samples and optimizing the complex parameters.

Similarity calculation mainly considers miRNA-miRNA similarity measurement. Several computational methods use the known miRNA-disease associations in calculating miRNA-miRNA similarity [1926, 29, 30]. In these methods, miRNA-miRNA similarity measurement is completed by disease-disease measurement and known experimental miRNA-disease associations. However, these methods are restricted by the possible overestimation of the predictive accuracy. This drawback may be due to the fact that cross-validation experiments are not correctly performed, and the miRNA-miRNA similarity depends heavily on the known miRNA-disease associations. These methods fail to remove known information of the tested element for similarity calculation at each round of cross-validation. Other limitations include the inability to predict isolated miRNA and lack of disease semantic similarity [36]. An isolated miRNA signifies that a miRNA has no associated disease; that is, no relationship exists between this isolated miRNA and diseases. Thus, miRNA-disease associations cannot be used to calculate miRNA similarity of an isolated miRNA. Instead of using experimentally verified miRNA-disease associations, other computational methods calculate miRNA similarity using the interaction of miRNAs with other biomolecules [31, 3638]. For example, Liu et al. [36] calculated miRNA similarity using the miRNA-target gene and miRNA-long noncoding RNA associations. However, the performances of these methods are deficient.

Based on the assumption that miRNAs with similar functions are normally associated with phenotypically similar diseases and vice versa, we solved the aforementioned limitations by establishing a novel computational method based on SimRank [39] and density-based clustering [40] recommender model for miRNA-disease association prediction (SRMDAP). The SRMDAP constructs miRNA similarity subnetwork using SimRank to calculate network topological similarity between miRNAs based on miRNA-message RNA (mRNA) interaction network. The disease similarity subnetwork is similar to miRNA similarity subnetwork and is based on the disease-gene network. Then, the SRMDAP uses the density-based clustering recommender model to integrate miRNA similarity subnetwork, disease similarity subnetwork, and experimentally verified miRNA-disease associations to predict potential associations between miRNAs and diseases. In this work, leave-one-out cross-validation experiment and case studies about two important cancers, namely, kidney and colorectal neoplasms, have indicated the excellent predictive performance of SRMDAP. The SRMDAP can also predict isolated diseases and isolated miRNAs.

2. Methods

2.1. Data

Three datasets were used in our approach. Experimentally verified miRNA-mRNA interactions were downloaded from the miRTarBase database to construct the miRNA similarity network [41] (http://mirtarbase.mbc.nctu.edu.tw/, Release 6.0: Sept-15-2015). Meanwhile, experimentally verified disease-related mRNAs were downloaded from the DisGeNET database [42] (http://www.disgenet.org/web/DisGeNET/menu/home, DisGeNET 4.0: October 2016) to construct a disease similarity network. Experimentally verified miRNA-disease network was downloaded from the HMDD v2.0 database [43] (http://www.cuilab.cn/hmdd, Jun-14-2014 Version).

2.2. Data Processing
2.2.1. MiRNA-Disease Association Network

The disease names of the DisGeNET and HMDD databases were mapped to the MeSH description (https://www.ncbi.nlm.nih.gov/mesh). Diseases in the HMDD database not found in the DisGeNET database and repeated associations were removed. Then, we obtained 5,048 known miRNA-disease associations, including 475 miRNAs and 334 diseases, as the benchmark dataset. Formally, we denoted the miRNA set as and the disease set as . The variables and denote the number of miRNAs and diseases, respectively. Matrix represents the adjacency matrix of miRNA-disease associations. denotes miRNA associated with disease ; otherwise, .

2.2.2. MiRNA Similarity Network

SimRank [39] was employed to calculate the disease and miRNA similarities based on miRNA-mRNA interaction network and disease-related mRNA associations. SimRank is a model to measure the degree of similarity between any two objects on the basis of the information of the topology graph, which has been successfully applied to web page ranking [44], recommender systems [45], outlier detection [46], network graph clustering [47], and approximate query processing [48], among others. The SimRank model defines the similarity of two nodes based on a recursive thinking. When other nodes pointing to the two nodes are similar, then the two nodes are similar. SimRank defines the similarity of two nodes as follows:where is the similarity between nodes and and is a decay factor. denotes all node sets that point to node , and is the number of elements of .

The adjacency matrix of the miRNA-mRNA interaction bipartite network is represented as , where in row and column is 1 if miRNA is associated with mRNA , and 0 otherwise. The matrix is normalized by column to determine the matrix , and the similarity matrix can be calculated as follows:where is the miRNA similarity matrix and is the similarity between miRNAs and . is the transpose matrix of , is a decay factor, and is the unit matrix.

2.2.3. Disease Similarity Network

We can obtain the similarity matrix of diseases using the same process in determining the miRNA similarity network. The adjacency matrix of the disease-gene network is represented as , where in row and column is 1 if the disease is associated with gene , and 0 otherwise. Matrix is normalized by column to obtain the matrix , and the similarity matrix can be calculated as follows:where is the disease similarity matrix and is the similarity between diseases and . is the transpose matrix of , is a decay factor, and is the unit matrix. A simple example of constructing miRNA and disease similarity is provided in Figure 1.

Figure 1: Illustration of the process of constructing miRNA and disease similarity network and predicting miRNA-disease associations. (a) A simple example of constructing similarity of miRNAs 1 and 2 is shown in (a). (b) A simple example of constructing similarity of diseases 1 and 2 is shown in (b). (c) The known miRNA-disease associations. (d) Predicting miRNA-disease associations through density-based recommender model by integrating miRNA similarity network, disease similarity network, and the known miRNA-disease associations.
2.3. Prediction Method

In this work, a density-based clustering recommendation model is developed based on the miRNA and disease similarity network to predict potential miRNA-disease associations. The flowchart of SRMDAP is shown in Figure 2.

Figure 2: The flowchart of SRMDAP.

For example, the calculation for predicting the association of miRNA and disease is as follows. First, given the assumption that miRNAs with similar functions are normally associated with phenotypically similar diseases and vice versa [10, 49], the closer the neighbors of miRNA are to disease , the closer miRNA will be to disease in the miRNA similarity network. Using miRNA as cluster center and greedy method, we added the most similar neighbor nodes to form new clusters, until the cluster density no longer increased. The cluster density of cluster is defined as follows:where and denote the sum of the weights of inner and external sides of cluster , respectively [50]. Item is a penalty item, and is the number of members of cluster . In our experiments, we set . Then, using , which denotes the closest neighbors of miRNA , the predictive score between miRNA and disease is calculated as follows:where is the predictive score between miRNA and disease calculated by the neighbors of miRNA ; and is the similarity of miRNA and miRNA ; and is the association between miRNA and disease . Equation (5) calculates the predictive score based on the nearest neighbors of miRNA and the associations between the neighbors and disease .

Second, in the same way, based on the assumption that diseases with similar functions often have similar semantic descriptions and vice versa [20], the closer the neighbors of disease are to miRNA , the closer the disease will be to miRNA in the disease similarity network; the predictive score between miRNA and disease is calculated as follows:where is the closest neighbor to disease .

Finally, the final predictive score between miRNA and disease is calculated by integrating and as follows:where is an integration parameter to balance the contributions from miRNA and disease similarities. in row and column is the prediction value of miRNA to disease .

When the predictive score between isolated disease and miRNA is calculated, all associations of isolated disease are ignored, and the contribution of the neighbors of miRNA to the predictor is zero. Thus, equals 0. The final predictive score between isolated disease and miRNA is , which is the predictive score between the similarity neighbors of disease and miRNA . Therefore, SRMDAP can predict associated miRNAs for an isolated disease. Similarly, when the predictive score between new miRNA and disease is calculated, is the predictive score between the similarity neighbors of miRNA and disease , and only is used as the predictive score between the new miRNA and related diseases.

To explore for a suitable value, we tested different values from 0.1 to 0.9 and calculated the average area under the curve (AUC) in the framework of leave-one-out cross-validation. The results showed that SRMDAP achieved the highest average AUCs when was 0.4 (Figure 3).

Figure 3: Average AUCs affected by value. When β is 0.4, average AUC is 0.8838 and SRMDAP achieves the best performance.

3. Results

3.1. Characteristics of the miRNA-Disease Association Network

In our study, 5,048 known miRNA-disease associations consisting of 475 miRNAs and 334 diseases were included. To comprehensively illustrate the known miRNA-disease association network, we demonstrated the characteristics of known miRNA-disease association network in Table 1. The degree of a disease (or miRNA) represented the neighboring miRNAs (or disease) related to it. The average degrees of the disease and miRNAs were 15.11 and 10.63, respectively. The degree of distribution of diseases and miRNAs of the known miRNA-disease association network (Figure 4) revealed a power-law distribution. Most of the miRNAs and diseases presented a degree of 1. Hepatocellular carcinoma showed that the maximum degree, that is, 208 miRNAs, was related to this malignancy. Meanwhile hsa-mir-21 showed the maximum degree, with 112 diseases related to this miRNA.

Table 1: Global characteristic of the known miRNA-disease association network.
Figure 4: Disease degree distribution and miRNAs degree distribution in the known miRNA-disease association network. (a) shows the bar diagram of disease degree. (b) shows the bar diagram of miRNAs degree.
3.2. Performance Evaluation of SRMDAP

We implemented the leave-one-out cross-validation (LOOCV) on the known miRNA-disease associations to evaluate the predictive performance of the SRMDAP. For a given disease , each known association between miRNA and disease was ignored in turn as a test sample, and other known associations between miRNAs and disease were considered as a training set. The remaining miRNAs without evidence to show their relation to disease composed the candidate miRNA set. We calculated the relevance score of these candidate miRNAs with disease and ranked them by their scores. If the rank exceeded a given threshold, then the SRMDAP model successfully predicted this miRNA-disease association. The threshold was varied to draw the receiver operating characteristic (ROC) curve, and the score of the AUC was calculated to demonstrate the predictive performance. The ROC plots the relationship between the true positive rate (TPR, sensitivity) and the false positive rate (FPR, 1 − specificity) at different thresholds. Sensitivity represents the percentage of test miRNA-disease associations with ranking above a given threshold. Meanwhile, specificity represents the percentage of miRNA-disease associations below the threshold.

The TPR and FPR were calculated as follows:where TP, FP, TN, and FN indicate true positive, false positive, true negative, and false negative, respectively. Given a threshold, TP and FP are the number of known and unknown associations above the threshold, respectively. TN and FP are the number of unknown and known associations below the threshold, respectively. The AUC value of 1 indicates perfect performance of the prediction method. Moreover, an AUC value of 0.5 implies the random performance of the prediction method.

To our knowledge, RLSMDA [30], KATZ [24], and Liu et al.’s method [36] are three the-state-of-the-art computation methods that predict miRNA-disease associations. In our work, we compared SRMDAP with these methods and implemented a LOOCV for the three methods. The SRMDAP achieved the highest AUC of 0.8838 when . When optimal parameters were selected as described by the authors, AUC values corresponding to RLSMDA, KATZ, and Liu’s method were 0.8584, 0.8522, and 0.7983, respectively. Comparative results of overall ROC curves and AUCs of all methods are shown in Figure 5.

Figure 5: Method comparison: comparison between SRMDAP, RLSMDA, KATZ, and Liu’s method in terms of ROC curve and AUC.

To obtain a reliable judgment, we tested 18 human diseases associated with at least 70 miRNAs, because diseases related to a few miRNAs were not sufficient to evaluate the performance of the prediction methods. Table 2 shows that the SRMDAP achieved the highest AUC of 0.8874 with lung neoplasms and lowest AUC of 0.7367 with renal cell carcinoma. The average AUC value for the 18 diseases was 0.8056. The average AUC values for the 18 diseases obtained from RLAMDA, KATA, and Liu’s method were 0.6671, 0.6901, and 0.5178, respectively. The average AUC achieved by SRMDAP was 14%, 12%, and 29% higher than those of the other three methods, respectively. The AUC values of the SRMDAP for the 18 diseases were all higher than those of RLSMDA, KATZ, and Liu’s method. These facts indicated that the prediction performance of SRMDAP was superior to RLSMDA, KATZ, and Liu’s method.

Table 2: Prediction result of SRMDAP and other methods for LOOCV.
3.3. Case Studies

To further evaluate the SRMDAP’s ability to discover potential miRNA-disease associations, we selected two important diseases (kidney neoplasms and colorectal neoplasms) as case studies. We analyzed the top 50 candidates in detail. Prediction results were supported by dbDEMC [15] database and literature.

Kidney neoplasm, which forms in tissues of the kidneys, is one of the top 10 cancer killers. This malignancy is still difficult to diagnose and treat. Based on 2010–2014 cases and deaths, the annual number of new cases of kidney and renal pelvis cancer was 15.6 per 100,000 persons. The five-year survival rate in the United State is 74.1% [51]. MiRNAs showing altered expression in the kidney are promising biomarkers for diagnosis. For example, miR-141 and miR-200b are underexpressed in renal cell carcinoma (a kidney neoplasm type) from normal kidney and oncocytoma in tissue samples. The miRNA expression profiles of miR-141 or miR-200b might provide an ancillary tool for the correct discrimination of kidney neoplasms [52]. Candidate miRNAs were ranked based on the SRMDAP. The top 50 potential miRNAs associated with kidney neoplasms and evidence for the associations with kidney are listed in Table 3. Among the top 50 predicted candidates, 49 miRNA have been confirmed by dbDEMC, and only hsa-mir-7 is not confirmed by dbDEMC. However, downregulation of miR-7 with synthesized inhibitor inhibited cell migration in vitro, suppressed cell proliferation, and induced renal cancer cell apoptosis. Thus, miR-7 could be characterized as an oncogene in renal cell carcinoma [53].

Table 3: The top 50 potential kidney neoplasms-related miRNAs predicted by SRMDAP and the confirmation of these associations. Forty-nine of the top 50 kidney neoplasms-related miRNAs have been confirmed by dbDEMC. Hsa-mir-7 ranked 48th has been confirmed by the literature (PMID: 23793934).

Colorectal neoplasm is the third most common cancer and the fourth most common cancer-related cause of death worldwide, with more than 1.2 million new cases and 600,000 deaths annually [54]. MiRNAs can be used as useful biomarkers for colorectal cancer diagnosis, prognosis, and prediction of treatment response because of their several unique characteristics [55]. For example, serum miR-21, miR-29a, and miR-125b levels could discriminate early colorectal neoplasms patients from healthy controls [56]. The top 50 potential miRNAs associated with colorectal neoplasms and evidence for associations with kidney are listed in Table 4. Among the top 50 predicted candidates, 49 miRNAs were confirmed by dbDEMC. Only 1 miRNA (hsa-mir-663a) was not confirmed in the dbDEMC.

Table 4: The top 50 potential colorectal neoplasms-related miRNAs predicted by SRMDAP and the confirmation of these associations. Forty-nine of the 50 colorectal neoplasms-related miRNAs have been confirmed by dbDEMC. Only 1 miRNA (hsa-mir-663a is ranked 30th) is unconfirmed.
3.4. Prediction of Isolated Diseases and Isolated miRNAs

An isolated disease signifies a disease without any known related miRNAs or newly discovered disease. When we tested the capability of SRMDAP to predict isolated diseases, we removed all known verified miRNAs, which have been shown to be related to the predicted disease. This operation was performed to confirm that we only used the similarity information of other miRNAs-related diseases to predict candidate miRNAs associated with the given disease. Then, these candidate miRNAs were ranked according to their scores. The average AUC of SRMDAP to predict isolated disease was 0.7990. For colorectal neoplasms, we removed 143 known miRNA related to colorectal neoplasms and ranked candidate miRNAs based on the predictive result of SRMDAP. Among the top 50 predicted candidates, 49 miRNAs have been confirmed by dbDEMC. The potential candidate hsa-mir-494 is supported by the literature [PMID: 25270723]. However, hsa-mir-494 is an independent prognostic marker for colorectal neoplasm patients, and this miRNA promotes cell migration and invasion in colorectal neoplasms by directly targeting PTEN [57]. The predicted results of colorectal neoplasms are listed in Table 5.

Table 5: The top 50 potential isolated diseases predicted of colorectal neoplasms. Forty-nine of the top 50 colorectal neoplasms-related miRNAs have been confirmed by dbDEMC. miRNA hsa-mir-494, which is ranked 45th, has been confirmed by literature.

As previously stated, an isolated miRNA is a miRNA without any known related disease, such as newly discovered miRNAs. The known verified disease-miRNA associations related to predictive miRNAs were removed to demonstrate the ability of SRMDAP to predict miRNAs without any known related disease. This procedure ensures the use of only known disease-miRNA associations and similarity information of other miRNAs to predict candidate disease. Then, these candidate diseases were ranked according to their scores. The average AUC of the SRMDAP to predict isolated miRNAs was 0.8464. The predicted results of hsa-mir-106b are listed in Table 6. For hsa-mir-106b, we removed 31 related diseases associations and ranked candidate diseases based on the predictive result of the SRMDAP. Among the top 10 predicted candidates, all diseases have been confirmed by dbDEMC, miR2Disease, or HMDD. These results demonstrate that the SRMDAP may be recommended to predict isolated diseases and miRNAs.

Table 6: The top 10 potential isolated miRNA predicted of hsa-mir-106b. All of the top 10 hsa-mir-106b related diseases have been confirmed by dbDEMC, miR2Disease, or HMDD databases.

4. Discussion

The success of SRMDAP could largely be attributed to several factors. First, SRMDAP is a novel method to predict human miRNA-disease associations. This similarity measurement method does not depend on experimentally supported miRNA-disease associations to calculate the functional similarity of miRNAs and diseases. Thus, overestimation of the predictive accuracy was avoided. In SRMDAP, we proposed a density-based recommender model to integrate miRNA similarity subnetwork and disease similarity subnetwork using experimentally verified miRNA-disease associations. Second, SRMDAP incorporates miRNA-mRNA information, disease-gene information, and experimentally verified miRNA-disease associations. This characteristic improved prediction accuracy. Third, only one parameter was used to balance the contributions from miRNA similarity subnetwork and disease similarity subnetwork, and this parameter was easy to adjust. Fourth, LOOCV experiment and case studies about kidney and colorectal neoplasms demonstrated that SRMDAP had excellent predictive performance. Finally, the SRMDAP could predict isolated diseases and isolated miRNAs for disease similarity, and miRNA similarity was obtained independently on the known miRNA-disease associations.

Although SRMDAP contains several innovative concepts, this process has several limitations in its current version. First, a similarity measurement is of vital importance. Hence, miRNA similarity measurement should use more interaction information of miRNAs with other biomolecules. Disease similarity measurement should consider not only functional similarities but also semantic similarities. A fusion of more information sources can benefit the similarity measurement. Second, considering that the SRMDAP is constructed on the basis of known miRNA-disease associations, the performance of SRMDAP can be improved by obtaining more available experimentally verified miRNA-disease associations.

5. Conclusions

Identifying most promising miRNA-disease associations facilitates biological experimentation to save time and cost. In this work, we developed SRMDAP to predict miRNA-disease associations using established miRNA similarity subnetwork and disease similarity subnetwork based on the SimRank and density-based clustering recommender model. We integrated these similarity networks with known experimentally verified miRNA-disease associations using the density-based clustering recommender model. SRMDAP obtained average AUC of 0.8838 in LOOCV. Case studies of kidney and colorectal neoplasms were evaluated, and 49 miRNAs in the top 50 miRNAs were confirmed. SRMDAP also performed well in predicting isolated diseases and miRNAs. For colorectal neoplasms and hsa-mir-106b, all top 50 predicted miRNAs and all top 10 predicted diseases have been confirmed by dbDEMC, miRCancr, HMDD, or the literature. These results demonstrated that SRMDAP has superior performance over the other tested processes.

Conflicts of Interest

There are no conflicts of interest to declare.

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant no. 61672223 and the Natural Science Foundation of Hunan Provincial under Grant no. 2016jj4029.

References

  1. R. C. Lee, R. L. Feinbaum, and V. Ambros, “The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14,” Cell, vol. 75, no. 5, pp. 843–854, 1993. View at Publisher · View at Google Scholar · View at Scopus
  2. B. J. Reinhart, F. J. Slack, M. Basson et al., “The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans,” Nature, vol. 403, no. 6772, pp. 901–906, 2000. View at Publisher · View at Google Scholar · View at Scopus
  3. D. P. Bartel, “MicroRNAs: genomics, biogenesis, mechanism, and function,” Cell, vol. 116, no. 2, pp. 281–297, 2004. View at Publisher · View at Google Scholar · View at Scopus
  4. A. M. Cheng, M. W. Byrom, J. Shelton, and L. P. Ford, “Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis,” Nucleic Acids Research, vol. 33, no. 4, pp. 1290–1297, 2005. View at Publisher · View at Google Scholar · View at Scopus
  5. E. A. Miska, “How microRNAs control cell division, differentiation and death,” Current Opinion in Genetics & Development, vol. 15, no. 5, pp. 563–568, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. X. Karp and V. Ambros, “Encountering microRNAs in cell fate signaling,” Science, vol. 310, no. 5752, pp. 1288-1289, 2005. View at Publisher · View at Google Scholar · View at Scopus
  7. G. A. Calin and C. M. Croce, “MicroRNA signatures in human cancers,” Nature Reviews Cancer, vol. 6, no. 11, pp. 857–866, 2006. View at Publisher · View at Google Scholar · View at Scopus
  8. I. Alvarez-Garcia and E. A. Miska, “MicroRNA functions in animal development and human disease,” Development, vol. 132, no. 21, pp. 4653–4662, 2005. View at Publisher · View at Google Scholar · View at Scopus
  9. L. He, J. M. Thomson, M. T. Hemann et al., “A microRNA polycistron as a potential human oncogene,” Nature, vol. 435, no. 7043, pp. 828–833, 2005. View at Publisher · View at Google Scholar · View at Scopus
  10. M. Lu, Q. Zhang, M. Deng et al., “An analysis of human microRNA and disease associations,” PLoS ONE, vol. 3, no. 10, Article ID e3420, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. Q. Jiang, Y. Wang, Y. Hao et al., “miR2Disease: a manually curated database for microRNA deregulation in human disease,” Nucleic Acids Research, vol. 37, no. 1, pp. D98–D104, 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. Z. Yang, F. Ren, C. Liu et al., “dbDEMC: a database of differentially expressed miRNAs in human cancers,” BMC Genomics, vol. 11, supplement 4, article S5, 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. B. Xie, Q. Ding, H. Han, and D. Wu, “miRCancer: a microRNA-cancer association database constructed by text mining on literature,” Bioinformatics, vol. 29, no. 5, pp. 638–644, 2013. View at Publisher · View at Google Scholar · View at Scopus
  14. A. Ruepp, A. Kowarsch, D. Schmidl et al., “PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes,” Genome Biology, vol. 11, no. 1, article R6, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. Z. Yang, L. Wu, A. Wang et al., “DbDEMC 2.0: Updated database of differentially expressed miRNAs in human cancers,” Nucleic Acids Research, vol. 45, no. 1, pp. D812–D818, 2017. View at Publisher · View at Google Scholar · View at Scopus
  16. Q. Zou, J. Li, L. Song, X. Zeng, and G. Wang, “Similarity computation strategies in the microRNA-disease network: a survey,” Briefings in Functional Genomics, vol. 15, no. 1, pp. 55–64, 2016. View at Publisher · View at Google Scholar
  17. X. Zeng, X. Zhang, and Q. Zou, “Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks,” Briefings in Bioinformatics, vol. 17, no. 2, pp. 193–203, 2016. View at Publisher · View at Google Scholar
  18. X. Chen, D. Xie, Q. Zhao, and Z.-H. You, “MicroRNAs and complex diseases: from experimental results to computational models,” Briefings in Bioinformatics, bbx130, 2017. View at Publisher · View at Google Scholar
  19. Q. Jiang, Y. Hao, G. Wang et al., “Prioritization of disease microRNAs through a human phenome-microRNAome network,” BMC Systems Biology, vol. 4, supplement 1, article S2, 2010. View at Publisher · View at Google Scholar · View at Scopus
  20. D. Wang, J. Wang, M. Lu, F. Song, and Q. Cui, “Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases,” Bioinformatics, vol. 26, no. 13, pp. 1644–1650, 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. X. Chen, M. X. Liu, and G. Y. Yan, “RWRMDA: predicting novel human microRNA-disease associations,” Molecular BioSystems, vol. 8, no. 10, pp. 2792–2798, 2012. View at Publisher · View at Google Scholar · View at Scopus
  22. P. Xuan, K. Han, and M. Guo, “Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors,” PLoS ONE, vol. 8, no. 8, Article ID e70204, 2013. View at Publisher · View at Google Scholar
  23. X. Chen, C. C. Yan, X. Zhang et al., “WBSMDA: within and between score for MiRNA-disease association prediction,” Scientific Reports, vol. 6, article 21106, 2016. View at Publisher · View at Google Scholar
  24. Q. Zou, J. Li, Q. Hong et al., “Prediction of microRNA-disease associations based on social network analysis methods,” BioMed Research International, vol. 2015, Article ID 810514, 9 pages, 2015. View at Publisher · View at Google Scholar · View at Scopus
  25. C. Gu, B. Liao, X. Li, and K. Li, “Network consistency projection for human miRNA-disease associations inference,” Scientific Reports, vol. 6, Article ID 36054, 2016. View at Publisher · View at Google Scholar · View at Scopus
  26. X. Li, Y. Lin, and C. Gu, “A network similarity integration method for predicting microRNA-disease associations,” RSC Advances, vol. 7, no. 51, pp. 32216–32224, 2017. View at Publisher · View at Google Scholar · View at Scopus
  27. Z.-H. You, Z.-A. Huang, Z. Zhu et al., “PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction,” PLoS Computational Biology, vol. 13, no. 3, Article ID e1005455, 2017. View at Publisher · View at Google Scholar
  28. C. Gu, B. Liao, X. Li et al., “Network-based collaborative filtering recommendation model for inferring novel disease-related miRNAs,” RSC Advances, vol. 7, no. 71, pp. 44961–44971, 2017. View at Publisher · View at Google Scholar · View at Scopus
  29. Q. Jiang, G. Wang, S. Jin, Y. Li, and Y. Wang, “Predicting human microRNA-disease associations based on support vector machine,” International Journal of Data Mining and Bioinformatics, vol. 8, no. 3, pp. 282–293, 2013. View at Publisher · View at Google Scholar · View at Scopus
  30. X. Chen and G.-Y. Yan, “Semi-supervised learning for potential human microRNA-disease associations inference,” Scientific Reports, vol. 4, article 5501, 2014. View at Publisher · View at Google Scholar · View at Scopus
  31. J. Xu, C.-X. Li, J.-Y. Lv et al., “Prioritizing candidate disease miRNAs by topological features in the miRNA target-dysregulated network: case study of prostate cancer,” Molecular Cancer Therapeutics, vol. 10, no. 10, pp. 1857–1866, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. J.-Q. Li, Z.-H. Rong, X. Chen, G.-Y. Yan, and Z.-H. You, “MCMDA: matrix completion for MiRNA-disease association prediction,” Oncotarget , vol. 8, pp. 21187–21199, 2017. View at Publisher · View at Google Scholar
  33. X. Chen, C. Clarence Yan, X. Zhang et al., “RBMMMDA: predicting multiple types of disease-microRNA associations,” Scientific Reports, vol. 5, article 13877, 2015. View at Publisher · View at Google Scholar · View at Scopus
  34. X. Chen, Y. Gong, D. H. Zhang, Z. H. You, and Z. W. Li, “DRMDA: deep representations-based miRNAdisease association prediction,” Journal of Cellular and Molecular Medicine, vol. 22, no. 1, pp. 472–485, 2018. View at Publisher · View at Google Scholar
  35. Q. Xiao, J. Luo, C. Liang, J. Cai, and P. Ding, “A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations,” Bioinformatics, vol. 34, no. 2, pp. 239–248, 2018. View at Publisher · View at Google Scholar
  36. Y. Liu, X. Zeng, Z. He, and Q. Zou, “Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 14, no. 4, pp. 905–915, 2017. View at Publisher · View at Google Scholar
  37. Q. Jiang, G. Wang, and Y. Wang, “An approach for prioritizing disease-related microRNAs based on genomic data integration,” in Proceedings of the 3rd International Conference on BioMedical Engineering and Informatics (BMEI '10), pp. 2270–2274, IEEE, October 2010. View at Publisher · View at Google Scholar · View at Scopus
  38. H. Shi, G. Zhang, M. Zhou et al., “Integration of multiple genomic and phenotype data to infer novel miRNA-disease associations,” PLoS ONE, vol. 11, no. 2, Article ID e0148521, 2016. View at Publisher · View at Google Scholar
  39. G. Jeh and J. Widom, “SimRank: a measure of structural-context similarity,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543, July 2002.
  40. M. Ester, H.-P. Kriegel, and X. Xu, “Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169–194, 1998. View at Google Scholar
  41. S. D. Hsu, Y. T. Tseng, S. Shrestha et al., “miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions,” Nucleic Acids Research, vol. 42, no. D1, pp. D78–D85, 2014. View at Publisher · View at Google Scholar
  42. J. Piñero, N. Queralt-Rosinach, À. Bravo et al., “DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes,” Database, vol. 2015, Article ID bav028, 2015. View at Publisher · View at Google Scholar · View at Scopus
  43. Y. Li, C. Qiu, J. Tu et al., “HMDD v2.0: a database for experimentally supported human microRNA and disease associations,” Nucleic Acids Research, vol. 42, no. D1, pp. D1070–D1074, 2014. View at Publisher · View at Google Scholar
  44. Y. Pi, H. Peng, S. Zhou, and Z. Zhang, “A scalable approach to column-based low-rank matrix approximation,” in Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 1600–1606, August 2013.
  45. R. Meymandpour and J. G. Davis, “A semantic similarity measure for linked data: An information content-based approach,” Knowledge-Based Systems, vol. 109, pp. 276–293, 2016. View at Publisher · View at Google Scholar · View at Scopus
  46. J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han, “On community outliers and their efficient detection in information networks,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 813–822, ACM, Washington, DC, USA, July 2010.
  47. H. Cheng, Y. Zhou, and J. X. Yu, “Clustering large attributed graphs: a balance between structural and attribute similarities,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 5, no. 2, article 12, 33 pages, 2011. View at Google Scholar
  48. P. Lee, L. V. S. Lakshmanan, and J. X. Yu, “On top-k structural similarity search,” in Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE '12), pp. 774–785, April 2012. View at Publisher · View at Google Scholar · View at Scopus
  49. S. Bandyopadhyay, R. Mitra, U. Maulik, and M. Q. Zhang, “Development of the human cancer microRNA network,” Silence, vol. 1, no. 1, article 6, 2010. View at Publisher · View at Google Scholar · View at Scopus
  50. T. Nepusz, H. Yu, and A. Paccanaro, “Detecting overlapping protein complexes in protein-protein interaction networks,” Nature Methods, vol. 9, no. 5, pp. 471-472, 2012. View at Publisher · View at Google Scholar · View at Scopus
  51. SEER Stat Fact Sheets: Kidney and Renal Pelvis Cancer, National Cancer Institue.
  52. R. M. Silva-Santos, P. Costa-Pinheiro, A. Luis et al., “microRNA profile: a promising ancillary tool for accurate renal cell tumour diagnosis,” British Journal of Cancer, vol. 109, no. 10, pp. 2646–2653, 2013. View at Publisher · View at Google Scholar · View at Scopus
  53. Z. Yu, L. Ni, D. Chen et al., “Identification of miR-7 as an oncogene in renal cell carcinoma,” Journal of Molecular Histology, vol. 44, no. 6, pp. 669–677, 2013. View at Publisher · View at Google Scholar · View at Scopus
  54. H. Brenner, M. Kloor, and C. P. Pox, “Colorectal cancer,” The Lancet, vol. 383, no. 9927, pp. 1490–1502, 2014. View at Publisher · View at Google Scholar · View at Scopus
  55. Y. Okugawa, W. M. Grady, and A. Goel, “Epigenetic alterations in colorectal cancer: emerging biomarkers,” Gastroenterology, vol. 149, no. 5, pp. 1204–1225.e12, 2015. View at Publisher · View at Google Scholar · View at Scopus
  56. A. Yamada, T. Horimatsu, Y. Okugawa et al., “Serum MIR-21, MIR-29a, and MIR-125b are promising biomarkers for the early detection of colorectal neoplasia,” Clinical Cancer Research, vol. 21, no. 18, pp. 4234–4242, 2015. View at Publisher · View at Google Scholar · View at Scopus
  57. H.-B. Sun, X. Chen, H. Ji et al., “MiR-494 is an independent prognostic factor and promotes cell migration and invasion in colorectal cancer by directly targeting PTEN,” International Journal of Oncology, vol. 45, no. 6, pp. 2486–2494, 2014. View at Publisher · View at Google Scholar · View at Scopus