Computational and Mathematical Methods in Medicine

Volume 2019, Article ID 5145646, 10 pages

https://doi.org/10.1155/2019/5145646

## A Novel Neighborhood-Based Computational Model for Potential MiRNA-Disease Association Prediction

^{1}Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China^{2}College of Computer Engineering and Applied Mathematics, Changsha University, Changsha 410001, China

Correspondence should be addressed to Xiang Feng; nc.ude.utx@gnaixgnef and Lei Wang; nc.ude.utx@ielgnaw

Received 23 September 2018; Accepted 11 December 2018; Published 17 January 2019

Academic Editor: Nadia A. Chuzhanova

Copyright © 2019 Yang Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In recent years, more and more studies have shown that miRNAs can affect a variety of biological processes. It is important for disease prevention, treatment, diagnosis, and prognosis to study the relationships between human diseases and miRNAs. However, traditional experimental methods are time-consuming and labour-intensive. Hence, in this paper, a novel neighborhood-based computational model called NBMDA is proposed for predicting potential miRNA-disease associations. Due to the fact that known miRNA-disease associations are very rare and many diseases (or miRNAs) are associated with only one or a few miRNAs (or diseases), in NBMDA, the *K*-nearest neighbor (KNN) method is utilized as a recommendation algorithm based on known miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity for miRNAs and diseases to improve its prediction accuracy. And simulation results demonstrate that NBMDA can effectively infer miRNA-disease associations with higher accuracy compared with previous state-of-the-art methods. Moreover, independent case studies of esophageal neoplasms, breast neoplasms and colon neoplasms are further implemented, and as a result, there are 47, 48, and 48 out of the top 50 predicted miRNAs having been successfully confirmed by the previously published literatures, which also indicates that NBMDA can be utilized as a powerful tool to study the relationships between miRNAs and diseases.

#### 1. Introduction

MiRNAs are one kind of small RNAs with the length of about 20–24 nucleotides that can regulate the expression of posttranscriptional genes, and each miRNA may have multiple target genes that can be regulated by multiple miRNAs as well [1–4]. Recently, more and more studies have shown that miRNAs play important roles in many physiological processes of the human body such as cell growth [5], proliferation [6], differentiation [7], immune response [8] embryonic development [5], and so on. In addition, emerging evidences have implied as well that miRNAs can affect the occurrence and development of various tumors by regulating the signaling pathways in which their target genes are involved and play a role similar to oncogenes or tumor suppressor genes [9]. For example, miR-203 can inhibit the formation of esophageal tumors [10], miR-328 is a key oncogene in hepatocellular carcinoma, and its expression level will be significantly upregulated and downregulated in hepatocellular carcinoma tissues [11]. MiR-143 and miR-145 are expressed at low levels in esophageal cancer and gastric cancer, which mean that the downregulation of these two kinds of miRNAs can be considered as a potential biomarker for related tumors [12]. Hence, the exploration of potential relationships between miRNAs and diseases will have important significance for disease prevention, treatment, diagnosis, and prognosis [13–15].

Up to now, many human miRNA-disease association databases such as HMDD [16] and miR2Disease [17] have been established, in which the stored associations are mainly collected from previous biological experiments. And with the rapid growth of known biological information associated with miRNAs and diseases, known miRNA-disease associations are becoming far from meeting the needs of modern medical researches, since traditional methods of detecting miRNA-disease associations (e.g., PCR [18] and northern blotting [19]) are very time-consuming and labour-intensive. Therefore, in recent years, a large number of computational models have been proposed [20–22]. For instance, based on the assumption that similar miRNAs tend to be related or unrelated with similar diseases [23], Zeng et al. [24] proposed a model to infer potential associations between miRNAs and diseases based on the miRNA similarity, disease similarity, and known miRNA-disease associations. Yu et al. [25] proposed a model called KATZMDA to predict potential miRNA-disease associations by measuring the number and length of paths existing between a pair of miRNA-disease nodes in the miRNA-disease association network. In addition, considering that more and more biological databases have been established by far, it is obvious that the prediction performance would be improved more effectively, if more information collected from more databases are integrated to predict potential miRNA-disease associations. For example, Yu et al. [26] proposed a model called NBCLDA to infer potential associations between lncRNAs and diseases through integrating known miRNA-disease associations, miRNA-lncRNA associations, lncRNA-disease associations, gene-lncRNA associations, gene-disease associations, and gene-miRNA associations. Moreover, for the past few years, with machine learning gradually becoming a hot topic in many fields, some machine learning algorithms have been adopted to predict miRNA-disease associations as well. For instance, Zhang et al. [27] proposed a semisupervised model to infer potential miRNA-disease associations by implementing label propagation algorithms on the miRNA-disease association network. Chen et al. [28] proposed a computational model called SDMMDA based on super-diseases and miRNAs to predict potential miRNA-disease associations, in which as many as possible similar diseases or miRNAs would be clustered into super-diseases or super-miRNAs first, and then the Naive Bayesian scheme was utilized to infer potential associations between miRNAs and diseases. Luo et al. [29] proposed a semisupervised method called KRLSM to identify potential miRNA-disease associations, in which, due to the sparsity of known miRNA-disease associations, different omics data were integrated to improve the prediction accuracy of KRLSM first, and then, the semisupervised classifier of regularized least squares was adopted to calculate the potential probabilities of associations between miRNAs and diseases.

In this paper, different from above mentioned models, a novel neighborhood-based computational model called NBMDA was developed to infer potential miRNA-disease associations, in which, due to the fact that known miRNA-disease associations are quite sparse and there are a variety of diseases (or miRNAs) associating with only one or few miRNAs (or diseases), the *K*-nearest neighbor (KNN) method would be utilized as a recommendation algorithm to improve the prediction accuracy of NBMDA first, and then, according to two kinds of newly constructed miRNA-disease association networks and the original miRNA-disease association network, the possibilities of potential associations between miRNAs and diseases would be calculated based on the concept of common neighbors. Finally, in order to evaluate the prediction performance of NBMDA, global leave-one-out cross validation (global LOOCV) and 5-fold cross validation (5-fold CV) were implemented simultaneously, and simulation results demonstrated that NBMDA could achieve reliable AUCs of 0.8983/0.8153 and 0.8975 under the global LOOCV and 5-fold CV, respectively, which were higher than several state-of-the-art computational models. In addition, we further implemented the case studies of esophageal neoplasms, breast neoplasms and colon neoplasms on NBMDA, and simulation results illustrated that there were 47, 48, and 48 out of the top 50 predicted miRNAs having been successfully confirmed by the previously published literatures separately, which also demonstrated that NBMDA has good performance in predicting potential miRNA-disease associations. Hence, it is obvious that NBMDA can be further applied to predict both diseases without any known related miRNAs and miRNAs without any known related diseases.

#### 2. Materials and Methods

##### 2.1. Human miRNA-Disease Associations

In order to evaluate the performance of our proposed NBMDA, we use two datasets. The first dataset (denoted as dataset1) was downloaded from the HMDD v2.0 database, which consisted of 5430 experimentally validated human miRNA-disease associations including 495 different miRNAs and 383 different diseases [16]. The second dataset (denoted as dataset2) was downloaded from the miR2Disease database and the HMDD database, which consisted of high-quality experimentally verified microRNA-disease associations [14, 30]. As for the dataset2, after deleting 13 miRNAs that could not be found in the website http://www.cuilab.cn/files/images/cuilab/misim.zip, we finally obtained 250 miRNA-disease associations including 105 different miRNAs and 52 different diseases. And for convenience, we adopted an adjacency matrix *A* to represent the miRNA-disease associations, in which, for any given disease *d*(*i*) and miRNA *m*(*j*), if there is a known association between them, then the value of *A*(*i*, *j*) will be set to 1, otherwise *A*(*i*, *j*) will be set to 0. Therefore, the *i*th row of *A* denotes the interaction profiles of the disease *d*(*i*) with each of these collected miRNAs, and the *j*th column of *A* indicates the interaction profiles of the miRNA *m*(*j*) with each of these collected diseases. And moreover, the number of diseases and miRNAs collected in this paper will be represented by *N*_{d} and *N*_{m}, respectively. Hence, based on the adjacency matrix *A*, we can obtain an original miRNA-disease association network MDA.

##### 2.2. miRNA Functional Similarity

The miRNA functional similarity network can be established based on the assumption that functionally similar miRNAs are always associated with similar diseases [31]. In this section, in order to construct the miRNA functional similarity network, we downloaded the functional similarity scores between miRNAs collected in this study from the website http://www.cuilab.cn/files/images/cuilab/misim.zip and then, for convenience, we use *FS* to represent the miRNA functional similarity matrix, in which, the value of *FS*(*i*, *j*) represents the similarity score between the miRNA *m*(*i*) and the miRNA *m*(*j*).

##### 2.3. Disease Semantic Similarity

The association between diseases can be represented by an directed acyclic graphs (DAGs), in which, a disease *D* can be described as DAG(*D*) = (*D*, *T*(*D*), *E*(*D*)), where *T*(*D*) represents a set of nodes including the *D* itself and its all ancestor nodes and *E*(*D*) is a set consisting of direct edges that connect parent nodes and child nodes in *T*(*D*). And moreover, the contribution of a disease *d* to the semantic value of *D* can be calculated according to the following formula:

Additionally, the semantic value of disease *D* can be obtained as follows:

Here, ∆ is a semantic contribution factor between 0 and 1, which will be set to 0.5 in this paper according to related works [32, 33]. And according to the above Formula (1), it is easy to see that the contribution of the disease *D* to the semantic value of itself is 1 and the contribution of an ancestor disease *d* to the semantic value of *D* gradually decreases with the increasing of the distance between them, which is regulated by ∆. And additionally, according to Formula (2), it is obvious that the semantic value of *D* is the sum of the contributions of ancestor diseases to the semantic values of *D*. In general, based on the assumption that if two diseases share more parts of the DAGs, there should be a higher semantic similarity between them, and the semantic similarity between the disease *d*(*i*) and *d*(*j*) can be obtained according to the following formula:

Thereafter, according to Formula (3), we can obtain a *N*_{d} × *N*_{d} dimensional disease semantic similarity matrix *SS* based on these *N*_{d} diseases collected previously.

##### 2.4. Gaussian Interaction Profile Kernel Similarity for miRNAs and Diseases

In this section, based on the hypothesis that similar miRNAs are always related or unrelated to similar diseases, we will adopt the topological information of known miRNA-disease association network to calculate the Gaussian interaction profile kernel similarity for miRNAs. Firstly, let the binary vector IP(*m*(*i*)) indicate the *i*th column of the adjacency matrix *A*, then, the Gaussian kernel similarity between the miRNA *m*(*i*) and the miRNA *m*(*j*) can be obtained according to the following formula:where is a parameter used to control the Gaussian kernel bandwidth, and is defined as follows:

As shown in Formula (5), there is a new bandwidth parameter , which will be set to 1 according to previous work [34]. Thereafter, a *N*_{m} × *N*_{m} dimensional miRNA Gaussian interaction profile kernel similarity matrix *KM* can be obtained based on Formula (4).

Similarly, the Gaussian interaction profile kernel similarity between the disease *d*(*i*) and disease *d*(*j*) can be calculated according to the following formulas:where represents the *i*th row of the adjacency matrix , is a parameter used to control the Gaussian kernel bandwidth, and is a bandwidth parameter that will be set to 1 according to previous work [34]. Hence, a dimensional disease Gaussian interaction profile kernel similarity matrix *KD* will be obtained based on Formula (6).

##### 2.5. Integrated Similarity for miRNAs and Diseases

In this section, in order to improve the accuracy of our prediction results, we will further construct an integrated miRNA similarity matrix *S*_{m} and an integrated disease similarity matrix *S*_{d} based on these newly obtained matrices such as *FS*, *SS*, *KM*, and *KD* according to the following formulas separately:

##### 2.6. The Prediction Model of NBMDA

For a disease node *d*(*i*) and a miRNA node *m*(*j*) in the miRNA-disease association network, according to the concept of common neighbors given in the previous literature [35], considering the computational complexity, we define the common neighbors (*CNs*) between *d*(*i*) and *m*(*j*) as the nodes that are involved in all possible quadrangular closure between *d*(*i*) and *m*(*j*) in the miRNA-disease association network. Obviously, the more *CNs* between two seed nodes such as *d*(*i*) and *m*(*j*), the greater the possibility that these two seed nodes are associated with each other will be. In addition, according to LCP-theory (i.e., the theory of local community paradigm) [35, 36], it is easy to know that the information content related with the common neighbor nodes should be complemented with the topological information emerging from their interactions. Hence, in this section, we introduced LCLs to indicate the number of links that exist between CNs in the miRNA-disease association network. While searching for CNs between two seed nodes in the miRNA-disease association network, we will temporarily remove the connection between these two seed nodes if there is a connection between them. Based on the above LCP-theory, we proposed a novel neighborhood-based computational model called NBMDA for potential miRNA-disease association prediction. In the model of NBMDA, firstly, an integrated miRNA similarity and an integrated disease similarity will be obtained based on the miRNA functional similarity, the disease semantic similarity, and the Gaussian interaction profile kernel similarity for miRNAs and diseases, respectively. And then, considering that known miRNA-disease associations are quite sparse and many diseases (or miRNAs) are associated with only one or a few miRNAs (or diseases), we adopted KNN as a recommendation algorithm to improve the prediction accuracy of NBMDA. Its main idea is to obtain *K* different diseases that are most similar to a randomly given disease *d*(*i*) based on the integrated disease similarity, if most of these *K* diseases are associated with a same miRNA *m*(*k*), then it is obvious that we can assume that disease *d*(*i*) is associated with the miRNA *m*(*k*), and therefore, we can construct a new miRNA-disease association network SDA. In a similar way, we can as well obtain *K* different miRNAs that are most similar to a randomly given miRNA *m*(*j*) based on the integrated miRNA similarity, if most of these *K* miRNAs are associated with a same disease *d*(*k*), then we can assume that miRNA *m*(*j*) is associated with the disease *d*(*k*), and therefore, we can further construct another new miRNA-disease association network SMA. Furthermore, considering the selection of the value of *K* while adopting KNN, we tried different values of *K* from 1 to 5 and found that the best experimental results can be achieved by NBMDA when *K* is set to 3. And as a result, an example is shown in Figure 1, in which, in order to predict the potential association between *D*_{1} and *M*_{1}, in SDA, based on the integrated disease similarity, we can obtain three diseases *D*_{2}, *D*_{5}, and *D6* that are the most similar to *D*_{1}, and then, we can obtain a new disease-miRNA association matrix based on 3*NN*. Additionally, in SMA, while calculating the association probability between the seed nodes *D*_{1} and *M*_{1}, we will temporarily remove the connection between them.