Abstract

A lot of research studies have shown that many complex human diseases are associated not only with microRNAs (miRNAs) but also with long noncoding RNAs (lncRNAs). However, most of the current existing studies focus on the prediction of disease-related miRNAs or lncRNAs, and to our knowledge, until now, there are few literature studies reported to pay attention to the study of impact of miRNA-lncRNA pairs on diseases, although more and more studies have shown that both lncRNAs and miRNAs play important roles in cell proliferation and differentiation during the recent years. The identification of disease-related genes provides great insight into the underlying pathogenesis of diseases at a system level. In this study, a novel model called PADLMHOOI was proposed to predict potential associations between diseases and lncRNA-miRNA pairs based on the higher-order orthogonal iteration, and in order to evaluate its prediction performance, the global and local LOOCV were implemented, respectively, and simulation results demonstrated that PADLMHOOI could achieve reliable AUCs of 0.9545 and 0.8874 in global and local LOOCV separately. Moreover, case studies further demonstrated the effectiveness of PADLMHOOI to infer unknown disease-related lncRNA-miRNA pairs.

1. Introduction

Noncoding RNA, according to its size, can be divided into small and long noncoding RNAs approximately. Generally, small RNAs include tRNAs, miRNAs, piRNAs, and snoRNAs [14], and miRNAs are widely present in the cytoplasm of eukaryotic cells and are approximately 18–22 nucleotides in length, which can bind to 3′-untranslated region of mRNA (3′-UTR) to inhibit the translation process of mRNA or to degrade mRNA, thereby affecting the expression of related genes [57]. miRNAs play important roles in a series of life activities such as cell differentiation of living body [8], growth and development [9], and apoptosis [10]. Compared to small-molecule ncRNA, lncRNA has a longer nucleotide chain with more than 200 nucleotides and has a specific and complex secondary space structure inside the molecule and can provide multiple sites for protein binding [11]. In addition, both lncRNAs and miRNAs are key members of noncoding RNAs and play important roles in coding and regulation of many complex human diseases [1216].

Up to now, there have been many studies on relationships between diseases and miRNAs. For example, some important methods proposed by Xing Chen et al. [1720] and Zou et al. [2124]. In terms of prediction of potential associations between lncRNAs and diseases, Yu et al. [25] and Xing et al. [26] proposed two kinds of computational models called NBCLAD and LRLSLDA, respectively. Moreover, studies have also shown that there exist relationships between lncRNAs and miRNAs. For example, Gernapudi et al. demonstrated that miRNA 140 can induce the expression of lncRNA NEAT1 [27]. Dey et al. showed that the silencing of lncRNA H19 and knockout of H19 gene in myoblasts significantly decreased skeletal muscle differentiation [28]. Yilong et al. discovered that, after low XIST expression in gliomas, XIST could regulate miR-152 glioma stem cells to inhibit cell proliferation, migration, and invasion [29]. Xinyu et al. demonstrated that lncRNA MALAT1 could achieve posttranscriptional regulation of esophageal squamous cell carcinoma cells through miR-101 and miR-217 [30]. Er-bao et al. proposed that lncRNA ANRIL interacted with miR-99a/miR449a to regulate cell proliferation during gastric cancer formation [31]. You et al. found that the expression of miR-449a and the expression of lncRNA NEAT1 in lung cancer cell L9981 inhibited each other. When miR-449a was overexpressed, NEAT1 expression was decreased, cell proliferation was inhibited, and apoptosis was increased, and vice versa [32]. Emmrich et al. found that the expression of lncRNA MONC and MIR100HG was closely related to the miRNA groups of miR-99a∼125b-2 and miR-100∼125b-1. After silencing of lncRNA MONC and MIR100HG, acute megakaryocytes in the early stage of the disease, the tumor cells of leukemia patients, were severely inhibited [33]. Amy et al. found that lncRNA Ang362 was the host transcriptor of miR-211 and miR-222, and their interactions regulated Ang II and induced proliferation of vascular smooth muscle cells [34]. Miaojun et al. found that the interactions between lncRNA H19 and miRNA-675 play an important role in the metastasis of prostate cancer [35]. Obviously, the exploration of these relationships was conducive to the construction of gene regulatory networks and the identification of the mechanisms of complex human diseases [3638].

From the above description, it is easy to see that more and more studies have shown that lncRNA-miRNA interactions are involved in the development of complex diseases. However, to the best of our knowledge, so far, in addition to the model of PADLMP proposed by Zhou et al. [39], few models have been proposed for large-scale prediction of potential associations between diseases and lncRNA-miRNA interactions. Hence, inspired by state-of-the-art methods [4044], which show that the miRNA-miRNA pairs can work cooperatively to regulate a single gene or gene clusters being involved in similar processes [45], and simultaneously, based on the reasonable assumption that functionally similar lncRNA-miRNA pairs tend to be associated with similar diseases, in this paper, a new prediction model called PADLMHOOI was proposed to infer potential associations between diseases and the lncRNA-miRNA pairs. And, as illustrated in Figure 1, our newly proposed prediction model PADLMHOOI consists of the following four major steps:Step 1 (Data Integration and Network Construction). In this step, first of all, we downloaded known disease-lncRNA associations from three different disease-lncRNA databases such as disease-lncRNA [46], MNDR [47, 48], and lnc2cancer [49], respectively, and then, based on these datasets, we constructed a bipartite network of disease-lncRNA. Next, we downloaded known disease-miRNA associations from three different databases such as miR2Disease [50], HMDD [51], and miRCancer [52] separately, and then, based on these datasets, we constructed a bipartite network of disease-miRNA. Moreover, we downloaded the 2015 and 2017 versions of known lncRNA-miRNA associations from the starBasev2.0 database [53] (http://starbase.sysu.edu.cn/) on Feb 2, 2017, and based on these datasets, we constructed a bipartite network of lncRNA-miRNA. Finally, based on these three kinds of bipartite networks, we constructed an integrated tripartite network of disease-lncRNA-miRNA, which could be denoted as a tensor T.Step 2 (Similarity Calculation). In this step, we would integrate the disease semantic similarity and Gaussian Interaction Profile Kernel similarity firstly to measure the similarity of diseases. Next, we would integrate the lncRNA functional similarity and miRNA functional similarity in three different ways to measure the functional similarity of lncRNA-miRNA pairs.Step 3 (Weighted K-Nearest Neighbor Profile). Considering that there may be diseases that are unrelated to all lncRNA-miRNA pairs, which may lead to unsatisfactory prediction results while implementing PADLMHOOI to infer potential associations between diseases and lncRNA-miRNA pairs. Hence, in this step, we would introduce the weighted K-nearest neighbor profile (WKNNP) to add more interaction information between diseases, lncRNAs, and miRNAs to improve the prediction performance of PADLMHOOI.Step 4 (Tensor Decomposition). In this step, we would perform tensor decomposition on the newly constructed disease-lncRNA-miRNA tensor T. Since the results of tensor decomposition include a core tensor and three matrices, we can define the final predicted association tensor as the modal product between the core tensor and these three matrices. Thereafter, we would sort scores of the lncRNA-miRNA pairs associated with each disease in the descending order in the final predicted association tensor, and it is obvious that the higher the ranking of the score, the bigger the possibility that there may exist potential association between the disease and the lncRNA-miRNA pair would be.

2. Materials and Methods

2.1. Construction of the Bipartite Network of Disease-lncRNA

In order to construct the bipartite network of disease-lncRNA, firstly, known associations between diseases and lncRNAs were downloaded from three different databases such as the LncRNADisease, MNDR, and Lnc2Cancer, respectively, and then, after feature processing (including feature cleaning and data imbalance processing etc.), 2048 different disease-lncRNA associations were finally obtained (Supplementary Table 1). Thereafter, based on these newly obtained 2048 known disease-lncRNA associations, we can construct a disease-lncRNA bipartite network G1 = (V1, E1) according to the following steps:Step 1. Let be the set of all different lncRNAs in these 2048 known disease-lncRNA associations and be the set of all different diseases in these 2048 known disease-lncRNA associations, then we define as the vertex set in G1.Step 2. , if (li, dj) belongs to these 2048 downloaded known disease-lncRNA associations, then we define that there is an edge between li and dj in G1; thereafter, we can obtain the edge set E1 in G1.

2.2. Construction of the Bipartite Network of Disease-miRNA

In order to construct the bipartite network of disease-miRNA, at first, known disease-miRNA associations were downloaded from three different databases such as the miR2Disease, HMDD, and miRCancer separately, and then, after these newly acquired miRNAs and diseases being mapped to the database miRBase v21 [54] and disease ontology (DO) [55], respectively, 4041 different disease-miRNA associations were finally obtained (Supplementary Table 2). Hence, based on these newly obtained 4041 known disease-miRNA associations, we can construct a disease-miRNA bipartite network G2 = (V2, E2) according to the following steps:Step 1. Let be the set of all different miRNAs in these 4041 known disease-miRNA associations and be the set of all different diseases in these 4041 known disease-miRNA associations, then we define as the vertex set in G2.Step 2. , if (mi, dj) belongs to these 4041 known disease-miRNA associations, then we define that there is an edge between mi and dj in G2; thereafter, we can obtain the edge set E2 in G2.

2.3. Construction of the Bipartite Network of lncRNA-miRNA

In order to construct the bipartite network of lncRNA-miRNA, at first, two different versions (2015 and 2017) of lncRNA-miRNA dataset were downloaded from the starBasev2.0 database separately, and then, after feature processing (including feature cleaning and data imbalance processing), 20324 different lncRNA-miRNA interactions were finally obtained (Supplementary Table 3). Thereafter, based on these newly obtained 20324 known lncRNA-miRNA associations, we can construct a lncRNA-miRNA bipartite network G3 = (V3, E3) according to the following steps:Step 1. Let denote the set of all different lncRNAs in these 20324 known lncRNA-miRNA associations and denote the set of all different miRNAs in these 20324 known lncRNA-miRNA associations, then we define as the vertex set in G3.Step 2. , if (li, mj) belongs to these 20324 known lncRNA-miRNA associations, then we define that there is an edge between li and mj in G3; thereafter, we can obtain the edge set E3 in G3.

2.4. Construction of the Tripartite Network of Disease-lncRNA-miRNA

Based on the above newly obtained networks such as G1, G2, and G3, we can construct a tripartite network G4 = (V4, E4) according to the following steps:Step 1. Let , , and .Step 2. While Vd is not null, Repeat:,If and satisfyies the following three kinds of conditions simultaneously:(a)(b)(c)Then (di, lj), (di, mk), and (lj, mk) will be added into E4 firstly, and then, di will be added into and removed from Vd. Finally, lj and mk will be added into V4 if lj and mk are not inV4.Else, di will be removed from Vd.Step 3. Let .

According to above steps, a tripartite disease-lncRNA-miRNA association network can be obtained finally. And, it is obvious that, in the tripartite network, there are three kinds of different nodes such as disease nodes, lncRNA nodes, and miRNA nodes; moreover, the number of disease nodes, lncRNA nodes, and miRNA nodes is 68, 44, and 211, respectively, and the number of associations between diseases and lncRNA-miRNA pairs is 3,047.

2.5. Construction of the Disease-lncRNA-miRNA Tensor

Based on the newly constructed tripartite network, for any given disease node di, lncRNA node lj, and miRNA node mk in G4, we can define a tensor T as follows:

2.6. Calculation of the Similarity of Disease Pairs
2.6.1. Calculation of the Disease Semantic Similarity (DisSemSim)

In order to estimate the semantic similarity between diseases, we first downloaded the MeSH descriptor from the National Medical Library (http://www.nlm.nih.gov/) and selected the standard MeSH disease terminology. And then, for each disease d, we can construct a Directed Acyclic Graph (DAG) such as , where denotes the set of nodes containing the node d itself and its ancestors and denotes the set of edges of the respective direct links from parent to child nodes [56]. Thereafter, based on the newly constructed directed acyclic graph , the semantic contribution of an ancestor node ds to the disease d can be calculated as follows:where is the semantic contribution decay factor with value between 0 and 1. And, in addition, according to the experimental results of some previous state-of-the-art methods [57, 58], the most appropriate value for will be 0.5. Hence, based on the assumption that two diseases with more common ancestor nodes in their DAGs shall have higher semantic similarity, the semantic similarity between two diseases di and dj can be defined as follows:

2.6.2. Calculation of the Gaussian Interaction Profile Kernel Similarity for Diseases (GIPSim)

Based on the hypothesis that functionally similar genes are often associated with similar diseases, in this section, we will adopt the Gaussian Interaction Profile Kernel to calculate the similarity of diseases according to the following steps:

Firstly, based on the networks G1 and G2 constructed above, for any given lncRNA li and disease dj, we define that

Next, for any given miRNA mi and disease dj, we define that

Hence, let denote the ith column of the matrix Y1, then we can calculate the Gaussian Kernel Similarity between diseases di and dj based on their interaction profiles as follows:where the parameter denotes the number of different diseases in G1.

In a similar way, let denote the ith column of matrix Y2, then we can calculate the Gaussian Kernel Similarity between diseases di and dj based on their interaction profiles as follows:Here, the parameter denotes the number of different diseases in G2.

Thereafter, based on these above formulas, we can calculate the Gaussian Interaction Profile Kernel Similarity between diseases di and dj as follows:

2.7. Calculation of the Similarity of lncRNA Pairs
2.7.1. Calculation of the lncRNA Functional Similarity

For any two given lncRNAs such as li and lj, let be all the diseases related to li in G1 and be all the diseases related to lj in G1, then we can define the functional similarity between li and lj as follows:where

2.7.2. Calculation of the Gaussian Interaction Profile Kernel Similarity for lncRNAs

For any two given lncRNAs such as li and lj, similar to the definition of formula (6), let and denote the ith and the jth row of the matrix Y1, respectively, then we can calculate the Gaussian Kernel Similarity between diseases li and lj based on their interaction profiles as follows:where denotes the number of different lncRNAs in G1.

Hence, based on these formulas given above, we can finally define the similarity measurement between lncRNAs li and lj as follows:

2.8. Calculation of the Similarity between miRNAs (miRSim)
2.8.1. Calculation of the miRNA Function Similarity (miRfunSim)

For any two given miRNAs, such as mi and mj, let be all the diseases related to mi in G2 and be all the diseases related to mj in G2, then we can define the functional similarity between mi and mj as follows:

2.8.2. Calculation of the Gaussian Interaction Profile Kernel Similarity for miRNAs

For any two given miRNAs, such as mi and mj, in a similar way, let and represent the ith and jth row in matrix Y2, respectively, then we can calculate the Gaussian Kernel Similarity between diseases mi and mj based on their interaction profiles as follows:where denotes the number of miRNAs in G2.

Hence, based on these formulas presented above, we can finally define the similarity measurement between miRNAs mi and mj as follows:

2.9. Weighted K Nearest Neighbor Profiles for Diseases, lncRNAs, and miRNAs (WKNNP)

Let , , and denote the set of diseases, lncRNAs, and miRNAs, respectively. Let denote the horizontal slice matrix in disease axis of the tensor T, hence, also represents the interaction profile for the disease di. Let denote the jth lateral slice matrix in lncRNA axis of the tensor T, hence, also represents the interaction profile for lncRNA lj. Let denote the frontal slice matrix in miRNA axis of the tensor T, hence, also denotes the interaction profile for miRNA mp. Then, it is obvious that the values in these three kinds of interaction profiles of any novel diseases, lncRNAs, or miRNAs are all zeros, which may lead to unsatisfactory prediction performance during inferring potential associations between diseases and lncRNA-miRNA pairs. Hence, in this section, we will perform a procedure for the construction of new interaction profiles to address the problem mentioned above. And, in this procedure, for each disease di, its association with other K nearest known diseases (including at least one experimentally verified association) and corresponding K interaction profiles will be utilized to obtain the following interaction profile:where, are the diseases sorted in descending order based on their similarity to di, is the weight coefficient, and , which means that a higher weight will be assigned if dt is more similar to di. The parameter is a decay term with values between 0 and 1. The parameter is a normalization term, and there is .

In the same manner, the new interaction profile for each can be determined as follows:where are the lncRNAs sorted in the descending order based on their similarity to is the weight coefficient, and , which means that a higher weight will be assigned if is more similar to . The parameter is a normalization term, and there is .

Similarly, the new interaction profile for each can be determined as follows:where are the miRNAs sorted in the descending order based on their similarity to is the weight coefficient, and , which means that a higher weight is assigned if mt is more similar to mp. The parameter is a normalization term, and there is .

Thereafter, after combining the above three kinds of tensors , and obtained from different data spaces and replacing with an associated likelihood score, we can update the original adjacency matrix T as follows:where .

2.10. PADLMHOOI

Inspired by the successful application of tensor decomposition in the field of link prediction and the application of nonnegative matrix decomposition methods in inferring disease-miRNA associations, in this section, we proposed a novel model called PADLMHOOI to predict new associations between diseases and miRNA-lncRNA pairs. From above descriptions, it is easy to know that a tensor is a multidimensional array. Currently, the most commonly used tensor decomposition techniques include Tucker decomposition [59], HOSVD [60], and HOOI [61]. In this section, we will perform Tucker decomposition on above constructed tensor T. Assuming , the tucker decomposition aims at finding and core tensor that can solve the following optimization problem:

Hence, based on formula (21), we can further transform this equation to following simple form:, and are the factor matrices, which are usually orthogonal and can be considered as the main component of each mode. R1, R2, and R3 are the number of columns () in the factor matrices Z1, Z2, and Z3 respectively. The notation denotes n-mode product; is the shorthand introduced by Kolda and Gibson [62] (Supplementary File A).

Based on equation (22), the above optimization problem can be solved according to the following steps:

Considering that the derivation forms of Z1, Z2, and Z3 are similar, we will only derive the iterative formula of Z1 as an example. Firstly, as illustrated in formula (23), the objective function given in formula (22) can be rewritten as a matrix form of T along the first dimension:where is the unfolding of T along the first dimension (Supplementary File A). Assuming that the optimal solution Z1 satisfies all the constraints in equation (22), we havewhere denotes the Kronecker product, and moreover, we have

Hence, formula (24) can be regarded as a nonnegative matrix factorization (NMF) form [63]. Then, we can finally obtain the solution of Z1 by updating NMF as follows:

Hence, we can finally obtain the factor matrices Z2 and Z3 in a similar way. Thereafter, while fixing the factor matrices Z1, Z2, and Z3, the objective function in formula (22) can be converted to the following form:where denotes the vectorization of the tensor. And moreover, based on formula (27), the following linear equation can be obtained:

Let , then obviously, formula (28) can also be regarded as a NMF, and thereafter, the core tensor in formula (28) can be obtained as follows [63]:

Based on above formulas, the pseudocode of our prediction model PADLMHOOI based on tensor decomposition can be described as follows:Step 1. Input: T, R1, R2, R3, Z1, Z2, Z3, G, and the convergence threshold .Step 2. Repeat    For i = 1 to 3:     Update Zi according to formula (26)    End For    Update G according to formula (30)   Until Step 3. Return Z1, Z2, Z3, G

According to above steps, we can obtain the final predicted disease-lncRNA-miRNA association tensor , and after prioritizing the disease-related lncRNA-miRNA pairs based on the entities in the tensor , obviously, the top-ranked lncRNA-miRNA pairs can be regarded as more likely to be related to the corresponding disease.

3. Results and Analysis

3.1. Leave-One-Out Cross-Validation (LOOCV)

In order to estimate the prediction performance of our newly proposed prediction model, the global leave-one-out cross-validation (LOOCV), 2-fold cross-validation (2-fold CV), and 10-fold cross-validation (10-fold CV) were implemented on PADLMHOOI, respectively. In the K-fold cross-validation, the initial sample will be divided into K subsample sets, and a single subsample set is retained as the data for the validation model, while the other K − 1 samples are used to train the model. During simulation, the cross-validation will be performed K times, and each subsampling set will be verified once, and the average results of K times will be utilized to obtain a single estimation. Moreover, in order to reduce the performance deviation caused by the random sample partitioning, we divide the partition 100 times and then obtain the ROC curve and the AUC value in the same way as the LOOCV. And, as a result, from the following Table 1, it is easy to see that PADLMHOOI can achieve reliable AUCs of 0.9545, 0.9730 ± 0.0119, and 0.9626 ± 0.0150 in the frameworks of global LOOCV, 2-fold CV, and 10-fold CV, respectively. Additionally, in order to further estimate the prediction performance of PADLMHOOI, we implemented it under the framework of local LOOCV, and the simulation results of 50 predicted related diseases were illustrated in Supplementary Table 4.

3.2. Performance Comparison with Other Methods

To the best of our knowledge, up to now, PADLMP [39] is the unique model having been proposed for predicting potential associations between disease and lncRNA-miRNA pairs, in which, these three kinds of nodes such as disease nodes, lncRNA nodes, and miRNA nodes are considered simultaneously to construct a triple network. And, the major difference between PADLMP and our model PADLMHOOI is that PADLMP is based on the method of link prediction. Therefore, in order to compare PADLMP with our model PADLMHOOI, we implemented LOOCV to verify the prediction performance of these two models based on the 3047 known disease-lncRNA-miRNA associations downloaded above. In the first experiment, we set the parameters in PADLMP to their best values; specifically, the step size K is set to 2 and the attenuation coefficient is set to 0.01. Meanwhile, for convenience, we set the parameters in PADLMHOOI as follows: the parameters a1, a2, and a3 in formula (20) are all set to 1, the parameters r1, r2, and r3 in formula (21) are all set to 5, and the parameters K and in formulas (17)–(19) are all set to 3 and 0.1 separately. And, as illustrated in Figure 2, it is easy to see that PADLMHOOI and PADLMP can achieve the AUCs of 0.9545 and 0.9318 separately, which demonstrate that the prediction performance of PADLMHOOI is superior to that of PADLMP.

As time went by, we found that some databases have been updated. Hence, in order to further demonstrate the advancement of PADLMHOOI, we once again collected the latest disease-lncRNA correlations from the databases lnc2cancer v2.0, lncRNADisease 2.0 [64], and MNDR v2.0 [48], collected the latest disease-miRNA associations from the database HMDD v3.0, and collected the latest lncRNA-miRNA associations from the database RAID v2.0 [65] separately. And thereafter, we reconstructed the triple network based on these newly collected latest datasets. In the newly constructed triple network, the numbers of disease nodes, lncRNA nodes, and miRNA nodes are 42, 234, and 251 respectively; the number of known associations between diseases and lncRNA-miRNA pairs is 3,768; the number of known associations between diseases and lncRNAs is 733; and the number of known associations between diseases and miRNAs is 674. Then, based on the new triple network, we compared our model PADLMHOOI with PADLMP once more. And, in this second experiment, we set the parameters K and to 10 and 0.5, respectively, in PADLMHOOI and kept other parameters unchanged as in the first experiment. And, as illustrated in Figure 3, simulation results show that PADLMHOOI and PADLMP can achieve AUCs of 0.9026 and 0.9013, respectively, which demonstrate that the prediction performance of PADLMHOOI outperforms that of PADLMP markedly.

Additionally, the interesting point is that our model can infer potential disease-lncRNA associations and disease-miRNA associations incidentally, while predicting potential associations between diseases and lncRNA-miRNA pairs. Hence, it is reasonable as well to compare our model PADLMHOOI with prediction models for inferring potential disease-lnRNA or disease-miRNA associations. Therefore, in this section, we would compare PADLMHOOI with some state-of-the-art computational prediction models such as the LRLSLDA [26], NBCLAD [25], WBSMDA [66], and RLSMDA [67]. Among them, LRLSLDA is a semisupervised learning-based prediction model for inferring potential lncRNA-disease associations; NBCLAD is a probabilistic model for predicting potential associations between diseases and lncRNAs; WBSMDA is a prediction model for predicting potential associations between diseases and miRNAs; and RLSMDA is a prediction model for predicting disease-related miRNAs based on the framework of regularized least squares. In addition, while comparing with LRSLDA, known disease-lncRNA associations were obtained from the triple disease-lncRNA-miRNA network; however, the parameters in LRSLDA are set to the same values given in the literature. Moreover, while comparing with NBCLDA, considering that there are four kinds of nodes such as diseases, lncRNAs, miRNAs, and genes included in NBCLDA, there are three kinds of nodes such as diseases, lncRNAs, and miRNAs in our model PADLMHOOI. Hence, for the sake of fairness, we only compared PADLMHOOI with the submethod NBCLDA-GN1-SD. And, as illustrated in Figure 4, simulation results show that PADLMHOOI, NBCLDA-G1-SD, and LRSLDA can achieve AUCs of 0.9568, 0.7928, and 0.5924 separately, which demonstrate that PADLMHOOI thoroughly defeats both NBCLDA-G1-SD and LRSLDA. In addition, while comparing with WBSMDA and RLSMDA, 674 known disease-miRNA associations were obtained from the triple disease-lncRNA-miRNA network; however, the parameters in both WBSMDA and RLSMDA are set to the same values given in the literatures. And, as illustrated in Figure 5, simulation results show that PADLMHOOI, WBSMDA, and RLSMDA can achieve AUCs of 0.9157, 0.8544, and 0.8991, respectively, which demonstrate that PADLMHOOI outperforms both WBSMDA and RLSMDA thoroughly as well.

3.3. Recall Ratio Analysis

In this section, in order to further evaluate the prediction performance of PADLMHOOI, we compared the recall value of PADLMHOOI and other state-of-the-art models. It is well known that the higher recall ratio of all selected diseases in a top k ranking list means that the more positive testing samples (real disease-related lncRNA-miRNA pairs) have been identified successfully. And, as a result, Figure 6 illustrates the recall rate of all selected diseases in different top k ranking lists. Moreover, we further listed the recall rate of some given diseases associated with at least 80 verified lncRNA-miRNA associations in Supplementary Table 5.

3.4. Case Studies

In this section, case studies of breast neoplasms, colon neoplasms, and prostate neoplasms were conducted to further verify the capability of PADLMHOOI to detect novel associations between diseases and lncRNA-miRNA pairs separately. And, among these three kinds of case studies, breast cancer is the second leading cause of female cancer death and comprises 22% of all cancers in women [68, 69]. The related literature has suggested that lncRNAs and miRNAs play an important role in the formation of many diseases, and the formation of breast cancer may be more relevant to them [70, 71]. Predicting breast cancer-associated lncRNA-miRNA pairs and identifying lncRNAs and miRNAs as biomarkers may make a significant contribution to better diagnosis and treatment of breast cancer [71]. In Supplementary Table 6, we have listed the top 30 candidate lncRNA-miRNA pairs related to breast cancer. And, in Supplementary Table 6, the column of lncRID and miRID denotes lncRNA ID and miRNA ID, respectively. Evi1 and Evi2 denote some authority database or published literature containing verified disease-lncRNA or disease-miRNA associations separately. “#” and “∗” stand for databases of lncRNADisease and MNDR v2.0, respectively, which consist of known disease-lncRNA associations or contain published literatures to support the association between predicted lncRNAs and breast cancer. “!,” “&,” and “+” stand for databases of HMDD, miR2Disease, and miRCancer, respectively, which consist of known disease-miRNA associations or contain published literature to support the association between predicted miRNAs and breast cancer. Particularly, “Nan” indicates that there is no database or no published literature to support the predicted results. From Supplementary Table 6, it is easy to see that all candidate disease-lncRNA associations have been verified in databases of the lncRNADisease and MNDR v2.0 or published papers containing these databases. And, in addition, there are 42 out of 50 candidate disease-miRNA associations having been reported by HMDD, miR2Disease, and miRCancer or published paper containing these databases. Moreover, we discovered that those novel miRNAs with miRID 35, 51, 73, 164, and 186 are related to some important factors affecting the development of breast neoplasms. Hence, it is obvious that we infer that these lncRNA-miRNA pairs may be associated with breast cancer.

In addition, colonic tumors are a type of malignancy that is common in the rectum and sigmoid borders [72]. Early colon cancer is difficult to detect because of its insignificant symptoms [73]. Unfortunately, the related literature reports that its incidence has been on the rise in recent years [74]. Therefore, predicting potential miRNAs and lncRNAs associated with colon tumors is of great significance for the diagnosis of early colon cancer. In Supplementary Table 7, we have listed the top 30 candidate lncRNA-miRNA pairs predicted to be associated with colon tumors. Moreover, all of these candidate lncRNAs and most of these candidate miRNAs have been verified by lncRNADisease database and MNDR v2.0, respectively.

Moreover, prostate neoplasm is one of the most common cancers in white and African-American men, and it is reported that there are about one in six white men and one in five African-American men having prostate cancer in their lifetime. Recent researches have shown that prostate neoplasm is caused by the malignancy of prostate epithelial cells [75], its formation includes many factors such as age, family history, and race [76], and particularly, some miRNAs such as has-let-7a-5p and lncRNAs such as XIST have been found to be involved in the formation of prostate neoplasms successively. Hence, it is interesting to infer potential miRNAs and lncRNAs associated with prostate neoplasms. In Supplementary Table 8, we have listed the top 30 prostate neoplasm-related candidate lncRNA-miRNA pairs. Moreover, all of these candidate lncRNAs and most of these candidate miRNAs have been verified by lncRNADisease and MNDR v2.0, respectively.

3.5. Parameter Sensitivity Analysis

Considering that there are some key parameters such as K and , which may be significant to the performance of our prediction model PADLMHOOI, in this section, we will further estimate the effects of these key parameters to the prediction performance of PADLMHOOI. Firstly, we varied K from 1 to 10 during simulation. And, as a result, Table 2 illustrates the impacts of parameter K on the performance of PADLMHOOI. By observing Table 2, it is obvious that PADLMHOOI can achieve the maximum AUC value of 0.9708 while K = 8. And additionally, as for the impacts of the parameter , considering the time costs, we set K = 3 and varied from 0.1 to 0.9 during simulation. And as a result, Table 3 illustrates the impacts of parameter on the performance of PADLMHOOI. By observing Table 3, it is obvious that PADLMHOOI can achieve the maximum AUC value of 0.9591 while  = 0.7.

4. Discussion and Conclusion

Researches on prediction of potential associations between lncRNA-miRNA pairs and diseases not only are helpful in understanding the disease mechanisms on lncRNA and miRNA levels but also play an important role in the detection of disease biomarkers, diagnosis, prognosis, and prevention. However, to our knowledge, although there are many researches having demonstrated that lncRNA-miRNA interactions are associated with the development of complex diseases, up to now, there are few models having been proposed for large-scale forecasting potential associations between diseases and lncRNA-miRNA pairs. Since traditional biological experiments are quite expensive and time-consuming, in this paper, based on the existing disease-miRNA associations, disease-lncRNA associations, lncRNA-miRNA interactions, and the assumption that genes with similar functions are often associated with similar diseases; we firstly constructed a three-order tensor T by adopting the method of WKNNP, and then, based on the method of tensor factorization, we further proposed a prediction model called PADLMHOOI to infer potential relations between diseases and lncRNA-miRNA pairs. And thereafter, simulation results under the frameworks of global and local LOOCV, 2-fold CV, and 10-fold CV, all confirmed the superiority of PADLMHOOI. Moreover, case studies of breast neoplasms, colon neoplasms, and prostate neoplasms further demonstrate that our model PADLMHOOI is an effective method for predicting potential disease-associated lncRNA-miRNA pairs. Certainly, there are still some limitations in PADLMHOOI. For example, although a large number of datasets have been integrated in PADLMHOOI, the amount of data available is still not enough; it is obvious that the prediction performance of PADLMHOOI will be better if more datasets can be collected. And in addition, in this paper, we only predicted the association between disease and a single lncRNA-miRNA pair. In the future, we will further modify PADLMHOOI to predict potential associations between diseases and multiple lncRNA-miRNA pairs.

Abbreviations

PADLMHOOI:Prediction of potential associations between diseases and lncRNA-miRNA pairs based on the higher-order orthogonal iteration.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The project is partly sponsored by the National Natural Science Foundation of China (Nos. 61873221 and 61672447), the Natural Science Foundation of Hunan Province (Nos. 2018JJ4058 and 2017JJ5036), and the CERNET Next Generation Internet Technology Innovation Project (Nos. NGII20160305 and NGII20170109).

Supplementary Materials

Supplementary 1. File 1: introduction to tensor and optimization of objective function and update of factor matrix and core tensor.

Supplementary 2. Table 1: 2048 known disease-lncRNA associations.

Supplementary 3. Table 2: 4041 known disease-miRNA associations.

Supplementary 4. Table 3: 20324 known lncRNA-miRNA associations.

Supplementary 5. Table 4: AUC score for 50 diseases in the framework of local LOOCV.

Supplementary 6. Table 5: recall ratio of some important diseases.

Supplementary 7. Table 6: the candidate lncRNA-miRNA pairs associated with breast cancer. In addition, the LncRNADisease and MNDR v2.0 databases have confirmed that these lncRNAs or miRNAs are associated with breast cancer.

Supplementary 8. Table 7: the candidate lncRNA-miRNA pairs associated with colon cancer. In addition, the LncRNADisease and MNDR v2.0 databases have confirmed that these lncRNAs or miRNAs are associated with colon cancer.

Supplementary 9. Table 8: the candidate lncRNA-miRNA pairs associated with pprostate cancer. In addition, the LncRNADisease and MNDR v2.0 databases have confirmed that these lncRNAs or miRNAs are associated with colon cancer.