Abstract

Coupled matrix and tensor factorizations have been successfully used in many data fusion scenarios where datasets are assumed to be exactly coupled. However, in the real world, not all the datasets share the same factor matrices, which makes joint analysis of multiple heterogeneous sources challenging. For this reason, approximate coupling or partial coupling is widely used in real-world data fusion, with exact coupling as a special case of these techniques. However, to fully address the challenge of tensor factorization, in this paper, we propose two improved coupled tensor factorization methods: one for approximately coupled datasets and the other for partially coupled datasets. A series of experiments using both simulated data and three real-world datasets demonstrate the improved accuracy of these approaches over existing baselines. In particular, when experiments on MRI data is conducted, the performance of our method is improved even by 12.47% in terms of accuracy compared with traditional methods.

1. Introduction

With the rapid development of cyber physical systems, a soaring amount of data from heterogeneous sources is now easily accessible. Analysing data from multiple sources has been proven to enhance knowledge discovery by capturing its underlying structures, which are otherwise difficult to extract. For instance, in recommendation systems, it is not only possible to rely on past user ratings as additional assistance for joint analysis, but to also consider the supply chain surrounding a product, or the similarity between users and other information [13]. Drawing upon additional related information can improve recommendation performance. In metabolomics—an analytical technique used to study biological fluids such as LC-MS (liquid chromatography-mass spectrometry) and NMR (nuclear magnetic resonance)—joint analysis helps to accurately identify the various component chemicals [4, 5]. Electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) are complementary patterns and when jointly analyzed, can provide “the best of both worlds,” i.e., EEG’s superior time resolution and fMRI’s superior spatial resolution. Hence, fusing models can, for example, provide deeper insights into the activities of the brain or help improve medical treatments for nervous system diseases [68].

A common and effective way to deal with multisource data is to represent them as matrices and then use collective matrix factorization (CMF) [9] for joint analysis. Matrix-based joint analysis is used extensively in many fields, including bioinformatics [10, 11], social network analysis [12, 13], signal processing [14, 15], and so on. However, this type of analysis only works with two-dimensional data and cannot be applied to datasets of three or more dimensions. However, recent developments in sensor technology now allow more and different aspects of data to be captured, and higher order tensors have become an important tool for representing these multidimensional datasets. Accordingly, tensor decomposition was introduced to accurately extract the correlations between different dimensions and extensions to joint decomposition for heterogeneous and highly dimensional datasets have naturally followed.

Applying coupled higher-order tensors and matrices to heterogeneous datasets from multiple sources has been a topic of interest in many areas, such as metabolomics [16], blind source separation [17], recommendation systems [18, 19], link prediction [20], and brain imaging [21, 22]. The various problems to be solved with this technique are called coupled matrix and tensor factorization (CMTF) problems. Acar et al. proposed an all-at-once optimization approach, called CMTF-OPT [23], which is based on gradients. The advanced version of CMTF-OPT, ACMTF-OPT [16], places additional constraints on the CMTF model to force good behavior when distinguishing between shared and unshared data components. Many researchers have subsequently made improvements to CMTF to allow for joint analysis on large-scale data [24, 25], increase the speed of calculation on large-scale data, and provide for situations with data sparsity. As a result, through the joint decomposition of high-order tensors and matrices, CMTF can extract shared and hidden patterns from most heterogeneous datasets and construct those patterns into a factor matrix. However, multisource datasets hold unique forms of shared relationships, including approximately shared or partially shared data. Models that are solely designed for an exactly shared factor matrix may not be suitable. For example, traffic flow data from upstream and downstream highways are clearly related to each other but may not be exactly coupled. There are many other examples, such as MRI images from the same patient or continuous tensor data streams that hold their own internal relationships. For these types of real-life data fusion tasks, joint analysis is crucial [26, 27].

In this paper, we focus on joint data analysis with datasets that are partially or approximately coupled [28]. We propose two improved coupled tensor factorization methods: one for partially coupled datasets, called CTF-PSF, and one for approximately coupled datasets, called CTF-AC.

A summary of our contributions follows.(i)The proposed CTF-AC method is the very first tensor factorization model to address data fusion with approximately coupled datasets. This model is also suitable for multisource datasets that are not coupled but where the data are highly correlated.(ii)By combining individual decomposition and coupled decomposition, a new coupled tensor factorization method called CTF-PSF emerges. This method handles data fusion with partially coupled datasets.(iii)Extensive experiments on synthetic and real-world datasets verify that the two proposed methods generate more accurate results than the traditional methods.

The rest of this paper is organized as follows. Section 2 introduces some background knowledge on tensor decomposition and provides the problem definition. The details of how our approaches work are introduced in Section 3. Section 4 describes the experimental design of this paper and numerical experiments to illustrate the advantages of the proposed methods. Finally, we conclude our work and discuss future research directions in Section 5.

2. Preliminaries and Problem Definition

Following the notations in [29], vectors (tensors of order one) are denoted in boldface lowercase letters, e.g., . Matrices (tensors of order two) appear as boldface capital letters, e.g., . The th column of is denoted as . indicates the th matrix in a sequence. For example, represent a sequence of matrices. The transpose of matrix is denoted by . Higher-order tensors (third-order or higher) appear as boldface Euler script letters, e.g., . indicates the mode-n matricization of an -order tensor , which can be obtained by permuting the dimensions of and reshaping the permuted tensor into a matrix. and denote the two-norm of and the Frobenius norm of , respectively. The Hadamard products is indicated by . Table 1 lists all the symbols used in this paper.

2.1. CANDECOMP/PARAFAC Decomposition

CANDECOMP/PARAFAC (CP) is one of the most popular tensor decompositions. The goal of CP decomposition is to factorize a tensor into a sum of rank-one tensors. For instance, given a third-order tensor , after CP decomposition, can be approximately represented aswhere is a rank-one tensor and the symbol “” represents the vector outer product operator [29]. is a positive integer, which means it approximates with rank-one tensors. This CP model can be concisely described bywhere denotes the CP decomposition operator [30]. In this CP decomposition, , and are the factor matrices of , which represent a combination of the vectors from the rank-one components in Figure 1; i.e., . Later, Acar et al. improved CP algorithm and developed an algorithm named CP-WOPT [31]; it uses a first-order optimization method to solve the weighted least squares problem.

2.2. Coupled Tensor Factorization

Coupled factorization methods have become an effective means for jointly analyzing multisource datasets. The simplest form of coupled tensor factorization is collective matrix factorization (CMF). For example, in a movie recommendation system, additional information about the movie, such as the movie genre, its actors, or the user’s social network, in addition to the user’s historical ratings, could be used to improve the accuracy of rating predictions. For example, a user rating matrix for the movie can be expressed as matrix , which represents , coupled with matrix , which represents . This CMF model can be defined aswhere , , and are the factor matrices.

As shown in Figure 2, a high-order extension of CMF, i.e., a CMTF model, can be simply defined aswhere , , , and are the factor matrices. In this problem, CMTF-OPT is used to vectorize all the factor matrices and their partial derivatives so the problem can be solved by any gradient-based optimization algorithm, such as the nonlinear conjugate gradient (NCG) method. More details can be found in [23].

2.3. Problem Definition

Consider a coupled tensor factorization of two third-order tensors and that are coupled in the first dimension. , and are the factor matrices of and , and are the factor matrices of . To jointly factorize and , the objective function can be written asGiven our focus is on situations that are not exactly coupled, but rather approximately coupled, e.g., , function (5) is no longer applicable. However, like soft constraints, the matrices can be coupled approximately by adding a regularization term. Then, the objective function becomes However, function (6) has two potential issues. (i) The model loses accuracy when there is a large difference between the number of entries in tensors and . The errors from approximating and will have a different impact on the objective function depending on whether there are many more or many less entries in than . Therefore, using the same weight ratio will, obviously, result in a loss of accuracy [33]. (ii) Further, this model is only suitable for cases of two-tensor coupling and cannot be applied to multiple tensor scenarios.

A completely shared factor matrix, whether approximate or not, is only one type of exact coupling, e.g., . There are other types of exact coupling scenarios, such as partial coupling, e.g., , , where heterogeneous datasets only share some, but not all, components [34]. The methods based on function (4) may not be applicable to such situations. Hence, we turn our attention to partial coupling with an extension to CTF-AC, called CTF-PSF. CTF-PSF is based on Acar et al.’s [23] CMTF-OPT algorithm, but with some modifications to allow for data reconstruction with heterogeneous data that has both shared and unshared components. More details on this model appear in Section 3.2.

3. The Proposed Models

In real life, many heterogeneous datasets are only approximately coupled, which means that their dimensions are not exactly coupled. The CTF-AC model offers a joint decomposition solution to situations with approximately coupled datasets. Moreover, it is relatively common for multisource datasets to be partially coupled, which means that only some of the factors in a potential matrix are shared, not all, as is the case with exact coupling and its variants. To address these situations, we have extended CTF-AC to incorporate the CMTF-OPT algorithm in a method called CTF-PSF to offer joint decomposition for partially coupled datasets. The CTF-AC model is presented in Section 3.1. The CTF-PSF model is presented in Section 3.2.

3.1. CTF-AC

To address the two potential problems associated with function (7), i.e. unbalanced tensor entries and its inapplicability to multitensor scenarios, we have developed a “two birds with one stone” solution. To overcome potential inaccuracies as a result of unbalanced tensor entry distributions, we have added error weights to the objective function (Section 3.1.1) and to extend traditional models for use with more than two tensors, we have added a soft constraint to the transfer factor matrix (Section 3.1.2).

3.1.1. Adding Error Weights

When a weight is assigned to the fitting error for each tensor, function (6) becomes where helps with the derivative calculations and are the error weights from approximating and , respectively. is the error weight of the Frobenius norm of and . To equalize the contribution of errors in each part of the objective function, and are set to the reciprocals of the number of entries in and , respectively. is set to the reciprocal of number of entries in .where and are binary tensors of the same size as and , respectively. Therefore, and indicate the number of entries of and , and denotes the number of entries in . In this way, the model eliminates the influence an imbalanced number of tensor entries has on accuracy.

3.1.2. Adding a Soft Constraint to the Transfer Factor Matrix

To extend traditional models for use with more than two tensors, we have modified function (8) on the assumption that the Frobenius norm may have possible transitiveness. Assume that tensor is approximately shared with and . , and are the factor matrices of . Three tensors can then be approximately coupled byNow consider a more general situation. Suppose there are tensors from sources. The objective function of the joint decomposition of these tensors, based on the CP model, is defined aswhere Further, assume that there are related factors in these relevant tensors; i.e., Thus, functions (9) and (10) can be modified asLet . Then, the partial derivatives of with respect to can be calculated with function (14) as follows:

The factor matrix between multiple tensors is constrained using the soft constraint transfer method. When taking partial derivatives of the shared factor matrix, the solution of the factor matrix for the first tensor and the last tensor is somewhat different from that in the middle. Hence, the shared factor matrix is divided into these three different types of partial derivatives, i.e., the first, last, and middle.

With all the gradients of the factor matrix derived, the problem can be solved with any gradient-based method. The algorithm flow of the joint-filled nonlinear conjugate gradient with C related factors of these M relevant tensors is shown in Algorithm 1. Convergence is achieved when the relative change of the objective function is less than the set threshold. The algorithm terminates when the number of iterations reaches its maximum.

Input:
Output: factor matrices
1 = 0;
2 ;
3 ;
4 initialize ;
5 while or do
6 ,vectorized factor matrices;
7 calculate objective function using (13);
8 calculate the partial derivatives of with respect to using (14);
9 vectorized combination of ;
10 , update with nonlinear conjugate gradient and linear search;
11 , convert the vector to the factor matrices;
12 convergence analysis;
13 termination analysis;
14 ;
15 end
3.2. CTF-PSF

The CTF-AC model outlined above can further be evolved into a new model that deals with partially coupled datasets, i.e., CTF-PSF [35].

As shown in Figure 3, when heterogeneous datasets only share some components rather than all, methods based on objective function (4) may not be applicable. Without loss of generality, we take the coupled datasets of a tensor and a matrix as an example. Figure 3 shows that a third-order tensor and a matrix are coupled in the first dimension. However, suppose they have the same low-rank structures, i.e., the same number of . Let , , and be the factor matrices of extracted through a individual decomposition with components. Similarly, and are the factor matrices extracted from matrix using a matrix factorization with components. In partially coupled multisource datasets, the factor matrix derived from each source will not match exactly; i.e., and are likely to only share some columns. Further suppose that tensor and matrix have shared components and () unshared components. Then, the objective function (4) can be modified towhereand where and are the unshared columns of and , respectively, and are the shared columns. Thus, the objective function (15) can be further modified to

Here, the shared and unshared components are optimized separately. The unshared components of the tensors and matrix are updated using individual decompositions, and the shared components of the tensor and the matrix are updated using joint decompositions. Specific details of this optimization can be found in [35].

The pseudocode for CTF-PSF is shown in Algorithm 2. The algorithm terminates when the number of iterations and the number of function evaluations reach their respective maximums. The algorithm converges based on the relative change value and the two-norm of the gradient of all factor matrices divided by the number of entries in the gradient. The functions of and in Algorithm 2 denote individual and joint decompositions, respectively. First, each single dataset is decomposed individually to update the unshared columns (, ) in the matrix of shared dimension (line 7 to 17 in Algorithm 2). The others factor matrices (, , and ) and the shared columns () in the matrix of the shared dimension are updated through joint decomposition (line 19 in Algorithm 2). However, this does mean that the number of shared components needs to be determined in advance. To ensure proper modeling (line 20 Algorithm 2), there is a necessary adjustment step to combine , , and into , . It is worth noting here that CMTF-OPT is a special case of CTF-PSF when .

Input:
Output:
1 initialize where ;
2 = 0;
3 ;
4 ;
5 while or do
6 individual decompositions;
7 if then
8 if  iter = 0  then
9 update by initialization;
10;
11 ;
12 else
13 update by the result of last iteration;
14 ;
15 ;
16end
17end
18 joint decomposition;
19 ;
20;
21 convergence analysis;
22 termination analysis;
23;
24 end

4. Experiments

4.1. Experimental Design

We tested and verified the advantages of both CTF-AC and CTF-PSF through a series of comparative experiments with several baselines on both simulated data and three real-world datasets.

4.1.1. Baselines

The baselines roughly fall into two different categories: methods that jointly decompose multiple tensors, such as CMTF-OPT and ACMTF-OPT, and methods that individually decompose a single tensor, such as CP-WOPT. A description of each follows.(i)CMTF-OPT first vectorizes all the factor matrices and their partial derivatives so that problems can be solved using any gradient-based optimization algorithm.(ii)ACMTF-OPT is an advanced version of CMTF-OPT that includes additional constraints to allow analysis of more complex coupled data. It can also be used for missing completion. More details can be found in [16].(iii)CP-WOPT is an individual decomposition method for single tensors, it uses a first-order optimization method to solve the weighted least squares problem.

4.1.2. Performance Metrics

The performance of all methods, including CTF-AC and CTF-PSF, was evaluated according to the accuracy of missing completions. Hence, missing index tensors in the models were added to deal with incomplete data. The assessment metric is defined as the difference between the original and the estimated entries for missing values, known as a tensor completion score (TCS). TCS is defined aswhere is the initial tensor and indicates the datasets estimated by different methods. is a binary tensor of the same size as and represents the missing entries in with zeros to represent the missing data and ones to represent valid data. Obviously, the smaller the TCS value, the better the result.

In the CTF-AC experiments, we also used RMSE to measure the fitness of the observable values for each method, defined as

4.1.3. Real-World Datasets

Dataset1 (Dataset1 is available at http://www.models.life.ku.dk/3Dnosedata) contains data from an electronic nose sensor. It comprises structural data from readings on the smell of licorice given 18 licorice samples 241 times 12 sensors [36], including 6 good licorice samples, 6 bad licorice samples, and 6 fabricated bad licorice samples. These were mixed into three tensors, each with a dimension of 6 241 12 to test approximately coupled datasets with CTF-AC.

Dataset2 (Dataset2 is available at http://www.medinfo.cs.ucy.ac.cy) consists of 38 patients who underwent two MRI scans of different parts of the brain within the same year. One patient was randomly selected from 38 patients, and then we randomly selected two MRI brain scans of the same site as the two sources of data. Each scan has a size of 378 378. The two MRI images are shown in Figure 4. was also used to assess CTF-AC.

Dataset3 (Dataset3 is available at http://www.models.life.ku.dk/joda/prototype) contains 29 chemical mixtures, each comprising five chemicals measured using LC-MS (liquid chromatography-mass spectrometry) and NMR (nuclear magnetic resonance). NMR was able to detect all five component chemicals and the results can be formulated as tensors . LC-MS, however, only detected four components and therefore the results were formulated as a matrix . More details about these coupled datasets can be found in [16]. This dataset was used to assess CTF-PSF.

4.1.4. General Experimental Parameters

For all comparative experiments, each method was given the same termination conditions. The maximum number of iterations was set to (104), and the maximum number of function evaluations was set to (105). Additionally, the relative change in loss function values was set to and the two-norm of the gradient divided by the number of entries in the gradient was set to . The sparsity penalty parameters for ACMTF-OPT were set to ; i.e., .

4.2. Partially Coupled Data
4.2.1. Simulated Data

Experimental Set-Up. Tensor data and matrix data were generated according to the same technique in [32] using the following formulation:where is the factor matrix shared by both datasets and the other matrices ( and ) are the factor matrices for other dimensions of the tensor. denotes the factor matrix corresponding to the second dimension of matrix . denotes the number of shared components. The factor matrices , , are the unshared factors for each dimension of the tensor. and are the unshared factors of the matrix , and represents the number of the unshared components in each dataset. , , and correspond to , , and in Section 3.2, respectively.

All matrices, except for and , were generated randomly with entries drawn from a standard normal distribution. All matrix columns were normalized to a unit norm. Then, Gaussian noise was added to the tensor and matrix using , , respectively, where indicates that the noise levels, tensors , and matrix are the same size. All entries had a standard normal distribution. Finally, the simulated missing values were added to tensor according to a sampling ratio, denoted as SR.

In the first dimension, we used a tensor size of coupled with a matrix of . The factor matrices were generated and constructed as coupled datasets using (20). Let indicate the estimation rank of the coupled datasets. In individual decompositions, denotes the estimation rank of the individual datasets. In the experiments with CTF-PSF, the total number of shared and unshared components were set to four unless otherwise specified; i.e., the rank of each individual dataset was set to 4 (). The noise level and the sampling ratio of missing values (SR) were set independently for each comparative test. As previously mentioned, performance was evaluated according to the estimation accuracy of missing values as calculated by (18).

Numerical Results. Figures 5 and 6 shows the TCSs of all four methods for different SR at , , and when and , respectively. Figure 5(a) shows similar performance by the three joint decompositions based methods when . In Figure 5(b), we see that CP-WOPT, which is based on individual decomposition was relatively stable and ostensibly equivalent to the joint decomposition methods with a missing value ratio of less than 90%. However, Figure 5(c) shows that the joint decomposition methods performed well with many missing values, while individual decomposition produced completely inaccurate results. CTF-PSF considers the shared components and does not take the unshared components into account when . In other words, CMTF-OPT is a special case of CTF-PSF when the coupled datasets do not have any unshared components. Figure 6(a) shows that CMTF-OPT and ACMTF-OPT gave almost the same performance when . However, in contrast to the previous experiment, individual decomposition had certain advantages with a missing value ratio below , as shown in Figure 6(b). It is worth noting that since CTF-PSF considers both the unshared components and the shared components, this method performed almost as well as the methods based on individual decomposition. However, as can be seen in Figure 6(c), once the proportion of missing values reached , the TCS for the individual decomposition method rapidly increased, while CTF-PSF continued to provide good performance.

The influence of different numbers of shared components with the methods based on joint decomposition is shown in Figure 7. CTF-PSF and CMTF-OPT were tested with , , and at different missing value ratios. As shown, increasing the number of shared components helped to improve completion accuracy. This is mainly because the factor matrix provides more auxiliary information as the number of shared components increases. In addition, the advantages of CTF-PSF became more obvious as the number of shared components increased compared to CMTF-OPT.

Figure 8 shows the TCSs for the joint decomposition methods when and were simultaneously sampled at and SR(%) = . Here, . S-1 and S-2 represent and , respectively. The results show that our method still achieved good completion accuracy when every tensor and matrix contained at least some missing values.

4.2.2. Real-World Data

Recall that and in (described in Section 4.1.3) are partially coupled in terms of the constituent chemicals. These datasets have four shared components and has an unshared component. LC-MS data are often noisy and contain many irrelevant features. Therefore, noise can also be regarded as an unshared component of Y.

To compare the performance of different methods, was simulated with different proportions of missing values, and TCSs were evaluated for all baselines using joint decomposition, as shown in Table 2. CMTF-OPT preformed better than CTF-PSF with lower amounts of missing values (SR(%)), and ACMTF-OPT was superior to CTF-PSF when the missing value ration reached 80%. The reason for this is that CTF-PSF does not consider the weight of shared and unshared components. Unsurprisingly, CTF-AC did not perform as well as the other methods, including CTF-PSF, when faced with partially coupled datasets.

4.2.3. Discussion

Figure 9 shows the TCSs for the methods based on joint decomposition at , SR(%) = , , and . As indicated in the figure, the accuracy of these methods deteriorated as noise increased, particularly with higher proportions of missing values. CTF-PSF performed better than the other methods when , but not obviously so with high levels of noise (). This is because unshared components can be very helpful with data reconstruction when . The accuracy of these methods improved as the number of shared components () increased. And their performance was almost the same when , i.e., when all components are shared, where CMTF-OPT becomes a special case of CTF-PSF.

Figure 10 shows the TCSs for the joint decomposition-based methods for different numbers of estimated components. (i.e., , at SR(%) = 80, , and , ). The TCSs for CMTF-OPT and ACMTF-OPT improved as the number of estimated components increased, peaking at . However, despite CTF-PSF’s performance improvement until , there were no significant changes when . These results demonstrate that CTF-PSF is able to improve accuracy at relatively low estimation ranks over CMTF-OPT and ACMTF-OPT.

4.3. Approximately Coupled Data
4.3.1. Simulated Data

Experimental Set-Up. The multisource datasets we generated contained two types of shared data to simulate the different kinds of shared relationships found in reality. Without loss of generality, two third-order tensors are used as an example to explain the way the data was generated. Suppose the tensors and are two related tensors. The factor matrices for tensor derived through individual decomposition are , , and , and the factor matrices for tensor derived through individual decomposition are , , and . From these matrices, two simulated datasets were generated in two different ways:

Case 1. The unshared factor matrices , , , and and the shared factor matrix were randomly generated from the normal distribution. , where the function represents the random arrays generated from a specified distribution. Two third-order tensors were then formed based on the generated factor matrix and normalized to arrive at and . Then, Gaussian noise was added to the tensor, i.e., , where corresponds to the random noise tensor and is used to adjust the noise level. The same process applies to .

Case 2. The only difference between this case and Case 1 is that , where indicates the parameter related to the number of tensors and the function produces random arrays from a normal distribution.

The difference between these two datasets is the relationship between the factor matrices and . In Case 1, has a linear relationship to . In Case 2, the elements in and have normal distributions that satisfy the same variance with different mean values. Both these datasets approximate real-world data.

We use CMTF-OPT to realize the traditional multisource tensor decomposition, i.e., MTF. In the experiments with simulated datasets, we compared CTF-AC with a joint decomposition method (MTF) and an individual decomposition method (CP-WOPT). We set the dimensions of the tensor to 50 and the estimation rank to 5. The estimated rank for joint decomposition methods was set to = 10. is the estimated rank for CP-WOPT. All noise levels were set to = 0.1 and, again, the sampling ratio of missing values is denoted as SR.

Numerical Results. Table 3 shows the completion degree of the observed values and the completion accuracy of the missing values for both simulated datasets with a missing value ratio of SR(%) = . A missing value ration of SR(%) = means that the first tensor has no missing values, while 95% of the values in the second tensor are missing. Since the two tensors are the same size, but the amount of missing values is very different, the completions for the first tensor and the second differ greatly, i.e., 100 : 5. Table 3 shows the root mean square errors for the observable data in the first tensor (RMSE1) and the second tensor (RMSE2). TCS1 represents the missing value results for the first tensor, with TCS2 representing the second tensor. The results in Table 3 show that RMSE1 for MTF was similar to CTF-AC. In Case 2, CTF-AC’s RMSE1 (2.8270e-4) was actually larger than MTF (2.8111e-4). However, regardless of the type of dataset, the RMSE2 for MTF was larger than that for CTF-AC, especially in Case 1. This indicates that traditional models, like MTF, sacrifice completion accuracy with tensors that have a smaller number of observables to compensate for completion accuracy with tensors that have a larger number of observables. Therefore, the CTF-AC tensor completion score for the second tensor (TCS2) is better than the MTF under this circumstance. CTF-PSF was not included in the experiments with these simulated datasets given that both are approximately coupled. However, CTF-PSF is included in the following experiments with real-world datasets.

4.3.2. Real-World Datasets

Table 4 shows the TSCs for the four methods with the electronic nose dataset and missing value sampling ratios of SR (%) = . Here, MTF had better completion accuracy with small missing value ratios (TCS=0.0029 at SR(%)= 5), but this accuracy was significantly reduced on tensors with large ratios (TCS=0.1635 at SR(%) = 90). CTF-AC shows better TCSs with more missing values because, again, traditional methods, like MTF, sacrifice accuracy with fewer observables in favor of better accuracy with more. With a missing value ratio of 90%, CTF-AC’s TCS was half that of MTF. Adding the TCSs for all three missing value ratios, CTF-AC achieved better overall accuracy. CTF-PSF was less effective than other methods because it is specifically designed to address partial coupling and is not well-suited to approximately coupled datasets.

Table 5 lists the TCSs for the four methods on the brain MRI dataset with missing value sampling ratios of SR(%) = and x = . The bold results denote the best scores. Here, CTF-AC shows a significant advantage over MTF. Although there was no significant difference between the completion accuracy for CTF-AC and CP-WOPT with small missing value ratios, CTF-AC’s completion accuracy on the first image improved in comparison as the ratio increased because it borrows auxiliary information from the second image. Hence, CTF-AC’s TCS was superior to CP-WOPT with a high missing value ratio. Figure 11 shows the completion accuracy of the first MRI image for all methods when SR (%) = .

4.3.3. Discussion

Figures 12(a) and 12(b) show the completion accuracy comparison plots for each method on the two simulated multi-source datasets; one with balanced missing value ratios (SR(%) = ), the other with imbalanced ratios (SR(%) = ). With a balanced ratio, the number of observable and missing values in the tensors is roughly equivalent, while with imbalanced ratios, there is great disparity.

From Figure 12, we see that the TSCs for the weighted model, CTF-AC, and the traditional model, MTF, are similar with balanced observable and missing values. However, as the disparity increases, the disadvantages of traditional models becomes very obvious. Figure 12(a) shows an even higher TCS for CP-WOPT, while the completion accuracy for CTF-AC with tensors that contain a great many missing values remains high. To further investigate the impact of missing values with each method, we sampled the MRI dataset with missing value ratios of SR(%) = and x = and list the results in Table 6. With a balanced ratio (SR(%) = ), CTF-AC still performed better than the other methods, including CTF-PSF. Therefore, using a model that is completely shared or partially shared based on factor matrices may introduce some errors. Overall, these experimental results prove the validity and accuracy of the CTF-AC model.

Figure 13 shows the completion accuracy of CTF-AC, CTF-PSF, MTF, and CP-WOPT on the two images with an imbalanced ratio of SR (%) = . Unlike traditional joint decomposition methods (such as MTF), CTF-AC adds error weights to the objective function and sets a discriminant factor for both datasets. The discriminant factor reflects the correlations in the data to fit each tensor. Therefore, CTF-AC provides better completion accuracy than MTF with greater levels of missing values.

5. Conclusions

Jointly analyzing data from multiple sources has the potential to extract underlying data structures to enhance knowledge discovery. However, in data fusion, traditional coupled tensor factorization has been unable to deal with the diverse relationships found between multi-source datasets, such as approximate or partial couplings. Existing techniques are only appropriate for modeling exact couplings. Therefore, to address this challenge, we propose two improved coupled tensor factorization methods: one for approximately coupled datasets, CTF-AC, and the other for partially coupled datasets, CTF-PSF. CTF-AC is also suitable for multisource datasets with no dimension couplings if the data is highly correlated. CTF-PSF is an extension of CTF-AC, based on the CMTF-OPT algorithm, which factorizes datasets with both shared and unshared components by combining individual and coupled decompositions. Through numerical experiments, we demonstrate that the tensor completion accuracy of the proposed methods outperforms traditional coupled tensor factorization methods on datasets with approximate and partial couplings. However, there are some disadvantages to the proposed methods. These are highlighted because they provide opportunities for future research.

In future, our work will pursue several research directions: (i) overcoming CTF-AC’s increase in computational costs as a result of calculating the factor matrix constraint term in the objective function. We intend to rewrite this calculation into a parallel algorithm to improve overall operational efficiency. (ii) CTF-PSF is based on a predetermined number of shared components and it does not consider the weights of those shared and unshared components. Therefore, in future work, we will strive to make our framework more accurate and robust by adding more constraints.

Data Availability

The data sources are available from http://www.models.life.ku.dk/3Dnosedata, http://www.medinfo.cs.ucy.ac.cy and http://www.models.life.ku.dk/joda/prototype.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (with nos. 61702146, 61772163, and 61761136010) and Science and Technology Program of Zhejiang Province (no. 2018C04001). Previous work has accepted as a regular paper and published in International Joint Conference on Neural Networks (IJCNN 2018), and we have cited it as [35].