Abstract

Corticomuscular activity modeling based on multiple data sets such as electroencephalography (EEG) and electromyography (EMG) signals provides a useful tool for understanding human motor control systems. In this paper, we propose modeling corticomuscular activity by combining partial least squares (PLS) and canonical correlation analysis (CCA). The proposed method takes advantage of both PLS and CCA to ensure that the extracted components are maximally correlated across two data sets and meanwhile can well explain the information within each data set. This complementary combination generalizes the statistical assumptions beyond both PLS and CCA methods. Simulations were performed to illustrate the performance of the proposed method. We also applied the proposed method to concurrent EEG and EMG data collected in a Parkinson’s disease (PD) study. The results reveal several highly correlated temporal patterns between EEG and EMG signals and indicate meaningful corresponding spatial activation patterns. In PD subjects, enhanced connections between occipital region and other regions are noted, which is consistent with previous medical knowledge. The proposed framework is a promising technique for performing multisubject and bimodal data analysis.

1. Introduction

Corticomuscular activity modeling is important for assessing functional interactions in the motor control system, that is, studying simultaneous cortical and muscular activities during a sustained isometric muscle contraction. Traditionally, the most common method to assess the interactions between motor-related brain areas and the muscles is magnitude-squared coherence (MSC), which is a normalized measure of correlation between two waveforms or signals in the frequency domain. For instance, in monkeys, coherent oscillations in the 20–30 Hz band could be detected between cortical local field potentials and the rectified electromyography (EMG) from contralateral hand muscles that were modulated during different phases of a precision grip task [1]. In humans, similar findings were discovered in the beta-band corticomuscular coherence between magnetoencephalography (MEG) [2] and electroencephalography (EEG) [3] from the primary motor cortex during isometric contractions.

Although MSC has been popular in studying corticomuscular coupling, it suffers from several limitations. First, addressing the intersubject variability challenge to make a robust group inference is not straightforward with MSC because the exact frequency of maximum coupling may be inconsistent across subjects. Second, MSC emphasizes the role of individual locus in the brain in driving the motor system, while motor activity is known to be more distributed [4]. In fact, recent work has suggested that interactions between brain regions correspond more closely to ongoing EMG than activity at discrete sites [57]. Moreover, when the brain activity is measured by EEG, applying MSC directly to raw EEG and EMG signals normally yields a very low coherence value, because only a small fraction of ongoing EEG activity is related to the motor control [8]. This implies that extensive statistical testing is required to determine whether the EEG/EMG coherence is statistically significant.

Recently, several data-driven multivariate methods have been developed for analyzing biological data, and they seem to be appropriate for modeling corticomuscular activity because these methods explore dependency relationships between data sets. These methods include multiple linear regression, principal component regression, partial least squares (PLS), and canonical correlation analysis (CCA) [9]. Among these methods, the latent-variable- (LV-) based approaches, such as PLS and CCA, play a dominating role, probably due to the fact that the extracted LVs could help the biological interpretations of the results.

PLS, first developed for process monitoring in chemical industry, exploits the covariation between predictor variables and response variables and finds a new set of latent components that maximally relate to them [10]. An advantage of PLS is that PLS can handle high-dimensional and collinear data, which is often the case in real-world biological applications. PLS and its variants have been investigated in many medical applications, such as assessing the spatial patterns of brain activity in functional magnetic resonance imaging (fMRI) data associated with behavioural measures [11] and the common temporal components between EEG and fMRI signals [12]. In addition to the ability of handling high-dimensional and collinear data, PLS is sufficiently flexible that it can be extended to perform group level analysis and to accommodate multiway data [5].

CCA is commonly used to seek a pair of linear transformations between two sets of variables, such that the data are maximally correlated in the transformed space. Generally CCA is not as popular as PLS in practical applications [13]. This is probably because real-world data are usually high dimensional and collinear, and thus applying CCA directly to the raw data can be ill-conditioned. However, with some appropriate preprocessing strategies, CCA has been shown to be quite useful in many medical applications. For instance, in [14], Clercq et al. successfully removed muscle artifacts from a real ictal EEG recording without altering the recorded underlying ictal activity. In [15], Gumus et al. found that there were significant correlations at expected places, indicating a palindromic behavior surrounding the viral integration site. CCA can be extended to accommodate multiple data sets simultaneously [16].

Although PLS and CCA have been investigated in many medical applications, to the best of our knowledge no report has profoundly explored their underlying differences, compared their characteristic performances, and combined their advantages to overcome their drawbacks. For corticomuscular activity modeling, as we will elaborate more in Section 2, both PLS and CCA have their advantages and disadvantages, but perhaps more importantly, these two methods can be considered complementary. In this paper, we propose combining PLS and CCA to improve the performance of the joint LV extraction, and the proposed method is denoted as PLS + CCA. More specifically, the proposed PLS + CCA has a two-step modeling strategy: we first adopt PLS to obtain LVs across two data sets and then perform CCA on the extracted LVs. In the first step, PLS is performed for preliminary LV preparation. The aim of this step is to extract LVs which can most explain its own data set and meanwhile are well correlated with the LVs in the other data set. Besides, this step can also prevent the ill-conditioned problem when applying CCA directly to the raw data. In the second step, CCA is applied to the extracted LVs by PLS to construct the LVs by maximizing the correlation coefficients. With these two steps, it is ensured that the extracted components are maximally correlated across two data sets and meanwhile can well explain the information within individual data sets.

We will evaluate the performance of the proposed method on both synthetic data and real-world data. We first illustrate its performance using simulations, and we then apply the method to concurrent EEG and EMG data collected from patients with Parkinson’s disease (PD) and age-matched normal subjects when they perform a dynamic, visually guided tracking task. We note highly correlated temporal patterns between EEG and EMG signals and meaningful spatial activation patterns. While the proposed method is intentionally proposed for corticomuscular coupling analysis, it can also be applied to analyze other types of concurrent signals, including, but not limited to, fMRI, photoplethysmograph (PPG), electrocardiography (ECG), and kinematic data.

2. Materials and Methods

2.1. Methods

In this section, we first analyze the properties of PLS and CCA and demonstrate their complementarity. Based on this observation, we then propose combining the two approaches to have the PLS + CCA method. The two zero-mean data sets are stored in two matrices, the predictor matrix () and the response matrix (), where means the number of observations and and indicate the numbers of variables in corresponding matrices.

2.1.1. Partial Least Squares

PLS exploits the covariation between predictor variables and response variables and tries to find a new set of LVs that maximally relate to them [13]. In other words, the covariance between the extracted LVs should be maximized as where ’s are the weight vectors. A typical PLS can be implemented by the classical NIPALS algorithm [10]. Also, an alternative calculation way is to perform eigenvalue-eigenvector decomposition [17]. Therefore, the maximum of (1) is achieved by having and as the largest eigenvectors of the matrices and , respectively. To obtain subsequent weights, the algorithm is repeated with deflated and matrices. The detailed calculation procedure can be found in the appendix.

The number of components to be extracted is a very important parameter of a PLS model. Although it is possible to extract as many PLS components as the rank of the data matrix , not all of them are generally used. The main reasons for this are the following: the measured data are never noise-free and some small components only describe noise, and it is common to ignore small components because of the problems of collinearity. Therefore, appropriate measures are needed to determine when to stop. Typically, the number of components needed to describe the data matrices is determined based on the amount of variation remained in the residual data [10].

2.1.2. Canonical Correlation Analysis

Different from PLS, CCA is to find linear combinations of both and variables which have maximum correlation coefficient with each other. This leads to the same objective function but different constraints compared with (1): where ’s are the weight vectors.

The solutions to this problem are the largest eigenvectors of the matrices and , respectively. The subsequent weights are the eigenvectors of the same matrix in the order of decreasing eigenvalues. The predictor LVs can be calculated directly from the original matrix as , the columns of which are uncorrelated with each other. The detailed derivation is shown in the appendix. However, the solution depends heavily on whether or not the covariance matrix is invertible. In practice, it is possible to have rank so that the invertibility cannot be satisfied, and directly applying eigenvalue decomposition in the raw data space may lead to the ill-conditioned problem. Therefore, some appropriate preprocessing strategies are needed in practice before applying CCA.

2.1.3. The Combined PLS + CCA Method

Based on the discussion above, we can see that the fundamental difference between PLS and CCA is that PLS maximizes the covariance while CCA maximizes the correlation. The objective of PLS is to construct LVs which could most explain their own data set and meanwhile are well correlated with the corresponding LVs in the other set. In other words, the first priority of PLS is to find the LVs which can explain significant proportion of variance in each data set, and the second priority is to find the LVs with relatively high correlation coefficients between the two data sets. In contrast, the only objective of CCA in the construction of LVs is to maximize their correlation coefficients with the LVs in another data set. From this point of view, the LVs extracted by PLS are able to represent major information for individual data sets while the ones extracted by CCA may be trivial (e.g., noises with similar patterns) even if their correlation coefficient is maximum. This is an advantage of PLS over CCA. Besides, PLS can handle high-dimensional and collinear data, which is often the case in real-world biological applications, while applying CCA directly to the raw data may be ill-conditioned. However, we should note that our goal is to find the relationships between two data sets, not just to explore the information within individual data sets. It is possible that a higher covariance merely results from the larger variance of LVs, which may not necessarily imply strong correlations. To overcome this, CCA is a powerful tool to ensure that the extracted LVs have similar patterns across the data sets.

For corticomuscular activity modeling, the coupling relationships between EEG and EMG signals are what to be explored. In practice, EEG and EMG signals can be contaminated by other types of signals and are never noise-free. In addition, the signals from adjacent channels generally are similar, which leads to collinear data. By employing PLS, we can deal with the collinear EEG/EMG data sets and extract significant LVs, but it cannot guarantee that the corresponding LVs are highly correlated with each other. With using CCA, we can extract highly correlated LVs from EEG and EMG signals, but it cannot ensure that such LVs are nontrivial and we may face the ill-conditioned problem.

For corticomuscular coupling analysis, both PLS and CCA have their advantages and disadvantages, but perhaps most importantly, these two methods can be considered complementary. It is natural for us to think of combining PLS and CCA to form a two-step modeling strategy. In the first step, PLS is performed for preliminary LV preparation. The aim of this step is to extract LVs which can most explain its own data set and meanwhile are well correlated to the LVs in another data set. In this case, the trivial and irrelevant information across data sets could be removed. Besides, this step can also prevent the ill-conditioned problem when applying CCA directly to the raw data. In the second step, CCA is applied to the prepared LVs by PLS to construct the LVs by maximizing the correlation coefficients. After these two steps, it is ensured that the extracted components are maximally correlated across data sets and meanwhile can well explain the information within each individual data set. The details of the proposed PLS + CCA method are given in the appendix, and the specific implementation procedure is shown in Algorithm 1.

Input: two data sets (with size ) and (with size )
Output: corresponding LVs matrices , and
The First Step:
Solve the eigen decomposition problems:
     and .
Determine and , the numbers of LVs extracted, corresponding to
    the above two problems by the ratio of explained variance.
Determine the final number of LVs: .
Set .
Initialize both LVs matrices to be empty, that is, and .
        while     do
      Set and to be the largest eigenvectors of the matrices
         and , respectively.
      Calculate the LVs as and .
      Set and .
  Deflate by subtracting the effects of the LV from the data space:
         .
  Deflate   by subtracting the effects of the LV from the data space:
         .
  Let .
  end while
The Second Step:
Solve the following eigen decomposition problems:
         and
         .
Set and to be the associated eigenvectors, respectively.
The recovered LVs and can be calculated by
         and .

2.2. Synthetic Data

In this simulation, we applied the proposed method to synthetic data and also reported the results of the PLS and CCA approaches for comparison. As an illustrative example, without loss of generality, four sources were generated and analyzed for each data set.

The following four source signals were considered for the data set : where denotes the time index vector, valued from 1 to 1000, and ’s represent four simulated sources, as shown in Figure 1(a). Note that here ’s are column vectors.

Also, four source signals were considered for the data set : where the notations are similarly defined. The four simulated sources are shown in Figure 1(b).

Two mixed data sets and were generated as follows with each row denoting one observation in their respective data space: where and with

The patterns of the corresponding sources are similar across the two data sets, representing common information. However, from (3) and (4), we can see that there are some time-shifts between corresponding source pairs, and their correlation coefficients are given in Table 1. The first pair sources have the lowest CC, but in the mixed data sets we intentionally assign the highest weights to this pair of sources, as shown in the mixing matrices and . This pair can represent the major information within individual data sets but cannot reflect too much the coupling relationships between the two sets. The second and third pairs have relatively high CCs and moderate weights in the mixed data sets. These two pairs generally not only contain the major information within individual data sets, but also represent the coupling relationships across data sets. The fourth pair sources have the highest CC, but we assign the smallest weights. Although this pair sources have the highest CC, they do not represent significant information due to the small weights. Generally, they could be regarded as trivial information. Moreover, different white Gaussian noise with 10% power was added to each source in each data space.

2.3. Real Data

In many medical applications, the results of analyzing one subject’s data cannot be generalized to the population level because of the intersubject variability concern. Therefore, it is necessary to recruit a proper number of subjects to perform a group analysis in many medical applications. For modeling the corticomuscular activity, we apply the proposed method to concurrent EEG and EMG signals collected from normal subjects and patients with PD during a motor task.

2.3.1. Data Collection

The study was approved by the University of British Columbia Ethics Board, and all subjects gave written, informed consent prior to participating. Nine PD patients (mean age: 66 yrs) were recruited from the Pacific Parkinson’s Research Centre at the University of British Columbia (Vancouver, Canada). They all displayed mild to moderate levels of PD severity (stages 1-2 on the Hoehn and Yahr scale) and were being treated with L-dopa medication (mean daily dose of 720 mg). All PD subjects were assessed after a minimum of 12-hour withdrawal of L-dopa medication, and their motor symptoms were assessed using the Unified Parkinson’s Disease Rating Scale (UPDRS), resulting in a mean score of 23. In addition, eight age-matched healthy subjects were recruited as controls. During the experiment, subjects seated 2 m away from a large computer screen. The visual target was displayed on the screen as a vertical yellow bar oscillating in height at 0.4 Hz. Subjects were asked to squeeze a pressure-responsive bulb with their right hand. The visual feedback representing the force output of the subject was displayed as a vertical green bar superimposed on the target bar as shown in Figure 2. Applying greater pressure to the bulb increased the height of the green bar, and releasing pressure from the bulb decreased the height of the green bar. Subjects were instructed to make the height of the green bar match the height of target bar as closely as possible. Each squeezing period lasted for 15 seconds and was followed by a 15-second rest period. The squeezing task was performed twice. The force required was up to 10% of each subject’s maximum voluntary contraction (MVC), which was measured at the beginning of each recording session.

The EEG data were collected using an EEG cap (Quick-Cap, Compumedics, TX, USA) with 19 electrodes based on the International 10–20 system. The EEG data were sampled at 1000 Hz using SynAmps2 amplifiers (NeuroScan, Compumedics, TX, USA). A surface electrode on the tip of the nose was used as ground. Ocular movement artifacts were measured using surface electrodes placed above and below the eyes (Xltek, ON, Canada). Data were later processed by a band-pass filter (1 to 70 Hz) offline and downsampled to 250 Hz. Artifacts associated with eye blinks and muscular activities were removed using the Automated Artifact Removal in the EEGLAB Matlab Toolbox [18]. The raw time sequences of the electrodes were then normalized to have zero-mean and unit variance. For subsequent analysis, data collected during the squeezing periods were concatenated in time into a single matrix for each individual subject. Data from the rest periods were excluded from the analysis.

The EMG signals were recorded using self-adhesive, silver, silver-chloride pellet surface electrodes with 7 mm diameter. A bipolar montage was used with a fixed interelectrode distance of 30 mm. The surface EMG signals were simultaneously collected together with the EEG signals and were amplified and sampled at 1000 Hz. To be consistent with the EEG preprocessing, the EMG signals were downsampled offline to 250 Hz, and only the squeezing periods were used for subsequent analysis.

2.3.2. Feature Extraction

In most existing studies, the analysis for corticomuscular coupling is performed directly on the raw EEG and EMG data. This typically yields quite small correlation values. Nonetheless, with appropriate preprocessing steps, highly correlated EEG and EMG feature(s) can be extracted from the raw signals. In this work, we examine the coupling relationships between time-varying EEG features and amplitudes of the EMG signals, constituting and , respectively, for each subject (for ). We have a total of subjects ( in this study). To achieve a group analysis, all subjects’ data sets are concatenated together as with the assumption that all subjects share common group patterns in the temporal dimension [19].

EEG Features. Pairwise Pearson’s correlations [20] are considered in this study. Pearson’s correlation measures the dependency between a pair of EEG signals and in the time domain as follows: where and are the sample means of and . In this work, we calculate the time-varying pairwise correlations between EEG channels, using a Hamming window with length 300 and with a 95% overlap. Therefore, the raw EEG information can be represented by a two-dimensional matrix with size , where the rows correspond to the samples at different time points and the columns correspond to the features, that is, pairwise correlations between the EEG channels.

EMG Features. An individual EMG channel signal can be considered as a zero-mean, band-limited and wide-sense stationary stochastic process modulated by the EMG amplitude, which represents the overall muscle activity of individual underlying muscle fibers [21]. While different techniques have been proposed for accurate amplitude estimation, in this study, we employ the root-mean-square (RMS) approach to calculate the EMG amplitude of short-duration EMG signals : A moving window with length and a 95% overlap is applied here, the same as in the EEG feature calculation, to ensure that the obtained EEG and EMG features are temporally aligned and matched.

In the above setting, for each subject (for ), and represent the time-varying feature matrices of EEG and EMG, respectively. The length of the time sequences here is 480 associated with the 300-length moving window and a 95% overlap. For the EEG correlation feature, since we have 19 EEG channels based on the International 10–20 system, thus there are a total of correlation connections. Therefore, is of size . For the EMG amplitude feature, since there are three surface EMG channels, is of size .

2.3.3. Significance Assessment

To determine the statistical significance levels of the extracted LVs, we employ a nonparametric permutation test [22] in which the temporal order of EEG features is uniformly permuted for all subjects while keeping the EMG features intact. Two hundred random permutations are generated. The proposed PLS + CCA method described in Section 2.1.3 is applied to each of these permutations. The correlation coefficients among the extracted temporal patterns from permuted EEG features and unchanged EMG features are then calculated to form an empirical null distribution. The value of the original EEG/EMG correlation coefficient is then computed from the null distribution as the proportion of sampled permutations whose correlation coefficients are greater than or equal to the original correlation coefficient. The components with value being less than 0.05 are considered to be statistically significant, denoted as and , both with size , where means the number of significant components.

2.3.4. Spatial Pattern Extraction

Our goal is to investigate the differences in spatial patterns of EEG channels between the normal and PD patient groups when the subjects perform a motor task. After the identification of significant temporal patterns, we can regress the EEG-related components back onto the EEG features (for ) for each subject as follows: where is the th column of and is the spatial pattern of the th component for subject . In addition, we also want to determine which EEG features in the spatial patterns have significant contributions to the corresponding temporal patterns. This is done by identifying EEG features that have weights statistically different from zero. To determine the group-level spatial pattern, for each significant component, we apply a two-tailed -test to each element of the spatial patterns of all subjects with each group.

3. Results and Discussion

3.1. The Synthetic Data Case

The extracted components using PLS, CCA, and the proposed PLS + CCA methods are shown in Figures 3, 4, and 5, respectively. The LVs extracted by PLS are automatically ordered in terms of their significance. To some extent, the LVs successfully reflect the corresponding relationships of the underlying sources between and . However, compared with the original sources, the extracted LVs are distorted, suggesting that a higher covariance may merely result from the larger variance of LVs, which may not necessarily imply strong correlations. We can see that CCA can recover the original sources accurately in both data spaces, and the LVs are ordered strictly according to their correlation coefficients, but it completely ignores the influence of the variance and thus the extracted LVs may only reflect trivial information of the data sets (e.g., the 1st LV). For instance, although the first pair of LVs has the highest correlation coefficient, they do not contain major information of the data spaces. In practice, such LVs generally represent the noises with similar patterns simultaneously coupled into the two data modalities. When jointly modeling the data sets, they should be removed. We also note that PLS only extracts three LVs since they are sufficient to describe the data sets. These LVs do not include the first pair recovered by CCA due to their triviality. The above observations motivate us to employ the proposed PLS + CCA method.

When the proposed method is employed, the dominant sources which make significant contributions to both data spaces are first identified and ordered in terms of covariance. At the same time, trivial information is removed. Then, within the extracted major information, sources that are highly correlated are accurately recovered with the focus on correlation. In this case, it is ensured that the extracted LVs are maximally correlated across two data sets and meanwhile can well explain the information within each individual data set.

3.2. The Real Data Case

In this case study, we applied the proposed method for corticomuscular activity modeling to the EEG and EMG features generated using the procedure described in Section 2.3.2 from 8 normal and 9 PD subjects simultaneously. The joint modeling of normal and PD data allows the identification of common temporal patterns across the groups. Meanwhile, the spatial patterns may be different across subjects, from which we could identify specific correlation connections that are differently recruited by PD subjects during the motor task.

Using the permutation test, two components were deemed significant (Figure 6). Note that in the figure only connections whose weights are statistically different from zero are shown. The results based on real data from PD and normal subjects performing a dynamic motor task are promising. In the past, most EEG/EMG coupling studies have compared EEG activity at a specific locus (e.g., sensorimotor cortex contralateral to the hand performing the task) with the EMG during sustained contractions. However, we found that in normal subjects, correlations between the contralateral sensorimotor cortex and other regions are closely associated with ongoing EMG features during dynamic motor tasks (Figure 6). It is likely that the dynamic nature of the task might require the recruitment of additional regions such as frontal regions for motor selection [23], contralateral (i.e., ipsilateral to hand movement) sensorimotor cortex for fine modulatory control [24], and occipital regions for posterror adaptations [25].

From Figure 6, we note similar connections between the PD and control groups, especially when comparing connections associated with each component, but we also note significant differences when comparing the PD and control groups. It is noted that PD subjects have increased connectivity between the frontal regions and central and sensorimotor cortices, compared with control subjects. This may reflect the enhanced effort required by PD subjects for motor task switching [26], a problem common in this PD population [27]. In addition, PD subjects have a significant connection between the left sensorimotor and occipital regions, that is, not present in the control group. We note that the connections with occipital regions are prominent in PD subjects. Compared to normal subjects, the PD subjects heavily rely on visual cues for the initiation [28] and ongoing control of movement [29]. Moreover, the increased intra- and interhemispheric connections observed in the PD subjects are consistent with the findings in previous MEG studies [30].

4. Conclusions

In this paper, we combine the advantages of PLS and CCA and propose a PLS + CCA method to improve the performance of the joint LV extraction. We illustrate the performances of the proposed approach using both synthetic data and real-life data. For corticomuscular activity modeling, we note highly correlated temporal patterns between EEG and EMG signals and meaningful spatial activation patterns. The proposed method is a promising analysis technique for multisubject and bimodal data sets, including, but not limited to, fMRI, PPG, ECG, and kinematic data.

Appendix

A. The Derivation of the Algorithm

In this appendix, we show how to mathematically derive the solution of the proposed PLS + CCA method.

A.1. The First Step: PLS

The cost function of PLS is as follows (the same as (1) in Section 2.1.1): where ’s are the weight vectors.

By employing the method of Lagrange multipliers, we rewrite the initial cost function as where ’s are Lagrange multipliers.

Now we only present the detailed derivations regarding since can be similarly derived. Taking the derivatives of with respect to and and setting them to be zero, we have Left multiplying both sides of (A.3) by , we have According to (A.4), can be calculated as Through the similar procedure, and can be easily derived as Substituting (A.8) into (A.3) and (A.7), respectively, we have the following two expressions: By substituting (A.10) into (A.9), we can formulate an eigenvalue-eigenvector decomposition problem: Similarly, we can have the formulation for as

The above solutions are straightforward. A practical issue is to determine the number of LVs. In our study, we determine the number by setting a threshold that corresponds to the ratio of explained variance (e.g., 95%). Therefore, the corresponding LVs in and can be calculated by where is composed of the first eigenvectors associated with (A.11) and the columns of represent the components extracted from . and are similarly defined.

However, the collinearity problem may exist in the LVs calculated through the above procedure since each data set is used repetitively for each LV’s calculation. The extracted LVs are not necessarily uncorrelated to each other. To effectively implement the second step and avoid the ill-conditioned problem, we need to address this uncorrelatedness concern and thus we design a deflation procedure: before extracting the second common LV in each data space, and are deflated by their corresponding first LVs as follows: Then the above procedure will be repeated for the further extraction of common LVs. In this way, the following new LVs are uncorrelated to the previous ones.

The purpose of this step is to extract LVs which can most explain the individual data sets and meanwhile are well correlated to the LVs in another data set. With this step, trivial and irrelevant information across data sets could be removed. However, a higher covariance may merely result from the larger variance of LVs, which may not necessarily imply strong correlations. To address this concern, the 2nd step will help further refine the results.

A.2. The Second Step: CCA

Based on the extracted LVs in the first step, the objective function of CCA can be constructed as follows: where ’s are the weight vectors.

By employing the method of Lagrange multipliers, we rewrite the initial objective function as where ’s are Lagrange multipliers. Similar to the derivation in the first step, we can obtain the following eigenvalue-eigenvector decomposition problem: Similarly, for , we have The solutions to this problem are the largest eigenvectors of the corresponding matrices. The recovered LVs and can be calculated directly from the matrices and by where is composed of the eigenvectors associated with (A.17) and the columns of represent the components extracted from . and are similarly defined.

After these two steps, it is ensured that the extracted components and are maximally correlated across data sets and meanwhile can well explain the information within each individual data set.