Abstract

Electrocardiogram (ECG) data classification is a hot research area for its application in medical information processing. However, insufficient data, privacy preserve, and local deployment are still challenging difficulties. To address these problems, a novel personalized federated learning method for ECG classification is proposed in this paper. First, a global model is trained with federated learning framework on multiple local data clients. Then, we use the global model and private data to train the local model. To reduce the feature inconsistency between global and private local data and for better fitting the private local data, a novel ”feature alignment” module is devised to guarantee the uniformity, which contains two parts, global alignment and local alignment, respectively. For global alignment, the graph metric of batch data is used to constrain the dissimilarity between features generated by the global model and local model. For local alignment, triplet loss is adopted to increase discriminative ability for local private data. Comprehensive experiments on our collected dataset are evaluated. The results show that the proposed method can be better adapted to local data and exhibit superior ability of generalization.

1. Introduction

Statistics of WHO report that heart disease is the most lethal chronic disease. Nearly 17.7 million people die of cardiovascular disease every year, which accounts for 31% of the total deaths in the world [1]. Electrocardiogram (ECG) is a physiological signal which is widely used in heart health monitoring. It contains many pathological information related to heart activity, and it is an effective way for monitoring and diagnosis of cardiovascular disease [2]. ECG is a long series of data and can be lasted for days, so it is very time and label consuming for monitoring and diagnosis with human experts. Therefore, it is necessary to use artificial intelligence technology for automatic cardiovascular disease diagnosis.

Using the advanced machine learning technology especially deep learning, the cardiovascular recognition model can be trained with labeled ECG data, i.e., the method proposed in [3] achieves 0.837 F1 value for 12 kinds of cardiac irregularities. However, most state of the art methods are based on public available training datasets, which are relatively small and with limited varieties. Moreover, they are very difficult to deploy in practical applications.

In order to expand available cardiovascular information and guarantee the privacy, data from multiple medical institutions can be combined as a unified dataset, which can be used to train a superior global model with federated learning framework [4]. Federated learning is a special machine learning model using datasets that are distributed across multiple devices while preventing data leakage. It is also a privacy-preserving decentralized collaborative learning technique [5]. There have been works that adopted federated learning for medical data processing and model training [6, 7].

In this way, a centralized global model can be trained based on data from a large number of local nodes. Then, the global model is deployed to the local model for data prediction. However, there is a big challenge for which local models will get different performance by using the same global model. This is because data distributions of local clients vary with the global trained data distribution. Therefore, personalization of the global model for each client becomes necessary to overcome the problem posed by heterogeneity of data distribution [8].

Device in terms of storage, computation, and communication capabilities will generate heterogeneous data. For ECG, sampling frequency and duration are different based on parameters setting and environment, which leads to nonuniform of pacing signal, left ventricular high voltage, and other diagnostic patterns. When there is large difference between data distributions of global server and local clients, it is hard to directly measure the feature inconsistency between them. Hence, this makes it difficult to deploy the global ECG classification model to local client with acceptable performance.

In order to solve these problems, a novel personalized federated learning framework for ECG data classification is proposed. First, a global ECG classification model is trained with a typical federated learning method across multiple local clients. Then, the global model is inherited to local model and served as the backbone part. During local model personalized training, the inherited part model is fixed. The local model is personalized based on the proposed feature alignment module. To utilize the generalization of the global model, we form the features of batch data into graph representation, so the internal structure between nodes can be preserved. The graph distance between global feature and local feature is used for global alignment constrain. From another point of view, local alignment is used to make the model better adapt to local private data. The metric learning method is adopted, and a triplet loss is designed to make data point of same class close to each other and negative data point far away. Finally, these loss functions are combined together for model training. By the proposed method, the consistency between global and local data can be learned, and the local personalized model is built with better adaptability and generalization. As far as we know, there are no related research studies for problems of personalized ECG classification. The main contributions of this paper are two folds:(i)In order to reduce the difference between global and local data, a novel feature alignment module is designed. With graph constraint and triplet metric, it can make the local model with better adaptability and generalization.(ii)Extensive experimental evaluations are carried out, and performance analyses are reported from multiple aspects.

The rest of this paper is organized as follows. Section 2 gives the related works. Section 3 describes the methodology. Experimental evaluation and analysis are given in Section 4. Section 5 concludes this paper.

Related works are introduced in this section, including ECG data classification method, federated learning framework, and personalization in federated learning.

2.1. ECG Classification Method

With well-designed feature representation, ECG classification can be realized by models based on Bayes, K-means, Decision tree, and Linear Discriminate classifiers [911], along with commonly used optimization techniques [12, 13]. Features like cycle and higher order of QRS wave were extracted in reference [14]. Then, a fuzzy neural network was trained as classifier. Wavelet transform was used for feature extraction in reference [15]. Reference [16] adopted support vector machine for ECG classification.

A convolutional neural network method suitable for multilead ECG data was proposed in reference [17]. The stacked denosing autoencoder recognition model was used in reference [18] for ECG data. In reference [19], deep belief networks were used to construct the model for arrhythmia diagnosis. The convolutional neural networks model was used in reference [20] for heart beat classification. In reference [21], the deep factor decomposition method was used to decrease the influence of complex noise on ECG signal. Deep autoencoder was first used for singal denosing and reconstruction, and then fully convolutional network was trained as classifier. In reference [22], nonnegative matrix factorization was used for data dimension reduction and feature was extracted with sparse representation. Feature representation with multiple scales was proposed in reference [23], and progressive decisions were fused for final classification.

2.2. Federated Learning

In order to utilize massive distributed data storage and keep the privacy, federated learning is a useful framework to provide efficient training from data island and make model collaboration [4]. In reference [24], local models are trained at each local node, and only the updated parameters set is shared for global training. A multitask-based federated learning method was proposed in reference [25], which can solve the problem of high communication cost and fault tolerance. In reference [26], a safe client-server was first constructed. Data were allocated according to different local users. A homomorphic encryption method was designed for model parameter aggregation so as to improve server security [27]. A differential privacy method for federated learning was introduced in reference [28]. It provides protection for client data by hiding the customer’s contribution during training. A ternary quantization method was proposed in reference [29], which optimizes the quantized networks and reduces lots of redundant parameters and excessive communication costs. To address problems of unlabeled and unannotated on-device data, reference [30] used a deep temporal neural network to train an auxiliary task by optimizing a contrastive objective with multiview strategy on diverse data sets.

2.3. Personalization in Federated Learning

Federated learning can be used to train a global model by utilizing distributed local data. However, for different local clients, the benefits they get from global model may vary greatly for various data distributions. To cope with non-IID data distributions of clients, personalized federated learning has been proposed to improve performance of each local client by training a personalized local model. Reference [31] reported that local models’ performances were hard to improve during federated learning, which may even worse than training only using local data. Therefore, it is important for model personalization according to specific local node. In reference [32], local nodes were first clustered, and each group models were training separately. In reference [33], part parameters of the global model were copied to the local model, and then local model was fine-tuned using local data. The metalearning method was adopted in reference [34], which treated model personalization as a meta testing procedure. In reference [35], there was a balance between global and local models. Global and local models were combined for final classification. In reference [36], the local model and global model were considered as two experts, and the personalized model was trained by mixed output of the personalized model and global model. Similar work was studied in reference [37]. Reference [38] introduced an attentive message passing mechanism to facilitate the collaboration effectiveness between clients.

3. Methodology

In this section, the proposed personalized federated learning for ECG classification based on feature alignment is described. Figure 1 gives the main framework of the proposed method. First, a global model is constructed with a typical federated learning framework from multiple local clients. Then, for each client , we train a personalized local model with private dataset. The local model contains three parts, including , , and . is inherited from the global model . is a convolution neural networks module for local feature extraction and representation. is a fully connected layer or softmax layer to form the final classifier. During local model training, is fixed to maintain the generalization ability of the global model. Specially, two alignment modules, global alignment and local alignment, are designed to constrain the feature distribution between local model and global model, which are realized by constraints of graph representation of feature generated by and , along with metric learning for intraclass and interclass losses. Finally, global alignment loss , local alignment loss , and cross-entropy loss are all incorporated to optimize the objective function. Details are described in the following subsections.

3.1. Global Model Training

In order to make the best use of distributed data and privacy guarantee, federated learning is a popular way for model training. In the first step, the global model is trained with a most widely used federated learning framework FedAvg [39], which is synchronous update for each communication round.

Figure 2 demonstrates the framework for global model training in our work. There are clients, and each contains a local dataset and a local model . Initially, the global server sends the model parameters to all clients. Then, each client performs local model training based on local data and then sends parameter update back to the server. The server collects all local updates, and the global model parameters are optimized, and this process is repeated for multiple times. For efficiency, the global model can be updated when part of local updates is collected.

For client , its model parameter is trained with local data . Equation (1) gives the loss function , which is the mean value of loss () for all training data. is the total number of data in . Then, equation (2) gives the minimize objective by adjusting with SGD and BP method.

When clients finish their training, the server updates the global model parameter by averaging all client model’s parameters . As shown in equations (3) and (4), is the number of clients and is the weight threshold for each model.

is distributed to all clients after one iteration, and each client uses as the base model to make further training using equations (1) and (2). After multiple iterations, the global model can be trained with optimal performance. Moreover, other federated learning frameworks can also be adopted for global model training.

3.2. Local Model Training

The global model is trained with federated learning over distributed data in the last subsection. However, cannot be directly deployed to local clients for local data inference when there is large difference between data distributions of global server and local clients. In this subsection, a personalized model adaptation method is designed based on the global trained model and private data of a specific client. The local model contains 3 main components, . is inherited by the global trained model , which serves as the backbone and is fixed during local model training. and are used for local feature representation and final classifier of the local model.

For better fitting the local private data, a special constraint strategy “feature alignment” is devised to guarantee the uniformity between global and local models. The alignment module is further divided into two parts, global alignment and local alignment, which are described as follows.

3.2.1. Global Alignment

A global alignment module is first designed to constrain the local model training by constructing the structure between feature nodes. Different from other methods, the batch training data are used to from a graph structure, which can represent the relationship between data node. In this way, effect of single data feature shift can be reduced and relation between data nodes is retained. As shown in Figure 3, for training batch data samples in a private dataset, its global feature and local feature are extracted through models and . Then, the batch training data are treated as the basic group for global alignment operation.

The features of samples are then used to construct graph representation using and . Suppose there are samples in a training batch, then the graph representation contains nodes and edges. The nodes of graph are represented with the features of samples in a training batch, and the edges of graph are represented by the distances between nodes. and are used to denote the graph representation of batch data by global and local models, respectively.

In order to measure the similarity between two batch data, matrix format is used to represent the graph structure representation. In this paper, only edges between nodes are incorporated for representation. It is hypothesis that the internal skeleton between nodes of a graph is more important and should be learned from the global model. If the node feature is used, then there will be strong probability that the local model is more like the global model. The white and yellow matrices in Figure 3 are used to denote and , which are formulated as equations (5) and (6). represents the edge between two nodes. and are two training samples in .

Equation (7) gives the distance metric for two graph representations of a given batch data . means a distance computation method, and denotes the size of batch data. Basically, all edges of two feature graph representations are compared.

3.2.2. Local Alignment

For model personalization with local private data, a local alignment module is also designed, which aims to increase the classification performance for local private data.

As shown in Figure 4, for a sample training batch data in a private dataset, its local feature is extracted through models . Then, the batch training data are used as the basic group for local alignment operation. Using metric learning method, we try to decrease distance between features from the same class; otherwise, the distance is increased. A training sample is randomly selected from , which is denoted as . It is called anchor sample. Then, another two samples are selected. One sample has the same label with anchor, and the other sample has different label. These three samples constitute a triplet , where denotes the positive sample and denotes the negative sample. Through training, it is expected to decrease the distance between and and increase the distance between and . The constraint is shown in the following equation:where is the threshold for minimal distance and is the triplet set.

3.3. Training Objective Function

The final loss function of our proposed model contains three parts, global alignment loss , local alignment loss , and cross-entropy loss , respectively.

Global alignment loss is given in equation (9), which is inherited from equation (7). means the number of batch data in all training dataset and indicates the index of batch data.

Local alignment loss is given in equation (10), which is on the basis of equation (8). “+” means that function value is 0 when content in [ ] is smaller than 0; otherwise, it is the normal loss function value. is the triplet set for batch data.

Cross-entropy loss is given in equation (11). is the standard cross-entropy function. and are the same as equation (8). is the ground truth label of batch data , and is the output value of the local model.

The final loss function is a combination of , , and , as shown in equation (12). , , and are weighted hyperparameters. is used to update the parameter of and modules in the local personalized model:

4. Experimental Evaluation

In this section, dataset description and experiment setting are first given. Then, the performance of the proposed method for personalized ECG classification is evaluated in terms of various environments.

4.1. Dataset and Experiment Setting

As there are no research studies on personalized federated learning for ECG classification, we construct a specific circumstance to evaluate the proposed algorithm. We collect about 120,000 ECG data from 8 hospitals. Table 1 gives the detail description of dataset. Six symptoms such as sinus rhythm, sinus arrhythmia, sinus tachycardia, sinus bradycardia, T-wave alternans, and normal are selected, which are the commonest types and data rich. There are 20201, 18581, 13682, 15854, 14211, and 38524 for types of sinus rhythm, sinus arrhythmia, sinus tachycardia, sinus bradycardia, T-wave alternans, and normal, respectively. Data distributions of each medical institution are also listed in Table 2.

In our research, each medical institution is corresponded to a local node, and these local nodes provide private data for global training. Hence, the federated learning circumstance is set up.

4.2. Base Model Training

The server (global model) employs the FedAvg to train the model globally whereas each local client updates its model locally after successive global aggregations using the SGD style algorithm. The CNN model with -34 is used as the base network structure for both global model and local model.

Each experiment is run for 100 global aggregations, with e = 4 epochs for SGD between successive global aggregations. The constant learning rate of 0.01 is used across global aggregations and clients.

Table 3 gives the classification result of each local node after federated learning. Average classification rate is used as the metric in our work. There are two node types, local and global. The global model has a classification rate of 82.6%. For local nodes, there are two evaluations. Local nodes 1 to 8 obtain the classification rate of 89.48%, 88.76%, 90.25%, 91.54%, 88.31%, 89.57%, 90.18%, and 90.54%, respectively, on their corresponding local private data set. Meanwhile, local nodes 1 to 8 obtain the classification rate of 54.65%, 57.18%, 49.75%, 51.43%, 48.92%, 46.35%, 52.45%, and 50.20%, respectively, on global testing data set. It can be obviously seen that the performance of local models is better than that of global model on private data. Local models get low performance on global data data, with about 30% lower than the global model. These indicate that local models are more preferable to local data, while the global model trained with traditional federated learning framework needs further improvement. Therefore, it is urgent requirement to make model personalization.

4.3. Model Personalization Evaluation

In this subsection, the proposed personalization model-based feature alignment is evaluated.

The global model obtained in the above subsection is first downloaded to each local node, and then the personalized model for each local node is trained on the basis of local data set and . Complying with Section 2, , , and are set with 0.3, 0.3, and 0.4, respectively. Batch size is set with 16, and learning rate is 0.001.

Table 4 gives the result of model personalization. Column 1 is the node index. Column 2 and 3 are average performance on local data with models of training with only local data and training with personalization. Column 4 and 5 are average performance on global data with models of training with only local data and training with personalization. It can be seen that the performance of model with personalization is decreased with about 3%. This demonstrates that the personalization model is less deviated to distribution of local data. For global testing data, the performance of the model with personalization is greatly increased with about 15–18%. It is a good validation that the personalization model is more generalized.

4.4. Comparisons with Other Methods

In this subsection, some related model personalization methods are compared with our proposed model. We implement algorithms of [31, 34, 35], and the average classification rate on local and global testing data is evaluated.

Table 5 demonstrates the comparison result. Methods in [3335] get an average performance of 84.41%, 82.70%, and 83.55% on local node testing data and 79.80%, 78.68%, and 76.45% on global testing data. There are about 4% and 6% compared with our proposed method, and this validates the effectiveness of the proposed personalization framework and feature alignment module.

4.5. Effect of Global Alignment and Local Alignment

In this subsection, effect of global alignment and local alignment is evaluated. Global alignment and local alignment are two novel operations proposed in our work, which aims to catch the generalization ability of the global model and extract the discrimination ability of local private data. Here, the effect of global alignment and local alignment by assigning them different weights is evaluated.

Table 6 demonstrates the comparison result. Five parameter settings for , , and are adopted. It can be seen from the table that with the raise of , value avg. (local) increases; meanwhile, the value of avg. (global) decreases. There are similar appearances for . and are two parameters to trade off the performance balance between global data and local data. Parameter setting with 0.3, 0.3, and 0.4 obtains the optimal performance.

4.6. Evaluation of Execution Time

In this subsection, personalized model execution time is evaluated. Three CNNs structures with various training batch sizes are tested. Table 7 shows that the model costs the most execution time, with 1.26 s, 1.67 s, 2.31 s, and 3.74 s for the batch size of 8, 12, 16, and 24 for each training iteration. -34 gets the least execution time, with about 58% of ’s. For model inference, three models take 0.120 s, 0.076 s, and 0.063 s for testing data, respectively. -34 is the preferred model for its excellent performance and acceptable cost.

5. Conclusions

This work proposes a novel personalized federated learning method for ECG classification. We explore feature alignment for personalization strategies on both global and local sides. Through experiments on our collected dataset, it shows that personalization benefits the local model with high performance and more generalization. To our knowledge, this is the first evaluation of personalization federated learning for ECG data analysis.

Our future works will focus on two aspects: (1) we will make more in-depth research on the personalization method with specific structure and (2) external dataset should be used to improve model performance, such as webly grabbed data [40].

Data Availability

Dataset used in this research is private.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by National Key R&D Program of China under Grant no. 2017YFB1003000, National Natural Science Foundation of China under Grant nos. 61632008, 62072099, 61972085, 61872079, and 61972083, Jiangsu Provincial Key Laboratory of Network and Information Security under Grant no. BM2003201, and Key Laboratory of Computer Network and Information Integration of Ministry of Education of China under Grant no. 93K-9 and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization, Collaborative Innovation Center of Wireless Communications Technology, and the Fundamental Research Funds for the Central Universities.