Abstract

The rapid growth of the Internet of Medical Things (IoMT) has led to the ubiquitous home health diagnostic network. Excessive demand from patients leads to high cost, low latency, and communication overload. However, in the process of parameter updating, the communication cost of the system or network becomes very large due to iteration and many participants. Although edge computing can reduce latency to some extent, there are significant challenges in further reducing system latency. Federated learning is an emerging paradigm that has recently attracted great interest in academia and industry. The basic idea is to train a globally optimal machine learning model among all participating collaborators. In this paper, a gradient reduction algorithm based on federated random variance is proposed to reduce the number of iterations between the participant and the server from the perspective of the system while ensuring the accuracy, and the corresponding convergence analysis is given. Finally, the method is verified by linear regression and logistic regression. Experimental results show that the proposed method can significantly reduce the communication cost compared with the general stochastic gradient descent federated learning.

1. Introduction

The Internet of Medical Things (IoMT) is using a variety of communication systems to connect many devices to form best-in-class systems that can detect, collect, exchange, analyze, and transmit valuable communications [1, 2], helping companies manage smarter and deliver faster business solutions. IoMT can build a large number of applications through various “smart” sensors such as artificial intelligence and machine learning (ML) technology, thereby revolutionizing the ubiquitous computing system [3, 4]. Secure communications and sensing technologies can leverage a participatory approach to implement integrated solutions while establishing new applications relevant to the industry, particularly healthcare. One of the key applications of 5G-based IoMT is healthcare, which is aimed at maintaining patients’ medical information in electronic environments (such as cloud and edge cloud) systems through the latest telecom paradigm [5, 6]. For healthcare applications, ML models are typically trained on enough user data to track health status information. Traditional machine learning methods such as support vector machine (SVM), decision tree (DT), and hidden Markov model (HMM) can be used in a variety of healthcare applications [7]. Patterns are analyzed and classified based on the construction of explicit or implicit models, and its ML method has been used to improve the detection rate of malicious data [8]. However, they still have many problems in detecting new or evolving malicious data, and the accuracy of unsupervised anomaly detection used to detect new security is low [9]. As the number of variants grew, this became a major bottleneck, mainly because of the amount of work required to gather enough datasets. In addition, when new features from different network layers need to be combined to deal with the evolving malicious data, the learned classifier cannot be directly used to test the data with different feature spaces [10]. This paper attempts to overcome these challenges, which involve data aggregation with security and privacy protection. First, in the real world, data often exists in separate, decentralized forms. Although there is a lot of data in different sensors, it is not shared due to privacy and security concerns [11]. If the same user uses data from two different sensors, the data stored in different clouds cannot be exchanged, making it difficult to train powerful models with valuable data. Another important issue is personalization based on feature data, most of which are based on a common server model for almost all users. After capturing enough user data, train a satisfactory machine learning model, which itself is distributed to all user devices that can track health information on a daily basis, but the program lacks personalization. It can be seen that different users have different characteristics and daily behavior models. As a result, general models cannot deliver personalized healthcare. Based on this idea, a federated transfer learning algorithm is proposed, which is an IoMT-enabled intelligent healthcare framework named FT-IoMT Health [12]. FT-IoMT Health can solve the problem of data decentralization and model personalization through federated learning and homomorphic multiparty encryption methods [13]. FT-IoMT Health aggregates data from different systems to build powerful machine learning models and appropriately protect user privacy. After building the cloud model, FT-IoMT Health utilizes migration learning to implement a personalized model for each network entity [14]. Transfer learning is a novel machine learning technology, which utilizes knowledge learned from related training (source) sets to improve the prediction accuracy of test (target) sets with almost no label data [15] and enables the framework to update gradually. FT-IoMT Health is scalable and used in many healthcare applications, enabling them to constantly update their learning capabilities every day.

In short, the main contributions of the paper are as follows: (1)This paper proposes an algorithm, FT-IOMT Health, which is the first federated migration learning mechanism based on IoMT. This mechanism aggregates data from different entities without compromising privacy and security and obtains relatively personalized models by means of transfer learning(2)On the basis of known data analysis, transfer learning technique is used to detect new unknown data analysis. The use of transfer learning itself is the main advantage of enhancing the adaptability of the detection model(3)This paper validates FT-IoMT Health’s superior performance in identifying human activity on UCI smartphones. The experimental results show that FT-IoMT Health greatly improves the recognition accuracy compared with traditional ML methods

In traditional healthcare applications, it is important to note that models are typically built by aggregating all user data. In practice, however, data is often separated and difficult to share due to privacy issues, and the models built by applications lack the characteristics of model personalities. A well-known network data detection technique is signature-based detection, which is based on the deep information of the specific characteristics of each detection. Another technique used for network data detection is supervised learning [16, 17]. Both studies were less accurate in detecting new data because they typically relied on known cases of detection. Federated machine learning was first proposed by Google [18]; since the phone is distributed throughout its life cycle, Google trains the machine learning model on this machine, with the primary purpose of protecting user data in the program. Federated learning is a technical approach to solve the problem of data discreteness through the training of privacy models in networks. The goal of transfer learning is to transfer information from known related fields to new fields, so as to achieve the purpose of analogical reasoning, and the main goal is to reduce the distribution differences between different fields. Therefore, there are two main implementation methods: instance reweighting [19] and feature matching [20]. Recently, deep transfer learning technology has made great achievements in many applications. FT-IoMT Health mostly involves deep transfer learning. Many methods assume the feasibility of training data, which is obviously unrealistic. FT-IoMT Health builds deep migration learning into a federated learning framework, eliminating the need to access raw user data. Therefore, this achieves the goal of greater security.

The point of federated transfer learning here is that samples or features do not have more in common. In recent years, a number of researchers have begun to dabble in the field. In [12], Liu et al. put forward a secure federated transfer learning algorithm in a two-party privacy protection environment, which paid more attention to data security. Most studies also propose a federated domain adaptive approach, which extends the domain adaptive approach to federated setting constraints to achieve data privacy and domain transformation. Although a great deal of research work continues to develop rapidly, there are still many challenges in the practical application of federated transfer learning. The work in this paper is the first federated transfer learning mechanism designed specifically for IoMT applications and will therefore be extended by a variety of transfer learning technologies.

3. System Model

3.1. Problem Definition

Take data from different users, the user is represented as , and the reading value of the sensor providing the data is defined as . The conventional method trains the general model by combining all the data . All data should have different distributions. In response to our proposed problem, we aim to gather on all data to train the federated model , in which no user will disclose these data to each other. If we define the accuracy as , the goal of FT-IoMT Health is to guarantee that the accuracy of federated learning is approximately or better than the accuracy of the following conventional learning: , where is a very small positive real number.

FT-IoMT Health is aimed at leveraging joint transfer learning technology to obtain accurate personal healthcare information without compromising user privacy. Figure 1 shows a profile of the mechanism. Suppose there exist users (network data) and one server, and then, expand them to a more general situation. The main components of the framework are described below. First, train the cloud model on the server based on a common dataset. Therefore, the cloud model is distributed to all users so that each user will train his model on his own dataset. The user model is then uploaded to the cloud for training the new cloud model using model aggregation. Finally, each user will implement personalized models to train users based on cloud models, network data, and predictive future data. In this process, due to the large distribution difference between server data and user data, the transfer learning method is adopted to make the model more suitable for users, as shown on the right end of Figure 1. It is important to note that none of the parameter sharing processes will include user data leaked through homomorphic encryption.

The federated learning model is an important computation model for the entire FT-IoMT Health mechanism. Its role in the whole process is to deal with model construction and parameter sharing. The server model will be directly applied to users after the learning and training process. This is exactly a way based on traditional healthcare applications applied to model learning. Obviously, the probability distribution of the samples in the server and the data generated by each user is very different. Therefore, it is difficult for the general model to achieve personalized settings of the data model. In addition, due to privacy security issues, the user model cannot easily achieve continuous model updates.

3.2. Federated Learning

FT-IoMT Health uses the federated learning paradigm to implement training and sharing of encryption models, and its steps mainly involve the following two key parts: namely, cloud and user model learning. For FT-IoMT Health, deep neural networks are used to learn cloud and user models. The deep neural network uses the original input of user data as the network input for end-to-end feature learning and classifier training, where represents the server model to be learned, and the learning goal is

where indicates network loss function such as cross-entropy loss for classification tasks, is a sample of server data, and its size is . represents all the parameters to be learned, namely, weights and bias.

After obtaining the cloud model, distribute it to all users. From the obstacle in Figure 1, direct sharing of user info will be prohibited. The process exploits homomorphic encryption to prevent info leakage. Due to the fact that encryption is not a subject to be considered, only the procedure of homomorphic encryption applying the addition of real numbers is explained. Therefore, this can complete parameter sharing without leaking any user information. We apply federated learning to aggregate user data without compromising privacy and security. Therefore, the learning goal for user is defined as

After completing the training of all user models according to the shared cloud model, upload them to the server for aggregation. It can be seen from the evaluation that in the case of shared initialization, the method of federated averaging [21] can be adopted to average the model to achieve good performance in reducing loss. Therefore, following [21], align the user model by the model average value, and then, perform the cloud model update average value on user models in each training round. The updated cloud model is expressed as where is the parameter of the network and is the number of users. After enough iterations, the updated server model has better generalization capabilities. Then, new users can join the next round of server model training. Therefore, FT-IoMT Health has incremental learning functions.

3.3. Transfer Learning

Apply transfer learning technology to improve the detection of new network data analysis by transferring the information learned from known network data analysis, so as to distinguish between the common coarse feature model for all users and the fine-grained feature model for personalized user. The expression source and target are used to define the training and test datasets in the machine learning task, respectively. Both source and target data are represented by normal flow records and abnormal flow records. The purpose of this transfer learning is to adapt source data to assist distinguish new detections from the target, thereby building a personalized model for each user.

The transfer learning mechanism is composed of the following three major processes: (1) feature extraction process (obtained from the original network), (2) feature-based learning process, and (3) supervised classification process. The first step is to perform data tracking on the original network to extract features based on the statistical calculation of network traffic. In the second step, a feature-based transfer learning algorithm is used to learn the new feature representation from both the source data and the target data, and the new representation will be fed to the general basic classifier.

The data detection is modeled as a binary classification problem, i.e., the data state is classified as malicious or normal. Assume a source training instance with label , and target data , , where and are both users’ data extracted from the network. and come from different distributions , and have different dimensions, . Our goal is to accurately predict the label on .

The method is to apply new public latent space through spectrum transformation, in which the distribution of malicious examples is similar, but the distance between discriminatory ones is still very different. The ultimate purpose is to learn a new representation of the original data and target data in the -dimensional latent semantic space, namely, , so that it can use and instead of the original and better against malicious data sort. Its key purpose is given in Figure 2, because in the new projected public latent space (Figure 2(c)), the distribution of malicious A and malicious B are indistinguishable, even though they are in their original 2D and 3D spaces.

The following discusses how to search the public latent subspace. The optimal subspace is described in the following.

3.3.1. Optimization

Based on the given source data and target data , find the best projection of and on the best subspace and on the basis of the optimization goals given below: where is a distortion function used to evaluate the difference between the original data and the projection data and indicates the projection difference between the source data and the target data. is a trade-off parameter used to control the resemblance between two datasets.

Therefore, the first two components of (4) can assure that the projection data is as consistent as possible with the original data structure. Define as follows: where and are realized via a linear transformation with linear mapping matrices expressed as and to the source data and target data. indicates the Frobenius norm, which is also denoted as the matrix trace norm. In another point of view, and project the original data and into a -dimensional space, in which the projected data are equivalent . But it can produce trivial solutions . Therefore, Equation (5) will be applied. It is regarded as matrix factorization, which is a well-known advantageous tool for extracting latent subspaces while maintaining the original data structure.

According to to define as which represents the difference between the projection target data and the source data. Therefore, based on the minimized difference function (6), the source data and target data constraints of the projection are similar.

Substituting (5) and (6) into (4), the following optimization goals to minimize with respect to , and are as follows:

Therefore, the loss function of the user model can be calculated by the following formula:

The learning process of FT-IoMT Health is given in Algorithm 1. The framework will work continuously with newly emerging user data. When faced with new user data, FT-IoMT Health can simultaneously update the user model and the network-based cloud model. Thus, the longer the user spends data, the more personalize the model. In addition to transfer learning, other common methods (e.g., incremental learning) are also implanted in FT-IoMT Health for personalized settings.

1: Input: learning rate ,
2: Output:,
3: Construct an initial cloud model with common datasets applying Equation (1)
4: Distribute to all users
5: Train user model by Equation (2)
6: All user models are updated to the server through homomorphic multiparty encryption. Perform aggregation on the model employing Equation (3). Then, the server treats the aggregation model as the updated cloud model .
7: Distribute to all users and then execute transfer learning on each user to obtain their model
Applying Equation (8)
8: while optimized function Equation (7) not converge do
9:   Update by gradient descent with
10:   Update by gradient descent with
11:   Update by gradient descent with
12:   Update by gradient descent with
13:   step++
14: end
15: Repeat the above process for new user data constantly appearing

4. Experiments

4.1. Datasets

We employ a public human action recognition dataset named UCI smartwatch. The dataset involves 6 actions gathered from 35 users who use smartwatch around their wristband. 10 accelerometer and gyroscope data channels are gathered at a constant rate of 50 Hz. There exist 10,300 cases. To construct the subject status in FT-IoMT Health, five relevant topic features (content IDS 31-35) are extracted from them, and they are regarded as independent users, who will not share data because of privacy security. The data of the remaining 30 users is used to train the cloud model. Then, the goal is to use the cloud model and all 5 independent objects to improve the accuracy of the activity recognition of these 5 objects without compromising privacy. Consider it is a simplification of the framework in Figure 2, where there are 5 users.

For the feature transfer learning used in the construction of the personalized model, we mainly analyze from the network data detection. The network functions that contains can be summarized into three groups: here, we focus on studying the traffic data features, which are generally extracted by flow analysis tools, and content features, which need to deal with grouping content.

4.2. Specific Implementation Steps

Both the server and the user side use CNN for training and testing. The cyber is consisted of the following 2 convolutional layers, 2 pooling layers and 3 fully connected layers, which employ a convolution size. It is optimized using small batch stochastic gradient descent (SGD). In the training process, 80% of the training data is used for model training, and the remaining 20% is used for assessment. Set user and fixed. When the batch size is 64 and the training period is set to 80, the learning rate is set to 0.01. Model network data detection as a binary classification issue to differentiate malicious traffic from normal one.

To effectively assess the transfer learning method, source and target datasets will be generated as follows. To assess the performance of the transfer learning method in detecting unknown model variants in the cloud, so as to construct personal model, the problem is regarded as a detection that only exists in the target network but is not visible in the source network. Suppose there is one data detection in the source and another detection in the target. Therefore, the distribution of detection feature value between the source and the target is different. Therefore, three datasets are reconstructed, each of which includes a series of randomly chosen normal cases and a set of detections from one category. Here, one of the datasets is set as the target, and the other dataset is set as the source. Therefore, there are mainly the following three detection tasks: Seen⟶Unseen (i.e., source Seen data for training, target Unseen data (new network) for testing), Seen⟶Detection, and Detection⟶Unseen. It is presumed that the feature space between the source and target is the identical. The accuracy of user is computed by the following formula: where and define the true and predicted labels on , respectively. Perform federated learning according to homomorphic encryption. During the transfer learning period, all convolution and pooling layers in the network are frozen, and only the parameters of the fully connected layer are updated using SGD. To verify the validity of FT-IoMT Health, its performance was compared with conventional deep learning (DL). In traditional deep learning, we only use the primary server model and other conventional machine learning modes to record each the performance of each subject. The hyperparameters used in all comparison methods are adjusted by cross-validation. To achieve a fair study, all experiments were performed 5 times to record the average accuracy. Table 1 shows the performance comparison between the detection technologies proposed based on FT-IoMT and the benchmark method. Table 2 shows the accuracy of activity classification for each topic. Figure 3 indicates the ROC curve. Figure 4 compares FT-IoMT with other transfer learning methods. Figure 5 shows the results of extending FT-IoMT through other transfer learning methods.

4.3. Evaluation

FT-IoMT achieves the best classification accuracy for all users. From the outcomes in Tables 1 and 2, it can be concluded that FT-IoMT Health has importantly enhanced performance in all examples. Compared with DL, it slightly increases the average result by 5.6%. Mainly due to the fact that federated learning can be used indirectly for more info from distributed data model to train better and applying transfer learning, the model can be more personalized for each user’s features. Compared with traditional methods such as KNN, SVM, and RF, FT-IoMT Health also significantly enhances the recognition outcomes. Overall, it proves the validity of the FT-IoMT Health mechanism. For activity recognition, the results also show that deep learning methods (DL and TL-IoMT) attain better outcomes than conventional modus.

It is controlled by the representation capabilities of deep neural networks, while conventional modus depend on manual feature learning. Deep learning also has another advantage of enabling the online update model to be incrementally updated without retraining, while conventional modus need further incremental algorithms. The performance is very valuable in model reuse and federated transfer learning. In view of the unseen new network data detection environment, we will compare the performance of FT-IoMT Health with common basic classifiers, instead of using the transfer learning method for the three detection tasks. We chose random forest (RF), SVM, and KNN as common basic classifiers. From the ROC curve illustrated in Figure 3, it will be seen that FT-IoMT Health has improved the detection rate compared to the baseline. Comparison of IoT-based transfer learning methods: we have used other feature-based transfer learning methods (such as HeMap [22] and CORrelation ALignment (CORAL) [23]) to evaluate FT-IoMT network data detection tasks. From the outcomes illustrated as Figure 4, it can be achieved that the performance of FT-IoMT is better than other feature-based methods in all classifiers for network data detection tasks. There exist two adjustable parameters, the similarity confidence parameter and the size of the new feature space , which will be set manually or automatically by experiential research. There are methods for automatically determining the best parameters, for example, by calculating the similarity degree between the source and the target data to determine the similarity confidence parameter . In this work, a small labeled dataset (300 labeled) is used in the test set to search the best parameters.

For the use of other transfer learning methods to expand FT-IoMT Health, using different transfer learning methods to analyze the scalability of FT-IoMT Health, it uses two methods to compare its performance: (1) fine-tune, by only fine-tune the network on each subject, it will not significantly reduce the distribution difference between sets; (2) MMD (Maximum Mean Difference) is used for transfer, and MMD loss is used instead of alignment loss. The comparison outcome is shown in Figure 5. It will be seen from the figure that in addition to alignment loss, FT-IoMT Health can also attain desirable outcomes through fine-tuning or MMD.

The outcomes of transfer learning are greatly better than no transfer in average accuracy. It shows that the transfer learning process of FT-IoMT Health is very valid and scalable. Thus, FT-IoMT Health is universal and will be expanded in many fields by merging other transfer learning algorithms. In addition, other encryption algorithms can also be used to extend the federated learning, which may be a future research direction.

5. Application of Assistance in Diagnosis and Treatment of Neurological Diseases

Parkinson’s disease is generally a neurological disease characterized by some motor symptoms, so biosensors can be used in IoMT to help diagnose [24]. In addition, patient data is also a privacy-sensitive problem and must be resolved through federated learning. Therefore, FT-IoMT Health is applied to assist in diagnosis and treatment of Parkinson’s disease and is arranged in hospitals. After training the user model on the user side, the patient downloads it to the biosensor and connects to the network to update it during the next access. This allows users to detect and obtain real-time feedback on their own, so as to more easily obtain disease status.

Based on this, a biosensing application was developed to collect the patient’s acceleration and gyroscope signals at a frequency of 80 Hz for symptom testing. The symptom condition test is designed in the following states: arm swing, balance, walking, postural normal tremor, and resting tremor. For each test set, each symptom is divided into five levels from normal to severe. The treating doctor evaluated the collected symptoms. We collected sensor data from 150 patients aged 18 to 85 years. In the following evaluation process, the test data of arm swing and postural normal tremor are evaluated, and two categories with quite sufficient data are chosen as references.

Evaluate the classification accuracy of the collected dataset. The data is gathered from three hospitals, 80% of each hospital is randomly chosen as the public dataset, the remaining 20% are randomly selected as 5 users, and .

Table 3 shows the comparison results. In addition, the proposed method gives the result of the ideal scheme. Due to all the data is preserved in one location, it is easier to view the upper bound of the model performance. From the outcomes, it will be seen that FT-IoMT Health has achieved the best classification accuracy, which obviously exceeds the best comparison means, and has narrowed the gap with the perfect case. It is fully proved that using federated transfer learning technology, the FT-IoMT Health mechanism can achieve effective symptom classification in practical applications.

Consistent with the experimental setup mentioned above, Figures 6 and 7 show the scalability results of the arm swing and normal posture tremor test data, respectively. It can be shown that in most cases, FT-IoMT Health can achieve satisfactory results using fine-tuning or MMD, which also shows that FT-IoMT Health and other transfer learning algorithms are as effective and scalable in practical applications.

For the performance of the given model, we further study ablation analysis (also called sensitivity analysis) to evaluate the two components of joint learning and transfer learning. We apply No-TL to mean an average model without personalize transfer learning. The outcomes are indicated in Figures 8 and 9. It can be seen from the results that both federated learning and transfer learning have made significant achievements to the performance of FT-IoMT Health. Comparing No-TL with DL, it can be seen that the model with federated conditions will increase the classification accuracy, which shows the effectiveness of federated learning. By further comparing No-TL with our federated transfer learning mechanism FT-IoMT Health, it can be seen that integrated with transfer learning technology, each user model will attain better performance in classification. The reasons are as follows. (1) Using federated learning, the server can indirectly aggregate more communication from multiple users to obtain a more general network cloud model. (2) Using transfer learning, users will obtain a more personalized user data model based on the cyber cloud model.

6. Conclusion

In the paper, we propose FT-IoMT Health, which is a federated transfer learning mechanism based on IoMT healthcare. FT-IoMT Health aggregates data from different network users without affecting privacy and security and realizes the user’s relatively personalized model learning through transfer learning. The key is feature-based transfer learning technology to overcome various detection methods that lead to variants in network performance. Experiments and applications have verified the validity and accuracy of the mechanism compared to other benchmark methods. Meanwhile, the experimental outcomes also indicate that the transfer learning method enhances the performance of detecting unseen new network malicious data compared with the baseline and proves that FT-IoMT Health can support the detection of new data in different feature spaces. In the future, we will plan to expand FT-IoMT Health through incremental learning to achieve a more personalized, flexible, and efficient healthcare system.

Data Availability

All the data is available in the paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors gratefully acknowledge the support from the Shandong National Science Foundation of China (Grant No. ZR202103040468).