Abstract

To date, the Medical Internet of Things (MIoT) technology has been recognized and widely applied due to its convenience and practicality. The MIoT enables the application of machine learning to predict diseases of various kinds automatically and accurately, assisting and facilitating effective and efficient medical treatment. However, the MIoT are vulnerable to cyberattacks which have been constantly advancing. In this paper, we establish a MIoT platform and demonstrate a scenario where a trained Convolutional Neural Network (CNN) model for predicting lung cancer complicated with pulmonary embolism can be attacked. First, we use CNN to build a model to predict lung cancer complicated with pulmonary embolism and obtain high detection accuracy. Then, we build a copycat model using only a small amount of data labeled by the target network, aiming to steal the established prediction model. Experimental results prove that the stolen model can also achieve a relatively high prediction outcome, revealing that the copycat network could successfully copy the prediction performance from the target network to a large extent. This also shows that such a prediction model deployed on MIoT devices can be stolen by attackers, and effective prevention strategies are open questions for researchers.

1. Introduction

The number of intelligent Medical Internet of Things (MIoT) deployed online has been constantly increasing, reaching 20.35 billion in 2017, and the estimated number will continually increase to 75.44 billion in the next decade [1]. Besides, according to the International Data Corporation (IDC), the last five years have witnessed a 17.0% annual growth rate in IoT spending from approximately $700 billion in 2015 to nearly $1.3 trillion in 2019 [2]. Among them, MIoT accounts for a large proportion. Tan and Varghese [3] pointed out that there is a huge potential for the application of IoT in the health industry. Nevertheless, practical constraints must be taken into consideration. Vicini et al. [4] presented an approach to combine vending machines with IoT technology to facilitate a healthy lifestyle. However, cyberattacks are not new to IoT, leading to terrible consequences [5, 6]. Most of the MIoT are without any defense mechanism. With the widespread application of IoT devices, cyberattacks are also improving, posing a more severe threat to the secure operation not only of IoT devices but also of the entire cyberspace [7, 8].

With an increasing number of IoT-related cyber incidents being reported, experts and researchers from the IoT industry and academia have been working to design secure systems and solutions to combat the attacks of various types [9, 10]. Many researchers have devoted extensive efforts to ensuring MIoT security and privacy, providing practical guidance for MIoT security. Fu et al. [11] highlight both opportunities and possible threats that IoT faces in two important application scenarios—the home and hospital. Yang et al. [12] provide an extensive survey, presenting the classification of MIoT attacks from perspectives of MIoT security research, threats, and open issues. Boejen and Grau [13] have utilized Unmanned Aerial Vehicles (UAV) to launch an attack in a simulated smart hospital environment and compromise a small collection of wearable healthcare sensors. Sethuraman et al. [14] have proposed a new deep learning approach, DFEL, for real-time cyberattack detection in the IoT environment and presented the robustness of high accuracy and significant time savings.

However, there are not many studies that investigate the attacks targeting the services deployed on the MIoT devices, particularly the MIoT-based AI services, for example, machine learning-based disease prediction/detection services. Unlike the model Mohan [15] has raised, using lightweight encryption and attribute-based authorization to protect the model, in our model when selecting the data set, we used the patient data in a specific area (Yunnan, Chongqing), which greatly reduced the risk of attacking the established network by exploiting the vulnerability of the data set. At the same time, we store the prediction model of lung cancer complicated with pulmonary embolism in the cloud to further protect our model with the protection measures provided by the cloud. In this paper, we study a scenario where a trained Convolutional Neural Network (CNN) [16] model for predicting lung cancer complicated with pulmonary embolism can be stolen by attackers. Specifically, we build a Copycat CNN [17] using only a small amount of data labeled by the original network, aiming to steal the established prediction model. We prove that the stolen model can successfully copy the prediction performance with a minor difference of approximately 3%. By doing this, a prediction model deployed on MIoT devices can be stolen by attackers. Overall, the contributions of our work are as follows: (1)Create a new platform of surgical IoT for cybersecurity study in high-performance medicine(2)Propose a model stealing attack on the intelligent medical platform(3)Implement and evaluate the proposed intelligent medical platform and model stealing attack

This paper is organized as follows: In Section 2, we review the related works focusing on the cyberattacks using deep neural networks for the MIoT. The model stealing attack experiments are designed in the methodology part which is presented in Section 3. In the next section, the evaluation of the attack scheme on the medical platform was demonstrated and discussed. In the last section, we summarize the results and conclude this paper.

2.1. VR for MIoT

The IoT application has been widely used in the medical industry. In recent years, it has become widespread to combine Virtual Reality (VR) technology with medical-related majors. The integration of the Internet of Things and VR technology in the education field can enable learners to combine their conceptual learning with practical experience in a novel way [18]. Coogan and He use Unity Software, combined with a brain-computer interface, to control the VR environment and MIoT devices [19]. To make the operation of the entire medical platform more transparent, we adopted the combination of VR technology and MIoT to correctly reproduce the prediction process of lung cancer complicated with pulmonary embolism through the medical platform.

2.2. Cyberattacks with Deep Neural Networks

Because the medical concept of the Internet of Things is based on the concept of the Internet of Things, we should also understand the concept of the Internet of Things which was put forward in 1995 by Bill Gates in The Road Ahead and in 1999 by Auto-ID who first proposed the “Internet of Things,” after the Internet of Things in various fields had a corresponding application, including the medical field. In 2013, Hu and his team [20] had believed that based on the support and guarantee of the powerful Internet of Things technology, the personal networking platform in the medical field will have a strong background shortly. This becomes reality, in 2018, when Jagadeeswari et al. [21] proposed a healthcare monitoring system based on big data training on a powerful computing platform. This has proven that the Medical Internet of Things has become a reality. In 2020, due to more and more cyberattacks, Flynn et al. [22] provided a proof of concept that the MIoT device and its accompanying smartphone app are vulnerable to attacks. A recent survey on Android malware detection is provided in [23]. This provides a certain theoretical basis for our attack model. The emerging deep learning techniques have shown impressive performance in various fields, from tasks like speech and object recognition to natural language processing (NLP), and even to cybersecurity tasks such as bug and vulnerability detection [24, 25]. Nevertheless, the deep learning technologies can easily be fooled by crafted adversarial examples, which have brought considerable attention since 2014 when Szegedy et al. [26] and follow-up studies [27, 28] showed that imperceptibly perturbed input images could successfully fool deep networks. Subsequently, Dalvi et al. [29] and Lowd and Meek [30, 31] investigate the carefully crafted adversarial samples which can fool linear classifiers in the context of spam email detection. In 2006, Barreno et al. [32] pointed out that machine learning algorithms can be targets of a malicious adversary, and deep learning algorithms are no exception. When it comes to the investigation of attacks to deep models using grey-box models, Papernot et al. [33] applied a grey-box target deep neural network (DNN) using the MNIST database. They use crafted adversarial samples against the target DNN, aiming to craft adversarial examples by approximating the decision boundaries of the target DNN. Subsequently, Bapiyev et al. [34] have demonstrated that one of the most promising approaches to the development of detection systems of network cyberattacks improved their software by application of modern models based on deep neural networks. And the results of model testing have shown that the accuracy of the basic variant is comparable with the accuracy of modern detection systems of network cyberattacks.

Table 1 shows different , , , in different formula expressions, which represents different levels of knowledge of the attacker. Compared with white-box attacks, grey-box attacks show differences in enumerated expression and trained parameters/hyperparameters, which are understood in the literature as unknown parameters. It can be concluded from the formula of a black-box attack that we do not know everything about the original network when carrying out the black-box attack. In our attack network, a grey-box attack is adopted. Based on the same data set selection interval, relatively reasonable data labels can be obtained by doing so while ensuring accuracy.

In this paper, we examine a copy attack using a CNN (which we call a copycat network, a grey-box attack) to copy information from another CNN (the target network) in a disease prediction scenario. By leveraging a small number of data labeled the target network, the copycat network could obtain similar performance compared with the target network, showing that the MIoT-based prediction model is vulnerable to adversarial attacks.

3. New Platform for Mobile and Intelligent Medicine

3.1. MIoT System Design

Unity Software is a multiplatform integrated game development tool that allows players to easily create interactive content such as 3D video games, architectural visualization, and real-time 3D animation. This is a fully integrated professional game engine. The core code of the Unity engine itself is written in the underlying language C/C++. The image, sound, and physics engines are all compiled in C++. The dynamic link library DLL file encapsulates a series of methods and classes. C#, Python, and other programs call corresponding methods and classes through DLL files to build the game flexibly and with superior performance. Unity can run across platforms, such as Android, IOS, PC, and Web. This article is for the Android platform. Unity will publish the APK file of the VR project to the Android device and then display it through the headset. Unity will publish the APK file of the VR project to the VR headset and display this scene. The VR headset uses Pico G2 (Beijing Bird-Watch Technology Co., Ltd.) mobile VR headset, which has a field of view of 101°, refresh rate of 90 Hz, and resolution of 3K, providing the wearer with immersive medical VR application scenes (Figure 1).

As shown in Figure 1, the whole MIoT system consists of two parts: The left part is the construction of a three-dimensional lung model, in which three-dimensional voxel segmentation was performed on CT images of patients (lung cancer with pulmonary embolism), and the lesions were marked. The right part processes the patient’s textual data and uses LSTM and RNN deep learning model algorithms to predict and classify the data, respectively. A safety module is then added to make up the MIoT system (Visual-Haptic Navigation System).

3.2. A Deep Neural Model for PE&LC Prediction

In this part, we use a CNN to perform the prediction of lung cancer with pulmonary embolism (LC&PE).

As we can see from Figure 2, our CNN-Net architecture contains two 1D convolution layers and two full-connection layers and connects to a sigmoid activation layer. Every 1D convolution layer is equipped with a kernel the size of which is 3, followed by a LeakyReLU activation layer and a max pool layer with a stride of 2 to downsample the text. Between two full-connection layers (one has the input size of 320 and the output size of 120; another one has the input size of 120 and the output size of 2), there is a LeakyReLU activation layer. Finally, we use a sigmoid neuron as a classifier.

We use the convolution layer to extract features from the data. The output value of the layer with input size and output can be precisely described as

where is the batch size, denotes the number of channels, and is a length of the signal sequence.

When and , where is a positive integer. This kind of operation is also called deep convolution in the literature.

For an input of size , depth convolution with depth multiplier can be constructed by parameters input: , output: where

4. Model Stealing Attack to the New Platform

4.1. Overview of the Threat Model

As we can see (Figure 3), the MIoT structure consists of three layers (the perception layer, the network layer, and the application layer). Healthcare data with a variety of devices have been mainly collected in the perception layer. The network layer is composed of a wireless system, which processes and transmits the input obtained by the perception layer with the support of the technology platform. According to the actual situation and service needs of the target population, the medical information resources are integrated at the application layer to provide personalized medical services to meet the needs of end users.

Dividing MIoT into these three levels enables a more thorough analysis of where the network is at risk. In the perception layer, Wang et al. put forward the concept of the input formed by applying small but intentionally worst-case perturbations to examples in the data set; by doing this, they can output an incorrect answer with high confidence [10]. In the network layer, we can steal the model already trained by others for higher business value, which can greatly reduce the investment in the early stage of research and development and obtain higher profits.

4.2. Theoretical Description of the Model Stealing Attack

In this part, we will introduce how to build our imitation network (copycat network) using data stolen from an existing target network (CNN in this case). The whole process of stealing is mainly to use random natural data to steal a network of imitators from the existing target network. It mainly includes two steps, creating pseudo training data and training a network of imitators. In the first step, a target network is used as a grey box to mark random natural data to generate a pseudo data set. Then, this pseudo data set is used to train an imitation network to replicate the property of the target network.

A data set is needed to train the imitation network (Figure 4). We recommend using pseudo data sets extracted from the target network (including text data related to or not related to the problem domain (PD)). Therefore, the pseudo data set is completely different from the original data set. When performing a steal operation, the target network receives text data as input and affords class tags as output. The data set can be composed of the same PD as the target network, or it can be composed of random natural text data. First, we assume that the attacker has text data in the same PD as the training target network. Second, we suppose that the attacker can only access publicly available large-scale data sets, but in our research, the original labels are considered irrelevant. When automatically labeling these data sets (PD and/or nonproblem domain (NPD)), the target network is used by the attacker. Another type of network can be trained with labeled pseudo data sets, hoping to capture the nuances of the characteristic regions, to achieve property close to the target network. Achieving this hypothesis is mainly based on adding imperceptible noise to the input text data of CNN to obtain an answer from the network in a certain direction. The NPD can be achieved from the Internet for free. Then, when disposing of small databases (for example, PD data sets), the data expansion process can help increase the size of the database to obtain better results.

Once a pseudo data set is obtained, the simulation network can start training. Firstly, a model architecture must be chosen as an attacker to mimic. Note that the attacker performing the replication may not know the target network’s model architecture, but it makes no difference. We use a well-known architecture (CNN architecture) to compare with the original network. CNN is created for classification, so its output layer can be set according to specific problems. For the attacker, this may also be the case of the chosen architecture, i.e., imitating the target network. So, the output of the selected model must be adapted to the target network’s PD; the output number of the replicator must match the number of classes processed by the derivation of the target network.

The purpose of this simulated network is to evaluate whether the proposed method can replicate the target model with a small set of text data set in the same PD. In this case, we assume that the attacker can access a small amount of data in the same domain but without labels. Therefore, the samples of this data set contain text data set of the same PD as the original data set but are marked by the target network.

The transferability of adversarial samples is accurately defined. We suppose an opponent is interested in producing a misclassified adversarial sample that is different from the class assigned to the legal input by the model. This can be achieved by solving the following optimization problem:

To mislead the sample , the model was calculated deliberately. However, as mentioned earlier, such adversarial samples are often misclassified by models other than in practice. To facilitate discussion, we formalize the concept of transferability of adversarial samples as

The set represents the expected input distribution of the tasks solved by model and model . We divide the adversarial sample transferability into two variables to characterize the pair of models . First is the transferability within technology, which defines transferability between training models of the same machine learning technology with different parameter initializations or data sets (for example, and are both neural networks or both decision trees). Second is crosstechnology transferability, which considers using models trained by two technologies (for example, is a neural network and is a decision tree).

4.3. Discussion on the Specific Medical Scenario and the Attack

Lung cancer with pulmonary embolism accounts for a large proportion of medical mortality, a large part of which is due to errors in the diagnosis of patients with lung cancer with pulmonary embolism. Our system, after several training steps, can predict accurately whether a lung cancer patient will have pulmonary embolism at the same time.

This would allow doctors to have an accurate diagnosis of the patient and develop a suitable plan to reduce the mortality rate. The system is of great value both medically and economically. However, this system can be vulnerable to attacks. The attack we designed was to steal a trained model. In today’s increasingly important intellectual property, attacks of such kind can severely damage the profit of the model owner, causing the leak of patients’ privacy. In this paper, we implement a copycat model to steal a trained model for predicting lung cancer with a pulmonary embolism network and demonstrate the feasibility of successfully copying the performance of a trained model.

The data set we use consists of 179 lung cancer patients with pulmonary embolism, 1372 lung cancer patients without pulmonary embolism, and 71 samples randomly collected from natural data which have been used to create the original data set (the size of which is 1622). Among the total number of 1622 patient samples, 60% of the samples were used as a training set and 40% as a test set. As a result, our system predicted lung cancer with pulmonary embolism with a precision of 79.43%.

5. Experiments and Results

5.1. Implementation of the Platform and the Attack

Unity’s release of the VR project to the Android platform process is shown in Figure 5. As shown in Figure 5, the overall display is the process of a copycat model attacking the medical prediction model of lung cancer with pulmonary embolism. In this process, the copycat model plays the role of a thief. The prediction model of lung cancer with pulmonary embolism established by us is stored in the cloud. First, we determine the network model used by a copycat, build the model through code compilation software, and then reuse the input following the original model input requirements of the data set, stealing useful labels for us to use to generate the copycat network. To make the whole prediction result more convenient for observation, we used the Unity 3D platform for 3D modelling to generate a 3D lung. First, we used the code to isolate the lesion area in the CT image of the patient and generated a file in the form of OBJ, which was imported into the Unity 3D platform for modelling. The upper part of the figure shows the 3D modelling process. In contrast, the lower part shows the whole process of the copycat model attacking the prediction model of cloud lung cancer combined with pulmonary embolism. The whole framework shows the process of the copycat model attacking MIoT.

The prepared data set has been imported into the target network stored in the cloud; at the same time, the label corresponding to our data set is also output together. The network we selected was trained through data sets and stolen tags. During the training, the parameters and hyperparameters in the network were constantly fine-tuned so that the copycat network and the target network were continuously fitted to achieve similar effects, which meant that our attack was successful.

5.2. Performance of Intelligent Medical Platform

We use the confusion matrix as the evaluation standard of the intelligent medical platform. In the prediction analysis, the confusion table, sometimes called a confusion matrix, is a two-row, two-column table composed of TP (True Positive), FN (False Negative), FP (False Positive), and TN (True Negative). It allows us to do more analyses, not just to get it right. The following expressions are the application of different parameters in the obfuscation matrix:

In the predictive classification model, the quantity of TP and TN is large, while the quantity of FP and FN is small, which means the prediction accuracy is higher (which can be seen from Figure 6). However, what is counted in the confusion matrix is the number. Sometimes, faced with a large amount of data, it is difficult to measure the number of models by counting. Therefore, the confusion matrix is an extension of the secondary and tertiary indicators in the basic statistical results (obtained by adding, subtracting, multiplying, and dividing the lowest indicators).

Therefore, after we obtain the confounding matrix of lung cancer with pulmonary embolism, we need to see how many observed values correspond to the second and fourth quadrants, where the value () takes up a large proportion in the total (311), which means that our prediction model is effective.

Macro average means to average the recall of class 1 and the recall of class 0. The weighted average is calculated using the proportion of samples as the weight. From the table above, our model has high prediction accuracy. From Table 2, we can see that our model has achieved a very high precision.

5.3. Effectiveness of Model Stealing Attack

We trained a CNN to predict LC&PE, using an adaptive learning rate of 1-4, which is then reduced based on the smooth behavior of the verification loss. Other hyperparameters include the batch size of 8, the number of instances () set to 200 (unless otherwise specified), the Adam optimizer with a weight of 0.01, and binary crossentropy loss. The implementation is based on Pytorch and uses NVIDIA GTX 1070 GPU.

The Receiver Operating Characteristic (ROC) curve shows the detection capabilities of the trained CNN model and the imitated CNN under different classification thresholds. The abscissa of the plane is the false positive rate (FPR), and the ordinate is the true positive rate (TPR). For the classifier, we can get the TPR and FPR point pairs according to the performance of the classifier on the test sample.

As can be seen, Table 3 lists the performance metrics of the Copycat CNN and Table 4 lists the absolute values indicating the performance difference variation between the original network and the imitator network after training. Combined with the data in Tables 3 and 4, we can see that the copycat model can achieve high accuracy in stealing the prediction model of lung cancer with pulmonary embolism, which is almost the same. And from the figure, we can see that Figure 7 describes the absolute value of the difference between the original network and the imitator network after training, and in terms of the precision/recall, the performance variations between the Copycat CNN and the original network range from 2.6% to 0.3%. Figure 8 shows the ROC curve about LC with PE and LC without PE in Chongqing, Yunnan. Almost the same bar chart and ROC curve close to 1 prove that the copycat network built by us is a model with functions close to the original network with facts. Above, the performance difference between the network stolen from the target medical platform model through the copycat model and the original network is not evident. This means that we can successfully use deep learning models to steal the target network with a small amount of labeled data.

Through the comparison of the data in the experiments we obtained, we can see that the copycat is generally low in various scales with the original network, which, in the prediction accuracy of lung cancer and f1 appeared on the score difference of 0, shows that we can steal out of the network and the gap with the original network has become very small, thus proving that our guess is correct. We may conclude that the prediction results of the copycat model are 99% identical to those of the original model.

6. Conclusions

In this paper, we establish a new platform based on surgical IoT for cybersecurity study. On the established intelligent medical platform, we propose a CNN for lung cancer with pulmonary embolism prediction. To demonstrate the attack to an established model on the surgical IoT platform, we implemented a random selection model that mimics CNN training using a small number of labeled samples. Experimental results show that the replication model can successfully replicate the performance of the target CNN, achieving minor performance variance (less than 3%). The success of the attack shows that intellectual property such as the trained AI model using private and sensitive information can be stolen. How to effectively prevent attacks of such kind from happening is an open question for researchers from the fields of cybersecurity, MIoT, and deep learning.

Data Availability

The data supporting the results of this study can be obtained from the corresponding author.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Authors’ Contributions

Liqiang Zhang and Gunjun Lin contributed equally to this paper.

Acknowledgments

We thank Professor Jun Peng of the Yunnan First People’s Hospital for the helpful data processing guidance and Xuejuan Wang and Shangjin Lv for collecting the data together. This research is funded by the National Natural Science Foundation of China (61741516) and the National Science Foundation of Yunnan Province, China (ZD2014004) of Yunnan Key Laboratory of Optoelectronic Information Technology, Kunming, China.