A Privacy Risk Assessment Model for Medical Big Data Based on Adaptive Neuro-Fuzzy Theory
Information leakage in the medical industry has become an urgent problem to be solved in the field of Internet security. However, due to the need for automated or semiautomated authorization management for privacy protection in the big data environment, the traditional privacy protection model cannot adapt to this complex open environment. Although some scholars have studied the risk assessment model of privacy disclosure in the medical big data environment, it is still in the initial stage of exploration. This paper analyzes the key indicators that affect medical big data security and privacy leakage, including user access behavior and trust, from the perspective of users through literature review and expert consultation. Also, based on the user’s historical access information and interaction records, the user’s access behavior and trust are quantified with the help of information entropy and probability, and a definition expression is given explicitly. Finally, the entire experimental process and specific operations are introduced in three aspects: the experimental environment, the experimental data, and the experimental process, and then, the predicted results of the model are compared with the actual output through the 10-fold cross verification with Matlab. The results prove that the model in this paper is feasible. In addition, the method in this paper is compared with the current more classical medical big data risk assessment model, and the results show that when the proportion of illegal users is less than 15%, the model in this paper is more superior in terms of accuracy and recall.
With the development of information technology, the era of big data has come quietly, bringing opportunities and different challenges to all walks of life. Among them, the medical field is a very special field. Its particularity lies in that all its data are closely related to everyone’s life, involving the whole life process such as people’s food, clothing, housing, life, illness, and death, which is the core asset of big data. The outline of the “Healthy China 2030” plan issued by the CPC central committee and the State Council in October 2016 pointed out that the total size of the health service industry would reach 8 trillion by 2020 and 16 trillion by 2030. In the future holographic digital era, everyone will generate about 605 Tbit of data in their lifetime, and the country will generate 1,000 Zbit of data every year, which has extremely broad industry prospects . Looking at the global development pattern, medical big data has become an emerging industry that promotes the development of economic industries, and it has provided a primitive resource base for scientific and technological innovation .
At present, China has realized major “Internet + medical” engineering projects such as telemedicine and cross-domain medical care, which has brought great convenience to people’s lives. But technology is a double-edged sword that brings convenience to people’s lives while also bringing some disadvantages. Symantec released the top ten industries with severe data leakage in the 2016 “Internet Security Threat Report,” ranking the first is the medical industry. In addition, the survey found that more than 90% of medical social security information in the United States has been sold in the recent years, and more and more medical equipment is out of control. In most hospitals in China, the HIS system has no privacy at all. If a higher-level doctor wants to obtain patient information, he only needs to log in to the terminal to obtain the entire patient’s medical record information, while ordinary doctors can obtain all patient information in their own workstation. We can view the current status of the medical industry through a set of data from the United States: medical machinery without any security protection accounts for 77%, and medical equipment with a certain security strategy accounts for 27%. Of these attacks, 17% came from medical equipment, and 75% of the traffic on the hospital’s LAN was not monitored and audited, and the hospital itself knew that patient privacy was leaking every day. Although the privacy leak rate of medical big data in China is slightly lower, personal privacy leaks occur from time to time, and there are currently no complete laws and regulations on personal privacy protection. Medical data have their particularity, because their data source is mainly “people.” No matter what level of application, it involves human privacy and social stability .
Therefore, in the process of the rapid development of the big data Internet, in order to better serve the people, accelerate the development of the digital economy, and promote the integration and open sharing of medical data resources, research on the privacy protection of medical big data is imperative.
The rest of this paper is organized as follows: the second part discusses the research progress and current situation at home and abroad from the two aspects of medical big data security and privacy protection technology, risk-based access control, and summarizes the research status at home and abroad; the third part first introduces the relevant theories and principles, then formalizes the definition of risk indicators, and finally, combines fuzzy theory and a neural network to establish a risk quantification model based on adaptive neural fuzzy theory; the fourth part has carried out simulation experiments to prove that the model in this paper is feasible and efficient; and the fifth part mainly summarizes the work of this paper.
2. Related Works
2.1. Medical Big Data Security and Privacy Protection Technology
Scholars at home and abroad have carried out related research on medical big data security and privacy leakage. Through the collection of relevant literature analysis, it is found that the current academic research in this field mainly adopts technologies such as differential privacy, encryption algorithms, anonymization, and authentication. Research on access control is scarce. For example, literature [2, 3] protects sensitive information in the patient’s genome sequence by the differential privacy method and homomorphic encryption method; literature [4, 5] mainly researches medical big data generated by wearable medical sensors and through agitation thresholds and the introduction of binary trees achieve user privacy protection; literature [6–10] researches personal privacy protection issues from the technical level and has established differential privacy protection models, in which the privacy protection of medical big data is given suggestions; Zhang and Zhang  advocated the use of data encryption technology to ensure the security of medical data from the first, middle, and last three aspects of the incident; Tian et al.  proposed an attribute-based encryption method that structured data access As an authorization policy, the decryptor is allowed to request access to data only when the attributes of the decryptor satisfy the structure; He et al. , through the anonymity of user identity and mutual authentication between the client, server, and network administrator, protect patient identity information and data confidentiality.
Xing  designed the disease-based secure routing protocol and emergency response scheduling mechanism based on the social layer and cloud service layer of the wireless medical network; Wei and Xu  proposed that, from the generation and storage of medical big data starting from the calculation of three nodes, I believe that privacy protection technology should take these three aspects as the starting point to strengthen the “medical network technology structure”; Wang et al.  analyzed the privacy leakage of regional medical care and suggested that the data sharing platform started to study privacy protection work; Chen  analyzed the possible causes of medical data leakage from a systematic perspective and suggested that information protection should be performed through means such as anonymization, access control, and hierarchical management; Mounia and Habiba  analyzed the opportunities and challenges faced by the medical field in the context of big data and, finally, proposed privacy protection issues and coping methods for medical information; Gao and Sang  first combined the characteristics of medical big data. Based on this, the entire life cycle of medical data is summarized, and finally, the problems faced at each stage are discussed.
In addition, some scholars have studied the security and privacy protection of medical big data from the perspective of ethics and law. Liu and Wang  analyzed the ethical problems existing in the protection of medical information privacy from the perspective of the patient’s right of informed consent and the relationship between risks and benefits and gave coping strategies; we should adhere to the principles of transparency and autonomy, establish a comprehensive medical information privacy protection law, and construct a medical-oriented access control system to achieve the privacy protection of medical big data.
2.2. Risk-Based Access Control Technology
Risk refers to the harm that may occur when an event occurs. From a management perspective, risk-based access control is actually applying risk assessment as an effective decision-making tool to access control and dynamically giving subjects access. The concept of risk was proposed for the first time in literature , which provides the principles and suggestions that the risk-based access control model should satisfy. Kandala et al.  proposed an attribute-based risk access control framework. The author mainly constructed an attribute-based RAdAC model from the user’s access purpose, user credibility, historical access behavior, and device attributes, but did not provide a specific risk quantification method. However, literature [23, 24] have given a method for quantifying risk based on factors such as the subject’s security level, the sensitivity of the objects, and the mutually exclusive relationship between the objects. In addition, an RBAC model based on risk awareness has been proposed in literature , which mainly includes the following three parts: user trust, user’s ability to assume roles, and the compatibility between roles and permissions. Finally, a risk assessment model combining these three factors is presented.
In literature [26, 27], according to the risk assessment principles, context, and other information, users’ behaviors of viewing, modifying, and deleting medical records are evaluated from the integrity, availability, and confidentiality of medical records. Wang and Hong  statically calculated the doctor’s access behavior risk by measuring the deviation between the resources accessed by the doctor and the objective. Literature  is not specifically for the medical field, but it is a risk decision access control system proposed for a dynamic environment such as the medical industry. This system not only considers the user’s historical access behavior, but also considers the user’s recent access behavior, and the user’s trust and access risk are dynamically adjusted based on the user’s access behavior. Choi et al.  constructed a context-based medical information risk access control framework, and this framework mainly judges whether to grant users access rights based on authority files, user access logs, and context information. Hui et al.  improved on literature  not only considering the deviation degree between medical information accessed by doctors and work objective but also considering that doctors may steal patients’ privacy by forging work objectives, that is, the deviation degree between work objectives selected by doctors and patients’ conditions. Literature [32, 33] mainly analyzed the risk indicator system affecting the privacy leakage of medical big data in the cloud environment from the stages of collection, transmission, storage, and use of medical big data, without designing a specific risk quantification model. Literature [34–36] established the risk assessment model of medical big data with the help of fuzzy theory, but this method has some obvious disadvantages, such as fuzzy rules and membership function are determined based on expert experience and the results are highly subjective.
2.3. Summary and Analysis of Research Status at Home and Abroad
A comprehensive analysis of the relevant research literature at home and abroad found that there are already some scholars doing research in the cross section of information and medicine and also achieved good results. However, from the perspective of the technology and method of privacy protection, it can be roughly divided into privacy protection technology based on anonymity and differential privacy; from the perspective of big data security technology, current research is mainly based on cryptography; however, from a management perspective, analysis can be summarized into the following two categories: one is the use of electronic information technology to monitor networks, platforms, and management systems; the other is the use of computer methods to analyze and mine medical data, such as machine learning.
Although there are some similar studies from the perspective of risk, this is still in the initial stage of exploration, and there is no mature theoretical model system; especially for the privacy protection of medical big data based on risk, it is extremely scarce.
The main contributions of this article are as follows:(1)Due to the particularity of the medical field, it is difficult to determine whether a user is an “illegal user” based on the user’s access behavior. Therefore, this article introduces the user’s trust value as one of the risk evaluation indicators. The two jointly evaluate users’ access requests to reduce the possibility of system misjudgment.(2)This paper uses mathematical methods such as information entropy, neural network, fuzzy theory, and probability to establish an adaptive fuzzy neural network model. First, information entropy and probability are used to quantify risk indicators. Then, the knowledge expression ability of the fuzzy theory and the self-learning ability of the neural network are combined, so that the data processing process can be presented in a way that people can easily accept, and at the same time, the risk can be dynamically predicted according to scene changes.
3. Risk Assessment Model Based on Adaptive Neural Fuzzy Theory
Fuzzy theory solves the problems of unclear and uncertain boundaries in intelligent systems by imitating human perception and reasoning . From the perspective of practical application, the application of fuzzy theory mainly focuses on the fuzzy system, especially on fuzzy control. For example, the fuzzy expert system in the medical field is often used for medical diagnosis and decision support . However, there are some disadvantages of fuzzy theory in practical application. For example, in the fuzzy control system, the corresponding rule base should be established according to experience, and the number of rules increases exponentially with the increase of input variables. In addition, the selection of membership functions and optimization work need to be completed manually, and the workload of fuzzy systems in the big data environment becomes extremely complicated . However, the biggest feature of neural networks is to automatically learn new things by imitating the thinking mode of the human brain. The introduction of neural networks into fuzzy theory can help people deal with complex tasks such as rule bases and membership function optimization in fuzzy systems. Therefore, the combination of the two methods can not only improve the expression and learning ability of fuzzy systems but also make the processing of neural networks appear in a way that people can easily accept.
Before introducing how to deal with the security and privacy issues of medical big data with an adaptive neuro-fuzzy system, the relevant theories involved in this model are firstly introduced.
3.1. Relevant Theories and Principles
The risk assessment model based on adaptive neural fuzzy theory mainly involves three key concepts of neural network, neural fuzzy theory, and adaptive neural fuzzy theory. The related content will be described in detail below.
3.1.1. Basic Principles of Neural Networks
A neural network is a network structure formed by the interconnection of many neurons. According to the different connection methods, neural networks can be divided into feed-forward neural networks, feedback neural networks, and self-organizing networks. This article mainly uses feed-forward neural networks, so the other two connection methods will not be introduced here.
Feed-forward neural networks are mainly composed of three parts: the input layer, hidden layer, and output layer. As shown in Figure 1, each circle represents a neuron node, and the output of each layer of neurons will be used as the input of the next layer of neurons .
A BP neural network is a typical feed-forward neural network, and the basic idea is to calculate the error value of the previous layer according to the output layer and, then, further calculate the error value of the previous layer based on this error value. At the same time, the weight coefficients of neurons in each layer are adjusted, and so, it went on until the final error value is within the acceptable range .
3.1.2. Neural Fuzzy Theory
Although the neural network has strong self-learning ability, its modeling process and data processing process have the characteristics of black box learning, and the processing process cannot be presented in a way that people can easily accept. Therefore, combining the ability of fuzzy theory to express the learning process and the self-learning ability of neural networks is undoubtedly the best choice. In the fuzzy system, the fuzzy models can be divided into two types according to the different output results. One is a Mamdani-type fuzzy model, and the other is a Takagi–Sugeno-type fuzzy model. The former output is a fuzzy set, while the latter outputs the input result in linear combinations or constants of variables . Because the specific risk value is finally calculated in this article, this section mainly introduces the combination of the Takagi–Sugeno fuzzy model and neural network, as shown in Figure 2 .
The first layer is used to receive the input variable and pass the input variable to the second layer. The role of the second layer is mainly to blur the input variables and calculate the membership function of each variable, where represents the input variable and represents the number of fuzzy sets corresponding to the variable . The third layer is used to train the antecedents of fuzzy rules, and each node represents a rule. For a specific input , represents the fitness of each rule, where , , , , , and represents the total number of rules. The fourth layer performs normalization processing according to the antecedent of the rule, that is, , where . The fifth layer performs deblurring processing on the results of each rule aggregation to obtain the output result , . For a more intuitive representation, can be written in the form of the following vector :
But, this kind of fuzzy control system completed with the help of neural networks has certain problems when dealing with practical problems, such as adjusting parameters and determining the number of hidden layers.
3.2. Quantification of Medical Big Data Risk Based on Adaptive Neuro-Fuzzy Theory
At present, hospital data are basically stored in a local area network. Generally, the outside world cannot steal the patient’s private information. In addition, the patient’s information will be printed out and stored in the medical record room after the patient is discharged. The workstation of the ordinary user can only query the recent patient’s medical information. However, in order not to affect the normal work of the doctor, some highly qualified doctors or experts will be granted extremely high permissions. They can not only access the patient information of their workstations but also log in to the hospital’s information center to view all the patients’ treatment information. Therefore, their access behavior needs to be evaluated to prevent them from stealing or snooping on patient information.
For the convenience of description, this article divides users into two categories, one is called legal user and the other is called illegal user. Legal users generally only access medical records within their own scope of responsibility, while illegal users, in order to steal more patient information, will also access related medical records by falsifying the patient’s condition while completing their own work or access some patients medical records unrelated to the condition . Therefore, legal users can be distinguished from illegal users based on differences in user access behavior. However, we also need to consider some special situations, such as encountering a patient’s condition is rare and difficult. In order to ensure the accuracy of the diagnosis, legal users may access additional information from the database that is not related to their work objective, and the more senior the expert, the more often the incurable diseases diagnosed.
In this case, it is difficult for the system to judge whether the user is legitimate or illegal just based on the access behavior, and the access request of the legitimate user may be rejected because of misjudgment. In order to solve this problem, we introduce the user’s trust and the user through the user’s access behavior trust together to evaluate the user’s access request; when a user’s access behavior is abnormal, the system will be combined to determine the user’s trust, if the user’s trust is very high, the system will may be allowed to access, but if the user’s trust is lower, then the user will have the risk of stealing the patient’s privacy, which, to some extent, can reduce the possibility of miscalculation. Therefore, this article mainly evaluates user access requests from two aspects: user access behavior and user trust.
3.2.1. Formal Definition of Risk Indicators
Before quantifying the risk of privacy leakage of medical big data, this section first formalizes the key index factors that affect the privacy of medical data and turns it into specific mathematical problems. A user’s access is recorded as a six-tuple:where represents the set of all access requesters, including doctors, nurses, technicians, administrators, or the initiators of other actions; represents the set of all patients in the hospital, and each patient has corresponding medical records; represents a set of task objectives, which is an activity set corresponding to a business process, and each user has his own work objective; represents the collection of patient medical information, including basic patient information, medical conditions, and medical records; represents the trust of all users; is the result value of risk quantification; and , , , and , respectively, represent the number of users, patients, work objectives, and medical records.
Since the Adaptive Neural Fuzzy Inference System (ANFIS) cannot identify qualitative index factors, it is necessary to formally describe the user’s access behavior and trust so that it turns into a quantitative mathematical problem. This section mainly quantifies the user’s access behavior and trust value. The following will introduce the quantification method and process of indicators in detail:
(1) Quantification of User Access Behavior. In order to compare the differences in access behavior between users, we use information entropy to describe the user’s access behavior. Suppose is a random variable and the random distribution of is ; then, the entropy of is
Reference literature , according to the user’s historical access records, respectively, defines the probability of the user choosing the work objective and accessing the medical record stage and, then, defines the information entropy of the user’s selection of the work objective stage and access to the medical record.
Definition 1. Probability that user selects work objective when diagnosing patient .where represents the set of work objective that user accesses when treating patient and represents the number of times the user selects work objective .
Definition 2. Probability that user selects medical record under job objective .where represents the set of medical records accessed by the user when patient and work objective are determined and represents the number of times the user accesses medical records .
Definition 3. Entropy for choosing work objectives (EFCWO) when user diagnoses patient .
Definition 4. Entropy of access to medical records (EATMR) when user is under job objective .(2) Quantification of User Trust. Based on the existing research, this paper mainly divides it into direct trust and recommended trust according to the way of obtaining trust. The following first introduces the related concepts and definitions.
Definition 5. (trust). Trust refers to the dependency relationship between entities. In spite of believing that the other party is trustworthy and upright, trust also has certain risks because trusting the other party means bearing the losses caused and hurt by the other party’s behavior.
Definition 6. (trust value). Trust is an evaluation between entities, which itself has a certain degree of uncertainty and ambiguity, and trust value is a quantification of this uncertainty and is expressed by .Among them, represents direct trust, represents recommendation trust, and and represent two entities.
Definition 7. (trust matrix). It refers to a matrix composed of the trust between entities in a specific context, denoted by .where the element represents the trust degree of entity to entity and the diagonal elements are all 1.
According to the relevant definitions, we will evaluate the user’s trust from two aspects: direct trust and recommended trust.(a)Direct Trust. When evaluating the trust degree of the user , if the evaluation result is the direct experience from the user , the relationship between and is called a direct trust relationship. Assuming that, during the user’s historical interaction, the number of successful interactions between user and user is and the number of interaction failures is ; then, the direct trust relationship function between user and user is defined as in which increases with the number of successful interactions. The premise of ensuring the validity of the relationship function is to have sufficient historical data; that is, the number of interactions between two users must be sufficient. If the number of interactions is very small, the accuracy of the results will be affected. In order to solve this problem, reference literature  in this paper introduces the interaction threshold . When the number of interactions is less than the threshold , the abovementioned formula is adjusted as follows: Therefore, the direct trust relationship between end user and user can be expressed by the following relationship function:(b)Recommendation Trust. The key to recommending trust different from direct trust is that there is no direct empirical relationship between the trustee and the client , but the trust relationship is established indirectly through the introduction of acquaintances. When there is no direct interaction experience between and , or the interaction experience is very limited, in order to be able to objectively evaluate the trust of , can establish an indirect trust relationship with through the introduction of acquaintance . As shown in Figure 3, and are direct trust relationships, and are also direct trust relationships, but and are recommended trust relationships established through . As shown in Figure 3, there are two indirect recommended paths between users and : and ; each path corresponds to a trust value, and the greater the path depth, the lower the trust between entities. Therefore, it is necessary to comprehensively calculate all reachable paths to obtain the final trust degree between and . Assuming that the path depth is and the reachable path is , the corresponding recommendation trust degree has the following definition : Finally, the comprehensive recommended trust degree between and is calculated for all possible path depths and corresponding reachable paths. where and , respectively, represent the minimum path depth and the maximum path depth and indicates the weight of the corresponding recommendation trust when the path depth is and satisfies .(c)Comprehensive Trust. The comprehensive trust degree is the result of combining the direct trust degree and the recommended trust degree in a certain way. This article uses the following expression to express it: Among them, represents the proportion of direct trust in the comprehensive trust. At present, there is no unified standard for the value of , which is generally subjectively determined based on expert experience.
3.2.2. Risk Quantification Method Based on Adaptive Neural Fuzzy Theory
Quantifying the risk of medical big data privacy leakage is a very complicated process because the access behavior and trust of users at each stage are mutually independent and interrelated, and different indicators have different effects on the final risk, which is a nonlinear changing relationship. This article establishes a risk quantification method based on adaptive neuro-fuzzy theory, aiming at solving some problems existing in existing methods, providing a risk assessment model and method specifically for medical big data information security, achieving academic innovation, and providing reference for relevant institutions.
The adaptive neural fuzzy theory mainly uses the self-learning ability of the neural network to learn the existing data, automatically generates the rule base and membership function in the fuzzy system, and does not rely on subjective factors such as expert experience. Matlab provides an adaptive neuro-fuzzy inference system based on the Takagi–Sugeno model . As shown in Figure 4, the quantification process of medical big data privacy leakage risk based on this inference system can be roughly divided into the following four steps: Step 1: the user’s access behavior data and trust data are preprocessed, and the processed data are loaded into the Matlab workshop Step 2: fuzzy C-means clustering or subtractive clustering is used to process the input data to generate the initial fuzzy inference system (FIS) Step 3: on the basis of the initial FIS, the adaptive neural fuzzy inference system trains the inference system according to the existing data, so as to correct and adjust the parameters of each membership function and output function and generate the final FIS Step 4: according to the final training results, the user’s access behavior, trust, and final risk membership function and rule base are recorded
4. Simulation Experiment
4.1. Experimental Environment
The experimental part mainly uses Matlab software to model and analyze the design of the network structure and the specific processing of the data in this paper. Then, the performance of the model is tested and the configuration of the specific experimental environment is shown in Table 1.
4.2. Experimental Data
It is known from the foregoing that, before training a fuzzy neural network, not only an input data set but also a corresponding output data set should be obtained. Therefore, this article not only obtains the following input data before the experimental test, entropy for choosing work objectives (EFCWO), entropy of access to medical records (EATMR), and the user’s trust (UT), but also the output data, risk (Risk).
At present, we have obtained part of the user information form, the doctor’s advice, and the user’s access record form from a hospital. The user information table mainly includes fields such as the user’s ID, department, and title; the medical order is what we usually call the electronic medical record, which mainly records the patient’s medical records, medical plans, and other information; the user’s access record is mainly extracted from the user’s access log, which records the computer model, login time, user access information, and user’s operation. In addition, for partial missing fields, we assume that there are appropriate software components that can automatically obtain information from the system, such as the number of successful and failures interactions between users and the interaction relationship and dynamically adjust these factors when the context changes. The initial value of is set as 0.5, the interaction threshold as , and the weight of the direct trust as . Formulas (5), (6), and (14) are combined to simulate the generation of the user’s EFCWO, EATMR, and UT.
The output data are calculated through the risk index; in literature [25, 28, 31, 34–36], the related risk quantitative method is introduced, and this paper is based on the existing research by cross entropy to measure individual user’s access behavior deviating from all user access behavior of entropy to calculate the risk. Assume that the trust of a user is , the risk caused by choosing the work objective is , and the risk caused by accessing medical records is ; then, the risk calculation formula introduced in literature  can simulate and generate the corresponding output data set.
In summary, 1500 pieces of data were generated for simulation experiments in this paper. In order to avoid omission of data and reduce the chance of test results, this paper uses 10-fold cross validation to test the accuracy of the model. The data set is divided into ten equal parts. First, the first data set for testing data is used, then the remaining 9 parts for training data are used, the second data set was used for testing, and the remaining 9 parts are used for training, and ten verifications are run in turn. In this paper, the first group of data (training data1, testing data1) is used to introduce the whole experimental process in detail.
4.3. Experimental Process
As shown in Figure 4, the risk quantization method based on adaptive neural fuzzy theory is roughly divided into five steps: loading data, generating initial FIS, training FIS, generating final FIS, and outputting results. This section will follow these five steps to conduct the following operations and display the results.
4.3.1. Load Data
First, we need to load the training data into the workspace to form a multi-input single-output data matrix, where the last column defaults to output data because only single-output data formats are supported in the Takagi–Sugeno model-based fuzzy inference system. Through the graphical interface window shown in Figure 5, the training data1 is loaded into the workshop of Matlab, and the final data distribution is shown in Figure 6.
4.3.2. Generating the Initial FIS
Before training FIS, an initial FIS structure is required. In this paper, fuzzy C-means clustering is used to extract the features of the input data to generate the initial FIS. The output of the clustering method represents the membership degree of each data point to each cluster center. Through constant correction of the clustering center point, until the weighted sum of the distance from each data point to the clustering center and the membership degree is the smallest, the output result can be further used to establish the fuzzy inference system. However, the subtractive clustering method is mainly used to estimate the number of data clusters and the location of the cluster center, so the fuzzy C-means clustering algorithm is selected. Among them, the fuzzy subsets of the input variables EFCWO, EATMR, and UT are all divided into 4 categories, which are very low (VL), low (L), medium (M), and high (H). At the same time, according to the distribution characteristics of the user’s risk indicators, most users’ EFCWO, EATMR, and UT are distributed near the mean, and the number of users with very low or very high values of the three indicator variables accounts for only a few parts. The general trend is consistent with the characteristics of the Gaussian distribution. Therefore, the input variable of type selects Gaussian membership functions, and the type of the output variable can only be constant or linear combination of the input variables. The resulting neural network structure and membership function corresponding to each index before system training are shown in Figures 7–10.
4.3.3. Training the Initial FIS and Generating the Final FIS
Based on the initial FIS structure, the neural fuzzy inference system is trained by data loaded into the workspace. But, before training, we need to determine the training method, error accuracy, and training times. The training method mainly includes a hybrid algorithm and BP algorithm. In this paper, the hybrid algorithm is used to train the FIS, the error accuracy is set to 1e-5, and the training number is set to 20 times. As shown in Figure 11–13, after training, the range of fuzzy subsets of input variables and the membership function shape of each index have changed, but the general trend still conforms to the Gaussian distribution. In addition, the ANFIS model structure will not change after training, only some structural parameters will change.
4.3.4. Output Results
This article combines fuzzy theory and neural networks to utilize the self-learning ability of neural networks to adaptively train membership functions and rule bases in fuzzy inference systems. At the same time, the fuzzy theory is used to present the relationship between the input and output of the neural network learning data in a way that people can easily accept. Therefore, this section mainly presents the results of the training in Section 4.3.3 in a formal way.(1) Membership function of input variables: the membership functions after the training of input variables are given, as shown in Figures 11–13, and all conform to the Gaussian distribution. From this, the parameters of each membership function can be obtained, as shown in Table 2.
The formula of the Gaussian membership function is known as ; the parameters in Table 1 are brought into it, and the membership function expression of the input variables can be obtained such as . (2) Membership function of output variables: it is known that there are 3 input variables, and each input variable corresponds to 4 fuzzy subsets. Therefore, the result of all input combinations will produce 64 output records. As shown in Figure 14, the output variable corresponds to 64 membership functions. Among them, the membership function parameter corresponding to after training in the neural network is [0.03051 0.01467 0.01683 0.1191]; then, the function expression corresponding to the output function u1 is , and so, all the output functions can be obtained. For convenience, only the parameters corresponding to each output function are listed here, as shown in Table 3. (3) Rule base: according to the input and output membership function of each indicator, the rule base shown in Table 4 is easy to obtain. For convenience of writing, a, b, and c are used instead of the indicators EFCWO, EATMR, and UT. At the same time, in order to more intuitively represent the impact of changes in each indicator variable on the final risk, this article analyzes the influence of the other two indicators on the final risk by fixing one of the indicator variables and generates a three-dimensional perspective, as shown in Figures 15–17.
When a user requests access to medical information, the system firstly fuzzy processes the indicators EFCWO, EATMR, and UT to calculate the membership of each indicator. Then, all possible output results are listed according to the rule base, and finally, rule aggregation and defuzzification are performed to obtain the final risk value. Assuming that the corresponding three index values of a user are [0.549, 0.504, 0.474], the final output risk value is 0.432, as shown in Figure 18. If the risk value is within the range that the system can tolerate, the system will allow the user to access; otherwise, it will deny its access or pass a risk reduction policy until it meets the system tolerable range.
4.4. Performance Analysis
Firstly, the overall effect of the model is evaluated using testing data 1, and the degree of agreement between the output of the model and the actual output is analyzed through comparative experiments. Figure 19 shows a partial screenshot of the experimental work area in this paper, where the variable ANFIS is the Adaptive Neuro-Fuzzy Inference System after training. From the foregoing, the system has three input variables and one output variable. Therefore, the input variable of testing data 1 is named testing data 1_input and the output variable is named testing data 1_output to generate 150 ∗ 3 and 150 ∗ 1 data structures, respectively.
Then, the comparative analysis results, as shown in Figure 20, can be achieved through the following code: x = (1 : 1 : 150); y = evalfis (Testingdata1_input, ANFIS); y1 = plot (x, Testingdata1_output, “or”) hold on; y2 = plot(x, y, “+k”); legend([y1, y2], “Actual output,”“ANFIS output”)
From the comparison results in Figure 20, it can be clearly seen that the ANFIS output results after training are basically consistent with the actual output results. There is no very obvious error, and the sum of error squares is 7.53521e − 06. The same method was used to perform the remaining nine experiments in sequence, and the results are shown in Table 5.
According to Table 5, the final error value of 10-fold cross validation is 7.0159e − 6, which is less than 1e − 5. Therefore, the model in this paper is feasible in predicting the risk of medical big data privacy disclosure.
Next, the accuracy rate, recall rate, and F1 value of the model will be specifically analyzed under the conditions of different proportions of illegal users. In order to facilitate comparative analysis, this article will refer to the experimental methods of Hui et al.  and Wang and Hong  to evaluate the performance of the medical big data security and privacy protection model based on risk access control proposed in this paper. It is known from the foregoing that illegal users have more diversity and instability in selecting work goals and accessing medical records, and their risk value is higher than that of legitimate users. Therefore, when the users with higher risk values are illegal users and the risk values of illegal users are higher, the model is considered to be effective. This article will set up 6 groups of experiments, each of which will generate 600 users’ EFCWO, EATMR, and UT values and, then, calculate their corresponding risk values through the risk quantification model based on the adaptive neural fuzzy theory. The number of curious doctors accounts for 2.5%, 5%, 7.5%, 10%, 12.5%, and 15%, then each group of data is sorted according to the magnitude of the risk value from high to low, and finally, the experimental results are calculated, as shown in Table 6.
The experimental results show that the performance of the model in this paper improves with the increase of the number of illegal users. This is because when the number of users is constant, the more the number of illegal users, the less the number of legal users and the larger the proportion of illegal users among high-risk users. In addition, through a comparative analysis with the methods of Hui et al.  and Wang and Hong , it is found that when the number of illegal users reaches 15%, the performance of the model does not change significantly, but when the number of illegal users is less than 15%, the performance of the model is significantly superior to the methods of Hui et al.  and Wang et al.  because this paper considers the user’s historical trust value on the basis of both, and the possibility of system misjudgment is reduced to some extent.
This article proposes a risk assessment model for the medical field. This model not only considers the risks that users may bring when choosing work objectives and accessing medical records but also considers the user’s trust and reduces the misjudgment rate of the system on legitimate users under special circumstances. In our model, when a user requests access to medical information, the system can evaluate the risk of the model based on the membership function, output function, and rule base after training and decide whether to grant access based on the size of the risk, which can not only prevent illegal doctors’ excessive access but also will not affect the normal work of the legitimate doctors. In addition, it is proved by comparison experiments that the evaluation output of the model is basically consistent with the actual output, and the recall and accuracy methods are superior to the existing models.
The original data of this article have been signed in a confidentiality agreement with the hospital and are temporarily unavailable, but the processed data (data used to support the research in this article) can be shared publicly and submitted with the manuscript.
Conflicts of Interest
The authors have no conflicts of interest to declare.
This work was supported by the National Natural Science Foundation of China (Nos. 71972165, 61763048, 61263022, and 61303234), National Social Science Foundation of China (No. 12XTQ012), Innovation and Promotion of Education Foundation Project of Science and Technology Development Center of Ministry of Education (No. 2018A01042), Science and Technology Foundation of Yunnan Province (Nos. 2017FB095 and 201901S070110), and the 18th Yunnan Young and Middle-aged Academic and Technical Leaders Reserve Personnel Training Program (No. 2015HB038).
X. T. Jin, Big Data of Health Care, People’s Medical Publishing House, Beijing, China, 2018, in Chinese.
Y. Li, W. Wen, and G. Q. Xie, “Review of differential privacy protection research,” Journal of Computer Applications, vol. 29, no. 9, pp. 3201–3205, 2014, in Chinese.View at: Google Scholar
P. Xiong, T. Q. Zhu, and X. F. Wang, “Differential privacy protection and application,” Journal of Computers, vol. 37, no. 1, pp. 101–122, 2014, in Chinese.View at: Google Scholar
Z. Z. Xian and Q. L. Li, “Application of differential privacy protection in recommendation system,” Journal of Computer Applications, no. 5, pp. 1549–1553, 2016, in Chinese.View at: Google Scholar
X. T. Wu, Research on Privacy Protection and its Key Technologies in Big Data Environment, Nanjing University, Nanjing, China, 2017, in Chinese.
Y. L. Bai, “Application of differential privacy protection in medical big data,” Electronic Technology and Software Engineering, no. 24, p. 163, 2017, in Chinese.View at: Google Scholar
J. Zhang and H. L. Zhang, “Research on security and protection system of big data based on cryptography,” Information Security Research, vol. 3, no. 7, pp. 652–656, 2017, in Chinese.View at: Google Scholar
Y. Tian, Y. B. Peng, and Y. L. Yang, “Attribute-based data access control scheme in wireless body area network,” Journal of Computer Applications, vol. 32, no. 7, pp. 2163–2167, 2015, in Chinese.View at: Google Scholar
H. Xing, Research on Privacy Protection Technology of Wireless Mobile Medical Monitoring Network, Shanghai Jiaotong University, Shanghai, China, 2014, in Chinese.
L. Wei and F. Xu, “Research on privacy protection technology of medical grid,” Computer Technology and Development, vol. 5, pp. 254–257, 2012, in Chinese.View at: Google Scholar
H. H. Wang, X. Wu, and H. Wang, “Research on regional health medical data sharing and privacy protection strategy,” Technology Innovation and Application, no. 31, pp. 181-182, 2017, in Chinese.View at: Google Scholar
H. Q. Chen, “Challenges to privacy protection of medical data in the big data environment and related technologies,” Electronic Technology and Software Engineering, no. 16, pp. 51–53, 2014, in Chinese.View at: Google Scholar
H. S. Gao and Z. Q. Sang, “Life cycle and governance of big data in medical industry,” Journal of Medical Information, vol. 34, no. 9, pp. 7–11, 2013, in Chinese.View at: Google Scholar
X. Liu and X. M. Wang, “Ethical issues in the construction of medical big data,” Ethics Research, no. 6, pp. 119–122, 2015, in Chinese.View at: Google Scholar
JASON Report: JSR-04-132, Broader Access Models for Realizing Information DomiCorporation, MITRE Corporation, McLean, VA, USA, 2004.
Q. Ni, E. Bertino, and J. Lobo, “Risk-based access control systems built on fuzzy inferences,” in Proceedings of the 5th ACM Symposium on Information,Computer and Communications Security Bing: ASIACCS, pp. 250–260, Beijing, China, April 2010.View at: Google Scholar
L. Chen and J. Crampton, “Risk aware role-based access control,” Security and Trust Management, vol. 7071, pp. 140–156, 2011.View at: Google Scholar
N. Diep, L. X. Hung, Y. Zhung, and S. Lee, “Enforcing access control using risk assessment,” Universal Multiservice Networks, vol. 2, pp. 419–424, 2007.View at: Google Scholar
M. Sharma, Y. Bai, S. Chung, and L. Dai, “Using risk in access control for cloud-accessed eHealth,” in Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communications, HPCC, pp. 1047–1052, Liverpool, UK, June 2012.View at: Google Scholar
Q. Wang and J. Hong, “Quantified risk-adaptive access control for patient privacy protection in health information systems,” in Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security: ASIACCS, pp. 406–410, Hong Kong, China, March 2011.View at: Publisher Site | Google Scholar
Z. Hui, H. Li, M. Zhang, and D. G. Feng, “Risk-adaptive access control model for medical big data,” Journal of Communications, vol. 36, no. 12, pp. 190–199, 2015.View at: Google Scholar
J. Zhang, Medical Big Data Privacy Security Risk Assessment in Cloud Environment, Yunnan University of Finance and Economics, Kunming, China, 2018, in Chinese.
J. Li, Y. Bai, and N. Zaman, “A fuzzy modeling approach for risk-based access control in eHealth cloud,” in Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, vol. 2103, pp. 17–21, Melbourne, Australia, July 2013.View at: Publisher Site | Google Scholar
X. Q. Lian, Fuzzy Control Technology, China Electric Power Press, Beijing, China, 2003, in Chinese.
X. M. Shi and Z. Q. Hao, Fuzzy Control and MATLAB Simulation, Tsinghua University Press, Beijing Jiaotong University Press, Beijing, China, 2018, in Chinese.
Z. L. Jiang, Introduction to Artificial Neural Networks, Higher Education Press, Beijing, China, 2001, in Chinese.
X. L. Wu and Z. H. Lin, MATLAB-aided Fuzzy System Design, Xidian University Press, Xi’an, China, 2002, in Chinese.
G. Y. Li and L. J. Yang, Neural, Fuzzy, Predictive Control and Their MATLAB Implementation, Electronic Industry Press, Beijing, China, 2018, in Chinese.
H. H. Niu and L. X. Liu, “Application research of neural network in information security risk assessment,” Computer Simulation, vol. 28, no. 6, pp. 117–120, 2011, in Chinese.View at: Google Scholar
Y. B. Liu, W. F. Zhang, and X. M. Wang, “Access control scheme based on multi-attribute fuzzy trust evaluation in cloud manufacturing environment,” Computer Integrated Manufacturing Systems, vol. 24, no. 2, 2018, in Chinese.View at: Google Scholar
X. J. Shi and W. H. Yu, “Quantitative method of access control risk based on fuzzy neural network,” Intelligent Computer and Application, vol. 8, no. 1, pp. 1–4, 2018, in Chinese.View at: Google Scholar