Abstract

The purpose of this paper is to identify the medical risk of applying big data technology and to build a medical big data risk (MBDR) control process and manage medical big data risk (MBDR) from a systematic perspective. In this process, we firstly used systematic literature reviews (SLRs) method to systematically search 322 papers in web of science with the topics of “medical risk” and “big data risk” to build a dimensional system of medical big data risk (MBDR) from the theoretical level. Based on a case study of a hospital in Shanghai, we explored the formation mechanism and interaction effect of medical big data risk (MBDR) by using Bayesian belief networks (BBNs) method, and built a systematic risk control process. This paper finally finds that: the dimensional system of medical big data risk (MBDR) includes 24 subdimensions and 5 major categories of dimensions, which helps to explore the medical application of big data technology from a risk perspective. In addition, the medical big data risk (MBDR) control process constructed in this paper includes: risk prediction, reverse reasoning, risk control, and risk prevention in 4 aspects, which is important for hospitals to actually carry out medical big data risk (MBDR) control.

1. Introduction

The emergence of smart health care can reduce the risk of patients contracting novel corona virus during hospital visits and improves the level of isolation control in hospital areas [1]. Thus, the development of smart healthcare has become an important and widely followed research issue. In recent years, with the development of big data technology, many hospitals have been mining and analyzing medical big data through big data-related technologies, laying the foundation for further building smart hospitals [24]. It has also been confirmed that the medical application of big data technology plays an important role in safeguarding human lives [59]. However, they ignore the fact that there are risks in the medical application of big data technology and such risks can hinder the process of smart healthcare development [10]. Therefore, it is important to explore the risks of the medical application of big data technology and eventually build a control process for the medical big data risk (MBDR).

At present, research on the medical application of big data technology still suffers from the following three deficiencies: for one, previous research by scholars has focused on explaining how big data technology can contribute to the development of smart healthcare, such as the medical application of big data technology can help hospitals to obtain information on key characteristics over time [11], and facilitate the provision of personalized medical decision support for realizing personalized precision medicine for patients [12, 13]. Even though some studies have identified the risks associated with the medical application of big data technology, such as, the issue of information security of patients, especially the leakage of patient privacy [14], the lack of laws about governing the access, use, and intellectual property protection related to big data [15]. There is a lack of medical big data research on risk prevention and early warning issues. Second, the risk mechanism of big data technology applied to the medical process needs to be further explored, which is important for hospitals to carry out smart healthcare. Third, the existing research lacks an MBDR control process, especially even though there are a few studies involving the construction of medical risk control process, it lacks the connection with big data technology [8]. The whole process of MBDR control lacks integrity, including the functions of prediction and diagnosis emphasized by risk control are not reflected.

Based on the compilation of the existing literature, this paper finds that the current research still lacks the analysis of the risks in the medical process and lacks a perfect, systematic, and operable process. This paper aims to solve the following problems, firstly, how to construct a dimensional system of medical big data risk from the theoretical level; secondly, how to clarify the internal mechanism of this risk theoretical system, and finally, how to construct a medical big data risk control process system. Based on this idea, this paper firstly searched 322 papers by using SLRs method with the keywords of “smart medical,” “healthcare,” “big data” and “risk,” and then further filtered 104 papers that fit the theme, and summarized 24 subdimension risks of MBDR, and couples them into 5 major risk dimensions, including: customer risk, financial risk, external environment risk, medical quality management risk, and information system risk. After that, this paper predicts and diagnoses the MBDR of a hospital in Shanghai based on the structured interviews of 6 experts of the hospital. Finally, based on the case study of the hospital, this paper constructs a comprehensive and systematic risk control process using Bayesian belief networks (BBNs), which includes 4 aspects: risk prediction, reverse reasoning, risk control, and risk prevention.

The contribution of this paper is that firstly, it emphasizes the medical risk of applying big data technology from the perspective of risk and establishes a dimensional system of MBDR. Secondly, it explores the relationship between the risks in the process of medical application of big data technology with a case of a hospital in Shanghai, which is important for hospitals to pay attention to MBDR and carry out risk control, and helps hospitals to realize the importance of controlling MBDR. Finally, this paper constructs an innovative risk control process, which clarifies the process of risk control in 4 aspects: prediction, reverse reasoning, control and prevention, and helps hospitals to better carry out smart medical practice by using big data technology.

2. Literature Review Based on SLRs

Systematic literature reviews (SLRs) method is systematic and replicable based. The SLRs method was initially applied in the field of sociology [16] and due to the scientific nature of the method, its gradual acceptance by other more research fields [17]. In this paper, a systematic review of the literature is conducted in accordance with the specifications of SLRs, which aims to search, evaluate, and analyze all literature related to a specific research area, thus avoiding the impact of the limitations of a single piece of literature on the results. SLRs are characterized by an exhaustive literature search to minimize judgmental errors in a scientific and transparent process. There are many studies on how to conduct a literature review using SLRs, and in this paper, a systematic review of the literature is conducted according to the following steps: determining the purpose of the review; literature search; literature screening; quality evaluation; data acquisition; integration of studies; and writing the literature review [18].

2.1. Operation Steps of SLRs

In the first step, keywords were selected according to the purpose of the study, and literature inclusion criteria were identified and searched. Firstly, a search was conducted in Web of Science with the topic of “smart medical,” “healthcare,” “big data,” and “risk,” and a total of 322 English literature was obtained and screened up to September 2021, the time of the paper writing.

In the second step, in order to exclude possible errors in the search process, the 322 searched papers were firstly screened by the source journals, and those that did not fit the theme of “big data in healthcare” were excluded, and a total of 190 papers were excluded at this stage. Subsequently, the literature that did not fit the theme of “big data risk in health care” was excluded, and a total of 28 papers were excluded [19]. Finally, 104 documents highly relevant to the topic of “big data risk in health care” were identified and coded as the English literature base.

2.2. Defining the Risk Dimensions of Medical Big Data Based on SLRs

In this paper, we used the “explore-word-frequency-cluster analysis” function in NVivo 12 software to coupled five types of MBDR based on the screened literature, including: customer risk, financial risk, external environment risk, medical quality management risk, and information system risk. The wordcloud is also shown in Figure 1. The frequency of the keywords determines the font size, and the frequency of the words “patient, financial, environment, quality, and information” is high.

2.2.1. Customer Risk

On the customer side, there are concerns that the application of big data technologies in healthcare will marginalize certain populations and put those who lack the knowledge or ability to use digital resources at risk of digital exclusion [20]. At the same time, it will further deepen the digital divide as these populations will not be able to utilize digital health services [21]. In addition, some scholars point out that the application of big data, if not properly managed, will raise customer privacy issues, such as the risk of data leakage of patient medical information [22], the use of data for discriminatory or other harmful purposes [23].

2.2.2. Financial Risk

On the financial side, reducing the cost of applying big data is a major issue facing the medical industry [24]. It has been pointed out that the significant growth in patient data will lead to a dramatic increase in costs, which will eventually make it difficult for hospitals to maintain large amounts of data [24]. It has also been pointed out that the use of big data technology requires additional costs and premiums due to the complex analytics systems that insurers and medical organizations must have to identify cost overruns due to fraud, abuse, and errors [25, 26], while acquiring and cleaning data in the process of using big data technology is expensive and time-consuming [27]. Moreover, in addition to the cost risk, it has been pointed out that the medical application of big data technology requires significant investments [24].

2.2.3. External Environmental Risk

The external environment risks are mainly about the legal, policy, and security regulatory risks facing the medical application of big data technology. On the legal side, regulatory policies regarding big data healthcare are changing due to the dynamic nature of laws [26], while relevant laws need to be further improved and there is a lack of effective protection of patients’ or users’ privacy [28]. The policy aspects mainly include the structure of big data medical policy form, the responsibility and power of policy subjects, and the lack of detailed policies of big data medical [29]. In terms of safety regulation, there are problems such as no unified authoritative national standards for smart medical safety regulation and no specialized organization for reviewing smart medical devices [26].

2.2.4. Medical Quality Management Risk

In the medical application of big data technology, some scholars have noted that there are risks in the quality management of healthcare [30]. There is often a lack of digital management and quality control during the drug implementation process, and the business process of medication implementation lacks real-time, accurate, and nonrepudiation implementation records [31]. The medical application of big data technology delivery may also pose risks, such as the risk of malpractice and misdiagnosis and omission is unpredictable, especially when telemedicine practices based on medical big data technologies will further increase the likelihood of risk occurrence (Abugabah et al., 2017). In addition, AI based on big data technologies must be integrated into the physician’s workflow, which means it must be closely linked to the workflow of the image review workstation, which may result in disruptions or slowdowns in workflow and productivity [32].

2.2.5. Information System Risk

In terms of information systems, there may be a series of problems with data access and data sharing. Some scholars have pointed out the difficulties in accessing patients’ data after applying big data technologies to healthcare services, which may be due to the increasing restrictions on the release of private health data by laws and regulations. And based on the competitive nature of the information technology market, electronic medical record providers may be reluctant to provide data to big data software providers [32]. In addition, medical information systems are mainly concerned with issues such as medical data transmission and medical data security, which are profitable for hackers, who may be driven by profit to illegally hack into medical information systems and steal patients’ personal health data [29]. The MBDR dimension system is shown in Table 1.

3. Bayesian-Based MBDR Control Process Construction

3.1. Bayesian Belief Networks

Bayesian belief networks (BBNs) model is a probabilistic graphical network based on probabilistic reasoning [38]. It is a directed acyclic graphical network consisting of a set of random variables linked by conditional probabilities, with many nodes, each of which has a finite number of mutually independent states. In 1988, Pearl proposed the BBNs, which are mainly used to solve the problem of uncertainty and incompleteness [39].

BBNs have the ability to describe event polymorphism and relational nondeterminism, enabling forward inference, backward inference, and sensitivity inference.(1)Forward inference: It is mainly used for reliability analysis, where the probability of occurrence of a leaf node can be inferred based on the prior probability of the root node. This inference method predicts the “outcome” based on the “cause”. It predicts the possible outcome of the node state if the variables are known to be in a certain state.(2)Backward reasoning: It is mainly used for cause diagnosis, based on Bayesian formula to reason out the most approximate cause chain from the bottom up. It reverses inference of “cause” based on “result”, which means inferring the possible causes of an event if it is known to have occurred.(3)Sensitivity reasoning: It is a way of reasoning to identify the nodal variables that have a significant impact on the target node, and to analyze the degree of influence between “causes” and “reasons”. In other words, when multiple causes are known, the degree of correlation between different causes is analyzed, and the primary and secondary causes of the event are identified.

In the practical application of BBNs, the network structure is generally modeled based on the knowledge and experience of experts in the relevant field, and then the conditional probability distribution table of each node in the network is determined based on questionnaires and statistical data. [40]. BBNs built by combining observed data with expert experience are more objective and accurate and are called posterior BBNs [41].

Among practical applications, there are several important formulas in BBNs, as follows [42].

For the identified BBNs, the joint probability distribution of the nodes is formulated as

Bayesian parameters learning algorithm: compute the expectation of the probability distribution of the full sample based on the observed variables and current parameter values, as in the following equation:where D is the sample data set and is the conditional probability distribution of the parameter of interest.

To find the maximum expectation of the probability distribution of the (2) formula, the value of the formula as

The analysis process of Bayesian network is carried out through the knowledge of probability theory, and figuring out the relationship between its nodes and directed edges can be a good solution to the problem. In this paper, we mainly use Bayesian network model to evaluate the risk of applying big data in healthcare. The advantages of using Bayesian network model are mainly reflected in the following four points.(1)Bayesian network is a directed acyclic graph, which can graphically present the whole risk formation mechanism completely and clearly, can express the logical relationship between each risk factor more clearly, make the process of uncertainty reasoning also more intuitive, and express the process of risk factors leading to the occurrence of medical big data and the relationship between them more clearly.(2)Bayesian network model can make full use of both expert a priori knowledge and data for machine learning, which greatly reduces the influence of human subjective factors on the evaluation results. There are many factors affecting the risk of medical big data, and the relationship between each risk factor is more complex, so it can be learned with the powerful logical reasoning ability of Bayesian network, and then combined with expert knowledge for correction, which not only avoids overfitting the risk data of medical big data, but also avoids the influence of human subjective factors on the evaluation results.(3)The Bayesian network model has the functions of forward inference, reverse inference, and sensitivity inference, which can both project the size of medical big data risks and reverse inference to find out the key factors affecting medical big data risks, and provide the basis for formulating targeted measures and rectification opinions.(4)With the continuous research on Bayesian networks, Bayesian networks have been used for risk evaluation work in many fields. Meanwhile, the algorithms for Bayesian network learning and inference are becoming more and more mature, and the software about Bayesian network algorithm is also developing and maturing, and the use of Bayesian network analysis software can build models quickly and accurately, which reduces the tedious calculation process and improves the accuracy of calculation results.

3.2. Bayesian-Based Case Study

The case selected in this paper is a well-known medical unit in Shanghai, which is a comprehensive hospital integrating medical treatment, teaching, scientific research, and prevention, and is also a national clinical drug trial institution, a specialist training base of the Ministry of Health, and a clinical base for the standardized training of resident doctors and general practitioners in Shanghai. At present, the unit is focusing on improving the modern hospital governance and lean management, helping the hospital development with intelligent management and smart medical care, and through digital transformation and intelligent support, the hospital’s operation, management, and service mode will undergo significant changes. This unit is one of the first hospitals in China to begin digital transformation and is therefore highly representative. Using platforms such as the pre-Internet hospital, various Internet applications, remote diagnosis and treatment, blockchain, and 5G, a new model of online and offline integrated services will be formed and the current wisdom ward of general medicine will be continuously upgraded and transformed on the basis of the current one, bringing patients a more comfortable medical experience.

Common data collection methods used in case studies include the documentary method, archival recording method, interview method, direct observation method, and participant observation method. In this paper, the interview method is used to conduct the case study. The interviewees are six experts from the outpatient, finance, administration, quality management, and logistics departments of a hospital in Shanghai, who are familiar with the business involved in each of the five risk dimensions of big data in healthcare, and one expert who is a senior manager of the hospital with expertise in the application of big data in healthcare. In the questionnaire research phase, the six experts were asked to answer a questionnaire on the risk of big data application in their hospital based on their experience and knowledge, and based on the collected questionnaires this paper used GeNIe software to initially construct Bayesian belief networks (BBNs). In the adjustment phase, six semistructured interviews were conducted to ask the experts about their suggestions related to the BBNs identified in the initial phase, each interview lasted 60 minutes on average, and they suggested to adjust the relationship between some subfactors. In the structure determination stage, this paper combined the experts’ suggestions and adjusted the subfactor relationships suggested by the experts in “Background Knowledge” in GeNIe software for background learning to further improve the BBNs structure and finally determined the BBNs structure, which is shown in Figure 2.

4. Discussion

Based on the results of SLR, this study collects 6 valid questionnaires from experts, constructs Bayesian neural network structure, firstly, with the help of GeNIe software’s risk prediction and inverse reasoning functions, reasoning from forward and reverse directions, respectively, both of which aim to find out the sensitive factors causing the occurrence of risk in the application of big data technology in a hospital in Shanghai and explore the mechanism of risk formation in medical big data, secondly, uses risk control means to control the system risk of medical big data, and finally uses risk prevention means to propose risk control suggestions in a targeted manner.

4.1. Risk Prediction Function

First, the H-value obtained from the BBNs structure shows that the risk of big data in health care is at a high risk of 34%. The root causes are “customer privacy issues,” “the rejection of digital technology,” “distrust of information,” “customer growth costs,” “cost of data processing,” and “need for large amounts of data,”“cost of customer growth”, “data processing costs”, “require huge investment”, “privacy laws are missing,” “complexity of medical services,” “medical data transmission issues,” “hacking problem,” “limited infrastructure” at the root node”. Secondly, the probability change of subfactors being at high risk affects the probability change of the whole risk system, and the hospital can use this function to predict the results of each risk when it occurs as shown in Figure 3, and take risk prevention measures in advance. So this paper sets each root node variable to a known state, the probability of MBDR occurrence increases, among which the cost of customer growth and infrastructure limitations trigger the greatest change, indicating that the cause of MBDR occurrence by the cost of customer growth and infrastructure limitations has a greater impact, hospitals should pay more attention to cost and infrastructure aspects of risk prevention.

4.2. Reverse Reasoning Function

First, when the risk level of medical big data reaches the highest, the magnitude of the influence of intermediate nodes on the risk level of medical big data is analyzed. From Figure 4, we can find that the probability of the intermediate nodes “customer risk,” “financial risk” “external environment risk,” “medical quality management risk,” and “information system risk” being in high risk, medium risk, and low risk has increased, indicating that the intermediate nodes have a significant impact on the level of big data medical risk.

Second, after determining what factors affect the occurrence of MBDR, we find that the biggest cause node of MBDR is medical quality management risk, and then continue to reason backwards to the cause node, gradually trace back to the root node, and finally determine the chain of causes to the target node. The chain of causes (complexity of medical services - incorrect diagnosis - workflow risk - medical quality management risk - medical big data risk) for the occurrence of “MBDR” is shown in Figure 5, and then external factors are found to control the occurrence of this risk. The analysis of the causal chain of the BBNs structure of the MBDR further explored the causes and mechanisms of the big data application risk in the hospital.

4.3. Risk Control Means

Based on risk prediction and cause diagnosis, this paper proposes the following principles that system dynamics issues arise in the process of risk control. System dynamics emphasizes the view of all factors in a system as a whole, and focuses mainly on exploring the dynamic change patterns presented by complex systems under the action of various cause-effect relationships at the strategic level. The study of system dynamics in medical risk management is one of its key areas of application. System dynamics models outperform other traditional methods in effectively identifying the interrelationships between risk factors [43]. They help to visualize the internal structure of the risk system by means of a chain of loops, thus reflecting the specific transmission process of risk [44]. They can be used to analyze, adjust, and control the dynamic relationship changes among factors, which helps to accurately measure the probability of occurrence of risk factors and develop corresponding control strategies [45].

In the research process, first of all, this paper finds that MBDR is a “chain” overall system. The system consists of multiple risk factors, each of which is in a state of constant change, and there are different chains of causes interacting with each other, so the system is a relatively complex system. From the perspective of system dynamics, this paper should find the cause of the whole dynamic change, accurately classify each risk factor, and make different risk control decisions, which can help to improve the hospital’s risk analysis ability and comprehensive risk management level. In addition, this chain system can find out the key paths that lead to the occurrence of MBDR and provide reference for the accurate identification of risk sources. In the control process, the interplay between the three layers, from the to-cause point to the to-cause chain, from the to-cause chain to the substructure, and from the substructure to the system, is crucial. This helps hospitals to cut off and control process risks to effectively safeguard patients’ lives.

4.4. Risk Prevention Means

The MBDR system is a complex and dynamic system, and all influencing factors within the system can have an impact on the development of intelligent medical care in hospitals. To achieve the purpose of improving the development of intelligent medical care, we should start from five aspects: raise the awareness of protecting customers’ privacy and security, improve financial management capabilities, improve laws and regulations and policies, improve the quality of medical management, and improve the medical information system.

4.4.1. Raise Awareness of Protecting Customers’ Privacy and Security

With the rapid development of smart medical care, more and more detailed patient information is exposed to the Internet, so it is important to protect customer privacy and security. Hospitals should focus on three aspects in the application of big data technology. First, hospitals should strengthen the management of patients’ personal information, including patients’ personal data, disease diagnosis, and medical and nursing treatment operations. Secondly, hospitals should focus on strengthening the professionalism of medical and nursing staff, restricting access to patient medical records, increasing medical safety supervision and inspection, and effectively safeguarding patient privacy information and medical data security. Finally, hospitals should respect patients’ right to informed consent if they need to use patient information twice when applying big data technology.

4.4.2. Improve Financial Management Capabilities

Improving financial management capability and preventing financial risks are important for hospitals to apply big data technology. Improving the financial management capacity can be from the external prevention and internal enhancement of two aspects. External prevention is mainly required, prior to a comprehensive data analysis and feasibility study before the specific implementation of big data technology investment business activities. Internal enhancement includes two aspects, on the one hand is to improve the financial management system, mainly including the establishment of a sound financial budget mechanism, improving the financial budget mechanism, and improving the hospital’s investment analysis and risk control capabilities. On the other hand, it is to give full play to the supervision role of internal audit, and the internal audit organization needs to monitor the internal control work of the hospital from several dimensions to ensure the perfection of its construction work and bring its management function into full play.

4.4.3. Improve Laws and Regulations and Policies

A good legal and policy environment can encourage hospitals to actively participate in medical big data work. First of all, in the formulation of relevant policies, the use of mandatory policies should be reduced, and the application of demand-based policies should be increased to achieve the diversified development of policy use. At the same time, attention should be paid to increasing the training of big data technology talents and establishing a perfect professional talent training system and assessment mechanism. In the formulation of laws and regulations, we need to focus on two aspects; on the one hand, we strengthen the value of laws and regulations research and clarify the types, characteristics, conditions of application of various laws and regulations, etc., to provide theoretical support for hospitals to reasonably select and apply big data technology. On the other hand, we ensure the implementation of laws and regulations and crack down on the illegal acts of applying big data technology in hospitals so that medical management can be carried out in an orderly manner.

4.4.4. Improve the Quality of Medical Management

To improve the quality of medical management, we can follow the concept of “one optimization and two standards”. “One optimization” refers to the optimization of business processes, that is, the hospital should give full play to the characteristics of the computer network, the establishment of “patient-centered” medical service model, simplify the patient consultation is not directly related to the service links. The “two standards” refer to: medical behavior norms, that is, the hospital should standardize the operation of the business platform, such as electronic medical documents generated in accordance with national norms compared to the original handwritten format is more standardized, can retain the original modification traces; monitoring link norms, that is, through the standardized system monitoring to reduce manual operation links and ensure the standardization and accuracy of the treatment process.

4.4.5. Improve the Information System

The concept of “one center with three protections” can be used to improve the information system in the process of applying big medical data. “One center” refers to the construction of a centralized monitoring platform for medical system security in hospitals. The “three protections” are: security protection for the computer environment, i.e., to improve the single function of intelligent medical devices to block data leakage and virus transmission; security protection for network communication, i.e., to deeply analyze the special communication protocols in the medical field to trace and control the source of data leakage and network attacks; security protection for regional boundaries, i.e. security partitioning of hospital networks and protection of intranet security by deploying firewalls in a series of ways. The control process of MBDR is shown in Figure 6.

This risk control process helps to assess the drivers of MBDR and the dependent influence among factors and to combine this influence relationship with MBDR control instruments. This risk control process provides an effective information visualization flowchart to help decision makers adjust the probability of subrisk factors according to the risk HIGH value, and further take a series of measures including risk prediction, reverse reasoning, risk control, and risk prevention by observing the probability changes. The risk control process introduced in this paper provides an opportunity for researchers from different fields, including but not limited to big data technology applications under the fields of medicine and healthcare, finance and insurance, transportation management, logistics and retail, and culture and entertainment, to manage various risks in the process of applying big data technology more systematically and easily adopted by practitioners.

5. Conclusion

The main contribution of this paper is to combine the SLRs method with the Bayesian neural network method, and use the SLRs method to systematically organize the medical big data risk dimension system, which lays the foundation for the initial determination of the Bayesian neural network structure. Secondly, the initially determined Bayesian neural network structure is revised by combining the case study of a hospital in Shanghai with expert knowledge. Finally, the risk prediction and sensitivity inference functions of Bayesian networks are used to explore the formation mechanism and interaction effect of medical big data risk in medical industry practice from a systematic perspective, and finally the whole process of medical big data risk control is constructed, which elevates medical big data risk to an operable level and has theoretical and practical significance for hospitals to carry out smart medical practice.

The research still has some limitations. Firstly, the case study part only focuses on one comprehensive medical unit, a hospital in Shanghai, and the survey subjects are not comprehensive enough, so the generalizability of the study findings needs to be verified, and other medical institutions that are developing big data technology applications can be further explored in the future. Secondly, only six valid questionnaires were collected from experts in this study, and the number of respondents is small, so the sample data needs to be further expanded in the future. Finally, the respondents of this study hold different views on the questionnaire items due to their own knowledge and experience, and the subjective judgment bias of different experts should be minimized in the future to explore a more effective MBDR control process.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by Social Science Research Planning Project of Education Department of Jilin Province (JJKH20221179SK).