Abstract

This research identifies the factors influencing the reduction of autopsies in a hospital of Veracruz. The study is based on the application of data mining techniques such as association rules and Bayesian networks in data sets obtained from opinions of physicians. We analyzed, for the exploration and extraction of the knowledge, algorithms like Apriori, FPGrowth, PredictiveApriori, Tertius, J48, NaiveBayes, MultilayerPerceptron, and BayesNet, all of them provided by the API of WEKA. To generate mining models and present the new knowledge in natural language, we also developed a web application. The results presented in this study are those obtained from the best-evaluated algorithms, which have been validated by specialists in the field of pathology.

1. Introduction

Autopsy [1] is a very important practice for medicine. It is the only study that allows identifying the true cause of death of the deceased, studying the evolution of diseases, determining the effectiveness of traditional treatments, discovering new diseases, and much more. However, in a hospital of Veracruz State, this method is practically in disuse, which have motivated the pathology department to investigate the probable causes. Figure 1 shows how autopsy studies have declined in the hospital. In the years 2012 and 2016, none of them were performed. For this reason, in [2, 3], the Apriori algorithm was applied to several data sets obtained by the application of an instrument (survey) to doctors of the hospital, with the aim of finding interesting association rules to identify the main causes for the nonrealization of autopsies in the hospital. In this paper, we present a more complete analysis because we apply four association rule mining algorithms, that is, Apriori, FPGrowth, PredictiveApriori, and Tertius, as well as Bayesian network learning to the data sets in order to identify the factors which influence the reduction of autopsies in the hospital. Also, we evaluated different classification algorithms such as J48, NaiveBayes, MultilayerPerceptron, and Sequential Minimal Optimization (SMO) to determine which is the best to be applied in the classification of open questions of the survey.

The purpose of this study is providing a tool to the department of pathology that makes the analysis of the above-described situation easier and allows discarding or testing hypotheses about the origin of the problem, so that physicians can base their solution proposals on intelligent, statistical, and probabilistic methods. Therefore, we developed a web application, which is capable of generating mining models, interpreting them, and returning the results in a natural language that pathologists can understand.

Prior to the development of the web application, we reviewed several research-related papers, which are described in Section 2. The proposals of solutions and results of these studies demonstrate the usefulness of data mining (DM) to solve problems such as the one presented, so we considered relevant for its solution to identify the factors influencing the reduction of autopsies, through the application of machine learning algorithms for DM tasks.

As a social research technique, we used the survey, a tool that allowed us to collect the opinion of physicians regarding the importance, requesting, and realization of autopsies in the hospital. For some years, autopsies have been subject of interest of some researchers as is evidenced in [4]. This demonstrates that the problem of the study is not trivial and that it is also affecting areas beyond this Mexican hospital, so it is necessary to do research like this one to help continuing the use of this practice in order to maintain quality in medicine.

The disuse of autopsies is an evident fact in many parts of the world, as mentioned in [5]. The paper stated that at health centers in Nigeria there are currently no autopsies and that the most frightening thing is that nothing is done about it. The author, in addition to exposing the real benefits of this practice and mentioning possible reasons that have generated this situation, cited a set of works that report data that make the decrease of autopsies in other places like United Kingdom (UK) evident, where currently autopsies are only performed to 10% of the deceased. In addition, the study ruled out the possibility that autopsy could be replaced by high technology techniques because no one has so far managed to mitigate errors in clinical diagnoses.

Another work that showed that autopsies are declining in many countries of the world is [6]. The authors analyzed the Cancer Registries of Zurich between 1980 and 2010 to see how the number of autopsies in cancer patients has changed. The investigation showed that the autopsy rate decreased from 60% to 7% in the analyzed period.

The rates of autopsies of different places at UK that were presented in [7] are alarming. This research analyzed data from autopsies conducted in 2013 and found that England has an average autopsy rate of 0.51%, Scotland of 2.13%, Wales of 0.65%, and Northern Ireland of 0.46%. As can be seen, most of the values are below 1%, which is why the authors proposed to carry out investigations that analyze the impact of this for patient safety, health, and medical education.

In [8], it was argued that the situation in which the autopsy is going through is serious. It refers to the alarming decline of the autopsy practice, explaining that in the fourth part of the UK National Health Service autopsies are no longer performed and that, among other areas, in Europe and the United States the number of autopsies has also reduced considerably. The main cause of this is that doctors are not requesting them. Therefore, the authors intend to raise awareness of the importance of this medical study in the area of health, as well as among politicians and the general public, all this with the goal of resuming autopsies like a routine procedure.

The research [9] showed the results of a study conducted at Punjab Medical College, Faisalabad, where a group of medical students was surveyed during the 2011 and 2012 academic years. The survey recorded a group of emotional reactions to this practice, but in general all the students recommended its use. This work points out the importance of emphasizing the maintenance of autopsies in medical education, because without this learning resource, future doctors will present problems when explaining procedures they have never seen.

For a better explanation of the objective of study, the rest of the article is structured as follows. Section 2 shows some of the works related to the objective of study of this research, that is, data mining techniques applied to the medical field. The method that was followed to obtain the results of this research is described in Section 3. The association rules and Bayesian networks obtained are presented in Section 4. Finally, the conclusions are given in Section 5.

We can classify the related works into two groups according to the type of DM techniques applied to solve problems in the medical area. We first discuss the articles that are focused on the use of classification techniques, or a combination between several techniques, followed by the works done exclusively under association rules.

2.1. Classification Techniques

With the motivation of supporting the development of the health sector in smart cities, several advances and trends of data mining in this area were described in [10]. According to the authors, Neural Networks and decision trees are the data mining techniques most used in the predictive analysis.

In [11], support vector machines (SVMs) were used to classify features associated with the effects of Human Immunodeficiency Virus (HIV) on the brain during three different periods of the early clinical course: primary infection, 4–12 months postinfection (pi), and >12–24 months pi.

Moreover, the goal of [12] was to demonstrate the use of artificial intelligence-based methods such as Bayesian networks to open up opportunities for creation of new knowledge in management of chronic conditions. The research found links between asthma and renal failure, which demonstrated the usefulness of this method to discern clinically relevant and not always evident correlations.

Also, in [13] it was determined that the best predictor of hypertriglyceridemia, based on traditional indicators derived from anthropometric measurements, may differ according to gender and age. To identify suitable predictors among the group of measures, authors employed two widely used machine learning algorithms to solve classification problems: logistic regression and Naïve Bayes (NB) algorithm.

In order to achieve highly accurate, concise, and interpretable classification rules that facilitate the diagnosis of type 2 diabetes mellitus and medical decision-making, the combined use of the Recursive-Rule eXtraction (Re-RX) and J48graft algorithms was proposed in [14]. Also, using this combination, a new extraction algorithm was developed, which the authors recommended to be tested in other data sets to validate its accuracy.

Different schemas for the identification of class labels for a given data set were compared in [15], to show how the proposal of an Improved Global Feature Selection Scheme (IGFSS) is more efficient than the classic ones. Also, the author described the use of algorithms commonly employed for text classification as NB and SVM, in order to demonstrate the effectiveness of his proposal.

With the goal of predicting the causes of deaths related to the World Health Organization standard classification of diseases, in [16] automatic learning techniques were applied in forensic text reports. In turn, the authors performed a comparison of feature extraction approaches, feature set representation approaches, and text classifiers such as SVM, Random Forest (RF), and NB for the classification of forensic autopsy reports. The data set used was the result of 400 forensic autopsy reports from a hospital in Kuala Lumpur, Malaysia, comprising eight of the most common causes of death. The results of the decision models for SVM exceeded those of RF and NB.

Also the results of [17] were interesting because authors proposed a system of automatic classification (multiclass) to predict the causes of death from decision models of automatic classification of texts. The data analyzed were 2200 autopsy records for accidents at a Kuala Lumpur hospital. The researchers evaluated SVM, NB, K-nearest neighbor (KNN), decision tree (DT), and RF algorithms according to precision, recall, F-measure, and ROC (Receiver Operating Characteristic) area metrics, from the data mining tool WEKA (Waikato Environment for Knowledge Analysis). RF and J48 proved to be the best-evaluated decision models.

The development of efficient and robust schemes for classifying text is of great importance to business intelligence and other areas. For this reason, [18] performed an empirical analysis on statistical methods for the extraction of keywords using the ACM (Association for Computing Machinery) and Reuters-21578 document collections. Authors also described the predictive behavior of classification algorithms and joint learning methods when using keywords to represent scientific text documents, thus demonstrating that as the number of keywords increases, the predictive performance of the classifiers tends to increase too.

In many countries with poor medical care, most deaths occur in households. Unlike hospital deaths, home deaths do not have a standard to be validated as it is indicated by [19], where it is explained that for this reason previous studies have shown contradictory performance of automated methods compared to physician-based classification of causes of death (COD). Therefore, authors compared the NB, open-source Tariff Method (OTM), and InterVA-4 (a model for interpreting verbal autopsy data into COD) classifiers in three data sets comprising about 21000 records of child and adult deaths. The result of the NB classifier overcame the other classifiers, although it was evident that none of the current automated classifiers is capable of adequately performing the individual COD classification.

The problem addressed by [20] was that it is difficult for experts to determine the degree of disease either when they lack sufficient evidence for medical diagnosis or when they have too much evidence. For this reason, authors analyzed important research that deals with the application of automatic learning algorithms for data mining tasks aimed at supporting the diagnosis of heart disease, breast cancer, and diabetes. The purpose of this research was to identify the DM algorithms that can be used efficiently in the field of medical prediction. In that sense, it reaffirmed the importance of diagnosing these diseases in their early stages and consecrated the need for a new approach to reduce the rate of false alarms and increase the detection rate of the disease.

Lastly, a review of sources that describe the application of the different data mining techniques in the medical field is presented in [21] to identify useful classification and clustering approaches for the development of prediction systems. Also, the available data processing and classification tools are discussed and it is explained that, for pattern recognition, the choice of mining tasks depends on the characteristics of the data. Therefore, authors indicated the use of grouping techniques when the data is not labeled and the classification for the opposite case. Their study highlighted the importance of accuracy in the diagnosis of life-threatening diseases, such as cancer and heart disease, and pointed out that it is a factor that requires a novel approach, which alleviates false alarms and improves the diagnosis in the early stages of the diseases.

2.2. Association Rule Mining

Given the variety of traditional Korean medicine in what medicinal herbs are concerned, in [22] data mining association techniques are used to establish various ways of treating the same disease by addressing etiological factors. As a result of the analysis, representative herbs used specifically in each disease were identified.

In order to overcome the disadvantage of the large volume of rules derived from the application of data mining association algorithms to big medical databases, in [23] an ontology based on measures of great interest that favors the establishment of association rules hierarchies was proposed. Thus, this ontology knowledge mining approach is used to rank the semantically interesting rules. This method was applied to data of an ontology that responded to the mammographic domain.

With the goal of improving the quality of the healthcare service for the elderly, satisfying the medical needs of that social group, and making a better management of the medical resources involved, an intelligent medical replenishment system was designed in [24] that, based on fuzzy association rules mining and fuzzy logic, proved to be very effective.

Moreover, using data mining techniques too, [25] presented a medical diagnostic system for web applications, which helps to reduce expense and time of visiting doctors. Using association rules the system processes the information entered by users, analyzes symptoms and correlations of symptoms, and, based on that, is able to give a preliminary diagnosis.

In addition, the risk factors correlated to diabetes mellitus type 2 (DM2) and the way healthcare providers perform the management of this disease were identified in [26] applying association rules. The experiment was conducted using a database of patients with DM2 treated by a healthcare provider entity in Colombia. Moreover, in [27] risks factors and comorbid conditions associated with diabetes were identified through frequent item set mining, which was applied to a set of medical data. The research proposed a new approach based on the integration of improved association and classification techniques, which resulted in an algorithm with greater analytical and predictive power.

Also, a new data mining framework based on generalized association rules to discover multiple-level correlations among patient data was proposed in [28]. The framework discovered and analyzed correlations among prescribed examinations, drugs, medical treatments, and patient profiles at different abstraction levels. Also, rules were classified according to the involved data features (medical treatments or patient profiles) and then explored in a top-down fashion: from the small subset of high-level rules, a drill-down was performed to target more specific rules. Moreover, in [29] the Intensive Care Units risk prediction system, called ICU ARM-II (Association Rule Mining for Intensive Care Units), was presented. ICU ARM-II is based on a set of association rules that forms a flexible model for the prediction of personalized risk. This approach assumes a classification supported by association.

Data mining techniques can be used to improve decision-making in areas such as hospital management. In that sense, they can be very useful to replace the manual analysis of health insurance data. With the increase of people who have joined a plan, that task, going only to limited professional knowledge, has become increasingly difficult and impossible to perform efficiently. Therefore, [30] proposed a classification based on three criteria (precision, stability, and complexity) to allow a more efficient analysis of the volume of data, compared to a manual analysis. The data set used to test the effectiveness of this approach comprises tens of thousands of patients in a city and hundreds of thousands of medical reimbursement records during the 2010–2015 period. The results of the experiment performed from the medical data analyzed by the FPGrowth algorithm demonstrated that the proposed approach improves the decision model, so that the decision-making gains in flexibility and efficiency and surpasses the other schemes in terms of the precision for classification.

Based on machine learning algorithms for data mining, in [31] a study was performed about the characteristics of diseases caused by the mosquito, such as dengue-1, dengue-4, yellow fever, West Nile virus, and filariasis. Although for some of the above-mentioned diseases there is a cure, the authors assumed that as these mainly affect areas of great poverty such as the African continent and Western Asia, people cannot afford the indicated treatment. Therefore, the objective of this study was to find similar characteristics in the amino acid sequences, which allows creating a cure capable of healing the patient at one time. The results of the study showed that although there appear to be similar features between dengue virus, yellow fever virus, West Nile, and Brugia Malayi mitochondrion, the differences between these are stronger than their similarities. On the other hand, authors discovered that Leucine control might contribute to the development of a single effective cure for the cases of West Nile and Brugia Malayi mitochondrion.

Finally, because of the high cancer mortality, [32] investigated the sequences of various cytokines using algorithms such as Apriori, decision tree, and support vector machines (SVM). Cytokines play a central role in the immune system, so the study, if its goal is achieved, may contribute to others in finding new rules to determine whether a cytokine may have anticancer properties or not.

Table 1 shows a summary of these investigations.

As we can see in Table 1, the different studies mentioned before demonstrate the usefulness of data mining techniques for the solution of problems in the medical area, which is raised as a great object of study, with appropriate problems to be studied from this perspective. Nevertheless, to the best of our knowledge, there are not works which used association rule mining and Bayesian networks to analyze the decrease in the number of autopsies performed in a hospital; therefore this determines the appropriateness, novelty, and interest of this research.

3. Methods

3.1. Collection and Preparation of Data

In order to carry out the study, it was necessary to collect data that record aspects about the opinion, attitudes, or behaviors of the physicians about the practice of autopsies, as well as the values, beliefs, or motifs that characterize them. To do this, one of the department’s pathologists compiled a survey of 16 questions, divided into three open type and thirteen closed type, of which five include a section to specify other responses considered by the doctors. Table 2 shows a summary of the survey applied to the physicians and the number of categories generated by response.

The survey was answered by 86 physicians of the hospital. Their answers were processed, resulting in the following:(i)27 categories related to factors that the doctors considered negatives for the realization of autopsies and 26 categories for the positive factors were generated.(ii)Nine motives for the family for autopsy rejection and eight possible reasons for the nonrealization of enough autopsies in the hospital were extracted.(iii)Regarding the opinion of the physicians about the procedure to request an autopsy, 14 efficient methods and six options about the suitable staff to request an autopsy were considered.(iv)The answers of general comments given by the doctors were reduced to 25 categories.(v)The remaining questions kept the proposed options of the questionnaire, three possible answers for the area, and the grade of the doctor and five for each of the three questions related to the medical opinions about the discoveries found in autopsies.

A database was designed and implemented to store the information obtained from the surveys and ensure the persistence of these data, so that they could be used in subsequent analyses. This database was implemented using the PostgreSQL database management system. From the database records, two suitable representations (binary-matrix and minable-view) were created to apply the DM techniques. These structures were created by SQL functions, generated dynamically by the tables, and in this way, two different data sets were formed from the same data. In this paper, the binary-matrix table will be named as C and the minable-view table as D. Their characteristics are described in Table 3.

Minable-View. The function constructs a matrix where rows mean combinations of answers for polls and columns represent the answers. The value of each column responds to the intersection that can be read as a pair .

Binary-Matrix. The function constructs a binary-matrix, in which each row represents a respondent and the columns represent the answers. The value of each column responds to the intersection that can be read as a pair , value being equal to “S” if that answer was answered and void otherwise.

Open answers provide research with a higher level of complexity because in these cases respondents can answer the question by writing a free idea. Because of this, the system has to perform an automatic categorization of the text, where it predicts or assigns a category to that response. To do this, we needed the classified data sets of each question to train the prediction model, so it was necessary that the experts established the possible categories and manually sorted the open answers of the recorded surveys (see Tables 4, 5, and 6). In this way, the data sets “aut_reason”, “reason_no_aut”, and “com_sug_op” were created with the answers to open questions. These ones answered for reasons for requesting autopsies, reasons for not requesting them, and comments (see characteristics in Table 7).

The characteristics of the data sets were analyzed and it was determined that no transformation was necessary because they did not affect the performance of the algorithms that were contemplated to evaluate. So, we went directly to the DM phase.

3.2. Evaluation of Algorithms

According to the data and the objective of this paper, two DM tasks were considered to solve the problem. It was first thought to perform an association analysis to determine the relationships between the attributes and on the other hand to use the classification to recognize the relevant dependencies between attributes, according to probability and statistics, by using also Bayesian networks. Other data mining techniques for text classification were considered too. The comparison of the evaluation of the WEKA algorithms for each DM task considered in the research is presented below.

3.2.1. Association Algorithms

Apriori [33]. It is a classic algorithm for association rule mining. It generates rules through an incremental process that searches for frequent relationships between attributes bounded by a minimum confidence. The algorithm can be configured to run under certain criteria, such as upper and lower coverage limits, and to accept sets of items that meet the constraint, the minimum confidence, and order criteria to display the rules, as well as a parameter to indicate the specific amount of rules we want to show.

FPGrowth [34]. It is based on Apriori to perform the first exploration of the data, in which it identifies the sets of frequent items and their support, value that allows us to organize the sets in a descending way. The method proposes good selectivity and substantially reduces the cost of the search, given that it starts by looking for the shortest frequent patterns and then concatenating them with the less frequent ones (suffixes), and thus identifying the longest frequent patterns. It has been shown to be approximately one order of magnitude faster than the Apriori algorithm.

PredictiveApriori [35]. The algorithm achieves a favorable computational performance due to its dynamic pruning technique that uses the upper bound of all rules of the supersets of a given set of elements. In addition, through a backward bias of the rules, it manages to eliminate redundant ones that are derived from the more general ones. For this algorithm, it is necessary to specify the number of rules that are required.

Tertius [36]. It performs an optimal search based on finding the most confirmed hypotheses using a nonredundant refinement operator to eliminate duplicate results. The algorithm has a series of configuration parameters that allow its application to multiple domains.

The measures considered in the evaluation were as follows:(i)Confirmation: this statistical measure indicates how interesting a rule can be.(ii)Support: it represents the percentage of transactions from a transaction database that the given rule satisfies.(iii)Confidence: it assesses the degree of certainty of the detected association.(iv)Time: amount of milliseconds that takes the construction of a model.

In order to determine which algorithms could be applied to C and D data sets, an analysis was made based on the characteristics of the data sets, such as their attribute types and whether they contained missing, out of range, or inconsistent values. The results of such analysis are shown in Table 8.

The association algorithms were grouped, taking into account their configuration characteristics to compare each other. Therefore, Apriori and FPGrowth were first analyzed. For this, different support and confidence thresholds were established, since rules are considered interesting and strong if they satisfy both a minimum support threshold (min_sup) and a minimum confidence threshold (min_conf) [34]. Moreover, the number of rules generated, the execution time of the algorithm (in milliseconds), and the support and confidence averages for each case were recorded. On the other hand, to analyze PredictiveApriori and Tertius it was necessary to specify the number of rules to be generated by these algorithms. The execution time of the algorithm and the average of support and confidence were registered too. Finally, we compared the best-evaluated algorithms in each of these cases considering the following variables: number of rules generated, execution time, and average values of support.

The results of the evaluations of the algorithms for the C data set were recorded in Tables 9 and 10. Each evaluation was executed 100 times to estimate the average time for the construction of the models. Also, the average values of support and confidence were taken into account. Table 9 shows the comparison between Apriori and FPGrowth, as we can see the latter is computationally faster than the former. Moreover, although Apriori found more rules than FPGrowth in seven cases, the average confidence of the rules found by FPGrowth is greater in three cases and only lowest in one case, while the average support of its rules overcame the average support of the rules obtained by Apriori in four cases. Therefore, FPGrowth is better than Apriori for the C data set.

The comparison between PredictiveApriori and Tertius is presented in Table 10. Experiments demonstrate that Tertius obtained the same number of rules as PredictiveApriori in considerably lower time. Nevertheless, the average support of the rules found by the latter is greater than the average support of the rules of the former in three cases and lower only one time. Figures 2 and 3 show the comparison between the four algorithms with respect to support and time, respectively. Table 9 and Figure 3 show that FPGrowth is the fastest algorithm. In contrast, Table 10 and Figure 3 demonstrate that PredictiveApriori is the slowest. Also, the results indicate that the algorithms that generate rules with better support within the C data set are Apriori and FPGrowth. Thus, the best algorithm for the data set C is FPGrowth.

Table 11 shows the comparison results of the algorithm evaluations for the D data set. Also, each evaluation was performed 100 times to estimate the average time for the construction of the models and the number of rules and average support were taken into account. Apriori reports better results than Tertius because the time it takes to obtain the same number of rules is considerably shorter and the rules that it identifies have better support.

Although FPGrowth was the best algorithm for the C data set, it has the disadvantage that it cannot be applied to the D data set; therefore the Apriori algorithm is considered more appropriate for this work since it generates more rules than FPGrowth and can be used for the two data sets contemplated in this research, as illustrated in Figure 4.

3.2.2. Classification Algorithms

Bayesian networks were considered to analyze the data of the surveys, whereas J48, Neural Networks, NaiveBayes, and Sequential Minimal Optimization (SMO) were studied considering their application in the classification process for the open questions. We performed 10-fold cross-validation in all cases to avoid any problem of overfitting.

BayesNet. It determines the relations of dependence and probabilistic independence between all the variables of a data set, thus conforming the structure of the Bayesian network, represented by an acyclic graph where the nodes are the variables and the arcs are the probabilistic dependencies between the linked attributes [37, 38].

J48. It constructs a binary decision tree to model the classification process. This algorithm ignores the missing values or predicts them according to the known values of the attribute in the other registers [39, 40].

Neural Networks. These are mathematical procedures based on the exploitation of parallel local processing and the properties of distributed representation that imitate the structure of the nervous system and can be interpreted as the way to obtain knowledge from experience [41].

NaiveBayes. It is a probabilistic classifier that calculates the probabilities according to the combinations and frequencies of occurrence of the data in a given data set [39, 40].

SMO. It implements the algorithm to train SVMs and solve the problems of quadratic programming that these presuppose [42].

The measures considered in the evaluation were as follows:

Accuracy. It is the percentage of test set tuples that are correctly classified by the classifier.

ROC Area. It refers to the area under the curve between true positives (-axis) and false positives (-axis); the result is better when it gets closer to one.

Kappa. It determines how good a classifier is according to the concordance of the results obtained by several classifiers of the same type. Values close to 1 affirm a good concordance, while values close to 0 show a concordance due exclusively to chance.

Time. It is the amount of milliseconds that takes the construction of a model.

The possibility of applying Bayesian network learning using various search algorithms in C and D sets was analyzed (see Table 12), as well as J48, Neural Networks, NaiveBayes, and SMO in aut_reason, reason_no_aut, and com_sug_op data sets (see Table 13).

To evaluate the Bayesian networks, the 18 attributes of D data set were considered as classes. Each test was executed 100 times to estimate the average time of construction of the network. Accuracy and ROC area values were considered too. Table 14 shows the results of the tests carried out by taking the grade of the respondent (grade) as a class. In the same way, the results of the 17 classes were recorded, which shows that the best results (see Table 18) were obtained with the TAN (Tree Augmented Naïve Bayes) search algorithm for 14 of the classes and HillClimber for the remaining 4. As we can see in Table 14, HillClimber for the class grade is the best algorithm because although TabuSearch, RepeatedHillClimber, and HillClimber present higher accuracy and ROC area than the other algorithms, HillClimber obtains the Bayesian network in lower time.

The evaluation of J48, Neural Networks (MultilayerPerceptron), NaiveBayes, and SMO algorithms, considered for text classification, is described in Tables 15, 16, and 17 for the aut_reason, reason_no_aut, and com_sug_op data sets, respectively. Each test was performed 100 times to estimate average values in terms of time and in addition other metrics such as accuracy, ROC area, and kappa were considered.

Table 18 presents the best cases for each algorithm analyzed according to the different data sets. This information can be very useful to guide the specialist about the parameters that must be generated by the models and thus obtain more accurate results. It should be noted that only the best configuration for the algorithms is attempted, but regardless of this, the specialists can configure them according to their interests.

3.2.3. Robustness Evaluation

Robustness is the ability of the classifier to make correct predictions given noisy data or data with missing values. This is typically assessed with a series of synthetic data sets representing increases in degrees of noisy and missing values [34]. To evaluate the robustness of the Bayesian network search algorithms, we create 18 data sets with 10% of missing values for each attribute, then we execute every algorithm considering each attribute as a class. Figures 5 and 6 show the evaluation results. TAN was the best algorithm because it had the greatest accuracy for 14 of the 18 classes. Also, it got the best ROC area for the 18 classes.

4. Results

The algorithms evaluated in Section 3.2 of this article were implemented in the web application, a tool proposed by the authors of this work to support the process of finding useful knowledge in the applied survey, generate the models, and submit the results to a thorough evaluation by the experts. These algorithms were configured with the parameters that gave rise to the best results during the evaluation stage.

The application allows physicians to answer the survey, which is the subject of study in this research. In addition, it guarantees the persistence of data and from these the data sets that the mining algorithms considered in this work must analyze are generated. The application, using the WEKA API, generates models according to the selected algorithms and returns the results. The format of the rules generated by the algorithms can be understood by experts in data mining but it is difficult to be understood by a common user, so it was decided to describe in the application each variable involved in the survey. From this, and using a pattern to define the explanation of a rule, it was possible to program a function that, given the rules of a model, returns the explanation of these in a natural language expression. In this way, specialists can perform evaluations without necessarily relying on data mining experts.

4.1. Association

We obtained the best 20 rules from each data set using Apriori, FPGrowth, PredictiveApriori, and Tertius algorithms. Then, the rules were evaluated by the expert. Figure 7 shows the rules generated by the application from the C data set using the Apriori algorithm.

4.2. BayesNet

Given the interest of the specialist to know the behavior of the data for the questions about the reasons why no autopsies are requested and the reasons why they are requested, two Bayesian networks were built taking these attributes as classes. The application shows the accuracy for each network and also represents them by nondirected acyclic graphs. This allows us to graphically appreciate the relationships between the nodes. The graph of the second network is shown in Figure 8.

4.3. Evaluation of the Results

To support the expert in the evaluation process, the system provides a natural language explanation of the mining models results, so that the pathologist understands them. In this way, the specialist can evaluate the results by subjectively analyzing, based on his experience and knowledge, the information extracted by the models.

The association rules analyzed by the specialist were the top 20 of each algorithm implemented in this research. For example, the 20 association rules obtained by Apriori in the C data set have been shown in Figure 7.

For each node in a Bayesian network, the application shows a conditional probability table containing all probabilities of occurrences for its attributes. For this, the Bayesian networks provide much information to be analyzed, but this does not mean that everything is interesting. For this reason, the analysis of the two networks generated was limited to six survey questions, considering only the highest probability values. The questions are related to the years of practice of the physician, the number of cases in which the doctor has intervened, discrepant findings, demand findings, causes of autopsies rejection, reasons for not requesting autopsies in the hospital, and why the physician does not perform it, among others.

To illustrate this procedure more clearly, we describe the analysis process for the no_hosp question from the Bayesian network generated from the query aut_reason of Figure 8. From this network, we selected the node corresponding to the question of interest for this case (no_hosp), which generated a total of 1875 conditional probabilities. The next step was to rule out the conditional probabilities less than 50% and this reduced to 14 the number of results (see Figure 9). In this particular case, the specialist ruled out the conditional probabilities 1, 2, 3, 4, and 11. With this procedure the remaining questions selected in each network were evaluated.

4.3.1. Association

The results comprised 120 rules, 20 per model, of which the specialist evaluated only 100. The decision of not including the rules of FPGrowth was made to avoid repetitions in the results, given that the majority of the rules are also obtained by Apriori.

After a thorough analysis of the rules, the conclusive results were as follows: from 100 rules, the expert approved 75. Eight rules were discarded in each model of Apriori, seven in PredictiveApriori and two in the model of Tertius with the D data set. The most accepted algorithm turned out to be Tertius with 90% of rules approved for the D data set and 100% for the C data set (see Table 19). In general, it can be concluded in the association analysis that the results had a 75% of approval.

4.3.2. BayesNet

It is complex to analyze the data generated by Bayesian models due to the large volume of probability relationships that have been extracted from these networks. This is why, for this research, the expert delimited the analysis to the results with a probability greater than 50% that relate the attributes: years of practice, cases in which the physician has intervened, discrepant findings, demand findings, causes of autopsies rejection, reasons for not requesting autopsies in the hospital, and why the physician does not do it. In this way, 352 probability relationships were evaluated. 168 were extracted from the network generated from the aut_reason class and 184 from the reason_no_aut class (see Table 20).

After a thorough analysis, the conclusive results were as follows: for a total of 352 conditional probabilities, 347 were approved by the expert. The specialist discarded one probability in the Bayesian network for aut_reason and four probabilities in the Bayesian network for reason_no_aut. In general, it is concluded that Bayesian networks had a 98.6% approval (see Table 21).

The networks also made establishing a situational diagnosis of autopsies in hospital possible, which is detailed in Figures 10 and 11. These diagnoses are referred to the causes and reasons for requesting and not requesting autopsies, respectively.

5. Conclusions and Future Work

The prominent decrease in the number of autopsies in the hospitals around the world has raised questions about the motives for this phenomenon. The purpose of this work was to analyze the possible causes of the reduction of autopsies in the hospital system of “Servicios de Salud de Veracruz” by means of association rule mining and Bayesian networks from the data that belong to the medical opinions about such medical practice.

The analyzed data were collected through a survey that was applied to the doctors of the hospital. The survey was focused on the medical opinions about the causes or reasons of autopsies that were not performed, the study level of the specialists, their years of experience, and the cases of autopsies they have been involved in, among others.

The use of association rule mining techniques and Bayes networks allowed us to perform a descriptive analysis of the problematic situation and find the correlations between the categorical attributes of the data set, which formed the information obtained from the medical staff, all this through a web application or system developed especially for the case. The system provides a natural language explanation of the mining models results, so that the pathologist understands them. In this way, the specialist can evaluate the results by subjectively analyzing, based on his experience and knowledge, the information extracted by the models.

The rules generated by the association models instrumented for the research had 75% approval by the specialist. As for the algorithms, Tertius proved to be the most accurate, with 90% approval of its rules in the C data set and 100% in the D data set.

As future work we suggest studying data of the clinical records of the patients who died in the hospital and analyzing with real data the trend of the causes that lead to perform autopsies in some patients and not in others. This will confirm the veracity of the results of this research. We also recommend to perform similar studies in other parts of the country and to identify whether the medical opinions and the consequences of autopsies rejection differ by region.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are very grateful to the National Technological of Mexico for supporting this work. Also, this research was sponsored by the National Council of Science and Technology (CONACYT), as well as by the Public Education Secretary (SEP) through PRODEP.