Data Preprocessing and Model Design for Medicine ProblemsView this Special Issue
Machine Learning Approach to Extract Diagnostic and Prognostic Thresholds: Application in Prognosis of Cardiovascular Mortality
Machine learning has become a powerful tool for analysing medical domains, assessing the importance of clinical parameters, and extracting medical knowledge for outcomes research. In this paper, we present a machine learning method for extracting diagnostic and prognostic thresholds, based on a symbolic classification algorithm called REMED. We evaluated the performance of our method by determining new prognostic thresholds for well-known and potential cardiovascular risk factors that are used to support medical decisions in the prognosis of fatal cardiovascular diseases. Our approach predicted 36% of cardiovascular deaths with 80% specificity and 75% general accuracy. The new method provides an innovative approach that might be useful to support decisions about medical diagnoses and prognoses.
Machine learning (ML) disciplines provide computational methods and learning mechanisms that can help generate new knowledge from large databases. Applications of ML are useful for constructing approaches to solving problems of classification, prediction, recognition patterns, and knowledge extraction, where the data take the form of a set of examples, and the output takes the form of prediction of new examples [1, 2]. In this sense, ML can provide techniques and tools that help solve diagnostic and prognostic problems in medical domains, where the input is a dataset with characteristics of the subjects, and the output is a diagnosis or prognosis of a specific disease . Although diagnosis and prognosis are relatively straightforward ML problems, clinical decision-making using ML applications is not yet widely used by the medical community , because such a complex task requires not only accuracy, but also the confidence of physician specialists about the functional use of ML approaches in the medical field.
To successfully implement an ML application in problems related to clinical decisions, it is necessary to consider some specific requirements [4, 5]. For example, the prediction of disease progression is generally associated with the evolution of certain risk factors; in the case of some chronic diseases (e.g., cancer, cardiovascular diseases, and diabetes), the risk factors include nonchangeable characteristics, such as age or gender. The use of such nonchangeable qualities to predict the onset of a disease might not be as useful for avoiding evolution of the disease, because currently there is no medical treatment for modifying these biological characteristics. Thus, ML applications usually focus on changeable qualities, which make the prognostic task more difficult and complex.
Another important aspect to consider is the need to obtain interpretable approximations, in order to provide medical staff with useful information about the given problem. This is typically achieved using symbolic learning methods (e.g., decision trees and rules systems), which allow decisions to be explained in an easily comprehensible manner. However, the use of a symbolic learning algorithm to obtain a more comprehensible model frequently sacrifices accuracy in the prediction.
Another problem that often hinders high overall performance in the analysis of medical datasets is that generally these exhibit an unbalanced class distribution , which include a majority or negative class of healthy people (normal data) and a minority or positive class of sick people (the important class) with higher cost of erroneous classification. The latter usually has a higher rate of misclassification, because the performance of standard ML algorithms tends to be overwhelmed by the majority class, ignoring the minority class examples and obtaining results with acceptable accuracy and specificity (healthy subjects diagnosed correctly), but low sensitivity (sick subjects diagnosed correctly).
In addition to developing ML approaches that result in good overall performance and provide medical staff with interpretable prognostic information, providing the ability to support decisions and to reduce the number of medical tests for a reliable prognosis are also desirable. A measure of reliability of the diagnosis or prognosis is also important, because this would give medical staff sufficient confidence to put the new approach into practice. On the other hand, it is also desirable to have an approach that can provide reliable predictions based on a small amount of information about the patient, because collection of that information is often expensive, possibly subject to privacy issues, time consuming, and possibly harmful to the patient .
The present study focused on the implementation of a ML method to support medical decisions in the prognosis of fatal cardiovascular diseases, which are ranked among the top ten in the global disease burden . The goal was to solve previously identified problems, through interdisciplinary work that included the collection and preprocessing of data from an ambulatory blood pressure (ABP) monitoring study , the implementation of a current ML algorithm with specific application to medical diagnosis and prognosis , and the identification of new prognostic thresholds for risk factors of cardiovascular mortality.
2.1. Ambulatory Blood Pressure Monitoring
Currently available ABP monitors are fully automatic and portable devices (Figure 1) that can record BP for 24 hours or longer, while patients go about their normal daily activities . This BP measurement technique provides a better estimate of risk in an individual patient than the traditional method, because it removes variability among individual observers, avoids the “white coat” effect (the transient but variable elevation of BP in a medical environment)  and the “masked hypertension” (normotensive by clinic measurement and hypertensive by ambulatory measurement)  and includes the inherent variability of BP . Detailed descriptions of the ABP measurement methods are provided in previous reports of the Maracaibo Aging Study (MAS) [8, 14, 15].
The MAS is an ongoing population-based, longitudinal study that includes 2500 subjects older than 55 years, residing in the Santa Lucia County, Maracaibo, Venezuela. All participants underwent extensive clinical and laboratory examinations and randomly selected individuals also underwent ABP monitoring. Informed consent was obtained from the subjects who agreed to participate, and from a close family member when doubts existed about the competence of the subject. The ethical review board of the Institute of Cardiovascular Diseases of the University of Zulia approved the protocol.
2.3. Cardiovascular Risk Factors
The leading global risk factor for mortality is high BP, which is responsible for 13% of deaths globally. Eight changeable risk factors (alcohol use, tobacco use, high BP, high body mass index, high cholesterol, high blood glucose, low fruit and vegetable intake, and physical inactivity) account for 61% of cardiovascular deaths. Combined, these same risk factors account for over three quarters of ischaemic heart disease, the leading cause of death worldwide .
However, investigators continue to look for new and emerging risk factors for cardiovascular disease. Recent ABP monitoring studies using a novel variability index  reported significant relationships between high BP variability (BPV) and cardiovascular outcomes [17–19]. BPV is a multifaceted phenomenon, influenced by the interaction between external emotional stimuli, such as stress and anxiety, and internal cardiovascular mechanisms that can vary from heartbeat to heartbeat. However, the complexity of BPV makes analysis difficult, and its independent contribution as a predictor of cardiovascular outcomes is not yet clear . The present study aimed to identify new prognostic thresholds of risk factors for cardiovascular mortality, including high BP (the most significant cardiovascular predictor) and abnormal BPV (a potential independent predictor).
To estimate 24-hour BP level, we computed the weighed mean of valid BP readings (WBP) using the time interval between successive valid measurements as weighting factors . In the case of BPV over 24 hours, we calculated the Average Real Variability (ARV) index  using (1): where is the number of valid BP readings, ranges from 1 to −1, and is the time interval between and .
2.4. Machine Learning Approach
We implemented an interdisciplinary ML method that encompassed all stages of knowledge extraction from databases (data preprocessing, attribute selection, data mining, and knowledge extraction), to examine the application of ML to support clinical decisions (Figure 2).
To improve the accuracy of predictions for affected subjects (positive class), we used the Rule Extraction for MEdical Diagnosis (REMED) algorithm , a symbolic one-class classification approach that implements internal bias strategies during the learning process . REMED employs three main procedures in the knowledge extraction process: selection of attributes, (2) selection of initial partitions, and (3) construction of classification rules.
First, REMED attempts to select the best combination of relevant attributes, using a simple logistic regression model. This is a standard method of analysis in medical research that uses the odds ratio metric  to determine if there is a significant association () between a considered attribute and the positive class. REMED then begins to build initial partitions (exclusionary and exhaustive conditions) to maximize sensitivity and maintain acceptable accuracy without significantly decreasing specificity. Finally, REMED uses the respective partitions for each selected attribute to construct a system of rules that includes conditions (one for each selected attribute) in the following way: If Condition 1 <relation> and Condition 2 <relation> and Condition <relation> and ⋯⋯⋯ and Condition <relation> then class = 1 Else class = 0,
where <relation> is either ≥ or ≤ depending on whether is positively or negatively associated with the positive class through (partition for attribute ).
To avoid overfitting during the training and testing phase, REMED implements the k-fold cross validation technique, which is based on randomly shuffling sample vectors among training and testing spaces . REMED also maintains the approximate imbalance of the original dataset through the iterations.
2.5. Data Preprocessing and Attributes Selection
Based on current medical guidelines , we only included participants that had ABP recordings of good technical quality. Therefore, subjects with <40 BP readings during the 24-hour ABP period were excluded. Systolic BP readings values >260 mmHg or <70 mmHg, and diastolic BP readings >150 mmHg or <40 mmHg were considered outliers or erroneous values and discarded. The treatment of missing values was addressed with predictive techniques, specifically multiple linear regression analysis considering medical criterions.
Only continuous and changeable attributes were considered in the knowledge extraction process. Continuous attributes have a higher degree of uncertainty than discrete attributes, because discrete attributes are usually binary in the clinical environment (e.g., smoker versus nonsmoker), and their associations with specific diseases are almost always well known. We also excluded age, which is a nonchangeable attribute. The attributes considered in the initial ML analysis were body mass index (BMI), serum cholesterol level, 24-hour heart rate, and systolic and diastolic 24-hour WBP and ARV.
The minable dataset was composed of 551 observations with 7 attributes, with only 43 missing values (1.1%) in the serum cholesterol attribute. The missing data were estimated from the regression slope on sex and age, according to the criteria of physician specialists. The sample included 374 women (67.8%) and 170 patients (30.9%) undergoing treatment with antihypertensive drugs (Table 1). The average number of BP readings was 65.1 (5th to 95th percentile = 51.5−77.5), indicating good quality ABP recordings. Mean age was years. At enrolment, 61 participants (11.1%) had a history of cardiovascular disease; 100 (18.1%) had a history of diabetes mellitus, of whom 59 (59%) were undergoing diabetes treatment; 86 (15.6%) were current smokers; 174 (31.6%) reported intake of alcohol. The average total cholesterol level was mmol L−1, and BMI averaged kg m−2. Mean 24-hour systolic WBP was mmHg, and diastolic WBP was mmHg. Average heart rate was bpm.
The median follow-up period was years (5th to 95th percentile = 1.7−12.3 years). Only the participants that died from cardiovascular diseases () were classified as positive examples. Cardiovascular mortality included 10 strokes and 51 cardiac deaths for a high event rate of 15.5 per 1000 person-years. The imbalance ratio between the positive (affected) and negative (unaffected) class was approximately of 1 : 9.
3.2. Machine Learning Process
3.2.1. Selection of Attributes
Using the simple logistic regression model, REMED found only two attributes significantly associated with the positive class: systolic WBP () and ARV (). However, other well-known cardiovascular risk factors, such as serum cholesterol level, BMI, and diastolic WBP [16, 25], were considered in further analyses.
3.2.2. Rule System
To provide medical staff with more information and comprehensible models, we used REMED to build several simple rule systems, which included individual and combined predictions of the more significant attributes (systolic WBP and ARV), as well as the combined predictions with the additional risk factors.
The confusion matrix from the predictions of the system rule, combining only high systolic ARV and WBP and using 10-fold cross-validation, indicated that REMED performed at 0.36 sensitivity, correctly diagnosing more than 35% of the cardiovascular deaths (Table 2). REMED focuses on improving sensitivity over specificity, because in the case of medical diagnosis/prognosis, the cost of misclassification of false negatives (FN, i.e., sick subjects diagnosed incorrectly) is higher than that of false positives (FP, healthy subjects diagnosed incorrectly), because more specific medical tests could discover the FP error, but an FN could cause a life-threatening condition and possibly lead to death . Additionally, to compare the performance of our approach in terms of reliable prediction, we selected from the WEKA framework  the ML approach that better performed with our dataset: the Naïve Bayes classifier, which is one of the most effective and efficient classification algorithms and has been successfully applied to many medical problems [27, 28]. The performance of all classifiers is showed in Table 3.
Use of the REMED algorithm selecting only the more significant attributes provided some of the desired features for solving medical diagnosis/prognosis problems: good overall performance for imbalanced datasets, with 36.1% of sensitivity, 80% specificity, and 75.1% general accuracy; (2) comprehensible prognostic information, based on a rule system with a high degree of abstraction (only one rule to predict positive class examples, independent of the number of instances and initial attributes); (3) the ability to provide the medical staff with sufficient confidence to use the rule system in practice, because it was based on attributes with high confidence levels (>99%), estimated with a standard method of medical analysis; (4) the ability to reduce the number of medical tests necessary to obtain a reliable diagnosis/prognosis, because a simple logistic regression model was used to select attributes strongly associated with the specific disease.
The ML approach generated a new prognostic threshold for cardiovascular mortality: systolic WBP ≥ 137 mmHg, which is lower than the currently proposed by hypertension guidelines (≥140 mmHg) and in agreement with recent ABP studies [29, 30], but with the advantage that our analysis was fully automated and had a smaller sample. Moreover, our ML approach generated a new prognostic threshold for abnormal systolic ARV (≥9.6 mmHg). Together, these new thresholds could provide improved predictions of cardiovascular mortality.
Both systolic WBP and ARV were independent predictors of cardiovascular mortality, performed >50% of sensitivity, but sacrificed significantly in specificity and general accuracy (≤60%). The addition of other well-known cardiovascular risk factors decreased considerably the accuracy in the prediction of affected subjects (<23%). Therefore, the use of logistic regression for the selection of significant attributes (>99%) could be an effective strategy in this stage of ML analysis in medical datasets.
Undoubtedly, one of the most important goals of the application of ML in the medical field is to generate new knowledge, providing the medical community with tools to develop novel points of view about any given problem. In our case, for example, although previous medical studies determined possible ranges of a low and high BPV measured whit ARV through statistical methods (median and quartiles analysis) [17, 18], our work is pioneer proposing a prognostic threshold for abnormal systolic ARV (≥9.6 mmHg). This threshold has a good performance as an independent a composed predictor of fatal cardiovascular events. The use of this threshold should facilitate new fields of investigation regarding BPV and its prognostic relevance.
We do not claim that our ML analysis using REMED is the ultimate solution for medical diagnosis/prognosis problems from unbalanced datasets, because it is necessary to implement modifications that improve REMED’s predictive capacity in terms of sensitivity (≥50%) without significantly deteriorating its specificity. However, we obtained better results than the Naïve Bayes classifier (11.48%), which is considered as a benchmark algorithm that in any medical domain has to be tried before any other advanced method . Therefore, we believe that our approach could improve performance in these medical tasks, and increase the confidence of the medical community in the use of ML approaches to support clinical decisions.
The authors are grateful to the referees for their detailed review on the paper and thoughtful comments. This paper was supported by the Secretaria de Educación Pública, México DF, México (PROMEP/103-5/11/4145). The Maracaibo Aging Study was funded by the Venezuelan Grant FONACIT G-97000726, FundaConCiencia, and by Award no. R01AG036469 from the National Institute on Aging.
E. Alpaydin, Introduction to Machine Learning, The MIT Press, Cambridge, Mass, USA, 2nd edition, 2010.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, Burlington, Mass, USA, 3rd edition, 2011.
S. Karpagavalli, K. S. Jamuna, and M. S. Vijaya, “Machine learning approach for preoperative anaesthetic risk prediction,” International Journal of Recent Trends in Engineering, vol. 1, no. 2, pp. 19–22, 2009.View at: Google Scholar
N. V. Chawla, N. Japkowicz, and A. Kolcz, “Editorial: special issue on learning from imbalanced data sets,” ACM SIGKDD Explorations, vol. 6, no. 1, pp. 1–6, 2004.View at: Google Scholar
T. G. Pickering, G. D. James, C. Boddie, G. A. Harshfield, S. Blank, and J. H. Laragh, “How common is white coat hypertension?” Journal of the American Medical Association, vol. 259, no. 2, pp. 225–228, 1988.View at: Google Scholar
L. Mena, S. Pintos, N. V. Queipo, J. A. Aizpúrua, G. Maestre, and T. Sulbarán, “A reliable index for the prognostic significance of blood pressure variability,” Journal of Hypertension, vol. 23, no. 3, pp. 505–511, 2005.View at: Google Scholar
L. Mena, J. D. Melgarejo, C. Chavez et al., “Relevance of blood pressure variability among the elderly: findings from the Maracaibo aging study,” Journal of Hypertension, vol. 29, Article ID e312, 2011.View at: Google Scholar
World Health organization, Global Health Risks-Mortality and Burden of Disease Attributable to Selected Major Risk, World Health Organization, Geneva, Switzerland, 2009.
S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, “Handling imbalanced datasets: a review,” GESTS International Transactions on Computer Science and Engineering, vol. 30, no. 1, pp. 25–36, 2006.View at: Google Scholar
R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the 14th International Conference on Artificial Intelligence, pp. 1137–1143, 1995.View at: Google Scholar
G. Mancia, G. De Backer, A. Dominiczak et al., “2007 guidelines for the management of arterial hypertension: the task force for the management of arterial hypertension of the European Society of Hypertension (ESH) and of the European Society of Cardiology (ESC),” Journal of Hypertension, vol. 25, no. 6, pp. 1105–1187, 2007.View at: Publisher Site | Google Scholar
G. M. Weiss, “Mining with rarity a unifying frame-work,” ACM SIGKDD Explorations, no. 1, pp. 7–19, 2004.View at: Google Scholar
Y. Cui, R. S. Blumenthal, J. A. Flaws et al., “Non-high-density lipoprotein cholesterol level as a predictor of cardiovascular disease mortality,” Archives of Internal Medicine, vol. 161, no. 11, pp. 1413–1419, 2001.View at: Google Scholar
M. Kukar and C. Grošelj, “Reliable diagnostics for coronary artery disease,” in Proceedings of the 15th IEEE Symposium on Computer-Based Medical Systems, pp. 7–12, June 2002.View at: Google Scholar