Abstract

Preeclampsia affects from 5% to 14% of all pregnant women and is responsible for about 14% of maternal deaths per year in the world. This paper is focused on the use of a decision analysis tool for the early detection of preeclampsia in women at risk. This tool applies a fuzzy linguistic approach implemented in a wearable device. In order to develop this tool, a real dataset containing data of pregnant women with high risk of preeclampsia from a health center has been analyzed, and a fuzzy linguistic methodology with two main phases is used. Firstly, linguistic transformation is applied to the dataset to increase the interpretability and flexibility in the analysis of preeclampsia. Secondly, knowledge extraction is done by means of inferring rules using decision trees to classify the dataset. The obtained linguistic rules provide understandable monitoring of preeclampsia based on wearable applications and devices. Furthermore, this paper not only introduces the proposed methodology, but also presents a wearable application prototype which applies the rules inferred from the fuzzy decision tree to detect preeclampsia in women at risk. The proposed methodology and the developed wearable application can be easily adapted to other contexts such as diabetes or hypertension.

1. Introduction

We are currently living in the age of data; our everyday lives are surrounded by sensors that capture information associated with objects, humans, or environments such as vision sensors, motion sensors, light sensors, and medical sensors. The so-called Internet of Things (IoT) [1] is all around us, producing a huge amount of data that needs to be automatically organized and processed in order to produce easy to understand reports for the users, as trying to deal with this raw data directly is certainly far from the human capabilities.

Data analysis [2] is at the core of a relevant amount of recently published works, many of them focusing on data mining techniques to extract useful knowledge from the data and to shape it so that users understand what is happening and act accordingly. The research on methods to communicate this knowledge in a user-friendly way has led to the concept of linguistic descriptions [3], which allow humans to abstract useful data into different levels and dimensions, therefore providing interpretability.

In this context, the computing with words (CWW) methodology emulates human cognitive processes to make reasoning processes and decisions in environments of uncertainty and imprecision [4]. The CWW methodology considers that inputs and outputs should be expressed in a linguistic domain in order to be close to human natural language, therefore providing interpretable and understandable results [5, 6].

The application of soft computing techniques for the development of methods for data analysis [7] has proved to be very successful. However, these methods produce results that in most cases still require some level of expertise to understand and to use them properly. This is the main motivation for the development of linguistic knowledge extraction techniques, which rely on the principles of fuzzy logic [7] in order to communicate relevant information obtained from the analysis of the data.

This paper is focused on the health care area, given its relevance nowadays, although the presented techniques could be applied in other areas like wellness [8]. In general, modern health care systems make use of many technological advances to measure biomedical signals that produce large amounts of raw data.

In the context of the early detection of preeclampsia [9, 10], generating user-friendly knowledge to support the diagnosis and monitoring in scalable contexts is a problem that has not been completely solved yet because sometimes a rigid interpretation of the data can lead to misdiagnosis. For example, in [11] it is highlighted that factors like physiology, body size, and variations on the instrumentation can produce differences in the measured heart-rate values that might lead to misdiagnosis.

Therefore, we propose a methodology based on the linguistic approach to extract knowledge by applying a decision tree analysis [12] on a supervised preeclampsia dataset with the aim of early detection of preeclampsia. The proposed methodology is composed of two phases: (i) a linguistic transformation of the dataset to increase the interpretability and flexibility in the analysis of preeclampsia and (ii) the extraction of linguistic knowledge by means of inferring rules using decision trees to classify the dataset.

The use of white-box classification in the cases of decision trees and fuzzy logic [13] provides a wide range of benefits related to real deployments in Health Systems, among which we highlight the following:(i)Increased human interpretability. It is to identify risky conditions and extract interpretable linguistic knowledge for the medical staff, thanks to the intuitive representation of concepts instead of using numeric values [14].(ii)Independence from health measuring instruments. In our approach, the values are translated to degrees of membership in different fuzzy sets, and therefore it is more robust to variations in precision or granularity. We highlight that the flexibility achieved by the fuzzification of the attributes is key for translating the classification from the original dataset based on measurements with traditional health instruments to different contexts and advanced medical devices, such as wearable devices.(iii)Inclusion of human observations in the system. The need for diagnosing preeclampsia in developing countries, where some areas are remote or isolated from health centers, involves integrating human measurements described by itinerant health staff without advanced instruments [15]. In our study case, the age or weight, and even urine infection, can be described by human observation of the patient through linguistic transformation of the data in the decision tree classifier.

The rest of the paper is structured as follows. Section 2 describes the preeclampsia disease and provides some significant data about its incidence worldwide. Section 3 reviews some notions related to mobile information systems and their application to health. Section 4 describes the proposed fuzzy linguistic methodology. Section 5 presents the results of the case study. Section 6 presents a prototype of wearable application which applies the rules inferred from the obtained fuzzy decision tree. Finally, Section 7 summarizes our conclusions and future work.

2. Preeclampsia Disease

In this section, the medical context of preeclampsia is reviewed and global quantitative data is provided in order to highlight the seriousness of the problem.

Preeclampsia (PE) [9, 10] is a multisystemic disorder that occurs during pregnancy, characterized by hypertension and proteinuria (excess of proteins in the urine). PE usually appears after 20 weeks of pregnancy. This condition is one of the most serious complications of pregnancy and also one of the most feared. In severe cases, it endangers the life of both the mother and the fetus [16]. Therefore, it is necessary to diagnose it and begin the treatment as soon as possible. In [17] we also find that severe preeclampsia is associated with an increment of maternal mortality (0.2%) and higher rate of maternal complications (5%) such as convulsions, pulmonary edema, acute renal or liver failures, liver hemorrhage, disseminated intravascular coagulopathies, and strokes. These complications are usually detected in women who develop preeclampsia before the 32nd week of gestation and in those with preexisting medical conditions [18]. PE may be mild or severe, according to the following clinical parameters [19]:(i)Mild PE: Blood pressure of at least 140/90 mmHg on two occasions six hours apart after the 20th week of pregnancy. Proteinuria greater than 300 mg in 24 hours. Moderate edema and urinary volume in 24 hours greater than 500 ml.(ii)Severe PE: Blood pressure greater than 160/90 mmHg on two occasions six hours apart after the 20th week of gestation. Systolic blood pressure greater than 60 mmHg over baseline and diastolic blood pressure greater than 30 mmHg over baseline. Proteinuria greater than 5 g in 24 hours, massive edema, and systemic symptoms such as pulmonary edema, headache, visual disturbances, pain in the right hypochondrium, high level of liver enzymes, or thrombocytopenia.

It is important to emphasize that “blood pressure” is one of the most important factors to be controlled in pregnant women at risk of PE. Therefore, it would be very useful to have a device for monitoring this factor, preferably in real time, and including the possibility of remote monitoring the reports generated by the device.

On the other hand, maternal mortality is a phenomenon that continues occurring despite the efforts of health institutions all over the world. In order to have a general view of the problem, we present here a brief description of the statistics related to maternal mortality in relation to preeclampsia (PE), mainly referred to the last years (see Table 1).

This paper presents a methodology based on the linguistic approach to extract knowledge by applying a decision tree analysis [12] on a supervised preeclampsia dataset from Colombia with the aim of early detection of preeclampsia. For this reason, it is also important to refer to the PE statistics in this country. A statistical summary can be found in Table 2.

It has also been pointed out [20] that the management of PE has not changed significantly over time, possibly as a result of the poor progress in understanding this condition. All these factors motivate our work, which was developed taking as reference a dataset of patients who suffered PE during a five-year time frame, kindly provided by a hospital in Colombia.

In this paper, we analyze this information using data mining techniques and, as a result, we propose a prediction model and a pathology monitoring process that takes into account the values of relevant vital signs, using a fuzzy linguistic system to support the diagnosis of PE. This prediction model has been developed for a mobile device as a prototype of wearable application in order to detect preeclampsia in women at risk.

3. Mobile Applications in Health Care

In this section, we review some notions related to mobile information systems and some applications of mobile systems to health. A mobile application is software designed to run on a mobile device. Typically, mobile applications will access data, devices, and other applications from anywhere [21].

One of the features offered by mobile platforms is real-time connectivity, which has facilitated the creation of various tools that efficiently guide the daily activities of their users. Mobile applications have become the preferred way for users to connect from their devices [22] and have different application areas, such as health, industry, commerce, marketing, entertainment, and sports, among others. For the case that we analyze, the specific domain is the one that refers to the application of mobile information systems to health care. To provide a global idea about some applications developed in this context, Table 3 shows some related works.

We can divide the mobile applications for health purposes into two main groups: applications that are designed for specific purposes (treatment of a single pathology) and general purpose applications that are intended to prevent or attend to emergency situations. The common feature of all these applications is that all of them use mobile technologies for their operation, including the monitoring through remote transmissions of the pathology or purpose for which they were created.

In this paper, we present a mobile information system in the context of health, especially, a prototype of wearable application to detect preeclampsia in women at risk. Moreover, the most important variables to be monitored are identified; the values of these variables can be collected from health sensors and/or from human observation straightforwardly.

4. Methodology

In this section, a methodology based on a linguistic approach is presented to extract knowledge by applying a decision tree analysis on a supervised preeclampsia dataset. The aim of this process is to allow early detection of preeclampsia.

The dataset contains quantitative data from patients with pregnancy disorders and risk of preeclampsia in Colombia, as well as a human expert diagnosis label for each case.

Firstly, a series of attributes is identified as key factors for diagnosing preeclampsia, based on the experience of health experts and other academic studies. This process led us to the following selection of attributes:(i)Age: age is between 13 and 46 years old, as stated in [39].(ii)Body mass index (ICM): body mass index is between 22 and 38 Kg/m2, as explained in [40].(iii)Trimester of pregnancy: pregnancy is to be in the second or third trimester.(iv)Blood pressure: diastolic (DBP) or systolic (SBP) between 80 and 200 mmHg is highly related to preeclampsia, according to [41].(v)Family history: if the mother of the patient suffered from preeclampsia, it is labeled as first degree; if only one of the patient’s grandmothers suffered from preeclampsia, it is labeled as second degree; otherwise, no label is set. This is pointed out as a risk factor in [42].(vi)Socioeconomic stratum: it is related to preeclampsia because of its consequences on the supplementation of multivitamins and folic acid [43]. It is described with discrete values in the range []. The value of 1 is related to the highest socioeconomic stratum and 4 the lowest, which is the most critical for preeclampsia.(vii)Race/ethnicity: according to [44], African-American women are more prone to suffer from severe preeclampsia. This attribute is defined with discrete values (indigenous, African-American, and mestizo).(viii)New mother: this attribute reveals whether it is the first time the patient is pregnant.(ix)Proteinuria: the presence of excess proteins in the urine. It can be measured by a simple dipstick test that returns a value in a scale ranging from 0 to 8 mg/dl. It is also related to preeclampsia diagnosis, as explained in [45].(x)Preeclampsia label: this is the target attribute to classify.

The last attribute, preeclampsia label, is defined as a discrete attribute with three possible values: nonpreeclampsia, moderate preeclampsia, and severe preeclampsia.

We highlight the fact that the cases included in the dataset were collected because of their complexity and relevance to preeclampsia detection. No trivial episodes have been included in the dataset, and the diagnosis of preeclampsia is divided into moderate and severe. The complexity of the cases in the dataset should be taken into account before presenting the results.

In the next subsections, the two phases of the proposed methodology are presented.

4.1. Extracting Knowledge by Means of Decision Trees

The first phase of our methodology is focused on extracting knowledge from the dataset on pregnant women with high risk of preeclampsia using decision trees. In this stage, decision trees are used to infer rules to classify the dataset.

Among all the standard decision tree strategies, we propose using the C4.5 statistical classifier [46] to analyze the dataset. This classifier is shaped as a tree-like graph for continuous and discrete attributes [47] in which the most relevant attributes are located in upper layers. The hierarchy is built taking into account the entropy of each attribute [48], which is a measurement of the homogeneity and the relevance of the information of an attribute with respect to the target attribute to be classified.

The result of applying a decision tree is a white-box model [49], in which the tree-like graph can be easily understood and modified by humans. Moreover, it can be translated into an inference rule system in which each path, from the root to each end node, represents an induction rule [50].

When analyzing real-world datasets, the model tends to increase the tree size to obtain the best accuracy in the classification. However, this may result in the overfitting of the tree, as well as making it more difficult to interpret; therefore, pruning techniques are applied to reduce the tree complexity [51]. Pruning decreases the size of the tree-like graph by removing those nodes whose instances present low entropy. In Section 5, the effects of applying two different pruning techniques on real data are discussed.

4.2. Linguistic Fuzzy Transformation

The fuzzy linguistic approach is included in the second phase of the classification of the dataset in order to increase the interpretability of the resulting decision tree. Fuzzy logic has been proposed for this stage because of the increased expressiveness of learning methods based on fuzzy logic predicates [52].

We have integrated a fuzzification step in which we relate each numeric attribute with an attribute described using linguistic terms as presented in [13]. In this step, each attribute with continuous values has been translated to a linguistic variable with different terms. The fuzzification consists of describing each attribute by one variable whose terms are related to a membership degree between . For example, systolic pressure: 138 mmHg would be described as systolic pressure is normal (0.8) and systolic pressure is high (0.2). Next, the continuous value of each attribute is replaced by the linguistic term with the highest membership degree.

The membership functions of the linguistic terms have been provided by health experts, which defined the values and ranges for each attribute based on their experience and the attribute relevance as a risk factor. The attributes, linguistic terms, and membership functions proposed are detailed in Table 4 using a trapezoidal shaped representation, as shown in Figure 1.

5. Evaluation of Results

In this section, a real evaluation of the methodology proposed in this paper is carried out in order to show the efficiency of our proposal.

We have used a dataset from patients with a diagnosis of possible preeclampsia collected in the Departmental Hospital of Nariño (Colombia); this meant an arduous task of compilation with private and public funding. According to medical experts, the dataset contains a representative sample of preeclampsia cases in the last 5 years in this area and consists of 729 records with the attributes and values described in Section 4. We present the results of analyzing the dataset using the open source implementation of C4.5 available in Weka [53], called J48 [54].

The initial approach was to classify the overall dataset using simple C4.5 without pruning techniques and without a limit in the number of required instances for creating a tree node. The accuracy obtained was 99.45% but at the cost of a tree size of 598 nodes. As we stated before, the problems of overfitting and interpretability can be reduced at the expense of accuracy using pruning. For this reason, in the second and third attempt, the accuracy ratio is computed on the test set by means of a leave-one-out cross-validation, which is a particular case of -cross-validation when is equal to 1. The main advantage of this validation is that all the activities in the dataset are used for training and testing, therefore avoiding the problem of considering how the dataset is divided.

In the second attempt, we applied two pruning techniques (one was applied on the finished tree and the other one was applied during the tree construction); these were the following: (i) reducing the generated nodes with a confidence factor of entropy equal to 0.25 and (ii) disabling nodes with just one instance in construction. This approach produced a notable accuracy of 82.16%, with a tree size of 236 nodes.

For the third attempt, we evaluated the fuzzy approach of tree classification, integrating the linguistic terms and membership functions proposed by health experts as described in Section 4.2. It resulted in a tree with 197 nodes and an improvement of interpretability at the expense of a lower accuracy of 75.03%. Obviously, the generation of more scalable classification system based on knowledge involves a loss of accuracy, which is related in [55, 56] as a reasonable small loss of accuracy for the sake of interpretability. In exchange, it can be adapted to other contexts using human observation or different health devices, such as a wearable.

We include a fragment of the rules generated from the original data and the linguistic approach, respectively, in Boxes 1 and 2. Note the interpretability of the linguistic approach which can deal with human perception of age and symptoms.

We have summarized the results of analyzing the preeclampsia dataset in Table 5.

6. Development Using Wearable Devices

In this section, we present a prototype of wearable application which contains the rules from the fuzzy decision tree detailed in Sections 4 and 5. It has been developed using the Android Wear platform. As we have described in this paper, the aim of the methodology is to generate medical knowledge that can be applied with the aid of wearable devices in situations in which access to health centers is restricted.

Firstly, the wearable application collects the input values for the attributes to monitor from the patient. This information can be collected in different ways:(i)Measurements from wireless health devices, such as wireless blood pressure monitors connected through Bluetooth Low Energy (BLE). In this case, the continuous value of the device is replaced by a linguistic term under the fuzzification described in Section 4.2.(ii)Linguistic terms related to human interpretation, for example, overweight or obesity.(iii)Linguistic terms related to human observation of colored strips from portable urine tests. It is successful, for example, for visually evaluating the excess of proteinuria [57] in a cheap and portable way.

Secondly, the linguistic terms are evaluated in the decision tree integrated in the wearable device, providing a matching with the inferred rules.

Finally, the degree of risk of preeclampsia in the unit interval is provided as the degree of membership to each target class (nonpreeclampsia, moderate preeclampsia, and severe preeclampsia). Figure 2 shows the wearable application.

Two features have prevailed in the software development for the wearable device: on the one hand, experimentability, that allows the tool to be used to evaluate and experiment with different linguistic rules generated by different datasets and on the other hand, maintainability, due to the use of frameworks and standard, well-documented programming languages. Due to these features, the software can be easily modified and fixed for further development.

7. Conclusions and Future Work

Early detection of preeclampsia is an important worldwide problem that should be addressed. This paper has presented a solution to support the diagnosis and monitoring of this disease, overcoming the limitation of a rigid interpretation of the data that can usually lead to misdiagnosis. In this paper, biomedical signals are used to identify risk conditions and extract linguistic knowledge for the medical staff. A methodology based on a fuzzy linguistic approach that provides interpretable results is proposed. The proposed methodology is composed of two phases that include a knowledge extraction by means of decision trees and a linguistic transformation of the data. Therefore, this methodology allows a linguistic monitoring in real time. A real evaluation of the proposed methodology has been carried out, providing good results and offering interpretable rules for monitoring. Furthermore, a prototype of a wearable application, which applies the rules from the fuzzy decision tree derived from the analysis, has been presented. Finally, our future work is focused on trying to obtain a preeclampsia dataset from another health center, preferably from another country, and applying the proposed methodology in order to compare and analyze the inferred linguistic rules for the new dataset.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement no. 734355, the Spanish government with the TIN2015-66524-P Project, and the Asociación Universitaria Iberoamericana de Postgrado.