Abstract

Pharmacological agents are often used to induce labor. Failed inductions are associated with unnecessarily long waits and greater maternal-fetal risks, as well as higher costs. No reliable models are currently able to predict the induction outcome from common obstetric data (area under the ROC curve (AUC) between 0.6 and 0.7). The aim of this study was to design an early success-predictor system by extracting temporal, spectral, and complexity parameters from the uterine electromyogram (electrohysterogram (EHG)). Different types of feature sets were used to design and train artificial neural networks: Set_1: obstetrical features, Set_2: EHG features, and Set_3: EHG+obstetrical features. Predictor systems were built to classify three scenarios: (1) induced women who reached active phase of labor (APL) vs. women who did not achieve APL (non-APL), (2) APL and vaginal delivery vs. APL and cesarean section delivery, and (3) vaginal vs. cesarean delivery. For Scenario 3, we also proposed 2-step predictor systems consisting of the cascading predictor systems from Scenarios 1 and 2. EHG features outperformed traditional obstetrical features in all the scenarios. Little improvement was obtained by combining them (Set_3). The results show that the EHG can potentially be used to predict successful labor induction and outperforms the traditional obstetric features. Clinical use of this prediction system would help to improve maternal-fetal well-being and optimize hospital resources.

1. Introduction

The induction of labor consists of promoting uterine contractions and cervical ripening before the onset of spontaneous labor. This common procedure is indicated when continuing pregnancy increases maternal and/or fetal risks. In the United States, 22.8% of all births were induced in 2012 [1]. Pharmacological labor induction is mainly obtained by prostaglandins [2] but can take up to 20 hours [3] and has been known to take more than 36 hours, with no guarantee of success. It has also been associated with maternal and fetal risks such as abnormal uterine activity, fetal distress, and higher cesarean rates [4]. Failed inductions lead to unnecessary waits, greater maternal-fetal exhaustion and suffering, and the need for additional resources, thus increasing medical care costs. Predicting successful induction is an important aspect in improving maternal and fetal well-being, reducing healthcare costs and improving labor management.

Obstetric variables have been considered for this purpose and are usually based on cervix assessment by the Bishop score [5, 6], although cervical length, maternal age, height, weight, parity, and birth weight [79] have also been used. The predictive capacity values given by the area under the curve (AUC) of the receiver operating characteristic (ROC) curves are 0.69 for cervical length [7], 0.72 for cervical dilatation [7], 0.52 for Bishop score [6], and 0.60 for fetal weight [8], showing that obstetrical data cannot at present be used to reliably predict induction of labor.

The electrohysterogram (EHG), i.e., uterine myoelectrical activity recorded on the abdominal surface, is an alternative method of monitoring uterine dynamics and consists of intermittent bursts of action potentials derived from the simultaneous activation of multiple uterine muscle cells. Uterine myoelectric activity evolves throughout gestation, being scarce and uncoordinated in the early stages, and becomes intense and synchronized as delivery approaches [10]. Previous studies have shown that EHG signals can discriminate effective contractions associated with imminence of labor [11] or whether delivery will be term or preterm [12]. EHG records have also been used to characterize the uterine myoelectrical response to labor induction drugs [1316]. Aviram et al. found that uterine electrical activity significantly increases 2 hours after prostaglandin E2 (PGE2) vaginal application and up to 8 hours after PGE2 application [13]. However, their aim was not to predict labor induction success or to compare the responses between successful and failed groups. Toth studied the possibility of predicting induction success using local prostaglandin [14]. They assessed uterine activity by means of an index that takes the intrinsic characteristics of EHG bursts into account (number of impulses, amplitudes, series, and shape) and found a statistically significant difference in the uterine activity index between successful (vaginally completed) and unsuccessful inductions between the 210th and 270th minutes. Benalcazar-Parra et al. also studied the differences between failed and successful (reaching the active phase of labor (APL)) inductions by comparing the evolution of different EHG parameters. They found different responses, mainly in amplitude and spectral parameters after 60-120 from labor induction onset [15, 16]. However, to date, no work has been done on predicting successful induction from EHG records, while EHG-based neural networks have been applied to the prediction of term and preterm labor [12, 1719]. In this context, the aim of the present study was to design a system capable of reliably predicting successful labor induction, based on EHG features and obstetrical data in the first 4 hours after labor induction onset.

Vaginal delivery can be considered a 2-step process. First, the woman has to reach the APL, i.e., regular uterine dynamic with 3-5 contractions every 10 minutes, 4 cm of cervical dilatation, and cervical effacement [20]. This is a necessary condition to be able to expel the fetus outside the uterus via the vaginal route (Step 2). It should be noted that although there is some controversy as regards establishing the value of the cervical dilatation and cervical effacement associated with APL, in the present work, we considered 4 cm, being the most widely extended definition [21]. A cesarean is needed if the APL cannot be reached. However, even if APL has been reached, various conditions may prevent vaginal delivery, such as labor arrest, pelvic-fetal disproportion, or loss of maternal-fetal well-being [22]. In the labor induction context and from the pharmacologic point of view, induction can be considered successful if drug action helps the patients achieve APL [15, 16, 23]. From the medical point of view, only vaginal deliveries are commonly considered successful [24, 25]. Taking this into account, we considered three different scenarios in designing and validating prediction systems for labor induction success (see Figure 1).

2. Materials and Methods

2.1. Signal Acquisition

The study was conducted on 115 healthy pregnant women with gestational ages of between 40 and 41 weeks and singleton pregnancies who were determined to undergo labor induction by medical prescription. The distribution of the labor outcome population is shown in Figure 1 according to the different scenarios: (i)Scenario 1: women achieving active phase of labor (successful group; ) vs. women nonachieving active phase of labor (failed group; )(ii)Scenario 2: from women who achieved active phase of labor, those achieving vaginal delivery (successful group; ) vs. cesarean section (failed group; )(iii)Scenario 3: women achieving vaginal delivery (successful group; ) vs. cesarean deliveries (failed group; )

The recordings were performed at the Hospital Universitario y Politécnico La Fe de Valencia (Spain), and the study was approved by the Hospital Ethics Committee (2015/0455, 12/01/2016). The women were previously informed of the nature of the study and gave their written consent. Labor induction was by vaginal administration of two different types of drugs commonly used in obstetrics: either a vaginal insert of 25 μg of misoprostol tablets (Misofar, Bial S.A., Portugal) with repeated doses every 4 hours up to a maximum of 3 doses or 10 mg of vaginal dinoprostone insert (Propess, Ferring, Germany). The women were kept under constant observation until the end of labor. The women’s obstetrical characteristics and labor induction outcomes are shown in Table 1.

TOCO and EHG signals were simultaneously acquired by tocodynamometer and four monopolar disposable Ag/AgCl electrodes (3 M red dot 2560), respectively, in the recording sessions, which comprised 30 minutes of basal activity (before drug administration) and 4 hours of recording after drug administration. The abdominal surface was first exfoliated (Nuprep, Weaver and Company, USA) to reduce skin-electrode impedance. The monopolar electrodes (M1 and M2) were placed over the navel at each side of the median axis at a distance of 8 cm from each other, which has been found to be the optimal electrode placement in the literature [26]. A reference electrode was placed on the right hip and a ground electrode on the left hip (Figure 2). Monopolar EHG signals were amplified and filtered between 0.1 and 30 Hz by a commercial biosignal amplifier (Grass 15LT+4 Grass 15A94; Grass Instruments, West Warwick, RI) and digitalized at a sampling frequency of 1000 Hz. Since EHG signal energy principally ranges from 0.1 to 4 Hz, the signal was digitally filtered between 0.2 and 4 Hz to eliminate undesired components and then downsampled at 20 Hz to reduce the amount of data and the computational cost, obtaining the preprocessed M1P and M2P signals. One bipolar EHG signal was then obtained (M1P-M2P) to further reduce common-mode interference. The TOCO signal was recorded by a Corometrics 250cx (General Electric Healthcare, US) commercial maternal monitor at a sampling rate of 4 Hz. All EHG bursts associated with uterine contractions were identified by visual inspection of the bipolar EHG signal using the same criteria as in Benalcazar-Parra et al. [15].

2.2. EHG Signal Characterization

Several studies have shown that the temporal and spectral parameters obtained from EHG recordings change between pregnancy and labor onset [11]. It has been reported that temporal parameters such as amplitude, duration, and number of contractions (EHG bursts) change during pregnancy [27, 28]. As with spectral features, parameters such as peak frequency, mean frequency, and deciles, among others, have been extracted from the power spectral density to characterize EHG burst frequency components [27, 2931]. In this regard, it is worth mentioning that EHG bursts are mainly composed of two distinct frequency components: fast wave low (FWL), a low frequency component associated with EHG propagation, and fast wave high (FWH), a high frequency component related to uterine cell excitability [32]. It is well known that both components are mainly distributed between 0.2 and 1 Hz [32], although some authors consider that it can extend up to 4 Hz [33]. However, some studies focus only on the FWH, restricting the bandwidth between 0.34 and 1 Hz to minimize breathing and cardiac interference [30]. It has also been shown that EHG burst spectral content shifts to higher frequencies, in the range of 0.34 to 1 Hz as labor approaches [34]. Furthermore, considering the nonlinear nature of the underlying mechanisms of the biological systems, parameters such as sample entropy, spectral entropy, and Lempel-Ziv have also been proposed to characterize EHG signals [33, 35].

Therefore, in the present work, 21 temporal, spectral, and complexity parameters were computed from each EHG burst (see Table 2). Peak-to-peak amplitude was computed from the temporal series associated with uterine contractions. The following parameters were extracted from the power spectral density distribution estimated by the periodogram method: dominant frequency in the range of 0.2-1 Hz (DF), ratio between the energy contents in high (0.34-1 Hz) and low (0.2-0.34 Hz) frequency bands (H/L ratio), and deciles (D1, D2, …, D9), which correspond to frequencies below in which 10, 20, …, 90%, respectively, of the total energy in the range 0.2-1 Hz are contained [36]. The Teager energy operator was computed to measure the energy of the EHG burst. This measure takes into account not only the amplitude but also the frequency of the signal [37].

As previously mentioned, due to the nonlinear nature of the underlying physiological mechanism of the biological systems, a set of 8 nonlinear parameters was computed for each EHG burst, where some of them were already used to characterize EHG signals: sample entropy (SampEn) has been used to discriminate between preterm and term labor and to assess the progress of labor [33], and the Lempel-Ziv (LZ) parameter has been used to distinguish between patients who give birth in less/more than 7 days [38]. We also computed some complexity parameters that have been used in other applications. Fuzzy entropy (FuzzEn) has been shown to be efficient at measuring the regularity of time series in surface EMG signals [39]. Spectral entropy (SpEn) has also given good results in monitoring the depth of anesthesia [40] and predicting epileptic seizures [41]. Poincare parameters (SD1, SD2, SDRR, and SD1/SD2) have been widely used for heart rate variability analysis [42] and have been claimed to be valuable for their ability to extract the nonlinear characteristics of time series [43].

In a previous work, to analyze the evolution of the EHG burst parameters in response to labor induction drugs, we first computed the median values of each parameter associated with the EHG bursts present in nonoverlapping intervals of 30 minutes [15, 16]. Results showed that for successful inductions, statistically significant and sustained increases with respect to the basal period were obtained after 60 minutes and 120 minutes in patients induced with misoprostol and dinoprostone, respectively [15, 16]. This is the reason why, in the present work, in order to use only the significant intervals for both drugs, for each parameter, we analyzed 5 intervals of 30 minutes (basal period—before drug administration: 120, 150, 180, and 210), giving rise to a total of EHG features.

Additionally, we considered the following obstetric parameters that have been used in the literature [59]: maternal age, body mass index (BMI), number of gestations, parity, number of abortions, Bishop before drug administration, and fetal weight.

Then, for the inputs to the different labor induction success predictor systems developed, the parameters were grouped into three sets: Set_1—containing only obstetrical features, Set_2—containing only EHG features, and Set_3—containing both EHG and obstetrical features.

2.3. Data Balancing

The disadvantage of imbalanced datasets is that classification learning algorithms are often biased towards the majority class, so that there is a higher misclassification rate for the minority class instances. The synthetic minority oversampling technique (SMOTE) was used in this study to deal with the unbalanced data problem. SMOTE is an oversampling approach proposed by Chawla et al. [44] and consists of increasing the number of observations of the minority class in the original dataset by creating new synthetic observations. SMOTE is an accepted technique for dealing with the unbalanced problem and has been used in several studies (e.g. [12, 45],).

Nine databases () were generated (see Table 3) using SMOTE to balance the number of observations of each class in every database.

2.4. Feature Selection

In order to use only relevant data and avoid redundant information, particle swarm optimization (PSO) was used for feature selection. PSO is a population-based stochastic optimization technique that is based on the social behavior of flocking birds or schooling fish developed by Eberhart and Kennedy [46]. PSO is an iterative algorithm that consists of a number of particles (the swarm) moving around in the search space in order to achieve the best solution. A particle representing a candidate solution moves to the optimal position by updating its position and velocity.

PSO was adapted for feature selection as shown in Figure 3. The algorithm starts from a training set to select a subset of relevant features with PSO (the winning particle). A reduced training set and a reduced validation set are obtained by removing the features that are not selected. An artificial neural network for classification is trained with the reduced training set and then applied to the reduced validation set to obtain the final validation classification accuracy. The algorithm is run iteratively times from to of original features (7 for Set_1, 21 for Set_2, and 28 for Set_3). Then, the subset of features with the lowest accuracy error is chosen. The algorithm was computed for each database to reduce the dimensionality.

2.5. Classifiers

Artificial neural networks (ANN) have been used to classify term and preterm deliveries [12, 17]. In the present study, we used the multilayer perceptron network which is a unidirectional network with one input layer, one output layer, and a certain number of hidden layers. The hyperbolic tangent function was used as the transfer function of each neuron. After selecting the optimal structure, for each scenario and set of features, we obtained a total of nine predictor systems (PS) based on ANN (PSSCENARIO_SET: PS1_1, PS1_2, …, PS3_3). For each PSSCENARIO_SET, the corresponding DBSCENARIO_SET database was used for training and validation (five-fold cross-validation). Figure 4 shows the scheme of each of the predictor systems.

In order to choose the optimal structure for each predictor system, we performed a grid search to select the number of hidden layers and hidden neurons. The rules in the grid search were as follows: maximum 2 hidden layers and maximum 10 hidden neurons in the first hidden layer. In addition, the number of neurons in the second hidden layer must not exceed the number of neurons of the first hidden layer, thus yielding a pyramidal structure with 2 hidden layers, which ensures optimal learning for multilayer networks [47]. In each scenario, we trained 165 ANN (). The best structure was selected from the 55 ANN of each case, measuring the average performance of each ANN from the validation set in a five-fold cross-validation. The implementation of the proposed algorithms to obtain the nine optimal predictor systems is shown in Figure 5.

Considering that vaginal delivery (Scenario 3) is a 2-step process, a fourth classifier was generated by cascading the predictor systems of Scenario 1 and Scenario 2 (PS1_SET-PS2_SET). The first system (PS1_SET) separates patients who achieve APL from those who fail to do so (non-APL) when using a particular set of features. Women classified as non-APL are directly classified as cesarean deliveries, while those who achieve APL are subclassified by a second system trained with the same set of features (PS2_SET). To evaluate this 2-step predictor system, the same validation partitions of the corresponding one-step predictor systems (DB3_SET) were used to compare the results between both approaches; i.e., validation partitions from DB3_1 were used to evaluate PS1_1-PS2_1, from DB3_2 to evaluate PS1_2-PS2_2, and from DB3_3 to evaluate PS1_3-PS2_3.

2.6. Performance Measures

We validated the performance of each classifier by five-fold cross-validation. The following measures were calculated to evaluate classification performance: where TP represents the true positives, TN represents the true negatives, FP represents the false positives, and FN represents the false negatives. The area under the ROC curves (AUC) was computed for each PSSCENARIO_SET.

3. Results

A total of 115 women with singleton pregnancies took part in the study. Their obstetric characteristics and labor induction outcome are summarized in Table 1. 98 women reached the active phase of labor, and 82 reached vaginal delivery. 33 ended up with a C-section: those who did not reach APL and some who did but were given a caesarian due to labor progression complications.

The mean and 95% confidence interval (CI) of the performance measures of the training and validation subsets when predicting APL (Scenario 1) are shown in Table 4. The predictor system using EHG features (PS1_2) outperformed that of obstetrical features (PS1_1). The highest performance measures were obtained when combining obstetrical and EHG features (PS1_3). The accuracy achieved in PS1_3 was 93.5% (CI 92.6-95.6%) for training subsets and 84.6% (CI 83.4-86.6%) for validation subsets. ROC curves of the three systems in Scenario 1 are depicted in Figure 6(a). The AUC was greater for PS1_3 with an AUC of 0.96, while PS1_2 and PS1_1 yielded an AUC of 0.94 and 0.89, respectively.

The performance of the predictor systems in Scenario 2, which is aimed at distinguishing between APL-vaginal and APL-cesarean, is shown in Table 5. The best performance measures were reached for PS2_3, yielding an accuracy value of 95.2% (CI 94.4-96.1%) in the training subset and 86.5% (CI 85.3-87.8%) in the validation subset. The performance measures of this scenario were slightly better than those in Scenario 1 in Set_2 and Set_3. The ROC curves of the three classifiers in Scenario 2 are depicted in Figure 6(b). The AUC was 0.98 for PS2_3, 0.95 for PS2_2, and 0.84 for PS2_1.

The results of the 1-step predictor systems which are aimed at distinguishing between vaginal and cesarean deliveries (Scenario 3) are shown in Table 6. Accuracy values are around 80% for the training subset and 70% for the validation. The table shows that the best performance measures in the training and validation subsets were obtained for PS3_3 but were quite close to those of PS3_2. PS3_3 gave an accuracy of 70.4% (CI 67.7–70.5%), a sensitivity of 67.4% (CI 65.3–69.3%), and a specificity of 74.2% (CI 71.2–75.7%) in the validation subset. However, these figures are only slightly higher (around 2% in training, around 0.5% in validation) than using only EHG features (PS3_2). The ROC curves of the three systems are depicted in Figure 6(c). The highest AUC was found for system PS3_3 (). A slightly lower AUC was found for PS3_2 (), while the lowest AUC was found for PS3_1 ().

The results of the vaginal vs. cesarean predictor system with a 2-step approach are shown in Table 7. Performance values were calculated for the same validation partitions of the database used in the 1-step predictor system in Scenario 3. The best performance measures were obtained by the two-step system, which combines obstetrical and EHG features (PS1_3-PS2_3). The accuracy reached for the 2-step prediction system using Set_1 (PS1_1-PS2_1) was 71.9 (CI: 70.8-73.0%). A great improvement was noted when cascading PS1_2-PS2_2 for Set_2, with an accuracy of 79.9% (CI 78.8-81.0), and slightly higher for PS1_3-PS2_3 for Set_3, with an accuracy 81.4% (CI 80.3-82.5). This latter also achieved a better balance between sensitivity and specificity: 80.3% (CI 78.8–81.8) and 82.8% (CI 81.2–84.8), respectively. The best 2-step predictor system (PS1_3-PS2_3) also gave a much better performance than the best 1-step predictor system—PS3_3: average accuracy 81.4% vs 70.4%, sensitivity 80.3% vs 67.4%, and specificity 82.8% vs 74.2%.

4. Discussion

Predicting the success of labor induction has always been a challenge for obstetricians, and a reliable technique would be an invaluable aid that would help to minimize long waits, maternal-fetal exhaustion and suffering, and the medical costs. Although several attempts have already been made to predict labor induction success from obstetrical information [69], these studies have shown poor predictive performance. In this study, we therefore opted to assess the potential role of EHG for this task.

In the active phase of labor, a necessary step before delivery, the electrical properties of the uterine myocytes undergo changes that generate increased uterine activity. The aim of pharmacologically induced labor is to promote uterine contractions and cervical ripening to achieve vaginal delivery. The reliable prediction of whether an induction agent could trigger APL or not would help clinicians to reduce unnecessary waits and decide whether or not to perform a cesarean section. Benalcazar et al. found a significantly different response between the EHG characteristics of patients that succeeded in achieving APL and those that did not [15, 16]. In the present work, we performed APL predictor systems (Scenario 1) with different sets of features: obstetrical (PS1_1), EHG (PS1_2), and a combination of both (PS1_3). The best performance measures were obtained in PS1_3, which yielded an accuracy of 84.6% in the validation subset and 0.96 for the predictor system AUC.

Vaginal delivery is not always guaranteed even after reaching APL, e.g., in conditions of labor arrest, pelvic-fetal disproportion or loss of maternal-fetal well-being. Knowing that it will definitely happen would help to reduce unnecessary waits. We designed PS2_1, PS2_2, and PS2_3 to discriminate between APL-vaginal and APL-cesarean (Scenario 2). However, as it is necessary to wait until the APL is reached (rarely in the first 4 hours from the onset of labor induction), its clinical significance is lower. In this scenario, combining obstetrical and EHG features also provided the best performance. However, this combination did not significantly improve the predictive performance with EHG features only (3.2% more accuracy in Scenario 1 and 3.8% in Scenario 2), and the EHG feature sets outperformed the results of the obstetrical features in both scenarios, indicating that EHG features provide more accurate information for classifying labor induction success.

As induction success after drug administration is usually defined as vaginal delivery, we developed vaginal delivery predictor systems (Scenario 3) which are potentially of the greatest clinical interest. Our first approach was a 1-step predictor system (PS3_1, PS3_2, and PS3_3). The average accuracy with obstetrical data only (PS3_1) was 68.9%, slightly lower than that in Sievert et al., in which 73.9% of the subjects were correctly classified in the validation cohort using obstetrical data only: gestational age, Bishop score, suspected growth restriction, chronic hypertension, and body mass index [9], and the area under the receiver-operating curve was 75%, which is lower than the 81% obtained in the present work. The larger AUC could be due to the different methods used to design the systems. In our case, we used neural networks, while Sievert et al. used multivariate logistic regression. Our results were also quite close to those obtained by Pitarello et al [7]., in which transvaginal sonographic cervical measurements were carried out on 190 pregnant women to predict success (defined as vaginal deliveries). The AUC of all the prediction ultrasound cervical parameters were 68.9% for cervical length, 71.6% for fetal head stage, and 72.0% for cervical dilatation.

Using alternative or additional EHG features slightly improved the accuracy of the validation sets (<71%), in contrast to the enhanced EHG prediction achieved in the previous scenarios. This could have been due to the heterogeneous myoelectrical response to induction drugs in the cesarean delivery cohort, composed of subjects that succeeded in achieving regular and intense contractile activity and APL but could not deliver vaginally for other reasons, plus those who did not reach the necessary contractile activity. This situation would have given rise to bad training, poor generalization capacity, and system performance. We thus turned to a second two-step approach for predicting successful APL and vaginal delivery. The accuracy improved insignificantly when using only obstetrical data, but remarkably when using the EHG parameters (79.9% average accuracy in validation), confirming that two-phase assessment of uterine muscle response to the induction drug reduces class heterogeneity, makes it easier to extract information from the EHG, and gives more accurate predictions. It can also be seen that adding obstetrical information to EHG features does not significantly improve accuracy, but does help to balance sensitivity-specificity.

To the best of our knowledge, this is the first time that EHG has been used to predict successful labor induction. The results obtained show that EHG can play an important role in labor management decisions and would help clinicians to avoid or reduce unnecessarily long inductions, decrease maternal-fetal risk and suffering, and reduce hospitalization costs.

The study has certain methodological limitations; firstly, it was composed of subjects administered with two different drugs (prostaglandin E1 and prostaglandin E2), which could have given rise to different electrophysiological responses. However, in a clinical context, the ability to predict the success of labor induction with an overall accuracy of 80%, regardless of the drug used, would be a huge advantage. Furthermore, the results of a randomized study would have had greater impact, especially if it compared the effects of various drugs. However, our aim here was to predict pharmacological induction outcomes using EHG and obstetrical information. In this regard, a previous study revealed no statistically significant differences between women who received prostaglandin E1 and prostaglandin E2 in the obstetrical parameters related to labor progress or outcomes, such as the number of women who delivered vaginally before or after 24 h of induction, the number of women who achieved active labor period and time to reach labor, and the number of women who underwent cesarean section, arterial pH, and vein pH [15]. In our case, we observed the results of the pharmacological induction and its predictive capacity. Secondly, the unbalanced database of success and failure records in the different scenarios could have caused a bias in favor of the majority class, as was found in [12]. For this reason, the SMOTE data oversampling technique was used, which adds synthetic data to alleviate the problem of class imbalance. Other techniques such as ADASYN have been explored to deal with the problem of imbalance and have given similar results. The use of classification methods that take into account unbalanced data such as the weighted extreme learning machine [48] or weighted decision trees [49] could also be explored. In the same context, we should like to point out that we applied SMOTE before splitting up the data subsets (training/validation) as has been done in several studies [5052]. It was seen that when performing cross-validation after simple oversampling, the same samples can be included to build the prediction model and evaluate its performance [53]. Although this is not exactly the case when oversampling with the SMOTE technique, the samples in the training subset can be correlated with samples in the validation subset. It is thus advisable to oversample after data splitting. However, our limited database, mainly the small samples of the minority class in Scenarios 1 and 2, would yield non-extrapolatable validation performance results, since the validation subset would contain very few samples of the minority class in each iteration of the -fold cross validation (3 samples in Scenarios 1 and 2). On the other hand, applying SMOTE to such a low minority class would yield samples similar to the original ones and would not solve this limitation. We thus opted to perform SMOTE on the entire database, as has been done in numerous other studies [5052]. We hope to address this limitation in a future work with a larger database. Finally, PSO is a type of wrapped approach for feature selection that uses a learning/classification algorithm to evaluate the quality of a particular feature subset and so is computationally expensive [54]. In a future work, we plan to evaluate other methods with similar performance but computationally less expensive, such as the embedded or hybrid approaches [55].

5. Conclusions

In this work, the use of uterine electromyography for the prediction of the success of labor induction was evaluated for the first time. The predictor system of three labor induction scenarios was designed using a different set of features: obstetrical, EHG, and both. The EHG features outperformed traditional obstetric features in all the scenarios of labor induction outcome prediction. The combination of the obstetrical and the EHG features resulted in greater performance measures but close to those when using only EHG features. Average accuracies of about 85% were obtained when classifying APL vs. non-APL (scenario 1) and APL-vaginal vs. APL-cesarean (scenario 2). Two approaches were assessed and compared for the classification of vaginal vs. cesarean deliveries (scenario 3). One-step predictor systems resulted in a low predictive capacity () The 2-step predictor system, cascade of the classifiers of Scenario 1 and Scenario 2, yielded accuracy values greater than 80% when EHG features were used. These results indicate that EHG parameters can be used to predict labor induction success in the early stages of labor induction. Therefore, an EHG-based labor induction success predictor system could be implemented to assist obstetricians in the task of labor management, improving maternal-fetal well-being, and reducing hospitalization times and costs.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

In accordance with my ethical obligation as a researcher, I declare that this research project received funding from Bial S.A., which could be affected by the results reported in the enclosed paper. I declare that none of the authors have a conflict of interest.

Acknowledgments

This work received financial support from the Spanish Ministry of Economy and Competitiveness, the European Regional Development Fund (DPI2015-68397-R and RTI2018-094449-A-I00), Universitat Politècnica de València VLC/Campus (UPV-FE-2018-B02), Generalitat Valenciana (GV/2018/104), and Bial S.A. The authors are grateful to the Obstetrics Unit of the Hospital Universitario y Politécnico La Fe de Valencia, where recording sessions were carried out.