Abstract

Objective. To compare diagnostic values of four intrapartum cardiotocography (CTG) classifications in predicting neonatal acidemia at birth. Methods. Retrospective case-control study. Forty-three CTG traces with an umbilical artery (study group) and 43 traces with a (control group) were analyzed. Inclusion criteria were singleton pregnancy, cephalic presentation, admission to labour ward during active phase of first stage of labour, and gestational age 37+0 to 41+6 weeks. Exclusion criteria were suspected intrauterine growth restriction, oligohydramnios, polyhydramnios, pregestational or gestational insulin-dependent diabetes mellitus, and preeclampsia. Last 30-60 minutes before delivery of CTG traces was classified retrospectively according to four classification systems—International Federation of Gynecology and Obstetrics (FIGO), Royal College of Obstetricians and Gynaecologists (RCOG), National Institute of Child Health and Human Development (NICHD), and the 5-tier system by Parer and Ikeda. Predictive value of each classification for neonatal acidemia was assessed using receiver operating characteristics (ROC) analysis. Results. FIGO, RCOG, and NICHD classifications predicted neonatal acidemia with areas under the ROC curves (AUC) of 0.73, 95% confidence interval (CI) 0.63-0.84; 0.72, 95% CI 0.60-0.83; and 0.69, 95% CI 0.57-0.80, respectively. The five-tier system by Parer and Ikeda had significantly better predictive value with an AUC of 0.96, 95% CI 0.91-1.00. Conclusions. The 5-tier classification system proposed by Parer and Ikeda for assessing CTG in labour was superior to FIGO, RCOG, and NICHD intrapartum CTG classifications in predicting severe neonatal acidemia at birth.

1. Introduction

Cardiotocography (CTG) has become an established method of fetal monitoring since its introduction in the late 1950s [1]. It currently represents the gold standard for fetal surveillance during labour despite lack of firm evidence that its use prevents neonatal mortality and long-term morbidity caused by hypoxic-ischemic injury [1, 2]. High inter- and intraobserver variability in interpretation of CTG traces is one of the main limitations of this intrapartum fetal monitoring methodology [35]. To address this shortcoming, several CTG classification systems have been developed during the last decades with an intent to make CTG interpretation more objective and consistent [6].

Currently, classifications proposed by the International Federation of Gynecology and Obstetrics (FIGO), Royal College of Obstetricians and Gynaecologists (RCOG), National Institute of Child Health and Human Development (NICHD), and the 5-tier system by Parer and Ikeda are most widely used. Tables 1 and 2 present CTG parameter definitions and classification criteria in these four classification systems. Only few studies to date, however, compared predictive values of various classifications. Coletta et al. found the 5-tier system to better predict umbilical artery compared with the NICHD 3-tier system but have not studied predictive values of other classifications [7]. More recently, Di Tommaso et al. found relatively poor predictive values of all CTG classifications studied, including Parer and Ikeda’s 5-tier, the NICHD’s 3-tier, and RCOG’s classifications, for neonatal [8]. They have not, however, included fetuses/neonates with severe acidosis (), which is truly associated with neonatal mortality and morbidity such as hypoxic-ischemic encephalopathy [9].

The objective of our study was to compare predictive values of four commonly used intrapartum CTG classifications for predicting severe neonatal acidemia at birth.

2. Methods

This was a single-center, retrospective, case-control study. CTG tracings from patients delivering at our tertiary perinatal center between 2016 and 2020 with neonatal umbilical artery at birth were matched to patients without fetal/neonatal academia (umbilical artery pH at ) by the following parameters: singleton pregnancies, with cephalic fetal presentation, at term (between 37+0 and 41+6 weeks of pregnancy), admission to labour ward during active phase of first stage of labour (regular contractions every 5 minutes with evidence of cervical ripening) without suspected intrauterine growth restriction, oligohydramnios ( cm or single deepest vertical (SDP) pocket of amniotic  cm), polyhydramnios ( cm and  cm), and gestational or preexisting insulin-dependent diabetes mellitus and preeclampsia. All births fulfilling inclusion criteria with neonatal during the study period were included in the study group. As more births were candidates for controls, we included consecutive births fulfilling matching criteria with neonatal as controls until the same number of births was reached in both groups (1 matched control per case).

At our institution, CTG is routinely monitored throughout labour and umbilical blood collected at birth for acid-base analysis in all deliveries. Blood gases are measured in a calibrated automated analyzer (ABL80 Flex, Radiometer, Denmark) with a pH difference of 0.02 for estimation of arterial and venous umbilical blood samples. Last 30-60 minutes before delivery of selected CTG tracings was retrospectively reviewed and classified according to four intrapartum CTG classifications (FIGO, RCOG, NICHD, and 5-tier classification by Parer and Ikeda) by three obstetricians (K.R., U.L., and M.L.) certified in CTG interpretation. The time limit of 30-60 min before delivery was used due to its best reflection of neonatal pH and was adjusted according to the specific classification system’s definitions [7]. Internal CTG transducer was used in case of inadequate quality of external monitoring in the absence of its contraindications. Traces were classified using definitions and terminology of each classification. Reviewers were blinded to maternal and neonatal clinical characteristics including umbilical artery pH results at birth. Each reviewer classified CTG tracings independently. In cases of discordant classification, they reviewed, discussed, and classified the tracing together reaching consensus. When classification process was completed, electronic medical records were used to obtain maternal and fetal clinical information.

For continuous variables, data were expressed as median with 25th and 75th percentiles. Categorical data were summarized as frequencies and percentages. For comparison between the two study groups (neonatal umbilical artery vs. ≥ 7.00), Mann–Whitney test was used for continuous variables and Chi-square test or Fisher’s exact test for categorical variables, as appropriate. Receiver operating characteristic (ROC) curves were used to evaluate capacity of each classification to predict neonatal acidemia. Areas under the ROC curve (AUC) were calculated with 95% confidence intervals (CI). Predictive value of a classification was considered low with an , moderate with an AUC 0.7–0.9, and high with an . Sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, positive predictive value, and negative predictive value for neonatal acidemia were evaluated for each category within each classification system. The Fleiss kappa coefficient () was calculated to assess the interobserver reliability in classifying CTG parameters for each classification system. Statistical analysis was performed using IBM SPSS Statistics for Windows version 28.0.1.1 (14) (Armonk, NY: IBM Corp.).

The National Medical Ethics Committee approved the study (reference number: 0120-65/2017-3; KME 60/03/17).

3. Results

We identified 43 CTG tracings from patients with an umbilical artery and 43 controls () fulfilling our inclusion criteria. Table 3 presents baseline maternal characteristics and modes of delivery in the two groups. External CTG transducer was used in two cases (5%) in the umbilical artery at birth group and none in the group. In 18 cases, the indication for operative delivery (cesarean section or operative vaginal delivery) was suspected fetal acidemia based on CTG pattern. Neonatal umbilical artery pH was <7.00 in 11 (61%) and ≥ 7.00 in 7 (39%) of these cases.

Table 4 presents neonatal characteristics and outcomes. Besides lower umbilical artery pH, the Apgar scores at 1 and 5 minutes were significantly lower in neonates with umbilical artery .

In 66 (77%) CTG traces, all three reviewers were unanimous in their classification without discussing the trace. In 20 (23%) cases, consensus was reached after discussion. Overall, the interobserver agreement for classifying a CTG tracing was substantial for the FIGO (), RCOG (), and Parer and Ikeda () classifications and almost perfect () for the NICHD classification. Type of decelerations, by which the trace could be classified in different categories, was the object of discussion in all cases. Proportions of CTG tracings in each classification category in the acidemia vs. no-acidemia groups are presented in Table 5.

Analysis of ROC curves showed low to moderate discriminative capacity in prediction of fetal/neonatal acidemia for FIGO (AUC 0.73; 95% CI 0.63-0.84), RCOG (AUC 0.72; 95% CI 0.60-0.80), and NICHD (AUC 0.69; 95% CI 0.57-0.80) classifications. Parer and Ikeda classification had an excellent predictive value with an AUC of 0.96 (95% CI 0.91-1.00) (Figure 1).

Table 6 presents diagnostic values (sensitivity, specificity, positive/negative likelihood ratios, and positive/negative predictive values) for each category within the four classifications. Among all categories in the four classifications analyzed, the green and blue categories in the Parer and Ikeda classification had the highest sensitivities (95.4% and 100%, respectively). Yellow, orange, and red Parer and Ikeda categories also had the highest specificities among all specific categories (97.6%, 97.7%, and 100%, respectively). Similarly, positive and negative likelihood ratios in all Parer and Ikeda categories as well as positive and negative predictive values were higher compared to other classifications.

4. Discussion

Among four of currently most widely used CTG classifications, the 5-tier classification by Parer and Ikeda seems to have the best predictive value for identifying fetuses with severe acidemia during labour. While we found only low to moderate discriminative capacity in prediction of severe neonatal acidemia at birth for FIGO, RCOG, and NICHD classifications, the 5-tier classification predicted neonatal umbilical artery very accurately with an area under the ROC curve of 0.96.

Our results are in accordance with several studies published to date. When comparing the 5-tier proposed by Parer and Ikeda and the 3-tier NICHD classification, Coletta et al. found that the 5-tier system performed better in identifying fetuses at risk for acidemia [7]. These results were recently confirmed by Kikuchi et al. [10]. Gyamfi Bannerman et al. also found the 5-tier classification to be superior to the 3-tier NICHD classification with a 79% sensitivity and 100% specificity for an umbilical artery pH of <7.00 [11]. In their study, 8.3% of tracings with severe acidemia were classified as “green” [11]. This is similar to the 5% of tracings classified as “green” and also to 5% of tracings classified as normal according to other classifications in fetuses with in our study. Usefulness of the 5-tier classification is further corroborated by Japanese experience with countrywide extensive adoption of this method for classifying and managing CTG tracings in labour [12]. Katsuragi et al. found a sevenfold decrease in incidence of metabolic acidemia without a concurrent increase operative delivery rates following introduction of the 5-tier classification [13, 14]. In addition, Elliot et al. demonstrated that the degree and duration of CTG abnormality, defined by the 5-tier system and analyzed using a specialized software, correlated with fetal metabolic acidemia and neurologic injury [15].

While there is plenty of data demonstrating superiority of the 5-tier classification over the 3-tier classification proposed by the NICHD, only few studies compared the 5-tier classification to other CTG classification systems mostly used outside the United States today. In 2013, Di Tommaso et al. compared predictive values of five CTG classifications for diagnosis of fetal/neonatal academia (defined in their study as umbilical artery ) [8]. Besides the 5-tier and the 3-tier classification, they also analyzed CTG classifications proposed by RCOG, Dublin Fetal Heart Rate Monitoring Trial (DFHRMT), and the Society of Obstetricians and Gynaecologists of Canada (SOGC) [8]. Similarly to our study, they found Parer and Ikeda’s 5-tier classification to have the best “trade-off” between sensitivity and specificity. However, in contrast to our results, they found low or moderate predictive values for low neonatal umbilical artery pH in all classifications. Contrarily to our study, however, they only evaluated a sample of fetuses with milder acidemia ( and < 7.15). This could explain the much lower predictive value of the 5-tier classification observed in their study compared to our results. A case-control study published in 2016 compared the FIGO CTG classification system with the 5-tier system by Parer and Ikeda [16]. Gamboa et al. found both classifications to have comparable diagnostic accuracy for mild and severe acidemia. This is in contrast with our results, which indicate a significantly better predictive value of the 5-tier system compared to the FIGO classification for diagnosing severe neonatal academia.

Only few studies to date compared predictive values of other frequently used CTG classifications. Santo et al. evaluated accuracy of CTG interpretation in prediction of newborn acidemia using the FIGO, American College of Obstetricians and Gynecologists (ACOG), and National Institute for Health and Care Excellence (NICE) guidelines interpreted by 27 observers. FIGO and NICE guidelines achieved higher sensitivities compared to ACOG (89 and 97% vs. 32%) whereas ACOG guidelines showed a significantly higher specificity (95%) than FIGO (63%) or NICE (66%) guidelines. According to the authors, the reason for the higher specificity of ACOG classification is due to the fact that ACOG has more restrictive criteria for classifying tracings to category III (to the pathological category) compared to the other two CTG guidelines. Some of the cases of acidemia classify that category II consequently the sensitivity of the guideline is lower and specificity is higher than in the other classifications [17]. Even more recently, Zamora del Pozo et al. compared FIGO, ACOG, NICE, and Chandraharan guidelines’ predictive value for neonatal acidemia with three independent observers. Chandraharan guidelines had the highest discrimination capacity for neonatal acidemia (AUC 0.66; 95% CI, 0.55-0.77), but it did not differ significantly from the other guidelines. As with the former study, ACOG reached the highest specificity (95.73%) among the classifications, while Chandraharan guidelines reached the highest sensitivity (78.79%) [18].

Strengths of our study include the blinded review of CTG tracings, universal cord gas collection, and an index group defined by umbilical artery where the median base excess was -18.1. This is a clinically meaningful index group with umbilical artery gas values well below the 2nd percentile that are associated with significantly elevated risks of newborn encephalopathy, identifying features associated with true pathology. Our study also has several limitations. We only included 43 CTG traces with neonatal . This reflects the low incidence of severe fetal academia among term uncomplicated pregnancies. Our results could, therefore, be due to small number of cases included and should be confirmed or refuted by further studies. We chose to include three different CTG reviewers given the well describer interobserver variability in CTG interpretation. In this way, we managed to avoid studying predictive values of CTG classifications based solely on one clinician’s ability to interpret and classify CTG tracings. However, in everyday clinical practice, we do not always have the luxury of three different reviewers discussing ambiguous cases and reaching consensus. Clinical applicability of our results should, therefore, be tested in prospective studies. This is especially true since the higher predictive value of the 5-tier classification could be due to the fact that traces can be classified in more than just three categories. This could make interpretation and classification more cumbersome and could even increase inter- and intraobserver variability. Our results support this, since further discussion among CTG reviewers was necessary in 23% of cases to reach consensus on type of decelerations. Interobserver agreement for classifying CTG traces was the lowest for the 5-tier Parer and Ikeda classification. Due to its complexity, the 5-tier system could, therefore, require specific staff training and adjustment. Further research is also needed to assess whether subcategorizing the second category of the 3-tier system might produce similar results.

Many decades have been needed for experts and different professional societies to agree on basic definitions of normal and abnormal CTG parameters. The current challenge is to translate this consensus into uniform classification of CTG tracings, which could eventually lead to a more standardized intrapartum CTG management. Different classification systems are now being used in different parts of the world. Our study was designed to compare the performance of four methods of grading intrapartum CTG tracings using visual inspection and existing rule-based methods that are in clinical use. This research question is important because CTG classification systems are often adopted/promoted without data describing their predictive values, let alone justifying one over another. This kind of comparisons is essential for clinicians given the potential for devastating sequelae of delayed intervention and subsequent fetal brain injury based on false reassurance from a certain classification of the tracing or conversely the potential for excessive unnecessary interventions based on an overly sensitive classification method. Our study indicates that among the most widely used CTG classifications today, the 5-tier classification proposed by Parer and Ikeda has the highest discriminative capacity in prediction of neonatal acidemia.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.