Abstract

The aim of this present study is firstly to compare significant predictors of mortality for hepatocellular carcinoma (HCC) patients undergoing resection between artificial neural network (ANN) and logistic regression (LR) models and secondly to evaluate the predictive accuracy of ANN and LR in different survival year estimation models. We constructed a prognostic model for 434 patients with 21 potential input variables by Cox regression model. Model performance was measured by numbers of significant predictors and predictive accuracy. The results indicated that ANN had double to triple numbers of significant predictors at 1-, 3-, and 5-year survival models as compared with LR models. Scores of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) of 1-, 3-, and 5-year survival estimation models using ANN were superior to those of LR in all the training sets and most of the validation sets. The study demonstrated that ANN not only had a great number of predictors of mortality variables but also provided accurate prediction, as compared with conventional methods. It is suggested that physicians consider using data mining methods as supplemental tools for clinical decision-making and prognostic evaluation.

1. Introduction

Hepatocellular carcinoma (HCC) is the fifth common cancer and the third leading cause of death worldwide. According to the World Health Organization (WHO) statistics in 2000, it has been estimated that there are at least 564,000 new cases of HCC per year around the world [1]. Though Asia and Africa have accounted for 80% of incidence cases of HCC for years, the incidence rates have been found to be significantly increasing in the United States [2] and some European nations [3].

Hepatic resection is one of the most effective treatments and the standard modality to achieve a long-term survival for HCC [4, 5]. However, even with progress in diagnosis and treatment, the overall mortality in HCC patients is still higher than in other types of cancer patients. The factors associated with mortality have been explored by traditional statistical methods, such as logistic regression (LR) and Cox regression [6]. Logistic analysis models hypothesize that as mean values of a given predictor variable increase, the predicted risk of the outcome increases. Despite its recognized limitations [7], LR is still widely used in clinical outcome studies.

Recently, artificial neural networks (ANNs) have proven effective for nonlinear mapping based on human knowledge [8]. Like a network of brain neurons, an ANN containing multiple layers of simple computing nodes can accurately approximate continuous nonlinear functions and can reveal previously unknown relationships between given input and output variables [810]. The unique structure of ANNs is well suited for machine learning methods such as backpropagation [11] and evolutionary algorithms [8, 12, 13]. Because of their universal approximation capability, potential applications of ANNs have attracted interest in some fields [1418]. The novel application of ANN in this study was in predicting postresection prognosis in HCC patients in order to enhance their clinical management by quantifying expected risks.

To our knowledge, no study has applied ANN in predicting the prognosis of HCC patients after resection. Additionally, despite the numerous comparisons of ANN and LR in the literature, no study has convincingly demonstrated which is superior in terms of predictive accuracy [19]. The objectives of the study are accordingly, firstly, to construct an ANN model and predict the input variables associated with the mortality of HCC patients undergoing resection and examine the differences in significant predictors between the ANN and LR models, and secondly, to compare the predictive accuracy of ANN and LR in different survival year estimation models.

2. Patients and Methods

The inclusion and characteristics of the study population are the same as those described in the previous report [6]. Briefly, the study population consisted of 608 consecutive patients with HCC who underwent liver resection at Kaohsiung Medical University Hospital and Yuan’s Hospital in Taiwan. In this study, we first excluded patients who received or underwent the following treatments or conditions: (i) received liver resection before ( ); (ii) treatments with radiofrequency ablation ( ) and microwave ablation ( ); (iii) histopathological reports indicated benign tumor and/or nonprimary liver cancer ( ); (iv) had case history missing and/or was incomplete ( ); (v) expired within thirty days after surgery ( ); and (vi) tumor remained after resection ( ). Further, to enhance data completeness, we excluded patients with missing values in key explained variables ( ) and patient follow-up days of less than one year ( ). Finally, 434, 341, and 264 were included in 1-, 3-, and 5-year survival groups, respectively.

There were two sources of data examined and used in our study: patient clinical information and death registry data. Patients’ clinical information was derived from medical charts and review by attending physician from both hospitals using a constructed questionnaire. The information included patients’ demographics and hepatic biochemical parameters. The mortality data bank is established and maintained by the Statistics Office, Department of Health, Taiwan. Two datasets were merged by unique identifier. All patients were followed until death or December 31, 2008, whichever came first.

2.1. Development of the Artificial Neural Network Models

Waikato Environment for Knowledge Analysis (WEKA) software V3.6.0 (with backpropagation algorithm) was used to construct the ANN model. This user-friendly software is compatible with Microsoft Windows and has been validated for use in developing new machine learning schemes [20].

The outcome variables in this study were death during the study period (event) and survival (no event), which were coded as 1 and 0, respectively. To minimize the effects of extreme values and to enhance the computing efficiency of the ANN model, all continuous explanatory variables were first transformed into categorical variables. The cut-off points for these variables were based on those used in previous clinical studies [6, 2125]. Low and high risk were coded as 0 and 1, respectively. The variables included BUN AST, α-fetoprotein, ALT, total bilirubin, and others. Other recoded items included TNM stage, a common prognostic index of cancer risk or severity, and ASA, a risk score for surgical procedures, were also recoded. The TNM stage ranges from 1 to 6, and ASA score ranges from 1 to 4. Two variables were recoded as 0 for low risk, 1 for medium risk, and 2 for high risk (Table 1). High risk was assumed to increase the probability of death (event).

Model development in this study was performed in two stages. Firstly, to enhance the calculation efficiency and prediction performance of the ANN model construct, a univariate Cox proportional hazard model was used to test variables for potential associations with survival or death. Variables with statistically significant (log-rank test) associations with survival were retained to construct the ANN model (Table 1). Of the 33 input variables, the following 21 statistically significant variables were retained for constructing ANN models: age, comorbidity, liver cirrhosis, α-Fetoprotein, AST, total bilirubin, albumin, BUN, platelet, ASA classification, Child-Pugh classification, TNM stage, tumor number, tumor size, portal vein invasion, biliary invasion, surgical procedure, postoperative complication, recurrence, and postoperative treatment. Additionally, gender was included as a control variable.

Secondly, Figure 1 shows the numbers of neurons in the input, hidden, and output layers of the ANN models of , and 5-year survival. In all three models, the input layers contained 21 neurons. In the hidden layers, the numbers of neurons were optimized using training and validation data in a trial-and-error process to maximize predictive accuracy [26], which resulted in 13, 28, and 17 neurons in the 1-, 3-, and 5-year models, respectively. The output layer in all models contained only one neuron, which represented survival status.

Studies suggest that an ROC plot should present the trade-off between sensitivity and specificity for all possible cut-offs [27]. The SPSS Windows version 6.1 software used for model building in this study automatically generated 110 possible cut-offs for each of the 1-, 3-, and 5-year models. For each of the three models, the authors then selected the best cut-off in terms of accuracy, sensitivity, and specificity.

2.2. Training Groups and Validation Groups

The 1-, 3-, and 5-year survival data were randomly divided into training sets and validation sets. The training data set was used to develop the model whereas the validation data set was used to assess its predictive accuracy [28]. In accordance with the literature, 80% of the data were used for training, and the remaining 20% were used for validation [29, 30]. In the 1-year survival group, for example, data for 347 and 87 patients were used for training and for validation, respectively. Data validation is needed to avoid overtraining an ANN to recognize specific subjects in the training data rather than learning general predictive values. Additionally, and Fisher’s exact test analysis were performed to compare the effects of each input variable in terms of training and validation. Table 2 shows that the effects of all input variables in all three survival models did not significantly differ between training and validation, which confirmed the reliability of the data selection.

In accordance with the criteria used for performance comparisons reported in the literature, the ANN and LR models were compared in terms of overall accuracy (sum of correct predictions divided by total predictions), sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) [9, 14]. Higher scores were considered better for validation. In the WEKA program, ANN model parameters for learning rate, momentum, and training time were set to 0.3, 0.2, and 500, respectively.

3. Results

In this section, the significant predictors were selected according to predictive error ratio (greater than one) for 1-, 3-, and 5-year survival models using ANN and LR in the order of features of demographic, clinical, surgical outcome, and prognosis. Overall, ANN models had more significant input variables at , and 5-year survival models than that of LR models. More specially, ANN had 15, 13, and 9 significant predictors at 1-, 3-, and 5-year survival models, whereas LR only had 8, 4, and 4 variables accordingly.

Notably, six variables in the clinical features dimension were significant predictors in all three survival models constructed by ANN: comorbidity, liver cirrhosis, α-Fetoprotein, platelet, ASA classification, and TNM stage. Among these variables, liver cirrhosis, α-Fetoprotein, and TNM stage were significant predictors for the LR model at 1-year survival model but were consistently significant for ANN at 1-, 3-, and 5-year models.

Table 4 shows the accuracy, sensitivity, and specificity of the 1-, 3-, and 5-year survival estimation models using ANN and LR of the training groups. All three performance criteria were superior in the models using ANN to those using LR in any survival estimation models. For the 1-year survival ANN model, the accuracy was 99.1% in contrast with the 1-year survival model using LR, whose accuracy was 89.0%. Sensitivity for ANN was 100% at the 5-year survival model compared to 67.5% for LR. Specificity for ANN was 96.2% at the 1-year model whereas it was 34.6% for LR.

Table 5 shows the accuracy, sensitivity, and specificity of the 1-, 3-, and 5-year survival estimation models using ANN and LR for validation groups. Although the results were mixed in scores of accuracy, sensitivity, and specificity between ANN and LR, most performance criteria were superior in the models by using ANN to those using LR in any survival models. Take the 5-year survival model, for example, the accuracy was 79.2% for ANN, whereas LR was 70.6%. LR had a relatively higher score (94.9%) in specificity measure at 1-year survival model, but poor value in specificity (25.0%). In contrast, ANN had relatively higher values at both scores in sensitivity (88.6%) and specificity (50.0%).

AUROCs for training data and validation data (Figures 2 and 3, resp.) were significantly higher in ANN models than in LR models. For training data, 1-, 3-, and 5-year survival AUROCs were 0.980, 0.989, and 0.993 in ANN models and 0.845, 0.844, and 0.847 in LR models, respectively. For validation data, the 1-, 3-, and 5-year survival AUROCs were 0.875, 0.798, and 0.810 in ANN models and 0.799, 0.783, and 0.743 in LR models, respectively.

4. Discussion

We have created models for prediction of outcome of HCC patients undergoing resection using ANN with input variables which were found to be significantly associated at univariate analysis. Clinical factors such as comorbidity, liver cirrhosis, α-Fetoprotein, platelet, ASA classification, and TNM stage were significant for 1-, 3-, and 5-year survival in ANN models as shown in Table 3. Among those, only liver cirrhosis, α-fetoprotein, and TNM stage were also found significant for LR at the 1-year prediction model. The consistently significant variables in mortality are suggested to be reviewed by clinicians to examine both short- and long-term clinical outcomes for HCC patients.

The appropriate selection of input variables is vital to the success of ANN construction. The process improves efficiency of the ANN model’s appropriate complexity (by using the most predictive variables) and low redundancy. We first employed traditional statistics to select those variables statistically significant as input variables to make equal comparative analysis. The crude hazard ratio has been widely used by biostatisticians and clinicians to explore the difference between crude and adjusted hazard ratio.

Our study found that ANN had double to triple numbers of significant predictors at 1-, 3-, and 5-year survival models as compared with LR models. A previous study also found such a gap between models derived from ANN and traditional statistical methods [17]. The reason for the difference might be owing to the fact that models derived from logistic regression usually employ variables that are statistically significant predictors of the outcome, and ANN utilizes all possible interactions between all input variables and the outcome, regardless of their statistical significance. ANN can be developed using a number of different training algorithms, many of which are continually being developed and may offer improved prediction accuracy. On the other hand, ANN cannot provide detailed information such as the hazard ratio, which generally provides direction and magnitude of individual variables on outcome variables.

As compared with the 1-year mortality model, numbers of predictors at both ANN and LR models decreased at 3- and 5-year survival models, though the ANN model appeared to have lower decreased rates. This suggested that relationship between input variables and survival status may be correlated rather than simply for the prediction of short-term outcome, and that 3- and 5-year survival status may be confounded by factors that are more complex. The change in health status over time should be examined to have better knowledge on long-term survival estimation.

In all training sets and in most validation sets, accuracy, sensitivity, specificity, and AUROC were higher in the 1-, 3-, and 5-year survival models constructed by ANN than in those constructed by LR, which is consistent with other reports that ANN outperforms LR in both training [15, 3135] and validation [14, 36, 37].

Although the ANN models in the current study generally had higher sensitivity and specificity compared to LR models when using both training data and validation data, a notable exception was specificity when using validation data in the 1-year LR model (Table 5). Compared to the 1-year ANN model, the 1-year LR model had higher sensitivity (94.9%), higher accuracy (88.5%) but lower specificity (25.0%) when using validation data. The literature [38] suggests that specificity and sensitivity values lower than 40% should be considered poor. Sensitivity and specificity are important when testing the capability of a model to recognize positive and negative outcomes. Sensitivity and specificity must also be measured to determine the proportion of false negatives or false positives produced by a model [39]. Comparing false positive and false negative rates explains the tendency of a model to misclassify positive patients as negative patients and vice versa [40]. Ideally, both sensitivity and specificity should be high [40]. According to comparisons of ANN and LR models reported in the literature as well as the experimental results in this study, ANN models have fewer prediction errors.

Although the proposed ANN-based models generally outperformed LR models in this study, the findings of this study should be interpreted cautiously. First, the WEKA program cannot be used if the ANN is constructed with numerous input variables, which can cause “insufficient computer memory” error messages. However, the number of input variables used in the present study was 21 suitable for the program used. Second, an ROC plot should be constructed for all possible cut-offs for a clear representation of the trade-off between specificity and sensitivity. Since the cut-offs used for each of the 1-, 3-, and 5-year survival models in this study were selected by the authors from possible cut-offs generated by a statistical software package, bias could not be ruled out. Third, although previous works adopted a 20% validation group [29, 30], this study adopted 25% and 30% validation groups to detect the sample difference. Therefore, the potential treat from the sample should be noted. Fourth, since the HCC patient sample in the current study was derived from only two hospitals, the ability to generalize the findings is limited. For a stronger methodological conclusion, future studies should test external validity such as by analyzing hepatic resection outcomes in HCC patients treated in different medical institutions.

5. Conclusions

In conclusion, survival estimation models at 1-, 3-, and 5-year intervals for HCC patients undergoing hepatic resection could be constructed by ANN, a data mining method as compared with conventional logistic regression. Arguably more significant predictors of mortality were identified by ANN at 1-, 3-, and 5-year models as compared with LR. The values in accuracy, sensitivity, specificity, and AUROC of ANN models were generally higher than those of LR models.

The study supported previous studies that ANN had better performance in prediction as compared with LR. The study suggested that ANN could become one tool for predicting clinical short- and long-term outcomes. It is suggested that physicians consider using data mining methods as a supplemental tool to make clinical decision-making and prognostic evaluation.

Acknowledgment

This work was in part supported by the National Science Council, Taiwan, under Grant nos. NSC 99-2320-B-037-026-MY2, NSC 95-2314-B-037-079-MY3 and NSC 101-2320-B-037-022.