Gastroenterology Research and Practice

Gastroenterology Research and Practice / 2016 / Article

Research Article | Open Access

Volume 2016 |Article ID 2636390 |

Somaya Hashem, Gamal Esmat, Wafaa Elakel, Shahira Habashy, Safaa Abdel Raouf, Samar Darweesh, Mohamad Soliman, Mohamed Elhefnawi, Mohamed El-Adawy, Mahmoud ElHefnawi, "Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients", Gastroenterology Research and Practice, vol. 2016, Article ID 2636390, 7 pages, 2016.

Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

Academic Editor: Antoni Castells
Received06 Jul 2015
Revised06 Oct 2015
Accepted07 Oct 2015
Published06 Jan 2016


Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0–F2) or advanced (F3-F4) fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy.

1. Introduction

Chronic hepatitis C (CHC) is recognized as a major healthcare problem worldwide and as a common infection in Egypt, especially genotype 4 [1, 2]. The assessment of liver fibrosis in CHC is essential to monitor the prognosis of the disease, to establish the optimal timing for therapy and management strategies, and to predict the response to treatment [3]. Liver biopsy is considered as mandatory for the management of patients infected with the hepatitis C virus (HCV), particularly for staging of liver fibrosis degree. Some can consider it as a gold standard [4]. However, liver biopsy has potential risk due its limitations including its invasive nature, being costly, being susceptible to sampling error, and the histological assessment that may suffer from variability of results [57]. Therefore, in recent years the noninvasive methods have significantly increased in use as an alternative in staging chronic liver diseases for avoiding the drawbacks of biopsy.

Many noninvasive methods have been proposed to predict fibrosis and cirrhosis in patients with hepatitis C. Noninvasive methods should be safe, easy to perform, inexpensive, reproducible and give numerical and accurate results in real time [8]. There are two kinds of noninvasive methods: based on indexes derived from serum markers [911], such as FIB-4 score and the aspartate aminotransferase- (AST-) to-platelet ratio index (APRI) [12, 13], or based on imaging techniques, such as using Transient Elastography (TE), which uses ultrasound and vibratory waves for estimating the extent of liver fibrosis [1416]. According to Parkes et al. [17], serum markers of liver fibrosis offer an attractive alternative to liver biopsy; they are less invasive than biopsy, with no risk of complications, eliminate sampling and observer variability, and can be performed repeatedly.

In recent years, machine-learning techniques such as classification trees and artificial neural networks (ANN) have been used as prediction, classification, and diagnosis tools [1820]. Alternating decision tree (ADT) combines the simplicity of single decision tree with the effectiveness of boosting [21]. This study aims at combining the serum biomarkers and clinical information to develop a classification model that can differentiate between mild to moderate liver fibrosis and advanced fibrosis stages accurately and to evaluate the usefulness of using decision tree algorithms in prediction of advanced fibrosis.

2. Patients and Methods

2.1. Patients

This study was carried out on 39,567 patients that were enrolled in Egyptian National Committee for Control of Viral Hepatitis database in National Treatment Program of HCV patients in Egypt. They were 10741 females and 28826 males. The laboratory tests were performed at the same time of liver biopsy. The dataset of blood serum for the patients has been investigated and analyzed. The data contains reported clinical information such as age, gender, and body mass index (BMI), histological findings such as grade of fibrosis and the activity, and laboratory tests such as albumin, total bilirubin, indirect bilirubin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), alpha-fetoprotein (AFP), postprandial glucose test (PC%), international normalized ratio (INR), quantity of HCV_RNA, white blood cells (WBC) count, hemoglobin (Hb), platelet count, creatinine, serology finding, glucose, postprandial glucose test (PC%), and platelet count.

All data were obtained on baseline, before starting antiviral therapy. Alcohol consumption was included in the questionnaire of the patients on baseline; most of the fields were missing or with denial of alcoholic consumption. Therefore and due to rare consumption of alcohol by Egyptian people, a specific history of alcohol consumption was not considered as covariant. The study was done under informed consent that was done by the National Committee for Control of Viral Hepatitis.

2.2. Liver Biopsy Histology

Liver histology is determined via METAVIR score [22] as assessed by local pathologists from Egypt. All patients underwent liver biopsy at baseline. Total histological activity index and fibrosis scores (F0–F4) were recorded. According to the METAVIR system, fibrosis was staged on a scale from F0 to F4, as follows: F0: no fibrosis; F1: portal fibrosis, without septa; F2: few septa; F3: many septa without cirrhosis; and F4: cirrhosis, respectively. F0 and F1 were considered as mild fibrosis and F2 as moderate, whereas F3-F4 were considered as advanced fibrosis [23].

2.3. Inclusion Criteria and Exclusion Criteria

Inclusion criteria were age ≥ 18 years and ≤60 years, positive HCV antibodies and detectable HCV RNA by PCR, positive liver biopsy for chronic hepatitis with F1 METAVIR score and elevated liver enzymes or F2/F3 METAVIR score, being naïve to treatment with PEG-IFN and RIB, hepatitis B surface antigen negativity, normal complete blood count, normal thyroid function, prothrombin concentration ≥ 60%, normal bilirubin, α-fetoprotein < 100 (ng/mL), and antinuclear antibody titer < 1/160.

Exclusion criteria were serious comorbid conditions such as severe arterial hypertension, heart failure, significant coronary heart disease, poorly controlled diabetes (hemoglobin A1C > 8.5%), chronic obstructive pulmonary disease, major uncontrolled depressive illness, solid transplant organ (renal, heart, or lung), untreated thyroid disease, history of previous anti-HCV therapy, body mass index (BMI) > 35 kg/m2, known human immunodeficiency virus (HIV) coinfection, hypersensitivity to one of the two drugs (PEG-IFN, RIB), and concomitant liver disease other than hepatitis C (chronic hepatitis B, autoimmune hepatitis, alcoholic liver disease, hemochromatosis, α-1 antitrypsin deficiency, and Wilson’s disease).

2.4. Statistical Analysis, Feature Selection, and Classification

The data were statistically analyzed using the MedCalc software and Microsoft Excel, while Weka Software performed the decision tree learning. Data were reported as mean value ± standard deviation (SD). The relationship between variables and the presence of significant fibrosis has been assessed. The Kruskal-Wallis test has been used for continuous variables with nonnormal distribution. The chi-square test has been used for categorical variables. Pearson’s correlation coefficients between fibrosis and each variable have been assessed.

We implemented several types of decision tree learning techniques such as classification and regression tree (CART) [24], C4.5 [25], reduced error-pruning tree (REP), and alternating decision tree [21]. We evaluated the performance of each of them on the datasets. The test set represents an external data set that was not used for training. The receiver operating curves (ROCs), sensitivities, specificities, predictive values, and accuracies were applied to evaluate the performance of each model or technique on both the training and test sets.

2.5. The Alternating Decision Tree

The alternating decision tree (ADT) is a classification and predictive learning machine method. Traditional boosting decision tree algorithms such as CART [24] and C4.5 [25] create complicated decision tree structures that are hard to interpret. ADTree merges a number of weak hypotheses to induce a boosted one. At the same time, classifiers of this type are easy to interpret [21]. An alternating decision tree, as any decision tree, consists of decision nodes and prediction nodes. Decision nodes specify a collection of attributes. The branches between the nodes convey the possible values that these attributes can have in the observed samples. Prediction nodes have a numeric score. In contrast, in ADT, prediction nodes exist as both root and leaves. An instance is classified in an ADT by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed [26].

3. Results and Discussion

Liver fibrosis was staged (F0–F4) and required laboratory tests were performed. The distribution of fibrosis stages and the three strata among training and test sets were stated in Table 1. Patients were divided according to random uniform sampling into two separate sets. About two-thirds of the dataset were used for training ( = 22,690 patients) and the rest of data for test ( = 16,877 patients). Table 2 states the characteristics of patients in training and test datasets and states the value and Pearson correlation coefficients between each variable and fibrosis in training set. Data expressed as mean ± SD unless otherwise was stated. As recognized, the training and test sets were approximately close to each other. The correlation and value results as shown in Table 2 identified age, body mass index (BMI), alpha-fetoprotein (AFP), aspartate aminotransferase (AST), platelets count, and albumin as independent predictors of fibrosis, with highest statistically significant relationship ( value < 0.0001) and accepted correlation () with fibrosis. Therefore, these variables were used in model 1.

Fibrosis stageTraining dataset ()Test dataset ()

Fibrosis strata
 Mild to moderate (0–2)1934914200
 Advanced (3-4)33412677

CharacteristicsTraining dataset 22690Validation dataset 16877Pearson correlation coefficients value

Age (yrs)40 ± 1140 ± 100.26<0.0001
 Female 6186 (27.3%)4555 (26.9%)−0.030.008
 Male16504 (72.7%)12322 (73.1%)
BMI26.70 ± 3.7926.79 ± 3.840.10<0.0001
AFP (U/L)7.26 ± 26.617.69 ± 28.490.10<0.0001
ALP (U/L)105.41 ± 65.17105.41 ±
AST (U/L)57.27 ± 33.7356.78 ± 34.610.12<0.0001
ALT (U/L)61.84 ± 36.8961.84 ±
Platelet count (109/L)212.48 ± 60.64211.55 ± 60.86−0.18<0.0001
Albumin (g/dL)4.39 ± 0.424.40 ± 0.42−0.14<0.0001
Indirect bilirubin (mg/dL)0.57 ± 1.770.60 ± 2.24−0.000.088
Total bilirubin (mg/dL)0.76 ± 0.280.76 ± 0.280.05<0.0001
Glucose (mg/dL)96.57 ± 19.4196.69 ± 20.710.08<0.0001
Hemoglobin (Hb)14.03 ± 1.4714.03 ± 1.62−0.000.0005
WBC (109/L)6.44 ± 1.906.44 ± 1.94−0.020.0001

In model 1, alternating decision tree was learned for the training data set considering the six variables (which are statistically significant relationship ( value < 0.0001) and accepted correlation coefficients () with fibrosis): age, body mass index (BMI), alpha-fetoprotein (AFP), aspartate aminotransferase (AST), platelet count, and albumin. Figure 1(a) shows the decision tree diagram of model 1. In Figure 1(a), advanced fibrosis is considered as positive, referred to by symbol (adv), while moderate or mild fibrosis is considered as negative, referred to by symbol (m). The liver fibrosis of the patient is scored by summing all of the prediction nodes through which it passes. If the result is more than or equal to zero, then the patient is high risked to have advanced fibrosis and vise a versa.

The four variables age, AFP, platelet count, and AST have the least value and the most correlated coefficients according to our previous work on the subject [11]; therefore in the study we investigate if we can exclude BMI and albumin from the effective prediction features. In model 2, alternating decision tree was learned for the training data set considering the four variables: age, alpha-fetoprotein (AFP), aspartate aminotransferase (AST), and platelet count. These features were similar to FIB-4 features except AFP instead of ALT. Figure 1(b) shows the decision tree diagram of model 2.

For more explanation, Figure 2 represents the flow chart of model 2. The liver fibrosis of the patient is scored by summing all of the prediction nodes through which it passes. Each positive value of prediction nodes boosts the probability of having an advanced fibrosis or decreases it by negative value. If the result was positive (≥0), then it was predicted that the patient has an advanced fibrosis. If the result was negative (<0), then it was predicted that the patient does not have an advanced fibrosis. For example, a 45-year-old patient with AFP of 9.7 U/L, AST of 394 U/L, and platelet a count of 139 × 109/L would have (score = −0.878 + 0.252 − 0.156 + 0.374 + 0.107 + 0.212 − 0.078 + 0.262 = 0.095). The final score of 0.095 is positive; as the criteria value is zero in the ADT techniques, so the patient can be classified to have an advanced liver fibrosis, conforming the fibrosis biopsy result of that patient, which was F3.

Table 3 states the accuracy, ROC analysis, sensitivity, specificity, and positive and negative predictive values of model 1 and model 2 for predicting advanced fibrosis in training set and shows a comparison between the results of these models and FIB-4 algorithm on the test set. Model 2 achieved highest accuracy of 85.7% in training set and 84.8% in test set. Moreover, it shows the highest negative predictive value NPV 87.3% in training set and 86.2% in test set. Figure 3 shows comparison between the ROC curves of proposed ADT model 2 and FIB 4. The areas under the ROC curves whether using model one or two are closer to each other (0.78), and better than the area under the ROC curve of FIB-4 (0.73). When we applied alternating decision tree algorithm (ADT) on cohort data with the six effective variables using cross validation with 10-fold, it achieved 0.78 ROC and 85.3% accuracy, which is very close to the results of using training and test sets separately. The low sensitivity of the models can be attributed to the zero cut-off frequency, which had been selected by ADT algorithm. The ADT algorithm trained at cut-off point zero. We can choose any other cut-off point from the ROC curve to increase the sensitivity but this will be at the expense of the accuracy and the ROC values.

Model Sensitivity %Specificity %PPV %NPV %ROCAccuracy %

Model 115.197.955.2870.7885.7
Model 217.597.554.887.30.7885.7
Model 114.497.956.685.90.7884.7
Model 217.497.556.686.20.7884.8

PPV: positive predictive value; NPP: negative predictive value; ROC: receiver operating characteristic curve.
Applying the model on the training set.
Applying the model on the test set.

As shown in Figure 3, the comparison between the ROC curves of proposed ADT model 2 and FIB 4 shows the preference of the proposed model 2, where there is a difference of 5% in the area under the ROC curve and of 2.5% in the accuracy in the interest of model 2.

4. Conclusion

In this study, we conclude that we can accurately predict advanced fibrosis stage for chronic HCV patients using learning decision trees with high accuracy. The most important features in predicting the advanced fibrosis were age, AFP, AST, and platelet count as they have the least value and the most correlated coefficients as shown in the results of the proposed model. The best model achieved 86.2% NPV, 0.78 ROC, and 84.8% accuracy on the test set, better than classical FIB-4 method. The use of alpha-fetoprotein AFP as a feature of predicting advanced fibrosis in addition to using ADT improves the results compared to those of FIB-4 algorithm which uses ALT instead. The proposed model could be used as an acceptable, safe, and low cost alternate for predicting advanced fibrosis rather than relatively risky alternative tools (such as the liver biopsy) in chronic Egyptian hepatitis C virus patients.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


The authors thank the Egyptian National Committee for Control of Viral Hepatitis for the patients’ information.


  1. S. Shah and J. Shah, “Advances in management of hepatitis C,” Medicine Update, vol. 22, 2012. View at: Google Scholar
  2. World Health Organization, “Hepatitis C,” Fact Sheet no. 164, World Health Organization, 2013. View at: Google Scholar
  3. D. Crisan, C. Radu, M. D. Grigorescu, M. Lupsor, D. Feier, and M. Grigorescu, “Prospective non-invasive follow-up of liver fibrosis in patients with chronic hepatitis C,” Journal of Gastrointestinal and Liver Diseases, vol. 21, no. 4, pp. 375–382, 2012. View at: Google Scholar
  4. P. Bedossa and F. Carrat, “Liver biopsy: the best, not the gold standard,” Journal of Hepatology, vol. 50, no. 1, pp. 1–3, 2009. View at: Publisher Site | Google Scholar
  5. A. Regev, M. Berho, L. J. Jeffers et al., “Sampling error and intraobserver variation in liver biopsy in patients with chronic HCV infection,” American Journal of Gastroenterology, vol. 97, no. 10, pp. 2614–2618, 2002. View at: Publisher Site | Google Scholar
  6. E. Nashaat, “Lipid profile among chronic hepatitis C Egyptian patients and its levels pre and post treatment,” Nature and Science, vol. 8, no. 7, 2010. View at: Google Scholar
  7. H. R. Rosen, “Chronic hepatitis C infection,” The New England Journal of Medicine, vol. 364, no. 25, pp. 2429–2438, 2011. View at: Publisher Site | Google Scholar
  8. M. Y. Kim, W. K. Jeong, and S. K. Baik, “Invasive and non-invasive diagnosis of cirrhosis and portal hypertension,” World Journal of Gastroenterology, vol. 20, no. 15, pp. 4300–4315, 2014. View at: Publisher Site | Google Scholar
  9. X. Forns, S. Ampurdanès, J. M. Llovet et al., “Identification of chronic hepatitis C patients without hepatic fibrosis by a simple predictive model,” Hepatology, vol. 36, no. 4, pp. 986–992, 2002. View at: Publisher Site | Google Scholar
  10. M. Bonacini, G. Hadi, S. Govindarajan, and K. L. Lindsay, “Utility of a discriminant score for diagnosing advanced fibrosis or cirrhosis in patients with chronic hepatitis C virus infection,” The American Journal of Gastroenterology, vol. 92, no. 8, pp. 1302–1304, 1997. View at: Google Scholar
  11. S. Hashem, S. Habashy, W. El-Akel et al., “A simple multi-linear regression model for predicting fibrosis scores in chronic Egyptian hepatitis C virus patients,” International Journal of Bio-Technology and Research, vol. 4, no. 3, pp. 37–46, 2014. View at: Google Scholar
  12. R. K. Sterling, E. Lissen, N. Clumeck et al., “Development of a simple noninvasive index to predict significant fibrosis in patients with HIV/HCV coinfection,” Hepatology, vol. 43, no. 6, pp. 1317–1325, 2006. View at: Publisher Site | Google Scholar
  13. C.-T. Wai, J. K. Greenson, R. J. Fontana et al., “A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis C,” Hepatology, vol. 38, no. 2, pp. 518–526, 2003. View at: Publisher Site | Google Scholar
  14. J. Foucher, E. Chanteloup, J. Vergniol et al., “Diagnosis of cirrhosis by transient elastography (FibroScan): a prospective study,” Gut, vol. 55, no. 3, pp. 403–408, 2006. View at: Publisher Site | Google Scholar
  15. S. Gaia, S. Carenzi, A. L. Barilli et al., “Reliability of transient elastography for the detection of fibrosis in Non-Alcoholic Fatty Liver Disease and chronic viral hepatitis,” Journal of Hepatology, vol. 54, no. 1, pp. 64–71, 2011. View at: Publisher Site | Google Scholar
  16. A. A. Wahba, N. M. Khalifa, A. F. Seddik, and M. I. El-Adawy, “Liver fibrosis recognition using multi-compression elastography technique,” Journal of Biomedical Science and Engineering, vol. 6, no. 11, pp. 1034–1039, 2013. View at: Publisher Site | Google Scholar
  17. J. Parkes, I. N. Guha, P. Roderick, and W. Rosenberg, “Performance of serum marker panels for liver fibrosis in chronic hepatitis C,” Journal of Hepatology, vol. 44, no. 3, pp. 462–474, 2006. View at: Publisher Site | Google Scholar
  18. L. Castera, “Noninvasive methods to assess liver disease in patients with hepatitis B or C,” Gastroenterology, vol. 142, no. 6, pp. 1293.e4–1302.e4, 2012. View at: Publisher Site | Google Scholar
  19. L. Zhang, Q.-Y. Li, Y.-Y. Duan, G.-Z. Yan, Y.-L. Yang, and R.-J. Yang, “Artificial neural network aided non-invasive grading evaluation of hepatic fibrosis by duplex ultrasonography,” BMC Medical Informatics and Decision Making, vol. 12, article 55, 2012. View at: Publisher Site | Google Scholar
  20. M. ElHefnawi, M. Abdalla, S. Ahmed et al., “Accurate prediction of response to Interferon-based therapy in Egyptian patients with Chronic Hepatitis C using machine-learning approaches,” in Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM '12), pp. 771–778, IEEE, Istanbul, Turkey, August 2012. View at: Publisher Site | Google Scholar
  21. Y. Freund and L. Mason, “The alternating decision tree learning algorithm,” in Proceeding of the 16th International Conference on Machine Learning, pp. 124–133, Bled, Slovenia, 1999. View at: Google Scholar
  22. P. Bedossa and T. Poynard, “An algorithm for the grading of activity in chronic hepatitis C. The METAVIR Cooperative Study Group,” Hepatology, vol. 24, no. 2, pp. 289–293, 1996. View at: Google Scholar
  23. D. Wang, Q. Wang, F. Shan, B. Liu, and C. Lu, “Identification of the risk for liver fibrosis on CHB patients using an artificial neural network based on routine and serum markers,” BMC Infectious Diseases, vol. 10, article 251, 2010. View at: Publisher Site | Google Scholar
  24. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, Claif, USA, 1984.
  25. J. Quinlan, “Bagging, boosting, and C4.5,” in Proceedings of the 30th National Conference on Artificial Intelligence, pp. 725–730, Menlo Park, Calif, USA, 1996. View at: Google Scholar

Copyright © 2016 Somaya Hashem et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.