Table of Contents Author Guidelines Submit a Manuscript
International Journal of Genomics
Volume 2014 (2014), Article ID 160236, 6 pages
Research Article

γ-H2AX: A Novel Prognostic Marker in a Prognosis Prediction Model of Patients with Early Operable Non-Small Cell Lung Cancer

1Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
2Department of Oncology, Democritus University of Thrace, Alexandroupolis, Greece
3Department of Pneumonology, Democritus University of Thrace, Alexandroupolis, Greece
4Department of Cytopathology, University of Athens Medical School, “Attikon” University Hospital, Athens, Greece
52nd Department of Surgery, Democritus University of Thrace, Alexandroupolis, Greece

Received 7 July 2013; Revised 3 November 2013; Accepted 12 December 2013; Published 8 January 2014

Academic Editor: Margarita Hadzopoulou-Cladaras

Copyright © 2014 E. Chatzimichail et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Cancer is a leading cause of death worldwide and the prognostic evaluation of cancer patients is of great importance in medical care. The use of artificial neural networks in prediction problems is well established in human medical literature. The aim of the current study was to assess the prognostic value of a series of clinical and molecular variables with the addition of γ-H2AX—a new DNA damage response marker—for the prediction of prognosis in patients with early operable non-small cell lung cancer by comparing the γ-H2AX-based artificial network prediction model with the corresponding LR one. Two prognostic models of 96 patients with 27 input variables were constructed by using the parameter-increasing method in order to compare the predictive accuracy of neural network and logistic regression models. The quality of the models was evaluated by an independent validation data set of 11 patients. Neural networks outperformed logistic regression in predicting the patient’s outcome according to the experimental results. To assess the importance of the two factors p53 and γ-H2AX, models without these two variables were also constructed. JR and accuracy of these models were lower than those of the models using all input variables, suggesting that these biological markers are very important for optimal performance of the models. This study indicates that neural networks may represent a potentially more useful decision support tool than conventional statistical methods for predicting the outcome of patients with non-small cell lung cancer and that some molecular markers, such as γ-H2AX, enhance their predictive ability.

1. Introduction

Prediction is one of the most interesting areas where intelligent systems are utilized [1]. In particular, prediction is an attempt to accurately forecast the evolution or outcome of a specific situation, using as input information a concrete set of variables that describe this situation [2]. In medicine, the valid and effective interpretation of medical data and the correct and early diagnosis along with a documented prognostic evaluation of the clinical and pathological data are very important parameters for a better management of the disease [3]. Prediction is a very difficult task because the expert human can hardly process the huge amount of data and usually suffers from absence of good and accurate analysis of these laboratory data [4, 5].

Lung cancer is the most common cause of cancer mortality worldwide for both men and women, causing approximately 1.2 million deaths per year. In the United States, there were 221.000 new cases of lung cancer and 157,000 deaths in 2011 [6]. The traditional evaluation of prognosis in non-small cell lung carcinoma (NSCLC) has relied, as in most other malignant tumors, on the stage of disease at the time of clinical presentation [7, 8]. Other factors currently commonly considered include performance status, weight loss, and presence or absence of symptoms at diagnosis, as well as time-honored pathologic parameters, for example, tumor size, tumor differentiation, and histological subtype [9, 10]. However, advances in molecular biology have provided important insights into other potentially significant prognostic biomarkers during the last decade such as the γ-H2AX histone [11]. Detection of γ-H2AX foci has been used as a biomarker for aging and cancer, as a biodosimeter for drug development and radiation exposure, as well as for clinical trials for cancer chemo- and radiotherapy [1214].

The most popular choice for the prediction models in lung cancer is multivariable logistic regression (LR) model. In the report of Beane et al. [15] logistic regression models describe the likelihood of having lung cancer using the biomarker and clinical factors. Although in recent years, there has been growing interest in using artificial neural networks (ANNs) in order to predict lung cancer outcome, their accuracy has not been fully assessed. Consequently, there is a great need for formal evaluation and empirical comparison of neural networks with other conventional statistical methods. In the study of Santos-García et al. [16] an ensemble model of ANNs was proposed in order to predict the cardiorespiratory morbidity after pulmonary resection for non-small cell lung cancer. Despite the criticisms, such as greater computational burden and proneness to overfitting, supporters argue that neural networks provide more meaningful interpretations of data and conjecture that the performance of neural networks will surpass that of regression modelling techniques [17].

The aim of the present study was to predict the outcome of patients with early operable non-small cell lung cancer using ANNs by incorporating γ-H2AX, a new DNA damage response biomarker. The variables of the cohort included both clinical (sex, age, smoking status, TNM status, FEV1, history of adjuvant chemotherapy, and/or radiotherapy) and pathological (histological subtype, differentiation grade, Ki67, ploidy status, EGFR mutation status, apoptotic status (caspase 3), lymphatic and vascular infiltration). Additionally, the central role of two biological markers, p53 which is reported to be associated with pathogenesis of lung cancer and the histone γ-H2AX, has been investigated. This is the first study using γ-H2AX, a DNA damage biomarker, as input in a prognosis prediction model of patients with early operable non-small cell lung cancer.

2. Materials and Methods

Data from 96 patients with NSCLC disease were collected and recorded. All patients underwent radical thoracic surgery of primary tumour (lobectomy or pneumonectomy), together with regional lymph node excision between January 2002 and December 2005 at the Cardiac Surgery Department of Evangelismos Hospital. The corresponding data to each patient were structured in fields containing information about postsurgical measurements and type of treatment. Medical experts pointed out the importance of a set of prognostic factors selected from all the fields which were mentioned above. For each patient, 27 clinical markers were studied and are listed in Table 1. Histology reports were issued according to World Health Organization criteria. Staging was performed according to the 7th Edition of TNM in lung cancer. In addition, 11 of 96 patients who underwent surgery were used for validation of the models. The group comprised of 77 men and 19 women with ages ranging from 36 to 80 years (median age: 66, mean age ± SD: 65.64 ± 7.23 years) and included 42 adenocarcinomas, 42 squamous cell carcinomas, 8 large cell carcinomas, and 4 undifferentiated carcinomas. Two distinct classes of outcome are to be predicted: outcome class 0 refers to patient survival and outcome class 1 refers to death after the resection.

Table 1: Characteristics of the prediction variables.

2.1. Artificial Neural Networks

ANNs are statistical models whose mathematical structure reproduces the biological organisation of neural cells for simulation of the learning dynamics of the brain [18]. In recent years, considerable attention has been paid to the application of ANN-based regression methods for the development of prognostic models in oncology [19]. Although some doubts have been raised about the real advantages of ANNs over traditional techniques [20], a recent review highlights their benefits for outcome prediction [2123].

ANNs have been developed as generalizations of mathematical models of biological nervous systems [2430]. A neural network has to be configured in such a way that the application of a set of inputs produces the desired set of outputs. Various methods to set the strengths of the connections exist. One way is to set the weights explicitly, using priori knowledge. Another way is to train the neural network by feeding it with teaching patterns and letting it change its weights according to some learning rule.

In order to implement an ANN, the network is processed in two levels, training and testing. In the level of training, the network is trained for an output prediction on the basis of input data. In the testing level, the network is used to predict an output. When the tested error reaches the desired tolerance value, the training of the network is stopped.

The back propagation (BP) algorithm is the most popular algorithm which has the widest area of use. The learning process in a BP network takes place in two steps. First, each pattern is presented to the network and propagated forward to the output. Second, a method called “gradient descent” is used to minimize the total error on the patterns in the training set. In gradient descent, the weights are changed in proportion with the negative of an error derivative with respect to each weight. The weights move in the direction of steepest descent on the error surface defined by the total error, where is the number of patterns, is the predicted output, and the target:

The BP training algorithm is a gradient descent and its main function is to improve the performance of the network by reducing the total error through changing weights along its gradient. Finally, the tested mean squared errors (MSE) stop decreasing and they begin to increase, which is a sign of overtraining; the training is stopped.

2.2. Application of Artificial Neural Networks to Patient Data

The Neural Network Toolbox of Matlab was the software used for the artificial neural network development. Three independent models for patient estimation outcome at 1, 2, and 4 years after surgery were constructed. The output values of survival and death for each estimation model were set to 0 and 1, respectively. Multicollinearity would be expected to be present among some of the variables. However, calculation of the correlation coefficients for every combination of all variables in Table 1 revealed no disturbingly high correlation, and every correlation coefficient was found to be less than 0.85. Therefore, all 27 variables were used as potential inputs for the current analysis.

An MLP network which contains an input layer, a hidden layer, and an output layer was used. The number of units in the input layer is equal to the number of the input variables, while the output layer has only one unit which represents the status of the patient. After the ANN structure was designed, the data which were obtained in the experimental study were normalized in the 0-1 value set by using (2) in order to improve the characteristics of the training:

Initially, the connection weight parameters were randomly assigned between 0 and 1, and subsequently they were automatically altered by the back propagation learning method to identify the optimal relationships between the input and the output. If useless variables are used as inputs of the ANN model, the accuracy of this model would be lower than that using only necessary variables. In addition to that, in cases where the number of connection weight parameters is much larger than that of the learning data set, the resultant model may have less generalizability and flexibility. Therefore, it was necessary to decrease the number of input and hidden units in order to optimize the model. For this reason, the parameter-increasing method (PIM) was used. The first step of PIM was to choose the most useful single input variable for accurate prediction. In the next step, the second most useful variable was selected. By repeating this operation, the best combination of input variables was selected in the prediction model. In the same way, the number of units in the hidden layer was decreased one by one from 10 in order to determine the ANN algorithm that yielded the best result.

The BP algorithm was used in the training procedure as well. Different transfer functions (purelin, tansig, logsig, etc.) were used and tried in the neurons in the hidden and output layers and logarithmic-sigmoid (logsig) was selected as the transfer function that yielded the best result.

The 5-fold cross-validation method was used for estimating the performance of the predictive model. At first, the 85 patients were divided into 5 subgroups. One of the 5 subgroups was used as evaluation data and the rest as learning data. The evaluation data were changed 5 times, such that each group was evaluated once as evaluation data. The average value of all accuracies of the evaluation data was considered as the estimation ability of the ANN model. In addition, a completely independent data set of 11 patients was also used to validate the ANN model constructed as described above.

The performance of the neural networks was estimated using judgement ratio (JR) and accuracy. Classification of a normal data as abnormal is considered as FP and classification of abnormal data as normal is considered as FN. TP and TN can be determined by the same way. The JR indicates the percentage of patients on which judgment can be achieved, while accuracy is used as a statistical measure of how well the binary classification correctly identifies or excludes the patient’s outcome. JR and accuracy are presented in the following equations: where , , , , and are the number of , , , , and all collected data, respectively.

3. Results

The logistic regression in this study was chosen as an accepted standard for prediction by biostatisticians in order to evaluate the modeling method [31]. SPSS for Windows (SPSS regression models 17.0, SPSS Inc., Chicago, IL) was used for LR modeling. The input variables for the LR model were optimized by PIM based on the likelihood ratio.

The selected variables for the 1-, 3-, and 4-year prediction models using ANN and LR in the order selected by PIM are shown in Table 2. The variables which are selected in the earlier steps are more useful for the l-year prediction. It was observed that several biological variables, especially p53 and ki67 as well as the histone γ-H2AX, were repeatedly selected in the models. The number of the units in hidden layers for the 1-, 3-, and 4-year prediction models was optimized resulting in 3, 5, and 2 units, respectively.

Table 2: Selected input variables by the parameter-increasing method.

The JR and accuracy of the 1-, 3-, and 4-year prediction models using ANNs and LR are presented in Table 3. Both of them were much higher in the models using ANNs than those using LR in most of the cases. To assess the importance of the two factors, p53 and γ-H2AX, models without these two variables were also constructed. JR and accuracy of these models were lower than those of the models using all input variables (Tables 4 and 5), suggesting that these biological markers are very important for optimal performance of the models.

Table 3: Comparison of predictive models for 1-, 3-, and 4-year using ANN and LR.
Table 4: Selected input variables by the parameter-increasing method.
Table 5: Comparison of predictive models for 1-year prediction.

In order to investigate the efficacy of the constructed ANN model, an independent data set of 11 patients, who underwent surgery, was used for validation. JR was 72.7% and 81.8% for 3- and 4-year prediction of the outcome, respectively (Table 6).

Table 6: Estimation of 3- and 4-year outcome prediction for unlearned data set.

4. Discussion

With the development of ANNs as an alternative method to logistic regression for prediction, research has been conducted to investigate the differences between the two techniques [32, 33]. There are many advantages and disadvantages to the use of artificial neural networks as a classification tool. ANNs have an excellent capability of learning the relationship between the input-output mapping from a given dataset without any prior knowledge or assumptions about the statistical distribution of the data. This capability of learning from a certain dataset without any priori knowledge makes the neural networks quite suitable for classification and prediction tasks in practical situations. Furthermore, neural networks are inherently nonlinear which makes them more practicable for accurate modelling of complex data patterns, as opposed to many traditional methods based on linear techniques. Due to their behaviour, they have found application in a wide range of medical fields such as cardiology, gastroenterology, pulmonology, oncology, neurology, and paediatrics [2027].

One of the disadvantages of ANNs when compared to logistic regression models is that ANNs frequently have difficulty analyzing systems which have a large number of inputs due to the large amount of time taken to learn the system as well as possibly overfitting the model during the learning time. Linear and logistic regression models have less potential for overfitting primarily because the range of functions they can model is limited.

Recently the task of comparison between these two models has been addressed from different points of view. Several published works in the medical literature have demonstrated the success of the ANN approaches. In a review carried out by Sargent et al. on 28 major studies, ANN outperformed logistic regression in 10 cases (36%) and was outperformed by regression in 4 cases (14%) and the 2 methods had similar performance in the remaining cases. Sargent concluded that both methods should continue to be used and explored in a complementary manner [34].

In this study, ANNs and LR achieved promising prediction results when clinical parameters and molecular factors were considered simultaneously in the prediction model. The predictive ability of ANNs was found to be comparable to that of the logistic regression model. Specifically, the ANN models significantly outperformed logistic models in terms of accuracy. ANNs had a prediction success rate of about 88%. Although the success rate of correct prediction was not 100%, this study shows that the rate can be improved step by step when parameters and novel molecular parameters involved in lung cancer are added and considered together.

Moreover, the present study was able to show each factor’s importance priority in lung cancer. For the first time, γ-H2AX, a DNA damage biomarker, was used in a prognosis prediction model of patients with early operable non-small cell lung cancer. Our research team was the first to demonstrate that overexpression of γ-H2AX may represent an independent prognostic indicator of worse overall survival in patients with non-small cell lung cancer [35].

5. Conclusions

In conclusion, our study demonstrated that the incorporation of γ-H2AX in an artificial network prediction model for patients with early operable NSCLC outperformed logistic models in terms of accuracy. A better prediction of non-small cell lung cancer prognosis will be possible by increasing patient’s data, adding appropriate input parameters as biomarkers, and using artificial intelligence methods that can work together with ANNs.

Conflict of Interests

The authors declare that there is no conflict of interests.


  1. B. Pandey and R. B. Mishra, “Knowledge and intelligent computing system in medicine,” Computers in Biology and Medicine, vol. 39, no. 3, pp. 215–230, 2009. View at Publisher · View at Google Scholar · View at Scopus
  2. J. A. Gómez-Ruiz, J. M. Jerez-Aragonés, J. Muñoz-Pérez, and E. Alba-Conejo, “A neural network based model for prognosis of early breast cancer,” Applied Intelligence, vol. 20, pp. 231–238, 2004. View at Google Scholar
  3. O. Er, N. Yumusak, and F. Temurtas, “Chest diseases diagnosis using artificial neural networks,” Expert Systems with Applications, vol. 37, no. 12, pp. 7648–7655, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. I. Saritas, “Prediction of breast cancer using artificial neural networks,” Journal of Medical Systems, vol. 36, no. 5, pp. 2901–2907, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. F. Feng, Y. Wu, Y. Wu, G. Nie, and R. Ni, “The effect of artificial neural network model combined with six tumor markers in auxiliary diagnosis of Lung Cancer,” Journal of Medical Systems, vol. 36, pp. 2973–2980, 2011. View at Publisher · View at Google Scholar · View at Scopus
  6. R. Siegel, E. Ward, O. Brawley, and A. Jemal, “Cancer statistics, 2011: the impact of eliminating socioeconomic and racial disparities on premature cancer deaths,” CA-Cancer Journal for Clinicians, vol. 61, no. 4, pp. 212–236, 2011. View at Publisher · View at Google Scholar · View at Scopus
  7. P. Goldstraw, D. Ball, J. R. Jett et al., “Non-small-cell lung cancer,” The Lancet, vol. 378, no. 9804, pp. 1727–1740, 2011. View at Publisher · View at Google Scholar · View at Scopus
  8. C. Stephan, H. Cammann, H. Rittenhouse et al., “New biomarkers and application of multivariate models for detection of prostate cancer,” Aktuelle Urologie, vol. 40, no. 4, pp. 221–230, 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. L. Cagini, M. Monacelli, G. Giustozzi et al., “Biological prognostic factors for early stage completely resected non-small cell lung cancer,” Journal of Surgical Oncology, vol. 74, pp. 53–60, 2000. View at Google Scholar
  10. M. Gönen, “Bias, biostatistics, and prognostic factors,” Journal of Thoracic Oncology, vol. 6, no. 7, pp. S1705–S1709, 2011. View at Publisher · View at Google Scholar · View at Scopus
  11. Q. Hu, B. Li, D. Garfield et al., “Prognostic factors for survival in a Chinese population presenting with advanced non-small cell lung cancer with an emphasis on smoking status: a regional, single-institution, retrospective analysis of 4552 patients,” Thoracic Cancer, vol. 3, pp. 162–168, 2012. View at Google Scholar
  12. J. S. Dickey, C. E. Redon, A. J. Nakamura, B. J. Baird, O. A. Sedelnikova, and W. M. Bonner, “H2AX: functional roles and potential applications,” Chromosoma, vol. 118, no. 6, pp. 683–692, 2009. View at Publisher · View at Google Scholar · View at Scopus
  13. M. Podhorecka, A. Skladanowski, and P. Bozko, “H2AX phosphorylation: its role in DNA damage response and cancer therapy,” Journal of Nucleic Acids, vol. 2010, Article ID 920161, 9 pages, 2010. View at Publisher · View at Google Scholar
  14. D. Matthaios, D. Bouros, and S. Kakolyris, “H2AX and lung cancer:is it the Ariadne’s thread?” DNA Repair, vol. 12, no. 2, pp. 90–91, 2013. View at Google Scholar
  15. J. Beane, P. Sebastiani, T. H. Whitfield et al., “A prediction model for lung cancer diagnosis that integrates genomic and clinical features,” Cancer Prevention Research, vol. 1, no. 1, pp. 56–64, 2008. View at Publisher · View at Google Scholar · View at Scopus
  16. G. Santos-García, G. Varela, N. Novoa, and M. F. Jiménez, “Prediction of postoperative morbidity after lung resection using an artificial neural network ensemble,” Artificial Intelligence in Medicine, vol. 30, no. 1, pp. 61–69, 2004. View at Publisher · View at Google Scholar · View at Scopus
  17. E. Bartfay, W. J. Mackillop, and J. L. Pater, “Comparing the predictive value of neural network models to logistic regression models on the risk of death for small-cell lung cancer patients,” European Journal of Cancer Care, vol. 15, no. 2, pp. 115–124, 2006. View at Publisher · View at Google Scholar · View at Scopus
  18. C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, NY, USA, 1995.
  19. K. Balachandran and R. Anitha, “Supervised learning processing techniques for pre-diagnosis of lung cancer disease,” International Journal of Computer Applications, vol. 1, article 17, 2010. View at Google Scholar
  20. G. Schwarzer, W. Vach, and M. Schumacher, “On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology,” Statistics in Medicine, vol. 19, pp. 541–561, 2000. View at Google Scholar
  21. P. J. G. Lisboa, “A review of evidence of health benefit from artificial neural networks in medical intervention,” Neural Networks, vol. 15, no. 1, pp. 11–39, 2002. View at Google Scholar · View at Scopus
  22. S. A. Grumett and P. B. Snow, “Artificial neural networks: a new model for assessing prognostic factors,” Annals of Oncology, vol. 11, no. 4, pp. 383–384, 2000. View at Publisher · View at Google Scholar · View at Scopus
  23. E. Biganzoli, P. Boracchi, D. Coradini, M. G. Daidone, and E. Marubini, “Prognosis in node-negative primary breast cancer: a neural network analysis of risk profiles using routinely assessed factors,” Annals of Oncology, vol. 14, no. 10, pp. 1484–1493, 2003. View at Publisher · View at Google Scholar · View at Scopus
  24. E. Chatzimichail, E. Paraskakis, M. Sitzimi, and A. Rigas, “Predicting the long-term outcome of preschool children with asthma symptoms,” in Proceedings of the E-Health and Bioengineering Conference (EHB '11), pp. 1–4, November 2011. View at Scopus
  25. A. Ciampi and F. Zhang, “A new approach to training back-propagation artificial neural networks: empirical evaluation on ten data sets from clinical studies,” Statistics in Medicine, vol. 21, no. 9, pp. 1309–1330, 2002. View at Publisher · View at Google Scholar · View at Scopus
  26. V. Van Belle, K. Pelckmans, J. A. K. Suykens, and S. Van Huffel, “Additive survival least-squares support vector machines,” Statistics in Medicine, vol. 29, no. 2, pp. 296–308, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. E. A. Chatzimichail, A. G. Rigas, and E. N. Paraskakis, “An artificial intelligence technique for the prediction of persistent asthma in children,” in Proceedings of the 10th International Conference on Information Technology and Applications in Biomedicine (ITAB '10), pp. 1–4, November 2010. View at Publisher · View at Google Scholar · View at Scopus
  28. E. Pagano, C. Filippini, D. Di Cuonzo et al., “Factors affecting pattern of care and survival in a population-based cohort of non-small-cell lung cancer incident cases,” Cancer Epidemiology, vol. 34, no. 4, pp. 483–489, 2010. View at Publisher · View at Google Scholar · View at Scopus
  29. L. Liu, E. Zhao, C. Li et al., “TRIM28, a new molecular marker predicting metastasis and survival in early-stage non-small cell lung cancer,” Cancer Epidemiology, vol. 37, pp. 71–78, 2013. View at Google Scholar
  30. E. Biganzoli, P. Boracchi, L. Mariani, and E. Marubini, “Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach,” Statistics in Medicine, vol. 17, pp. 1169–1186, 1998. View at Google Scholar
  31. C. R. Mehta and N. R. Patel, “Exact logistic regression: theory and examples,” Statistics in Medicine, vol. 14, no. 19, pp. 2143–2160, 1995. View at Google Scholar · View at Scopus
  32. J. V. Tu, “Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes,” Journal of Clinical Epidemiology, vol. 49, no. 11, pp. 1225–1231, 1996. View at Publisher · View at Google Scholar · View at Scopus
  33. B. Eftekhar, K. Mohammad, H. E. Ardebili, M. Ghodsi, and E. Ketabchi, “Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data,” BMC Medical Informatics and Decision Making, vol. 5, article 3, 2005. View at Publisher · View at Google Scholar · View at Scopus
  34. D. J. Sargent, “Comparison of artificial neural networks with other statistical approaches: results from medical data sets,” Cancer, vol. 91, no. 8, supplement, pp. 1636–1642, 2001. View at Google Scholar · View at Scopus
  35. D. Matthaios, P. G. Foukas, M. Kefala et al., “γ-H2AX expression detected by immunohistochemistry correlates with prognosis in early operable non-small cell lung cancer,” OncoTargets and Therapy, vol. 5, pp. 309–314, 2012. View at Google Scholar