Table of Contents Author Guidelines Submit a Manuscript
Advances in Bioinformatics
Volume 2009, Article ID 235320, 9 pages
Research Article

Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints

1Division of Biostatistics, University of Massachusetts, Amherst, MA 01003, USA
2Immunology Program, Wistar Institute, Philadelphia, PA 19104, USA
3Clinical HIV Research Unit, University of Witwatersrand, Johannesburg, South Africa
4Department of Hematology and Molecular Medicine, National Health Laboratory Service and University of Witwatersrand, Johannesburg, South Africa

Received 19 May 2009; Revised 14 August 2009; Accepted 12 October 2009

Academic Editor: George Luta

Copyright © 2009 M. Eliot et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


We demonstrate the application and comparative interpretations of three tree-based algorithms for the analysis of data arising from flow cytometry: classification and regression trees (CARTs), random forests (RFs), and logic regression (LR). Specifically, we consider the question of what best predicts CD4 T-cell recovery in HIV-1 infected persons starting antiretroviral therapy with CD4 count between 200 and 350 cell/L. A comparison to a more standard contingency table analysis is provided. While contingency table analysis and RFs provide information on the importance of each potential predictor variable, CART and LR offer additional insight into the combinations of variables that together are predictive of the outcome. In all cases considered, baseline CD3-DR-CD56+CD16+ emerges as an important predictor variable, while the tree-based approaches identify additional variables as potentially informative. Application of tree-based methods to our data suggests that a combination of baseline immune activation states, with emphasis on CD8 T-cell activation, may be a better predictor than any single T-cell/innate cell subset analyzed. Taken together, we show that tree-based methods can be successfully applied to flow cytometry data to better inform and discover associations that may not emerge in the context of a univariate analysis.