Abstract

Bioinformatics, as a relatively young discipline, has grown up in a world of high-throughput large volume data that requires automatic analysis to enable us to stay on top of it all. As a response, the bioinformatics discipline has developed strategies to find patterns in a ‘low signal : noise ratio’ environment. While the need to process large amounts of information and extract hypotheses is both laudable and inescapable, the pressures that such requirements have introduced can lead to short cuts and misapprehensions. This is particularly the case with reference to assumptions about the underlying evolutionary theories that are implicitly invoked by the algorithms utilised in the analysis pipelines. The classic example is the misuse of the term ‘homologous’ to mean ‘similar’ or even ‘functionally similar’, rather than the correct definition of ‘having the same evolutionary origin’, which may or may not imply similarity of function. In this review, we outline some of the common phylogenetic questions from a bioinformatics perspective that can be better addressed with a deeper understanding of evolutionary principles and show, with examples from the amidohydrolase and Toll families, that quite different conclusions can be drawn if such approaches are taken. This review focuses on the importance of the underlying evolutionary biology, rather than assessing the merits of different phylogenetic techniques. The relative merits of a priori and a posteriori inclusion of biological information are discussed.