The Scientific World Journal

Volume 2018, Article ID 1916094, 7 pages

https://doi.org/10.1155/2018/1916094

## The Probabilities of Trees and Cladograms under Ford’s -Model

Balearic Islands Health Research Institute (IdISBa) and Department of Mathematics and Computer Science, University of the Balearic Islands, 07122 Palma, Spain

Correspondence should be addressed to Francesc Rosselló; se.biu@ollessor.csec

Received 1 February 2018; Accepted 8 March 2018; Published 18 April 2018

Academic Editor: Béla Tóthmérész

Copyright © 2018 Tomás M. Coronado et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Ford’s -model is one of the most popular random parametric models of bifurcating phylogenetic tree growth, having as specific instances both the uniform and the Yule models. Its general properties have been used to study the behavior of phylogenetic tree shape indices under the probability distribution it defines. But the explicit formulas provided by Ford for the probabilities of unlabeled trees and phylogenetic trees fail in some cases. In this paper we give correct explicit formulas for these probabilities.

#### 1. Introduction

The study of random growth models of rooted phylogenetic trees and the statistical properties of the shapes of the phylogenetic trees they produce was initiated almost one century ago by Yule [1] and it has gained momentum in the last 20 years: see, for instance, [2–8]. The final goal of this line of research is to understand the relationship between the forces that drive evolution and the topological properties of “real-life” phylogenetic trees [3, 9]; see also [10, Chapter 33]. One of the most popular such models is Ford’s -model for rooted bifurcating phylogenetic trees or* cladograms* [4]. It consists of a parametric model that generalizes both the uniform model (where new leaves are added equiprobably to any arc, giving rise to the uniform probability distribution on the sets of cladograms with a fixed set of taxa) and Yule’s model (where new leaves are added equiprobably only to* pendant* arcs, i.e., to arcs ending in leaves) by allocating a possibly different probability (that depends on a parameter and hence its name, “-model”) to the addition of the new leaves to pendant arcs or to internal arcs.

When models like Ford’s model are used to contrast topological properties of phylogenetic trees contained in databases like TreeBase (https://treebase.org), only their general properties (moments, asymptotic behavior) are employed. But, in the course of a research where we have needed to compute the probabilities of several specific cladograms under this model [11], we have noticed that the explicit formulas that Ford gives in [4, ] for the probabilities of cladograms and of* tree shapes* (unlabeled rooted bifurcating trees) are wrong, failing for some trees with leaves; see Propositions and in [4], with the definition of given in page 30 therein, for Ford’s formulas.

So, to help the future user of Ford’s model, in this paper we give the correct explicit formulas for these probabilities. This paper is accompanied by the GitHub page https://github.com/biocom-uib/prob-alpha where the interested reader can find a SageMath [12] module to compute these probabilities and their explicit values on the sets of cladograms with leaves labeled , for every from 2 to 8.

#### 2. Preliminaries

##### 2.1. Definitions, Notations, and Conventions

Throughout this paper, by a* tree *, we mean a rooted bifurcating tree. As it is customary, we understand as a directed graph, with its arcs pointing away from the root, which we shall denote by . Then, all nodes in have out-degree either 0 (its* leaves*, which form the set ) or 2 (its* internal nodes*, which form the set ). The* children* of an internal node are those nodes such that is an arc in , and they form the set . A node is a* descendant* of a node when there exists a directed path from to in . For every node , the* subtree ** of ** rooted at * is the subgraph of induced on the set of descendants of .

A tree is* ordered* when it is endowed with an* ordering * on every set . A* cladogram* (resp., an* ordered cladogram*) on a set of taxa is a tree (resp., an ordered tree) with its leaves bijectively labeled in . Whenever we want to stress the fact that a tree is not a cladogram, that is, it is an unlabeled tree, we shall use the term* tree shape*.

It is important to point out that although ordered trees have no practical interest from the phylogenetic point of view, because the orderings on the sets of children of internal nodes do not carry any phylogenetic information, they are useful from the mathematical point of view, because the existence of the orderings allows one to easily prove certain extra properties that can later be translated to the unordered setting (cf. Proposition 1).

An* isomorphism* of ordered trees is an isomorphism of rooted trees that moreover preserves these orderings. An* isomorphism* of cladograms (resp., of ordered cladograms) is an isomorphism of trees (resp., of ordered trees) that preserves the leaves’ labels. We shall always identify a tree shape, an ordered tree shape, a cladogram, or an ordered cladogram, with its isomorphism class, and in particular we shall make henceforth the abuse of language of saying that two of these objects, ,* are the same*, in symbols , when they are (only) isomorphic. We shall denote by and , respectively, the sets of tree shapes and of ordered tree shapes with leaves. Given any finite set of taxa , we shall denote by and , respectively, the sets of cladograms and of ordered cladograms on . When the specific set is unrelevant and only its cardinal matters, we shall write and (with ) instead of and , and then we shall understand that is .

There exist natural isomorphism-preserving forgetful mappings