BioMed Research International

Volume 2016, Article ID 7534258, 9 pages

http://dx.doi.org/10.1155/2016/7534258

## A Metric on the Space of Partly Reduced Phylogenetic Networks

School of Computer Science, Inner Mongolia University, Hohhot 010021, China

Received 30 March 2016; Accepted 23 May 2016

Academic Editor: Dariusz Mrozek

Copyright © 2016 Juan Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of evolutionary events acting at the population level, such as recombination between genes, hybridization between lineages, and horizontal gene transfer. The researchers have designed several measures for computing the dissimilarity between two phylogenetic networks, and each measure has been proven to be a metric on a special kind of phylogenetic networks. However, none of the existing measures is a metric on the space of partly reduced phylogenetic networks. In this paper, we provide a metric, -distance, on the space of partly reduced phylogenetic networks, which is polynomial-time computable.

#### 1. Introduction

Phylogenies reveal the history of evolutionary events of a group of species, and they are central to comparative analysis methods for testing hypotheses in evolutionary biology [1]. Computing the distance between a pair of phylogenies is very important for understanding the evolutionary history of species.

A metric on a space satisfies four properties for all :(I) (nonnegative property);(II) if and only if (separation property);(III) (symmetry property);(IV) (triangle inequality).

Phylogenetic network can represent reticulate evolutionary events, such as recombinations between genes, hybridization between lineages, and horizontal gene transfer [2–5]. For the comparison of phylogenetic networks, there are many metrics on the restricted subclasses of networks including the tripartition metric on the space of tree-child phylogenetic networks [6–9], the -distance on the space of tree-sibling phylogenetic networks [10], and the -distance on the space of reduced phylogenetic networks [11]. Later the -distance was also proved to be a metric on the space of tree-child phylogenetic networks, semibinary tree-sibling time consistent phylogenetic networks, and multilabeled phylogenetic trees [12–15].

For any rooted phylogenetic network , we can obtain its reduced version by removing all nodes in maximal convergent sets (will be discussed in the following) and all the nodes, with indegree 1 and outdegree 1, from . The reduced versions of all rooted phylogenetic networks form the space of reduced phylogenetic networks (-distance, defined by Nakhleh, is on this space). In this paper, we will discuss the partly reduced version of a phylogenetic network by removing the nodes in a part of the convergent sets and all the nodes, with indegree 1 and outdegree 1, from the phylogenetic network. The partly reduced versions of all rooted phylogenetic networks form the space of partly reduced phylogenetic networks. Then we will introduce a novel metric on the space of partly reduced phylogenetic networks. The space is not the space of rooted phylogenetic networks, but it is the largest space on which a polynomial-time computable metric has been defined so for. The papers [16, 17] have proved that the isomorphism for rooted phylogenetic networks is graph isomorphism-complete. Unless the graph isomorphism problem belongs to , there is no hope of defining a polynomial-time computable metric on the space of all rooted phylogenetic networks. However, our paper’s aim is mainly to find a larger space on which a polynomial-time computable metric can be defined such that the space is closer to the space of rooted phylogenetic networks.

#### 2. Preliminaries

Let be a directed acyclic graph, or DAG for short. We denote the indegree of a node as indeg() and the outdegree of as outdeg(). We will say that a node is a* tree node* if . Particularly, is a* root* of if of . If a single root exists, we will say that the DAG is* rooted*. We will say that a node is a* reticulate node* if . A tree node is a* leaf* if . A node is called an* internal node* if its . For a DAG , we will say that is a* child* of if ; in this case, we will also say that is a* parent* of . Note that any tree node has a single parent, except for the root of the graph. Whenever there is a directed path from a node to , we will say that is a* descendant* of or is an* ancestor* of .

The* height* of a node is the length of a longest path starting at the node and ending in a leaf. The absence of cycles implies that the nodes of a DAG can be stratified by means of their heights: the nodes of height 0 are the leaves; if a node has height , then all its children have heights that are smaller than and at least one of them has height exactly .

The* depth* of a node is the length of a longest path starting at the root and ending in the node. Similarly, the absence of cycles implies that the nodes of a DAG can also be stratified according to their depths: the node of depth 0 is the root; if a node has depth , then all its parents have depths that are smaller than and at least one of them has depth exactly .

Let be a set of taxa. A rooted phylogenetic network on is a rooted DAG such that(i)no tree node has outdeg 1;(ii)its leaves are labeled by by a bijective mapping .

We use the notation (or ) for the rooted phylogenetic network and the notation for its leaf set.

*Definition 1. *Two rooted phylogenetic networks and are isomorphic if and only if there is a bijection from to such that (i) is an edge in if and only if is an edge in ;(ii) for all .

Moret et al. (2004) discussed the concept of reduced phylogenetic networks from a reconstruction standpoint. Subsequently, we briefly review the concept of reduced phylogenetic networks and introduce a new definition of partly reduced phylogenetic networks. In the following section, we present a metric on the space of all partly reduced phylogenetic networks. First we review the concept of a maximal convergent set that has been given in [7, 11].

*Definition 2. *Given a network , we say that a set of internal nodes in is convergent if and every leaf reachable from some node in is reachable from all nodes in .If there is no convergent set containing except itself, we say that is a maximal convergent set.

Here the leaf set reachable from the nodes in a convergent set is called the leaf set of .

We will take Figure 1 as an example in the following. The two networks , on are adapted from refinements () and () in Table in [11].