Noncontiguous Pattern Containment in Binary Trees
We consider the enumeration of binary trees containing noncontiguous binary tree patterns. First, we show that any two -leaf binary trees are contained in the set of all -leaf trees the same number of times. We give a functional equation for the multivariate generating function for number of -leaf trees containing a specified number of copies of any path tree, and we analyze tree patterns with at most 4 leaves. The paper concludes with implications for pattern containment in permutations.
Pattern avoidance has been studied in a number of combinatorial objects including permutations, words, partitions, and graphs. In this paper, we consider such pattern questions in trees. Conceptually, tree avoids tree if there is no copy of anywhere inside . Pattern avoidance in vertex-labeled trees has been studied in various contexts by Steyaert and Flajolet , Flajolet et al. , Flajolet and Sedgewick , and Dotsenko  while Khoroshkin and Piontkovski  considered generating functions for general unlabeled tree patterns in a different setting.
In 2010, Rowland  explored contiguous pattern avoidance in binary trees (i.e., rooted ordered trees in which each vertex has 0 or 2 children). He chose to work with binary trees because there is natural bijection between -leaf binary trees and -vertex trees. In 2012, Gabriel et al.  considered Rowland’s definition of tree pattern in ternary, and more generally in -ary, trees.
The patterns in [6, 7] may be seen as parallel to consecutive patterns in permutations. In those papers, tree was said to contain tree as a (contiguous) pattern if was a contiguous, rooted, ordered, subtree of . In 2012, Dairyko et al.  considered noncontiguous patterns in binary trees in order to introduce a tree pattern analogue of classical permutation patterns. In particular, they showed that for any , any two -leaf noncontiguous binary tree patterns are avoided by the same number of -leaf trees and gave an explicit generating function for this enumeration.
In this paper, we follow the definition of tree pattern in  to mirror the idea of classical pattern avoidance in permutations. However, instead of focusing on trees that do not contain tree pattern , we turn our attention to the number of trees with exactly copies of tree pattern , making pattern avoidance the special case where . Ultimately, we study the total number of copies of a given tree pattern in the set of all -leaf trees to mirror the work of Bóna in [9, 10] where he considers the total number of copies of a given permutation pattern of length 3 in the set of all 132-avoiding permutations of length .
All trees in this paper are rooted and ordered. We will focus on full binary trees, that is, trees in which each vertex has 0 or 2 (ordered children). Two children with a common parent are sibling vertices. A vertex with no children is a leaf and a vertex with 2 children is an internal vertex. A binary tree with leaves has internal vertices, and the number of such trees is given by the th Catalan number (OEIS A000108). For simplicity of computation, we adopt the convention that there are zero rooted binary trees with zero leaves. The first few binary trees are shown in Figure 1, with names that will be referred to throughout the paper.
2. Definitions and Notation
Tree contains as a (noncontiguous) tree pattern if can be obtained from via a finite sequence of edge contractions. Conversely, avoids if there is no sequence of edge contractions that produces from . For example, consider the three trees shown in Figure 2. avoids as a contiguous pattern, but contains noncontiguously (contract both dashed edges). On the other hand, avoids both contiguously and noncontiguously since no vertex of has a left child and a right child, both of which are internal vertices.
The definition of pattern in the previous paragraph is unambiguous for deciding the question “does contain ?” but becomes more complicated when determining “how many copies of are in ?” To remove ambiguity, we make the convention that if an edge between a parent vertex and a child vertex is contracted, then the edge from the parent to its other child must be contracted simultaneously.
Define to be the number of -leaf binary trees that contain exactly copies of tree pattern noncontiguously. For any tree , let be the number of copies of in . We write for the set of -leaf binary trees and . Further, let be the number of leaves of . We are particularly interested in determining for various choices of , , and . To this end, we define In , the authors were concerned with pattern avoidance, so they focused on . They showed the following enumeration.
Theorem 1 (Dairyko et al. ). Let and be a binary tree pattern with leaves. Then
Corollary 2 (Dairyko et al. ). Fix . Let and be two -leaf binary tree patterns. Then
We obtain a parallel result to Corollary 2 if we focus on We compute for any tree and in Section 3. In Section 4, we find a functional equation for for any path tree (i.e., any tree avoiding in Figure 1), and in Section 5 we consider for any tree pattern with at most leaves. Finally, in Section 6 we consider implications for pattern containment in permutations.
3. Total Number of Copies
In this section, we compute , that is, the total number of occurrences of tree pattern in , for any tree pattern and any positive integer . Theorem 3 is parallel to a result of Steyaert and Flajolet . They showed that the total number of occurrences of a (contiguous) -leaf binary tree pattern in all -leaf binary trees is independent of the tree pattern and is . As it turns out, for noncontiguous tree patterns, we also have the following.
Theorem 3. Fix . Let and be two -leaf binary tree patterns. Then
In the following argument we give a bijective proof of Theorem 3. Notice that this is a different approach from the proof of Theorem in , which relies on algebraic manipulation of recurrences and generating functions.
Since we are concerned with pairs of -leaf binary trees, we make some definitions allowing us to more precisely compare and . First, the intersection of trees and is the largest contiguous rooted tree that is contained in both and and includes the root vertex. For example, Figure 3 shows trees and along with their intersection.
Two -leaf trees whose intersection has exactly leaves are called neighboring trees. Thus, and in Figure 3 are neighboring trees. On the other hand, and its left-right reflection are nonneighboring since their intersection has only 2 leaves. By definition, if and are neighboring trees, then each of them has exactly two vertices that are not part of the intersection. Call the vertex on each tree that is the parent of the nonintersection vertices the breaking point. For example, in Figure 2, the breaking point of is the left child of the left child of the root. The breaking point of is the right child of the root. In fact, since both ’s breaking point and ’s breaking point are part of their intersection, we can identify both breaking points on either of the original trees or on their intersection.
Given neighboring trees and , we define a map from the set of copies of in to the set of copies of in . If appears noncontiguously, we may still identify a (possibly noncontiguous) copy of using only edges from that copy of . The breaking points along this copy of are then uniquely determined.
Given a copy of in a particular -leaf tree, find both breaking points on the intersection and swap the subtrees that have the breaking points as their roots. We have now obtained an -leaf tree with a unique copy of that has the same intersection with and the same breaking points. This copy of is . Figure 4 shows a copy of being mapped to a copy of via .
Since and are neighboring trees, it is clear that maps a copy of to a copy of . Further, since only involves swapping two well-defined subtrees. Thus, is a bijection from the set of copies of in to the set of copies of in .
The fact that is a bijection shows that Theorem 3 is true when and are neighboring trees. To show that the theorem holds in general we need the following lemma.
Lemma 4. Given two -leaf trees and , there is a finite sequence , of trees such that for any , (), and are neighboring trees.
Proof. Let and be nonneighboring -leaf binary trees whose intersection is a -leaf tree. Clearly, since both trees share the root vertex and its two children.
To obtain from , remove a pair of leaves with a common parent from that are not in the intersection of and and attach them to a leaf of that is not a leaf of . The new tree has a larger intersection with . Repeat until the intersection has leaves.
Since any two -leaf tree patterns are a finite sequence of neighboring trees apart, we have that provides a bijection between all copies of in and all copies of in , so Theorem 3 is true.
Theorem 3 also generalizes naturally for copies of -ary tree patterns within the set of all -ary trees with leaves. The definition of intersection and breaking points remains unchanged, and the swapping action of still applies. To find a sequence of neighboring trees, we need only to move collections of vertices with a common parent instead of pairs, and the rest of the argument goes through as expected.
Now that we know that is the same for all -leaf trees , we define where is an -leaf tree and compute in general.
Our first proposition deals with the pathological case of .
Proposition 5. where is the th Catalan number.
Proof. There is exactly one way to contract all edges of a tree to produce the one-leaf tree. Since there is one copy of the one-leaf tree in any given binary tree and there are binary trees with leaves, we see that there are copies of the one-leaf tree in .
Proposition 6. .
Proof. There is only one two-leaf tree and the number of copies of this tree in tree is equal to the number of internal vertices of , which is one less than the number of leaves of . Since there are copies of the two-leaf tree in any -leaf tree and there are -leaf trees, .
More generally, we obtain the following recurrence for .
Proof. Consider -leaf tree pattern . A copy of in can (a) be fully contained in the left subtree of ’s root, (b) be fully contained in the right subtree of ’s root, or (c) include ’s root.
For the first case, suppose that is an ()-leaf tree containing . appears times as the left subtree of some -leaf tree in . Therefore, the number of times that is fully contained in a left subtree of an -leaf tree in is . The same sum also counts the number of times that is fully contained in a right subtree of an -leaf tree in .
If a copy of includes the root of , we must count copies of ’s left subtree to the left of the root and ’s right subtree to the right of the root. By Theorem 3, we may assume that is the -leaf right comb, that is, the unique -leaf tree where every left child is a leaf. This means that the number of ways for an -leaf tree to have a copy of that includes the root is where counts copies of the 1-leaf left subtree addressed in Proposition 5, and counts copies of the ()-leaf right comb in the right subtree.
Fix , and let . Then, using the recurrence of Proposition 7, we have that We know from Proposition 5 that , the generating function for the Catalan numbers, so by induction, we have Moreover, the following theorem enumerates all copies of a given -leaf tree in for any .
Theorem 8. Consider
Table 1 gives values of for and . As expected, if and . It also follows that is a polynomial in of degree . Further, and were given above. ; this is entry A006419 in the Online Encyclopedia of Integer Sequences , which gives several other combinatorial interpretations. for appear to be new sequences to the literature.
4. Pattern Containment of Path Trees
Now that we know for any -leaf tree, we turn our attention to computing for particular tree patterns. In this section we give a functional equation for for the case where is a path tree, that is, has no vertex which has both left and right grandchildren. Each -leaf path tree can be encoded uniquely by a word in . The two leaf tree is encoded by the empty word. For , consider , then encodes the tree whose root’s right child is a leaf, and whose root's left child is the root of the subtree encoded by . Similarly, if , encodes the tree whose root’s left child is a leaf and whose root's right child is the root of the subtree encoded by . For a path tree whose encoding is , the deletion is the tree whose encoding is . Note that is the 2-leaf tree for any . Several iterations of the deletion map on the path tree with word encoding are shown in Figure 5.
Theorem 9. Given -leaf path tree , let Then where and if is the left subtree of and and if is the right subtree of .
Observe that setting causes every catalytic variable to drop out, leaving .
Proof. In this generating function, the weight of a given tree is . Clearly, for , the one-leaf tree .
Now, for other trees we see that each copy of some () either (a) is contained entirely in the left subtree of , (b) is contained entirely in the left subtree of , or (c) includes the root of . The weight-enumerator for copies of () covered in cases (a) and (b) is . If the word representation of begins with L, a copy of including the root consists of the root, the two edges emanating from the root, and a copy of in the left subtree of . The contributions keep track of copies of formed in this way. Similarly, the contributions keep track of copies of that include the root of when ’s word representation begins with . The factor keeps track of the copy of the two-leaf tree, , that includes the root of .
For example, if is the tree in Figure 5, we have
For even larger trees, we obtain even more complicated functional equations which are hard to solve in general but straightforward to extract initial terms from via the computer.
For nonpath trees, the interaction of the left and right subtrees of makes this computation more tedious. Analysis of for for small path trees appears in the following section. A parallel argument holds for -ary path tree containment, although it requires complicated notation for the and terms.
5. Pattern Containment of Small Trees
We have already seen that when is the one-leaf tree, and if . Similarly, we know that when is the two-leaf tree and , then and if . If is the left-right reflection of tree , then for any and since if tree contains copies of , then contains copies of . This means that we need only to consider one three-leaf tree ( in Figure 1) and three different four-leaf trees (, , and in Figure 1) to completely classify tree patterns with at most four leaves.
5.1. Containing a 3-Leaf Tree
Following the result of Theorem 9, we have that and .
Each of these formulas can be proved directly by case analysis. In general, has a rational ordinary generating function with denominator , but the numerator has increasingly many terms as increases.
5.2. Containing a 4-Leaf Tree
4-leaf trees provide the first opportunity to consider trees with an equal number of leaves that are not reflections of one another. We must consider three different tree patterns for a complete analysis. Two of these three trees fall under the scope of Theorem 9.
For , we have the functional equation
for is new to the OEIS, but each of the sequences above is referenced as the number of 132-avoiding permutations of a given length containing a particular number of copies of the pattern 123. We will see more about this connection to pattern-avoiding permutations in Section 6.
For , we have the functional equation
Particular sequences are as follows. (i)As expected, for . (ii) for (OEIS A001792).
for is new to the OEIS. shows up in a number of combinatorial contexts from compositions to the game of Hex. Also, notice that when .
is not a path tree, and thus requires other techniques. If we consider the polynomial , we obtain the recursion below. Consider Here, counts the number of leaves to the left of the root, accounts for copies of including the root of , and (resp. ) accounts for copies of entirely contained in the left (resp. right) subtree of .
Particular sequences are as follows. (i)As expected, for . (ii) for (ogf ). (iii) for (ogf ). (iv) for (ogf ). (v) has ogf .
In fact, it is clear that for any fixed and sufficiently large , . This is because there is a finite number of ways to arrange exactly copies of before the only option is to take an ()-leaf tree with copies of and make it to be either the left subtree or the right subtree of a new -leaf tree. The numerators of the ordinary generating functions for for fixed have increasingly many terms as grows larger.
Larger nonpath trees introduce additional difficulties. Counting copies of the left (resp. right) subtree of is equivalent to counting single vertices. Counting copies of a nonpath tree that includes the root is more complicated when either subtree is larger.
We end this section with a conjecture. Further computational data suggests this is the case, but settling this question in general remains an open problem.
Conjecture 10. for all and if and only if .
6. Connections to Pattern-Avoiding Permutations
Several sequences obtained by counting trees that contain noncontiguous binary tree patterns are already known in the literature for pattern-containing permutations. In this section, we make the relationship between trees and permutations explicit.
To this end, let denote the set of permutations of length . As in the introduction, given and we say that contains as a pattern if there exist indices such that if and only if . Let avoid and . For example, for since the only way to avoid the pattern 12 is to be the decreasing permutation of length . It is also well known that if , then where is the th Catalan number.
The following theorem provides an initial relationship between pattern-avoiding trees and pattern-avoiding permutations that we seek to expand.
Theorem 11 (Dairyko et al. ). Let be any binary tree pattern with leaves. Then
In fact, a stronger statement is true. It is well known that the set of binary trees with leaves is in bijection with the set of permutations of length which avoid the pattern 132.
To see this, label the root of tree with the label . Now, suppose that there are internal vertices to the right of the root and internal vertices to the left of the root. The vertices on the right will receive labels from the set and the vertices on the left will receive labels from the set . For each subtree, give the root the largest available label and continue recursively until each internal vertex has been labeled.
Now, there is a natural left-to-right ordering of the vertices of ; in particular, for each vertex , all vertices in ’s left subtree are to the left of and all vertices in ’s right subtree are to the right of . Read the labels of the vertices from left to right to obtain a permutation . Necessarily, avoids because all labels to the left of a given vertex have larger labels than all labels to the right.
This correspondence between 132-avoiding permutations and binary trees is not new. If one ignores the leaves in our trees, the bijection given above is a symmetry of the correspondence between postorder-labeled trees with inorder-read permutations found in . Further work connecting permutations to binary trees in the context of sorting can be found in [13–18].
To make this result even stronger we turn to mesh patterns. Mesh patterns were introduced by Brändén and Claesson  in a search for more compact expressions for various permutation statistics. They were later generalized by Úlfarsson  to unify the results for permutation patterns used in characterizing Schubert varieties and in analyzing stack-sortability.
The graph of permutation is obtained by plotting the points in the Cartesian plane. If contains as a classical pattern, as defined above, then the graph of has rows and columns whose points appear in the same arrangement as the points in the graph of .
The graph of a mesh pattern is the graph of a classical permutation with some squares in the graph shaded. For example, the graph of 132 and the graph of a mesh pattern with underlying permutation pattern 132 are shown in Figure 6. A copy of a mesh pattern is a copy of the underlying classical pattern but where no points appear in the shaded regions. A permutation is said to avoid a mesh pattern if it contains no copies of the mesh pattern. For example, the permutation 2413, whose graph is also shown in Figure 6, contains 132 as evidenced by the subsequence 243. However, 2413 avoids the mesh pattern shown because there is no copy of 132 where all gray regions are empty; in particular, the only copy of 132 is given by the subsequence 243, but the point for the digit 1 appears in the gray strip at the bottom of the mesh pattern.
The bijection above between binary trees and 132-avoiding permutations associates each tree with a classical permutation pattern in a natural way. However, sometimes the pattern corresponding to a particular tree may embed in a larger permutation without the tree pattern being embedded in the corresponding larger tree. For example, the permutations 3241, 3421, and 321 and their corresponding trees are shown in Figure 7. Notice that while 3241 contains a copy of 321, the corresponding tree does not contain a copy of the 4-leaf right comb. Also, while 3421 contains precisely 2 copies of the permutation pattern 321, the corresponding tree only contains one copy of the 4-leaf right comb. We repair this discrepancy by associating trees with mesh patterns.
The discrepancy between tree patterns and permutation patterns occurs precisely when pattern has a descent, that is, a pair of adjacent elements such that . A descent in a permutation pattern can be embedded in a tree either as one vertex being the right child (or right descendant) of another vertex or as one vertex being in the left subtree and the other in the right subtree of a third vertex. For example, in Figure 7, when 3241 contains the permutation pattern 321, the descent 32 embeds with the 2 vertex as right child of the 3 vertex, while the descent 21 embeds in two separate subtrees of the 4 vertex.
To prevent the split of a descent between two subtrees, we associate each tree pattern with a mesh permutation pattern in the following way.(1)Given tree pattern , compute , the permutation given by the vertex-labeling bijection above.(2)Construct the permutation graph of .(3)For each descent in , shade all squares between and above the two points involved in the descent. Call the resulting mesh pattern .
Now, copies of in permutation correspond precisely to copies of tree in since the possibility of splitting between left and right subtrees, without using the root, is removed. Figure 8 shows this correspondence for the 4-leaf tree patterns and Figure 9 shows the correspondence for an even larger tree pattern.
Now, using this map from tree patterns to mesh patterns we obtain the following stronger version of Theorem 11.
Theorem 12. Let be any binary tree pattern with leaves. Then
In particular, this restatement gives a set of Wilf-equivalent pattern sets of the form for any integer , and furthermore, since the increasing pattern corresponding to the -leaf left comb has no descents, each of these is pattern pairs equivalent to the classical pattern pair .
We also obtain a stronger statement for pattern containment once we augment our current notation for permutation patterns. Because of the bijection between trees and 132-avoiding permutations, we are concerned with permutations in . Now, let We saw above that . In fact, the correspondence given above yields the following result of which Theorem 11 is a special case.
Theorem 13. Let be any binary tree pattern with leaves. Then .
Because of this correspondence, counts 132-avoiding permutations with copies of 12, counts 132-avoiding permutations with copies of 123, and so on.
Further, Theorem 3 causes us to revisit the question of the total number of copies of a given pattern within the set of all length permutations. To this end, let be the number of copies of pattern in .
Corollary 14. Given an integer , there exist mesh patterns for which .
This corollary provides a hidden symmetry to Bóna’s result in that there is a mesh pattern associated with each 4-leaf tree for which .
Conflict of Interests
The authors declare that they have no conflict of interests regarding the publication of this paper.
Research was supported by the National Science Foundation (Grant no. NSF DMS-0851721).
P. Flajolet, P. Sipala, and J. M. Steyaert, “Analytic variations on the common subexpression problem,” in Automata, Languages and Programming, vol. 443 of Lecture Notes in Computer Science, pp. 220–234, Springer, Berlin, Germany, 1990.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. Even, Graph Algorithms, Computer Science Press, Woodland Hills, Calif, USA, 1979.View at: MathSciNet
D. E. Knuth, The Art of Computer Programming: Sorting and Searching, vol. 3, Addison-Wesley, Reading, Mass, USA, 1973.View at: MathSciNet
R. P. Stanley, Enumerative Combinatorics, vol. 2, Cambridge University Press, Cambridge, UK, 1999.View at: MathSciNet