The Distribution of the Size of the Union of Cycles for Two Types of  Random Permutations

Lengyel, Tamás

doi:https://doi.org/10.1155/2010/751861

International Journal of Combinatorics

On this page

Abstract Introduction Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2010 | Article ID 751861 | https://doi.org/10.1155/2010/751861

The Distribution of the Size of the Union of Cycles for Two Types of Random Permutations

Tamás Lengyel¹

Academic Editor: Johannes Hattingh

Received28 Aug 2010

Accepted05 Oct 2010

Published24 Oct 2010

Abstract

We discuss some problems and permutation statistics involving two different types of random permutations. Under the usual model of random permutations, we prove that the shifted coverage of the elements of , 2, , of a random permutation over , 2, , ; that is, the size of the union of the cycles containing these elements, excluding these elements themselves, follows a negative hypergeometric distribution. This fact gives a probabilistic model for the coverage via the canonical cycle representation. For a different random model, we determine some random permutation statistics regarding the problem of the lost boarding pass and its variations.

1. Introduction

Permutation statistics play an important role in exploring structural properties of random phenomena from a statistical point of view and often do in the context of the analysis of algorithmic applications. We select a permutation of at random with denoting the element to which gets mapped in the permutation. Sometimes, the mapping is also denoted by . Each permutation can be uniquely expressed as the product of disjoint cycles. A cycle is an ordered subset of the permutation whose elements cyclically trade places with one another; for example, the permutation 3241 is the product of the (134) 3-cycle and the () 1-cycle. Here, the notation (134) means that starting from the original ordering 1234, the first element is mapped into the third, the third into the fourth, and the fourth into the first, while the second stays unchanged.

Let denote the set of elements in the cycle that contains element of a random permutation of ; that is, . We also use the notation for the (set) union of cycles containing all of the elements in (thus, in fact, is a shortcut for ). Sometimes, we write in its cycle form when we want to emphasize the actual cyclical order in the cycle and not only its membership.

In the standard model, all permutations are assumed to be equally likely among the permitted permutations. If all the permutations are permitted, then one of the most useful tools in analyzing these permutations is a correspondence known as the canonical cycle representation (although often an alternative version is referred to as canonical cycle form): in the cycle product form of the permutation, write each cycle such that it ends with its least element and order these cycles so that these last elements of the cycles are in increasing order. This correspondence does not seem to match any real life situation. (Had we ordered the cycles in decreasing order or in increasing order but according to their largest elements, then this process models how new records are achieved.) Note that one might benefit from the canonical cycle representation even if the permutations are not equally likely (cf. proof of Theorem 3.9).

Except for Corollary 2.7, membership problems are well known for the standard model (cf. [1], Section 3). We prove that the size of the union of the cycles containing any prescribed elements excluding these elements themselves follows a negative hypergeometric distribution. For completeness, we also included the well-known results and their short standard proofs in Section 2.

Note that in the second random permutation model, the cycle structure is simple with one nontrivial cycle. The distribution of its size is determined in Remark 3.4 and Theorem 3.6. The number of valid permutations for a generalized version is given in Theorem 3.9.

Theorems 2.3, 2.4, 3.5, 3.6, 3.9, and Corollary 2.7, as well as the distributional observation given in identity (3.5) and its consequences in the second solution of the lost boarding pass problem seem to be new. We also provide a straightforward proof of Theorem 3.2. Identity (3.3) also appeared in [2].

We note that [3] contains a great selection of results regarding random permutation statistics by using tools of analytic combinatorics mainly applied to bivariate generating functions.

Apparently, the second model has the flavor of a statistical resampling method whose purpose is to exchange labels on data points when performing significance tests.

2. The Union of Cycles with Given Membership and Its Cardinality

Now, we discuss membership issues by using the canonical cycle form.

Theorem 2.1. We have that with .

Proof. Without loss of generality, we answer the question for . We use the canonical cycle form defined in Section 1: is related to the position of 2 relative to that of 1. In fact, 2 belongs to the cycle of element 1 exactly if 2 comes before 1 whose probability is 1/2.

The previous question on has an equivalent form . We can generalize this to cycles about shared membership.

Theorem 2.2. For , we have that and the probability remains the same for any distinct indices.

Proof. Similarly to the proof of Theorem 2.1, the elements , and belong to the cycle of element 1 exactly if , and each comes before 1; that is, is the probability that in a random permutation 1 is the last of these numbers which is . The probability is not affected by the choice of the indices by symmetry.

Now we find, in probabilistic terms, how much of is covered by taking the union of the cycles that include any given elements, for example, .

Theorem 2.3. For all we get that

With some calculations, we get the expected size and the variance of the size of the coverage in

Theorem 2.4. We have that For any fixed , as , we get that

Remark 2.5. The case with is trivial since 1 can go to equally likely positions in the canonical representation. With the indicator variables , exactly if is in the cycle , we have , since and if by Theorem 2.1, and hence . Note that the s are not independent since , for , by Theorem 2.2. Indeed, as we have just mentioned has uniform distribution rather than binomial.

Remark 2.6. Another interesting special case deals with , that is, determining the probability that the combined cycles covering preselected elements cover the whole set . Surprisingly, the most likely number of elements covered is for as we will see it in (2.5) of the proof below.

Proof of Theorem 2.3. We use the canonical cycle form again: we can characterize all permutations with the required property. We start with the case in which (cf. Remark 2.6). The elements cover the whole set exactly if the last element of the permutation in the correspondence is not greater than which has a chance of .
In the general case, if the union consists of elements, then the probability of this happening is by the previous case, and the remaining positions can be filled ways, out of which will have the smallest elements in the union, and this completes the proof of the first part of the identity (2.2).
In order to prove the second part of the theorem, we note that, for , the distribution of the random variable is, perhaps somewhat surprisingly, tail heavy on the right; that is, is getting larger as is getting closer to . We observe that For , we get that the ratio is 1, and the coverage follows the uniform distribution over . For , the ratio exceeds 1. It is easy to see that is a linear function in , and in general, as a function of over the integers is a polynomial of degree by (2.5). More precisely, it follows by (2.5) and that (This fact also guarantees that the sequence , is strictly concave up for .)

It follows that the distribution of the shifted coverage belongs to the family of negative hypergeometric distributions. This observation gives a probabilistic model for the coverage via the canonical cycle representation of permutations.

Corollary 2.7. For the shifted coverage, one has that NegativeHypergeometric[k,n-k;k].

Now, we present an alternative direct proof of the corollary without using (2.7).

Proof of Corollary 2.7. We have black and white balls. We select balls without replacement until all black balls have been selected, and, at this point, let denote the number of white balls selected. Clearly, NegativeHypergeometric[]; that is, . With , we get that , for the distribution of the total number of selected balls (including the black ones). Note that this probability remains the same if we distinguish the balls of the same color; for example, individually label the black and white balls in two separate groups, since it introduces an extra factor in both the numerator and denominator. We can find a correspondence between the black and white balls, and the sets and , respectively. First, we index the black balls from 1 to and the white balls from to . We set to be the index of the first ball. We also record the color of the ball in every step. If the index is different from 1, then we go on and pick the next ball whose index is to be assigned to . We keep picking elements until for some . Let be the smallest index in not in . If such an exists, then we set to be the index of the next ball and keep continuing until the cycle is completed. We set to be the smallest index in not in . If such an exists, then we set to be the index of the next ball and keep continuing until is completed, and so forth. The process goes on until no such exists. Note that the coverage of consists of black balls and white balls.
For , is the number of elements of the first cycle in the canonical cycle representation before 1 is included; that is, it is the number of white balls before the only black ball is selected. For a general , counts the white balls before the set is completely included.

3. Some New Results Regarding the Lost Boarding Pass

Now, we discuss a popular problem and some of its variations involving random permutations of a different type. We answer questions similar to those addressed in Section 2 and derive some other properties regarding certain generalizations.

The problem of the lost boarding pass (see [4, page 35]). One hundred people line up to board an airplane, but the first has lost their boarding pass and takes a random seat instead. Each subsequent passenger takes their assigned seat if available, otherwise a random unoccupied seat. What is the probability that the last passenger to board finds their seat occupied?

We assume that the passengers and the seats are ordered and the th passenger takes the seat . In our original standard model of random permutations, each passenger would randomly pick their seat while here the th passenger's assigned seat is but then the first one picks their seat at random, out of necessity. The other passengers make random choices only if their seats are occupied. In a more general context, if stands for the number of passengers who have lost their boarding passes, then the original model corresponds to , while here with the understanding that the first passenger lost their pass. Remarks 3.3 and 3.7 below make it clear that it is irrelevant which of the first passengers lost their pass. Of course, if only the last passenger lost their boarding pass, then even they will be properly seated. On the other hand, for larger , the actual -element set of passengers without boarding passes does matter. Note that if the last passenger finds their seat occupied, then they end up at a seat assigned to one of the passengers in . In summary, this model can be viewed as a generalization of the original one with . We focus on the case with , and thus .

Remark 3.1. By (2.1), we would get that . However, it turns out that the random permutation model used for establishing (2.1) does not work here. In fact, changes with , for example, and as we will see below in the solution (when we calculate ). The structure of the cycles is also different from the above random model, and the resulting permutation is far from being typical: it has a long cycle (of asymptotic average length , cf. Remark 3.4), and otherwise, only one-element cycles (fixed points). It is easy to see that there are such permutations for , since we have ways to construct the nontrivial cycle with size . In a similar fashion, half of these permutations have the last passenger finding their seat occupied. Note, however, that the permutations occur with very different probabilities depending on not only the size of but its membership according to Theorem 3.5 (notwithstanding, it also gives the answer 0.50 to the problem of the lost boarding pass by the fact that (3.3) has a last factor of one exactly if the last passenger finds their seat already occupied, and thus, making ).

We find the answer by presenting two approaches. There are other solutions using recurrence relations or other techniques. The “definitive" solution is given in [4, pages -] although it is of no help with respect to distributional concerns. (Winkler's solution requires no calculations and is based on conditionalizing with two competing targets. It also shows that references to the actual permutation can be completely dropped.) Regarding the case , we also mention an interesting result by Misra [2] and determine the number of valid permutations.

The first approach is based on finding the probability that passenger finds their seat occupied. We note that the cycle of 1 is quite special: it looks like with , , , and ; thus, the elements of the only nontrivial cycle are in a monotone increasing order. (We note that by assuming that a little argument erupts whenever a passenger with a boarding pass finds their seat occupied, then the first passenger can figure out their original seat assignment since their seat is taken by the last upset person. The distribution of the waiting time for obtaining this information is implicitly discussed in the second solution since , and the probability that none of the passengers get upset is .) One might ask about the chances that the second, the third, and so forth, passengers will find their own seats occupied rather than simply the last one. With , we get that , , and . With the notation meaning that passenger takes the seat of passenger : and In a similar way, . More complicated calculations lead to . Indeed, in general, we encounter exactly the various terms in the expansion of by properly conditionalizing with respect to all seat selections leading to . We have just proved

Theorem 3.2.

In particular, we get , which provides the solution to the problem.

Remark 3.3. We note that depends on the actual set of passengers who lost their boarding passes if . The above derivation assumes that and the first passenger lost their pass. However, it is easy to see that if it is the th rather than the first passenger who is without a boarding pass, then we can simply ignore the first passengers since they are properly seated. With this reduction in the size of the actual problem, the original answer is preserved in Theorem 3.2 for every , since we simply shift both and by to the left. Clearly, . In contrast, if , then things get more complicated. In general, we consider the condition which represents the event that exactly the passengers in lost their boarding passes. In this case, the calculation of the conditional probability that the th passenger finds their seat occupied may depend on the choice of . For example, if , then the corresponding conditional probability for the last passenger with is or depending on whether , that is, the first , or , that is, the last passengers lost their boarding passes. Of course, makes no sense since the last passenger has no choice and takes the only empty seat left; therefore, we assume that . Another example with a small : if , then with , and . On might speculate that depends only on an ordered subset of .

Remark 3.4. Now, we can find the average number of people who pick their seats at random. The average is the harmonic number . In fact, we set the indicator variables exactly if is in the cycle . We get that since , and thus, .

We can also derive the following theorem on the probability of membership and shared membership, similar to Theorems 2.1 and 2.2 of the original standard random permutation model.

Theorem 3.5. We have that with . Also, for the full nontrivial cycle, with arbitrary , we get that It is a trivial lower bound on , as we might skip some indices of the cycle and can be less than , and an equality if , that is, .
In general, for with , we have and thus, the probability depends on the choice of indices.

Proof. The proof is based on the way random selections are made. Identity (3.4) follows by the repeated applications of Theorem 3.2. Indeed, , since we can recast the problem as if the nontrivial cycle started with the th passenger rather than the first one.

As a consequence of Remark 3.4 and Theorem 3.5, now we can determine the exact distribution of since the construction of corresponds to that of consecutive records (cf. [5]).

Theorem 3.6. For the distribution of the size of the nontrivial cycle, one gets that , with being the unsigned Stirling number of the first kind indicating the number of permutations over an -element set with exactly cycles.

Proof. We observe that the variables introduced in Remark 3.4 are independent random variables according to (3.4). The proof can be finished by the usual argument used for proving that the positions of the records form an independent sequence of random variables (cf. [6]). Note that for the usual records, position is that of a record with probability . But here, for the s, the order of the indices is reversed between 2 and . We add that the standard proof that the distribution of the number of records and cycles is the same uses the alternative canonical cycle representation mentioned in the introduction.

The second approach is based on calculating the probability that exactly the last passengers will get their own seats. We can rephrase Remark 3.4 by saying that the number of people getting their own seats is on the average. A portion of this number is formed by people with consecutive indices immediately following the largest index in . This is the point after which people take their own seats one-by-one and follow the proper seating from then on. In fact, if , then exactly the last passengers (without any gap) will get their own seats. Depending on , we get that The reason for is that it suffices to focus on . As this cycle develops, at any stage of the selection, the current passenger faces two competing targets: the seat assigned to the first passenger and the set of seats assigned to the last passengers. The cycle is either finished by the first choice with probability or it will move to a seat assigned to one of the last passengers with probability . We note that a similar argument shows up in the proof in [4] with ; however, here, we are interested in the complete distribution .

We have that by (3.5) which solves the problem, in agreement with the above result.

We note that clearly, and , for not just the permutation of the single cycle but all permutations which include 1 and in the same cycle should be accounted for. Of course, the case with is impossible; thus, . It is worth mentioning that the question regarding another permutation statistic, the maximum element of a cycle is answered by : for and .

The expected number of passengers properly seated beyond the th passenger defined above is . In fact, Thus, on the average, there are passengers who are seated at random, and there are asymptotically passengers who happened to take their seats before the procedure settles to proper seating with asymptotically people involved in it.

Remark 3.7. Note that the answer does not depend on . Also, we can start anywhere: it is irrelevant which of the first passengers lost their boarding pass; the answer is simply , if the th passenger lost their pass.

With a proof based on equivalences due to exchanging pairs of seats and (3.3), Misra [2] obtained the following result.

Theorem 3.8 (see [2, Theorem ]). The probability that the last passenger finds their seat when the first passengers seat themselves at random is , that is,

In other words, we can create the ultimate chaos by letting the passengers without their boarding passes enter the plane first; thus, and which agrees with our answers based on Theorem 3.2 if and the ones in Remark 3.3.

Of course, in real life, these passengers should enter last, since this way they will have a better chance to guess their seats correctly. In fact, this group of people will never know what was their original seat assignment. They might not even care about it and simply choose one of the remaining empty seats as it was theirs in the first place; thus, all of them are happily seated at the end. Not to mention that this way the passengers with boarding passes will not be aggravated by finding their seats occupied.

We note that this kind of seating arrangement seem to be easier to analyze. For instance, an even better explanation comes for the permutation count , mentioned in Remark 3.1, from applying the canonical cycle representation: for , there are two ways to put , either before or after 1, corresponding to the options and , respectively; otherwise, the orders are determined. In a similar fashion, relative to the position of 1 and 2, we find that there are , valid permutations if . In general, we get

Theorem 3.9. For , the number of valid permutations is if .

Proof. The proof uses the canonical cycle representation. Any ordering of the elements of uniquely determines the ordering of the other elements once their relative positions with respect to are chosen. In fact, for each number , there are segments relative to the elements of to hold . The segments ending with the elements of correspond to the cycles, and thus, within the segments, the order of these numbers is set by the nature of the seat selections.

Note that this argument does not directly apply to a set with a gap. For example, if , then the cycle ; hence, the ordering in the first segment cannot occur.

Acknowledgments

The author wishes to thank Gregory P. Tollisen and the referees for helpful suggestions and comments.

References

L. Lovász, Combinatorial Problems and Exercises, North-Holland, Amsterdam, The Netherlands, 1979.
N. Misra, “The missing boarding pass,” Resonance, vol. 13, pp. 662–679, 2008.
View at: Publisher Site | Google Scholar
P. Flajolet and R. Sedgewick, Analytic Combinatorics, Cambridge University Press, Cambridge, UK, 2009.
P. Winkler, Mathematical Puzzles: A Connoisseur's Collection, A K Peters Ltd., Natick, Mass, USA, 2004.
D. Collins, T. Lengyel, and G. P. Tollisen, “Keeping scores,” Journal of Statistical Planning and Inference, vol. 137, no. 7, pp. 2208–2213, 2007.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. Blom, L. Holst, and Dennis Sandell, Problems and Snapshots from the World of Probability, Springer, New York, NY, USA, 1994.
View at: Zentralblatt MATH

Copyright

Copyright © 2010 Tamás Lengyel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

772

Downloads

554

Citations