Research Article | Open Access
Frosso S. Makri, Zaharias M. Psillakis, "On the Expected Number of Limited Length Binary Strings Derived by Certain Urn Models", Journal of Probability, vol. 2014, Article ID 646140, 6 pages, 2014. https://doi.org/10.1155/2014/646140
On the Expected Number of Limited Length Binary Strings Derived by Certain Urn Models
The expected number of 0-1 strings of a limited length is a potentially useful index of the behavior of stochastic processes describing the occurrence of critical events (e.g., records, extremes, and exceedances). Such model sequences might be derived by a Hoppe-Polya or a Polya-Eggenberger urn model interpreting the drawings of white balls as occurrences of critical events. Numerical results, concerning average numbers of constrained length interruptions of records as well as how on the average subsequent exceedances are separated, demonstrate further certain urn models.
1. Introduction and Preliminaries
Recently, some researches associated with the number of patterns which consist of runs of zeros (s) between subsequent ones (s) in sequences of binary random variables (RVs) have appeared in the literature. The 0-1 sequences may have several internal structures including among others sequences of independent but not necessarily identically distributed (INID) RVs, with , , and , , and sequences of exchangeable (EXCH) or symmetrically dependent RVs, the joint distribution of which is invariant under any permutation of its arguments; that is, for any and any vector , , it holds that for any permutation of the set and . A common ground for both INID and EXCH sequences is sequences of independent and identically distributed (IID) RVs with probability of s , , since an IID sequence is an INID sequence with or an EXCH sequence with .
In population genetics and evolution of species, urn models are frequently used as probabilistic models/devices to explain/apply some theories. Among the plethora of such models (see, e.g., Johnson and Kotz , Blom et al. , and Mahmoud ) we consider in the sequel two of them: the Hoppe-Polya urn model () and the Polya-Eggenberger urn model (). The first one, introduced by Holst [4, 5] as a generalization of the Hoppe urn model, is a device to produce certain INID binary sequences whereas the second one supplies a mechanism for producing particular EXCH binary sequences.
Special cases of are models of a () fixed/random threshold (see, e.g., Eryilmaz and Yalcin , Makri and Psillakis , and Eryilmaz et al. ), whereas a special case of is the (RIM) record indicator model (see, e.g., Holst [5, 9, 10], Demir and Eryılmaz , and Makri and Psillakis ). and find potential applications in the frequency analysis and risk managing of the occurrence of critical events (records, extremes, and exceedances) in several scientific disciplines like physical sciences (e.g., seismology, meteorology, and hydrology) and stochastic financial analysis (e.g., insurance and financial engineering). For details see the cited works.
In this paper we wish to use the expected number of an overlapping enumerative statistic (RV) as an index of the average occurrence of some special events over time or space. The statistic counts 0-1 strings of a limited length on sequences derived by the urn models which are discussed in Section 2. In Section 3 we first define the statistic on any 0-1 sequence and then we provide its expected value for any INID and EXCH 0-1 sequence. Accordingly, the latter fact covers as particular cases the and , respectively. Finally, in Section 4 certain and associated with interpretations of special events, specifically the RIM and , are considered. Extensive numerics, concerning the average number of limited length interruptions of records and how on the average subsequent exceedances are separated, demonstrate further 0-1 sequences related to RIM and .
Throughout the paper is the indicator of an event ; that is, if the event occurs, and otherwise and stands for the greatest integer less than or equal to a real number . Also, we apply the conventions , , for ; that is, an empty sum (product) is to be interpreted as a zero (unity).
2. 0-1 Sequences Derived by Certain Urn Models
An urn that initially contains white and black balls, , , is considered. A ball is drawn at random (all balls being equal likely) from the urn and its color is observed. We describe two drawing (or sampling) schemes that yield binary sequences different in structure. To this end we suppose the following.(i)Every time a white ball or a black ball is drawn from the urn, it is replaced together with a new ball of a color not already in the urn; otherwise the ball is replaced together with a copy of it. This drawing scheme is called Hoppe-Polya urn model and it is denoted as . If , that is, there is no black ball, the so-called Hoppe urn model is obtained.(ii)For , we assume that all balls emanating from a draw of a white or a black ball are colored white or, respectively, black. Every time a ball is drawn from the urn, it is replaced together with (: an integer) balls of the same color of the ball drawn. This scheme is the well-known Polya-Eggenberger urn model denoted as . Certain values of the parameter that define various sampling schemes of special interest are (sampling without replacement), (sampling with replacement), (Polya sampling), and .
We note in passing that in there is a fixed set of admissible colors that appear in the urn, whereas in an infinite set of admissible colors is considered. Interpreting colors as species, the Darwinian theory and the Mendelian theory of evolution of species may be captured, respectively (for details see Mahmoud ).
A -valued sequence of RVs where 1 denotes the drawing of a white ball and 0 anything else in a or a , is considered. Then constitutes an INID sequence with for and an EXCH sequence with for , respectively. For , it holds and hence, an IID sequence with probability of 1s is derived.
Next, we consider certain 0-1 stochastic processes describing the occurrences of critical events which are associated with particular and . The connection between the urn models and the model processes is achieved by interpreting the occurrence of a critical event () as a drawing of a white ball from the related urn model. More explicitly we have the following models.(A)Let be a sequence of IID RVs with continuous cumulative distribution function (CDF). Then the (upper) records of the sequence are those elements with values strictly larger than all previous ones. We define the sequence of record indicators or record indicator model (RIM) and , . Then constitutes an INID sequence derived by repetitions of a . By symmetry lower records of are obtained by considering the (upper) records of the sequence . Namely, they are s with .(B)We assume that being associated with a process with continuous CDF, occurs at a step if a value of the process exceeds a threshold (i.e., a critical level) at that step. In this way, a -valued sequence , , is generated depending on the type of the threshold (fixed or random) used. Examples of model of exceedances over fixed or random threshold are among others as follows. (a) If is fixed and ’s are IID RVs, then the so-called fixed threshold model (FTM) is derived. (b) Suppose that two independent samples of size (≥1) and (≥1), each one of independent elements, of the process , are considered, specifically the samples and . The th order statistic of the first sample is chosen as a random threshold; that is, , . Then the model sequence which is generated by this random threshold model (RTM) is , .
Both models, FTM and RTM, are derived by repetitions of a with initial composition and such that . Specifically, FTM corresponds to (drawings with replacements so that the urn composition remains constant) and , whereas RTM corresponds to (drawings such that the number of balls in the urn increases) and , , so that . If , so that , , with remaining fixed, then repetitions of a give same results as repetitions of a . This is so because as , so that , then for fixed the Polya-Eggenberger distribution has as a limit the Binomial distribution with parameters and (cf. Johnson and Kotz , page 371).
3. Counting 0-1 Strings of a Limited Length
In frequency analysis of 0-1 sequences , like the ones derived by the models (A) and (B) or more generally of that associated with and , may be useful to define a statistic (RV) denoting the distance (or the time) between two subsequent drawings of white balls. Measuring this distance as the number of consecutive 0s (nonwhite balls) between subsequent 1s (white balls) we equivalently face the problem of counting runs of 0s of a limited length between subsequent 1s. The number, , of such 0-1 strings of limited (or constrained) length , , in the first (≥2) 0-1 trials (i.e., ball drawings) can be written as where Readily, for , . The support (range set) of is It should be mentioned that the counting of 0-1 strings via is considered in the overlapping sense; that is, 1 which is not at either end of the sequence may contribute towards counting two possible strings, the one which ends with the occurrence of it and the next one which starts with it.
The study, via , of constrained strings of the form where 1 is followed by at least and at most 0s before the next 1 in a 0-1 sequence covers as particular cases strings that have been considered recently by several authors. Specifically, , , and enumerate strings of two subsequent 1s separated by a run of 0s of length at most, exactly, and at least equal to a nonnegative integer number, that is, strings of the form , , and , respectively. Some relevant contributions on the subject are the works of Sarkar et al. , Sen and Goyal , Holst , Dafnis et al. , Huffer and Sethuraman , and Makri and Psillakis .
Applications of constrained strings of the general () or the restricted forms (, , ) were found in information theory and data compression (see Zehavi and Wolf , Jacquet and Szpankowski , and Stefanov and Szpankowski ) in urn models, record models, and random permutations (see Chern et al. , Joffe et al. , Chern and Hwang , Holst [5, 9, 10, 23], and Huffer et al. ) in system reliability (see Eryilmaz and Zuo , Eryilmaz and Yalcin , and Makri ) and in biomedical engineering (see Dafnis and Philippou ).
Recently, Makri and Psillakis  obtained the probability mass function , as well as the expected value of defined on any 0-1 INID or EXCH sequence. Accordingly, with given by (5) and (6), it holds that for a and for a , respectively.
Since , it holds that Equation (10) offers an alternative way to compute in cases for which we have available , , and , . In this direction the result of Holst  for a which is as follows: where is helpful. Furthermore, he reestablished that is a Poisson RV with when tends to infinity.
As it is observed from (8) and (9), both expressions of are closed formulae proper for applications concerning large 0-1 sequences where may be used to extract information for their stochastic behavior. For instance, is a potentially useful device to a trader (a human or a computer system) making transactions in a certain capital market (e.g., a stock market, a bond market, or an exchange market). It is assumed that the trader relies his/her/its buy-sell strategy on the expected value of , whereas the interpretation of what counts depends on the particular sequence on which is defined.
4. Numerical Results
In this section we consider two numerical examples that clarify further the theoretical results of Sections 2 and 3. In both examples 0-1 sequences of large or very large size are concerned. Moreover, the selected values of the urn parameters demonstrate the flexibility of the models as well as their applicability for a variety of situations.
Example 1. In a RIM sequence of length , , helps in understanding how the rather few records , with (see, e.g., Blom et al. , page 110) where (0.577216) is the Euler’s constant, are distributed among the many nonrecord elements of the sequence. Consequently, indicates how many on the average strings of two subsequent records interrupted by at least and at most nonrecords occur in such a sequence.
In Table 1 we compute, via (8), for various values of . The RIM sequences used have lengths , , , and . They approximately contain on the average 5, 7, 10, and 12 records, respectively. The entries of the table confirm that for it holds , . Hence, , , because of (10) and the Poisson approximation of (11). Consequently, a precompiled computation of , , for large enough , yields approximately for , too. For instance, let , let , and let . Then by the respective entries of the table we find , whereas a direct computation, via (8) or the usage of the exact value in (10), gives . That is, the relative error between and is . The value means that in a 0-1 sequence of elements containing on the average 12 records occur on the average 5 strings of 2 subsequent records separated by at least 10 and at most 998 nonrecord elements.
Example 2. We consider a sequence of length , , with an exceedance probability , . In a RTM there are infinitely many different selections of the parameters and which give the same value of . Let , be positive integers such that and be their greatest common devisor. Then for every pair such that , , for is constant. Therefore, for a given what values (small, moderate, or large) of the size and the threshold should be used depends on the demanded consideration (short, middle, or long) of the past sequence’s data. Readily, in a FTM sequence there is not such a need since . In both sequences (FTM or RTM) the expected value of might be helpful in predicting how many on the average strings of two subsequent exceedances (e.g., of a stock prices) over a threshold (fixed or random) are separated by at least and at most nonthreshold exceedances.
In Table 2 we compute, via (9), the expected value of defined on a and on a with the same exceedance probability , for various values of . The sequences which have been used have lengths , , and and sets of pairs that give a selected value of are depicted for , , , and , as well. The entries of the table confirm that, for large , the dependence in structure of a RTM relaxes loosing its sensitivity with respect to the initial sample sequence’s size and threshold. Eventually, it degenerates becoming a FTM and therefore tends to . This situation is more apparent if is smaller than or equal to and . If is larger than and , the expected value departs, even a little, from that of with the same showing that in this case the dependency of RTM on its and counts. For instance, this is the case for , , and , , and .
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
- N. L. Johnson and S. Kotz, Urn Models and Their Applications, John Wiley & Sons, New York, NY, USA, 1977.
- G. Blom, L. Holst, and D. Sandell, Problems and Snapshots from the World of Probability, Springer, New York, NY, USA, 1994.
- H. M. Mahmoud, Polya Urn Models, CRC Press, Boca Raton, Fla, USA, 2009.
- L. Holst, “Counts of failure strings in certain Bernoulli sequences,” Journal of Applied Probability, vol. 44, no. 3, pp. 824–830, 2007.
- L. Holst, “The number of two consecutive successes in a Hoppe-Pólya urn,” Journal of Applied Probability, vol. 45, no. 3, pp. 901–906, 2008.
- S. Eryilmaz and F. Yalcin, “On the mean and extreme distances between failures in Markovian binary sequences,” Journal of Computational and Applied Mathematics, vol. 236, no. 6, pp. 1502–1510, 2011.
- F. S. Makri and Z. M. Psillakis, “On success runs of length exceeded a threshold,” Methodology and Computing in Applied Probability, vol. 13, no. 2, pp. 269–305, 2011.
- S. Eryilmaz, O. L. Gebizlioglu, and F. Tank, “Modeling of claim exceedances over random thresholds for related insurance portfolios,” Insurance: Mathematics & Economics, vol. 49, no. 3, pp. 496–500, 2011.
- L. Holst, “On consecutive records in certain Bernoulli sequences,” Journal of Applied Probability, vol. 46, no. 4, pp. 1201–1208, 2009.
- L. Holst, “A note on records in a random sequence,” Arkiv for Matematik, vol. 49, no. 2, pp. 351–356, 2011.
- S. Demir and S. Eryılmaz, “Run statistics in a sequence of arbitrarily dependent binary trials,” Statistical Papers, vol. 51, no. 4, pp. 959–973, 2010.
- A. Sarkar, K. Sen, and Anuradha, “Waiting time distributions of runs in higher order Markov chains,” Annals of the Institute of Statistical Mathematics, vol. 56, pp. 317–349, 2004.
- K. Sen and B. Goyal, “Distributions of patterns of two failures separated by success runs of length ,” Journal of the Korean Statistical Society, vol. 33, no. 1, pp. 35–58, 2004.
- S. D. Dafnis, A. N. Philippou, and D. L. Antzoulakos, “Distributions of patterns of two successes separated by a string of -2 failures,” Statistical Papers, vol. 53, no. 2, pp. 323–344, 2012.
- F. W. Huffer and J. Sethuraman, “Joint distributions of counts of strings in finite Bernoulli sequences,” Journal of Applied Probability, vol. 49, no. 3, pp. 758–772, 2012.
- F. S. Makri and Z. M. Psillakis, “Counting certain binary strings,” Journal of Statistical Planning and Inference, vol. 142, no. 4, pp. 908–924, 2012.
- E. Zehavi and J. K. Wolf, “On runlength codes,” IEEE Transactions on Information Theory, vol. 34, no. 1, pp. 45–54, 1988.
- P. Jacquet and W. Szpankowski, “On (d,k) sequences not containing a given word,” in Proceedings of the International Symposium on Information Theory, pp. 1486–1489, 2006.
- V. T. Stefanov and W. Szpankowski, “Waiting time distributions for pattern occurrence in a constrained sequence,” Discrete Mathematics & Theoretical Computer Science, vol. 9, no. 1, pp. 305–320, 2007.
- H. H. Chern, H. K. Hwang, and Y. N. Yeh, “Distribution of the number of consecutive records,” Random Structures and Algorithms, vol. 17, pp. 169–196, 2000.
- A. Joffe, E. Marchand, F. Perron, and P. Popadiuk, “On sums of products of Bernoulli variables and random permutations,” Journal of Theoretical Probability, vol. 17, no. 1, pp. 285–292, 2004.
- H.-H. Chern and H.-K. Hwang, “Limit distribution of the number of consecutive records,” Random Structures & Algorithms, vol. 26, no. 4, pp. 404–417, 2005.
- L. Holst, “A note on embedding certain Bernoulli sequences in marked Poisson processes,” Journal of Applied Probability, vol. 45, no. 4, pp. 1181–1185, 2008.
- F. W. Huffer, J. Sethuraman, and S. Sethuraman, “A study of counts of Bernoulli strings via conditional Poisson processes,” Proceedings of the American Mathematical Society, vol. 137, no. 6, pp. 2125–2134, 2009.
- S. Eryilmaz and M. Zuo, “Constrained -out-of- systems,” International Journal of Systems Science, vol. 41, no. 6, pp. 679–685, 2010.
- F. S. Makri, “Minimum and maximum distances between failures in binary sequences,” Statistics & Probability Letters, vol. 81, no. 3, pp. 402–410, 2011.
- S. D. Dafnis and A. N. Philippou, “Distributions of patterns with applications in engineering,” IAENG International Journal of Applied Mathematics, vol. 41, no. 1, pp. 68–75, 2011.
- F. S. Makri and Z. M. Psillakis, “Exact distributions of constrained (k, ℓ) strings of failures between subsequent successes,” Statistical Papers, vol. 54, no. 3, pp. 783–806, 2013.
Copyright © 2014 Frosso S. Makri and Zaharias M. Psillakis. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.