Research Article  Open Access
Bayes' Model of the BestChoice Problem with Disorder
Abstract
We consider the bestchoice problem with disorder and imperfect observation. The decisionmaker observes sequentially a known number of i.i.d random variables from a known distribution with the object of choosing the largest. At the random time the distribution law of observations is changed. The random variables cannot be perfectly observed. Each time a random variable is sampled the decisionmaker is informed only whether it is greater than or less than some level specified by him. The decisionmaker can choose at most one of the observation. The optimal rule is derived in the class of Bayes' strategies.
1. Introduction
In the papers we consider the following bestchoice problem with disorder and imperfect observations. A decisionmaker observes sequentially iid random variables . The observations are from a continuous distribution law (state ). At the random time , the distribution law of observations is changed to continuous distribution function (i.e., the disorder happen—state ). The moment of the disorder has a geometric distribution with parameter . The observer knows parameters , , and , but the exact moment is unknown.
At each time in which a random variable is sampled, the observer has to make a decision to accept (and stop the observation process) or reject the observation (and continue the observation process). If the decisionmaker decided to accept at step (), she receives as the payoff the value of the random variable discounted by the factor , where . The random variables cannot be perfectly observed. The decisionmaker is only informed whether the observation is greater than or less than some level specified by her.
The aim of the decisionmaker is to maximize the expected value of the accepted discounted observation.
We find the solution in the class of the following strategies. At each moment (), the observer estimates the a posterior probability of the current state and specifies the threshold . The decisionmaker accepts the observation if and only if it is greater than the corresponding threshold .
This problem is the generalization of the bestchoice problem [1, 2] and the quickest determination of the changepoint (disorder) problem [3–5]. The bestchoice problems with imperfect information were treated in [6–8]. Only few papers related to the combined bestchoice and disorder problem are published [9–11]. Yoshida [9] considered the fullinformation case and found the optimal stopping rule which maximizes the probability that accepted value is the largest of all random variables for a given integer . Closely related work to this study is Sakaguchi [10] where the optimality equation for the optimal expected reward is derived for the fullinformation model. In [11], we constructed the solution of the combined bestchoice and disorder problem in the class of singlelevel strategies, and, in this paper, we search the Bayes' strategy which maximizes the expected reward in the model with imperfect observation.
2. Optimal Strategy
According to the problem the observer does not know the current state ( or ). But she can estimate the state using the Bayes' formula:
Here, is the threshold specified by the decisionmaker within steps until the end (i.e., at the step ), is the a prior probability of the state (i.e., before getting the information that ), , and .
We use the dynamic programming approach to derive the optimal strategy. Let be the payoff that the observer expects to receive using the optimal strategy within steps until the end. The optimality equation is as follows:
Simplifying (2.2), we get Here, , .
The following theorem gives the presentation of the expected payoff in linear form on .
Theorem 2.1. For any the function can be written in the form where
Proof. Using the formula (2.3), one can show that
where , and
Threshold is the solution of (2.3) for for .
Assume the theorem is correct for certain . Then, for
where
The theorem is proved.
The following lemma takes place.
Lemma 2.2. Assuming , as , there is a limit of the expected payoff .
Proof. It is obvious that the sequence is increasing by .
Now, we prove that the sequence of the expected payoffs has an upper bound.
Further one can show using the induction that for any and any the expected payoff at the step has the upper bound
The lemma is proved.
Corollary 2.3. Theorem 2.1 and the lemma yield that there are such and that
As the expected payoff satisfies the following equation:
To find the components of the expected payoff for a case of huge number of observation we should solve the following equation: therefore,
The solution of the system is as follows
The expected payoff is and the optimal threshold is
The above results are summarized in the following theorem.
Theorem 2.4. For , the solution of (2.3) is defined as where
3. Examples
Consider the examples of using the Bayes' strategy defined by the formula (2.18) comparing with two strategies with constant thresholds that do not depend on .
3.1. Normal Distribution
Consider the example of the normal distribution of the random variables where functions and have the variance and the expectation and , respectively.
Strategies and with constant thresholds defined by the following formula: where and for the strategy ; and for the strategy .
The values of the thresholds of strategies and depending on discount rate are tabulated in Table 1.

Table 1 shows how much the discount rate is affect on the thresholds.
Figure 1 shows the graphics of the optimal thresholds for strategies and ( and , resp.) and strategy depending on . As the figure shows, the strategy depends on the a posterior probability of the state . As tends to zero, the optimal threshold of the strategy tends to threshold .
We compare the payoffs that the observer expects to receive using different strategies. Define as the expected payoff for and depending on probability of disorder .
Figure 2 shows the numerical results of the expected payoffs of the observer who uses the strategies , , and (thresholds , , and , resp.).
The expected payoff of the observer who uses the Bayes' strategy is greater if she uses one of the strategies or . The difference is significant for , because of uncertainty of the current state of the system.
Table 2 shows the numerical results of the main characteristics of the bestchoice process.

For the small probability of the disorder (), the expected payoff according to the strategy is greater (10.429) than according to the strategy (10.035). But the Bayes' strategy that depends on gives the largest expected payoff (10.500).
Table 2 shows that the average time of accepting the observation is increasing with respect to the value of the threshold. Note that the strategy does not depend on the disorder and this leads to a high value of the average time of accepting the observation. Both strategies and have a small average time of accepting the observation.
3.2. Exponential Distribution
Consider the example of the exponential distribution of the observations. Let and have the exponential distribution with parameters and , respectively. As in the previous example, consider the strategies and comparing with the Bayes' strategy , where and for the strategy ; and for the strategy .
Table 3 shows the values of the thresholds for the strategies and depending on the discount rate.

The value of the optimal threshold of the strategy as in the case of the normal distribution of the observations is increasing by and equal to the threshold of the strategy at . The graphics of the expected payoffs have the same view as in Figure 2. Table 4 shows the main characteristics of the bestchoice process for different strategies.

As in the previous example, the Bayes' strategy gives better payoff than the strategy , but it has bigger average time of accepting the observation. The strategy is the worst for all the parameters.
4. Results
In the article, we consider the bestchoice problem with disorder and imperfect observations. We propose the Bayes' strategy where the threshold depends on the a posterior probability of the disorder. The numerical results show that this strategy gives better expected payoff than the constant strategies.
Acknowledgment
The paper is supported by grants of Russian Fund for Basic Research, Project 100100089a and Division of Mathematical Sciences, Program “Mathematical and algorithmic Problems of New Information Systems”.
References
 J. P. Gilbert and F. Mosteller, “Recognizing the maximum of a sequence,” Journal of the American Statistical Association, vol. 61, pp. 35–73, 1966. View at: Publisher Site  Google Scholar
 B. A. Berezovskiĭ and A. V. Gnedin, The Problem of Optimal Choice, Nauka, Moscow, Russia, 1984.
 A. N. Širjaev, Statistical Sequential Analysis, vol. 38, American Mathematical Society, Providence, RI, USA, 1973.
 T. Bojdecki, “Probability maximizing approach to optimal stopping and its application to a disorder problem,” Stochastics, vol. 3, no. 1, pp. 61–71, 1979. View at: Google Scholar  Zentralblatt MATH
 K. Szajowski, “On a random number of disorders. Forthcoming,” in Probability and Mathematical Statistics, 2011. View at: Google Scholar
 E. G. Enns, “Selecting the maximum of a sequence with imperfect information,” Journal of the American Statistical Association, vol. 70, no. 351, pp. 640–643, 1975. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 P. Neumann, Z. Porosiński, and K. Szajowski, “On two person fullinformation best choice problem with imperfect observation,” in Game Theory and Applications, vol. 2, pp. 47–55, Nova Science Publishers, Hauppauge, NY, USA, 1996. View at: Google Scholar  Zentralblatt MATH
 Z. Porosiński and K. Szajowski, “Modified strategies in two person fullinformation best choice problem with imperfect observation,” Mathematica Japonica, vol. 52, no. 1, pp. 103–112, 2000. View at: Google Scholar  Zentralblatt MATH
 M. Yoshida, “Probability maximizing approach to a secretary problem with random changepoint of the distribution law of the observed process,” Journal of Applied Probability, vol. 21, no. 1, pp. 98–107, 1984. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. Sakaguchi, “A bestchoice problem for a production system which deteriorates at a disorder time,” Scientiae Mathematicae Japonicae, vol. 54, no. 1, pp. 125–134, 2001. View at: Google Scholar  Zentralblatt MATH
 V. V. Mazalov and E. E. Ivashko, “Fullinformation bestchoice problem with disorder,” Surveys in Applied and Industrial Mathematics, vol. 14, no. 2, pp. 215–224, 2007. View at: Google Scholar
Copyright
Copyright © 2012 Vladimir Mazalov and Evgeny Ivashko. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.