A Common Value Experimentation with Multiarmed Bandits

Gao, Xiujuan; Liang, Hao; Wang, Tong

doi:https://doi.org/10.1155/2018/4791590

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Authors’ Contributions Acknowledgments References Copyright Related Articles

Special Issue

Mathematical Theories and Applications for Nonlinear Control Systems

View this Special Issue

Research Article | Open Access

Volume 2018 | Article ID 4791590 | https://doi.org/10.1155/2018/4791590

A Common Value Experimentation with Multiarmed Bandits

Xiujuan Gao,¹Hao Liang,¹and Tong Wang¹

Academic Editor: Ju H. Park

Received21 May 2018

Revised16 Jul 2018

Accepted24 Jul 2018

Published30 Jul 2018

Abstract

We study a value common experimentation with multiarmed bandits and give an application about the experimentation. The second derivative of value functions at cutoffs is investigated when an agent switches action with multiarmed bandits. If consumers have identical preference which is unknown and purchase products from only two sellers among multiple sellers, we obtain the necessary and sufficient conditions about the common experimentation. The Markov perfect equilibrium and the socially effective allocation in -armed markets are discussed.

1. Introduction

A financial debate has arisen when we need to choose the best goods among multiple products. In Robbins [1], this problem is described as a decision maker facing slot machines (called arms) and the maker has to choose one of them at each instantaneous time. Gittins and Jones [2] and Michael et al. [3] calculate the value of pulling an arm, i.e., Gittins index, in discrete time. Comparing the value to the Gittins index of all other arms, Michael et al. [3] present that the optimal strategy for an -armed problem is an -dimensional discounted Markov decision chain and the value pulling each arm itself is independent of the cutoff. Karatzas [4], Kaspi, and Mandelbaum [5] transform the problem into a standard optimal stopping problem. Bolton and Harris [6] and Bergemann and Välimäki [7] show that when there are sellers who sell different products and consumers whose preferences are identical (but unknown) in the market, the optimal strategies of consumers are to buy the products from the same seller; i.e., it is symmetric equilibria. Cohen and Solan [8] study two-armed bandit problems in the continuous time with the property of Lévy processes and obtain the Hamilton-Jacobi-Bellman (HJB) equation for the problem. They conclude that the optimal strategy is a cutoff strategy when the arms have two types. For other optimal strategies and control approaches, the reader is referred to [9–11] and the references therein.

Jan and Xi [12] investigate that second derivatives of value functions are equal at the cutoff with value matching and smooth pasting (assume that the value function is and the cutoff is . Value matching: and smooth pasting: which is discussed in [13, 14], where and ), when there are two arms with different types, and conclude that the optimal strategy is an interval strategy in the market. In [12], an application is given about strategic pricing of two vendors in a competitive market. There are consumers who have the same type either or . Two vendors produce two different kinds of goods for the two types, respectively. Jan and Xi [12] describe the socially efficient allocation and pricing strategies of two vendors in the market. Moreover, they use value matching, smooth pasting, and second derivatives of value functions to discuss the Markov perfect equilibrium and the socially efficient allocation.

In this article, we investigate multiarmed bandits, while two-armed bandits are studied in [12]. We consider multiple sellers instead of two sellers. In general, there are multiple sellers to sell the same type of goods in the market, but the quality, utilities, and the prices of goods sold by each seller are different. Therefore, changing the number of arms from two to is reasonable in the market. People believe that purchasing two products has the best effects to type or in the market. For example, the first seller is the highest utility to type but is the lowest utility to type . On the contrary, the second seller is the highest utility to type but is the lowest utility to type and other utilities are between about the two sellers to types and . Instinctively, we think that a rational consumer chooses goods only from the first two sellers but this is not perfect or this strategy needs some conditions. We discuss the necessary and sufficient conditions for this strategy.

In order to obtain our conclusions under certain assumptions of the model for multiarmed bandits, using the methods used in [12], we calculate the optimal cutoff points by solving the corresponding ordinary differential equations. Then, we obtain the necessary and sufficient conditions for the strategy of which consumers purchase products from only two sellers among multiple sellers.

The rest of the paper is as follows. In Section 2, we introduce a multiarmed model under certain assumptions and show a conclusion about the model. In Section 3, we give an application for the model. In Section 3.1, we discuss optimal choice of consumers in the case of market equilibrium. Based on the optimal choice of consumers, we derive HJB equations for their utilities functions. Using solutions of the HJB equations, value matching, and smooth pasting, we get the cutoff point at Markov perfect equilibrium and the necessary and sufficient conditions about the common experimentation. In Section 3.2, we get the cutoff point at socially efficient allocation with the same way in Section 3.1. The relationship between the Markov perfect equilibrium and the socially efficient allocation in -armed markets is discussed.

2. The Model

Jin and Xi [12] consider one agent and a bandit with arms. We study multiarmed bandits and consider the case where there is only one real-valued state and is a connect set. The instantaneous flow payoff of each arm is at state , . Let be the discount rate. For each arm , there is a probability space endowed with filtration and is the total measure of time to time when arm has been chosen. From [12], we know that the updating of in arm , when there are arms in the market, satisfieswhere is a standard Brownian motion, is independent of (), and is a Poisson random measure that is independent of . The state is updated from arm , and . Let be the change of the state when there is a Poisson jump. Equation (1) shows that the state changes from to (). Thus, (1) does not contain the case that state jump from other states. Because state can jump to any other states and any other states can jump back to , (1) contains all jump processes that describe the changes of state.

In multiarmed bandit problem, the stochastic process is constructed on the product space with filtration . If , the agent chooses an allocation rule adapted to filtration to solve the following optimal control problem:

Assumption 1 (see [12]). Let be a connected set. Assume that , , , and for are functions of .

Assumption 2 (see [12]). Assume that the first derivatives of , , , and with respect to are bounded. Namely, there exists such that for any , , , , and for each . Using (2) and the dynamic programming principle (the detailed process is in [15]), for any , we obtainUsing Ito’s lemma for and property of Poisson random measure (the property of Poisson random measure is in [13]), we get where .
Then, we haveSubstituting (6) into (4), using the mean value theorem of integrals and sending , we get the HJB equationwhere is the finite intensity measure of . From [13], we know that there exists a unique solution to (1).

We assume that optimal strategy of consumers is an interval strategy (see [14]). Namely, there exists points (cutoffs) to partition the space into intervals, where . When and , , we assume that if , the agent chooses arm , where and , . Thus, we have

Using the conclusions in [12, 14], we have the value matchingand the smooth pasting

Now, in the light of the value matching and smooth pasting, we have the following conclusion.

Theorem 3. If for all , , a necessary condition for optimal solution is that satisfies for any possible cutoffs .

Proof. On the basis of assumption , an agent chooses arm , . It derives thatFor and , inequality (11) becomesDue to the value matching, we haveandFrom the smooth pasting, we obtain . In the same way, we get . Thus, we have for any .

Theorem 3 gives a necessary condition under which second derivatives of value functions are equal at every cutoff when there are arms in the market.

When , i.e., there are two arms in the market; the agent has two states and in the model. In this case, we only need to consider that an agent jumps between the two states, i.e., . Thus, there is one cutoff, which is discussed in [12].

3. Application

The application of Theorem 3 is similar to that in [12]. The difference is that there are sellers offering different products in the market. We index these sellers with . We assume that all consumers have the same type , either or . Let and be the utilities of consumers who buy good with type or , respectively. We assume and , i.e., the more likely the consumers are to buy type , the more tendency they choose the previous sellers. Therefore, we denote that is the common belief that the type is high and then the expected utility of controlling the seller is . If we denote and , the utility is represented bywhere and .

At any time, all market participants observe all previous outcomes. Because of the influence caused by uncertain external factors, the flow utility has a noisy signal of the true value (the detailed introduction can be found in [12]). where is independent of .

is related to time and it is described as a learning process in [16], denoted by . Without loss of generality, we assume that there are consumers choosing seller and = . From the statements in [6, 7], we have that satisfieswhere is independent of () and . We denote .

In the next subsection, the Markov perfect equilibrium and the socially efficient allocation in -armed market are discussed.

3.1. Markov Perfect Equilibrium

Let denote the price of goods of seller . The price is related to at instantaneous time . So is a mapping , . We denote as the choice of th consumer which is related not only to his common belief , but also to the prices of the sellers’ goods, i.e., . A symmetric Markov perfect equilibrium is .

When the choice of the previous consumers is observed, the utility of the th consumer is maximized. Let denote the maximum utility of this consumer and denote the number of choosing the th seller. We have . There exists subject to and , . Then, we have Due to the price competition, the consumer chooses equal utility for other sellers. Thus, we get Therefore, we obtainwhere and .

Now, we discuss the pricing of goods of sellers. Denote as the th seller’s utility. If there are consumers buying goods when the price is , we havewhere .

From (21), we derive that

We get or because all consumers choose only one seller (see [6, 7]). When , the utility of seller is presented in the formwhere we assume that all consumers choose seller , i.e., and .

When only one consumer chooses seller , i.e., , the utility of seller is

As a rational market participant, when no consumer buys goods, the seller adjusts the price so that the payoff in this case is equal to the payoff when only one consumer chooses this seller. We obtain the price of goods of seller in the form

From (20), we get the price of seller in the form

We have cutoffs , , where . When common belief , consumer chooses seller , ( , ). If , the utilities of consumers are indifferent when they choose seller or due to value matching. When , we have the conclusion

Theorem 4. All consumers only choose the first seller or the last seller in the market if and only if cutoffs , , are the same and equal to , where

Proof. Firstly, we prove the sufficiency. When all cutoffs are equivalent and equal to , all consumers only choose the first seller or the last seller in the market.
Letting , we have . When , all consumers choose the first seller. From (25) and (26), we haveandSubstituting (29) into (22) yieldsSubstituting (28) and (29) into (22) gives rise toFrom (23), (30), and (31), for , we haveSimilarly, for , it haswhere , , , and and are constants, .
Using value matching, smooth pasting, the second derivative condition, and , we have , , , and . Therefore, we obtain where and Define and Thus, we getandWhen , , , all consumers only choose the first seller or the last seller in the market.
The sufficiency of Theorem 4 is proved.
Now, we prove the necessity. That all consumers choose only the first seller or the last seller in the market means that these cutoffs are the same and equal to , . We prove it by contradiction.
Assuming , we get , . So , i.e., , . When the common belief is , the optimal choice is the th and and . This is contradiction to the proposition of which all consumers choose only the first seller or the last seller. So , . According to the proof of the sufficiency, we have . The necessity of Theorem 4 is proved.

Theorem 4 shows the necessary and sufficient conditions for the consumers’ choice. We find that when the consumers only choose the first seller and the last seller, the multiarmed bandits problem would be attributed to the two-armed bandits. In other words, other sellers gradually disappear in the market because they have no sales. The multiarmed bandit problem is transformed into the two-armed bandits in the situation discussed in [12].

3.2. Socially Efficient Allocation

We consider the optimal choice of planners when they face multiarmed bandit problem. Let the total social surplus function be . We have

Assume that is the total social surplus in a neighborhood of if a planner choose seller . From (41), we havewhere .

Solving the ordinary differential equation (43), we getwhere and , , are constants.

We denote subjecting to as the cutoffs of the choice for the planner. When , the optimal choice is the th seller, and , . Due to value matching, smooth pasting, and the second derivative conditions, one haswhere .

There are unknown parameters and equations in system (45) with , for all , i.e., the coefficient matrix of system (45) is nonsingular. Thus, system (45) has a unique solution.

The planner chooses from the first seller or the seller when . Let . System (45) becomesWe obtain

The results in [12] introduce the necessary and sufficient condition under which the Markov perfect equilibrium with cautious strategies is socially efficient with two-armed bandits.

Corollary 5. When , , the consumers’ cutoffs are , and the planner’s cutoffs are , the Markov perfect equilibrium with cautious strategies is socially efficient if and only if . Moreover, when .

The proof of Corollary 5 is similar to that of Theorem 2 in [12]. We omit its proof.

Corollary 5 shows that the necessary and sufficient conditions under which the Markov perfect equilibrium with cautious strategies is socially efficient when the cutoffs are multiarmed bandits. Jin and Xi [12] present the conditions in the case of two-armed market. Thus, Theorem 4 extends parts of results in [12].

According to the condition in Corollary 5, when , we obtain that the Markov perfect equilibrium with cautious strategies is socially efficient. If , we obtain . In the light of our initial hypothesis, we have and ; i.e., all sellers are identical in the market.

4. Conclusion

We study a common value experimentation with multiarmed bandits and present its application. This extends two-armed bandits in [12] to multiarmed bandits. We derive the HJB equation with multiarmed bandits. In the application, we get the necessary and sufficient conditions for the choices of consumers from two sellers. The necessary and sufficient conditions guarantee that the Markov perfect equilibrium with cautious strategies is socially efficient. In future, we need to solve all the cutoffs in system (45) when these cutoffs are different and give general solutions about these cutoffs.

Data Availability

No data are used to support the study. Using theoretical derivation and proof, the authors get their conclusions.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

The article is a joint work of three authors who contributed equally to the final version of the paper. The authors read and approved the final manuscript.

Acknowledgments

This work is supported by the Fundamental Research Funds for the Central Universities (JBK120504).

References

H. Robbins, “Some aspects of the sequential design of experiments,” Bulletin (New Series) of the American Mathematical Society, vol. 58, pp. 527–535, 1952.
View at: Publisher Site | Google Scholar | MathSciNet
J. C. Gittins and D. M. Jones, “A dynamic allocation index in the sequential design of experiments,” in Progress in Statistics, pp. 241–266, North-Holland, Amsterdam, 1974.
View at: Google Scholar | MathSciNet
M. N. Katehakis and J. Veinott, “The multi-armed bandit problem: decomposition and computation,” Mathematics of Operations Research, vol. 12, no. 2, pp. 262–268, 1987.
View at: Publisher Site | Google Scholar | MathSciNet
I. Karatzas, “Gittins indices in the dynamic allocation problem for diffusion processes,” Annals of Probability, vol. 12, no. 1, pp. 173–192, 1984.
View at: Publisher Site | Google Scholar | MathSciNet
H. Kaspi and A. Mandelbaum, “Lévy bandits: multi-armed bandits driven by Lévy processes,” The Annals of Applied Probability, vol. 5, no. 2, pp. 541–565, 1995.
View at: Publisher Site | Google Scholar | MathSciNet
P. Bolton and C. Harris, “Strategic experimentation,” Econometrica, vol. 67, no. 2, pp. 349–374, 1999.
View at: Publisher Site | Google Scholar | MathSciNet
D. Bergemann and J. Valimaki, “Experimentation in Markets,” Review of Economic Studies, vol. 67, no. 2, pp. 213–234, 2000.
View at: Publisher Site | Google Scholar
A. Cohen and E. Solan, “Bandit problems with Lévy payoff,” Mathematics of Operations Research, vol. 38, no. 1, pp. 92–107, 2013.
View at: Google Scholar
X. Yu, X. J. Xie, and N. Duan, “Small-gain control method for stochastic nonlinear systems with stochastic iISS inverse dynamics,” Automatica, vol. 46, no. 11, pp. 1790–1798, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
Z.-J. Wu, X.-J. Xie, P. Shi, and Y.-Q. Xia, “Backstepping controller design for a class of stochastic nonlinear systems with Markovian switching,” Automatica, vol. 45, no. 4, pp. 997–1004, 2009.
View at: Publisher Site | Google Scholar
X. Xie and M. Jiang, “Output feedback stabilization of stochastic feedforward nonlinear time-delay systems with unknown output function,” International Journal of Robust and Nonlinear Control, vol. 28, no. 1, pp. 266–280, 2018.
View at: Publisher Site | Google Scholar
J. Eeckhout and X. Weng, “Common value experimentation,” Journal of Economic Theory, vol. 160, pp. 317–339, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
D. Applebaum, Lévy Processes and Stochastic Calculus, vol. 93 of Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, UK, 2004.
View at: Publisher Site | MathSciNet
G. Peskir and A. Shiryaev, Optimal stopping and free-boundary problems, Birkhauser Verlag, Basel, Switzerland, 2006.
View at: MathSciNet
J. Yong and X. Y. Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations, vol. 43 of Applications of Mathematics, Springer, New York, NY, USA, 1999.
View at: Publisher Site | MathSciNet
R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes II: Applications, vol. 6, Springer-Verlag, Berlin, 2001.
View at: MathSciNet

Copyright

Copyright © 2018 Xiujuan Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

582

Downloads

723

Citations