Recent Advances in Information TechnologyView this Special Issue
Research Article | Open Access
Yuhuang Hu, Chu Kiong Loo, "A Generalized Quantum-Inspired Decision Making Model for Intelligent Agent", The Scientific World Journal, vol. 2014, Article ID 240983, 8 pages, 2014. https://doi.org/10.1155/2014/240983
A Generalized Quantum-Inspired Decision Making Model for Intelligent Agent
A novel decision making for intelligent agent using quantum-inspired approach is proposed. A formal, generalized solution to the problem is given. Mathematically, the proposed model is capable of modeling higher dimensional decision problems than previous researches. Four experiments are conducted, and both empirical experiments results and proposed model's experiment results are given for each experiment. Experiments showed that the results of proposed model agree with empirical results perfectly. The proposed model provides a new direction for researcher to resolve cognitive basis in designing intelligent agent.
Decision making model is crucial to build successful intelligent agent. Therefore, study of decision making model plays a key role in order to improve performance of intelligent agent. Traditionally, decision making model is represented and implemented by employing Bayesian or Markov process [1, 2]. However, traditional methods usually introduce biases problems in designing stage. And many empirical findings in cognitive science in recent years have indicated that human usually violates “rational decisions” which are produced by traditional methods [3–6]. Hence, many researchers in cognitive science, psychology, neuroscience, and artificial intelligence have proposed different explanations to complete traditional methods [7–9]. However, none of the explanations is able to resolve all human’s violations of “rational decisions” completely.
There are two major violations of “rational decision” found in previous studies: “sure thing principle” and “order effects.” The sure thing principle claims that human should prefer over if is always better than in world . This principle was tested by Tversky and Shafir  in a simple two-stage gambling experiment and showed violation to the principle. The order effect argues that human’s decision pattern violates a fundamental requirement of classical probability theory: which implies according to Bayesian rule . The violations of “rational decision” are recognized as “wrong decisions” or “stupid decisions” in some game theorists’ perspective. For example, the violation of the sure thing principle is an obvious wrong decision theoretically. However, people usually choose differently even though they completely understand the risks and benefits in the certain scenario such as . And the order effect challenges the classical theory even more fundamentally. The commutative property is not followed in human decision making process, which means the analyses based on classical probability theory introduce serious bias of modeling decision making process. Therefore, the traditional methods should be enhanced or replaced for modeling and describing human decision making.
Recently, quantum mechanics inspired explanation of “rational violation” is proposed and tested [11–13]. It showed that quantum explanation is able to explain and illustrate two previously mentioned violations successfully. Essentially, noncommutative property and superposition principle of quantum mechanics inspired probability theory are natural tools to explain the violations. Furthermore, the approach is also capable to produce “human-like” decision during simulation [11–14]. “Human-like” in this paper refers to that the intelligent agent is able to perform decisions similar to human in same scenario. On the other hand, previous researches did not model complex environment and decision spaces which are practical to implement on intelligent agent.
In this paper, a generalized quantum-inspired decision making model (QDM) is proposed. QDM helps to extend previous research findings and model more complicated decision space. Four experiments are concluded and verified QDM where the experiment results agree with empirical almost perfectly. The cognitive biases in decision making process are resolved in experiments. QDM is expected to help researches to model real life decision making process and improve the performance of current intelligent agent for generating “human-like” decision.
This paper is based on three hypotheses. First, because QDM is capable to explain violations of “rational decisions” of human behavior, authors believe that QDM could result “human-like” decisions. Second, all decisions in a scenario can be quantified. Third, some parameters are predefined because the paper is mainly discussing decision making model. The paper offers a preliminary result of QDM and its applications. The representation introduced in this paper has its own advantages and limitations. In future, more theoretical works of QDM are needed to be explored. An elegant representation of QDM is also required.
The paper is formatted as follows. Section 2 presents the methodology and mathematical description. Section 3 showed experiment results to verify the model and finally Section 4 concludes the paper and discusses future works.
2.1. Environment Setting
In this section, the paper sets two types of environment for further discussion. In this paper, authors considered two players involved only in order to simplify the scenario and establish fundamental analysis of the topic.
2.1.1. First Type Two Players Game
First Type Two Players Game (FTTP) contains two characters: Player 1 and Player 2. In this context, at least one of the players is an intelligent agent which is sufficient to provide and execute necessary functionalities and make decisions. Mathematically, let be the Player 1's decision space and be the Player 2's decision space. Elements of and can be formed with any semantic description: names, codes, and so on.
This type of game is used as main scene in the following sections to describe QDM.
2.1.2. Second Type Two Players Game
Second Type Two Players Game (STTP) contains two players: Player 1 and Player 2. In this context, at least one of the players is an intelligent agent which is sufficient to provide and execute necessary functionalities and make decisions. Both players share same decision space . Similarly, the elements of can be any semantic description. During the game, Player 1 will act a decision from , and Player 2 have to select an appropriate decision from to respond.
Players will receive amount of rewards by performing any decision. A payoff matrix which assigns rewards to each decision for both players is defined. The payoff matrix according to two players is necessary to be produced before the game started in both FTTP and STTP. The received payoff of a player is determined by utility function or utility vector. The elements of payoff matrix are not necessarily real numbers. However, payoff has to be real number due to the limitation of Hamiltonian operator (explained in Section 2.4).
2.2. Two-Stage Quantum Decision Model
Two-Stage Quantum Decision Model assumes that Player 1 makes a decision then Player 2 has to react an appropriate decision .
Let be the state vector and defined as where ,, , and and means the state of performing decision given . Furthermore, the state vector satisfies , and is the probability of state according to quantum mechanics. A noted fact is that the order of elements does matter in this paper.
Before the game starts, set the initial state as where there are elements in the state vector.
2.2.1. Stage One
Assume Player 1 makes decision and Player 2 recognized it successfully; the state vector is transformed to state that rules out other probabilities and only retains , and the description of is defined as follows: where .
2.2.2. Stage Two
According to time-dependent Schrödinger equation, the time evolution is determined by (3), where is imaginary unit, defined as :
The solution to (3) is where and Hamiltonian operator is determined by the sum of two matrices: :
The detailed description of Hamiltonian operator can be found in Section 2.4.
State is transformed to by employing (4) with given time . Note that (4) is not a conventional representation to time-dependent Schrödinger equation. In this paper, the Plank constant is omitted. And authors assume that and are associated with .
2.3. One-Stage Quantum Decision Model
The previous section presented the decision making strategy based on Player 1's decision. In this section, a one-stage quantum decision model is described. The model in the section does not require Player 1's decision as reference and makes decision directly.
This approach will produce fuzzier result of decision-making certainly; however, it is extremely important when Player 2 is not able to collect enough evidence to perform two-stage QDM.
Equation (6) indicates that the state is directly transformed to without knowing and it seems unreasonable. In fact, (6) can be viewed as the summarization of all possible due to superposition principle. The interpretation is showed as follows: where represents that Player 1 makes decision .
2.4. Hamiltonian Operator
According to quantum mechanics, Hamiltonian operator in matrix form is required to be a Hermitian matrix at least for ensuring that is a unitary operator. And due to the property of Hermitian matrix, payoff has to be real number. Hamiltonian is used to rotate the state vector to the desire basis. A suggested solution of and is presented.
is a diagonal matrix where rotates the state to the desired decision according to Player 1's decision, and is defined as a edited Hadamard transform: where is same as the defined in Section 2.1.
An adjust matrix is defined as where is the received payoff of Player 2 given according to utility function or utility vector of Player 2.
Therefore, is defined as where is a scalar and “” represents entrywise product.
exists in STTP only. In FTTP, is set as null matrix. Cognitive Science findings suggest that if Player 1 chooses an action, Player 1 would tend to think that Player 2 will choose the same decision; is constructed in a special way. Let .
For each , it follows certain rules.(i) is a symmetric matrix.(ii)The th row and th column of must be full of 1 s.(iii)Other than the th row, each row contains positive and negative .
Let be the initial matrix. represents null matrix. Let the index of be and the index of be . Replace 0 in as the corresponding element in using the following relationship:
Employing (11) to every . Normalizing the final product from above, where is a scalar and is a constant.
2.5. Payoff Matrix and Utility Function
As the paper discussed previously, payoff matrix and corresponding utility function/vector are necessary to be produced and affect the result fundamentally. Some suggestions of settings are presented in this section.
The concepts of payoff matrix and utility function/vector are borrowed from Game Theory, which are useful to represent decision space in two dimensions. Payoff matrix, for certain purposes, can be abstracted and estimated from environment. Utility function is used to calculate the expected payoff of a player. There are many ways to perform this function in Game Theory and reinforcement learning. Utility function/vector may be learned during training process. A reliable utility function/vector would increase the robustness of QDM.
Usually, payoff matrix is easy to define or estimate. On the other hand, although utility function has well definition in Game Theory, the actual received payoff is different from mathematical formalization. For example, a famous hypothesis in Game Theory is that every participant in the game is “evil.” Altruism, an important factor of humanity, on the contrary, is rarely mentioned. Involving “altruistic” factor to adjust utility function may help model produce more “human-like” decision.
3. Experiment Results
3.1. Prisoner’s Dilemma
Prisoner’s Dilemma is a canonical Game Theory problem which has been used in discussing and analyzing human behavior and decision making. The payoff matrix is described in Table 1. The Nash Equilibrium suggests that both parties have to defect in standard Game Theory. However, empirical studies argue differently.
Table 2 presented several well-known empirical studies on Prisoner’s Dilemma. By employing proposed model, experiment result showed that quantum-inspired decision making model matches Prisoner’s Dilemma almost perfectly.
The experiment is set as follows.(i)The state vector , where 1 represents “defect” and 2 represents “cooperate.”(ii)Initial state .(iii)If Player 1 chooses “defect,” the state vector changes to ; if Player 1 chooses “cooperate,” the state vector changes to .(iv)Rotation matrix is equal to (v)Set and with .
By performing the settings above, the experiment concludes the following result.(i)By known Player 1 choosing “defect,” the probability vector is for the where indicates the probability of choosing according to .(ii)By known Player 1 choosing “cooperate,” the probability vector is for the .(iii)By unknown Player 1's decision, the probability vector is for the .
Therefore, the probability of Player 2's decision, in this case, “defect,” is for the (known “defect”, known “cooperate”, unknown). The average result of empirical studies is . The model produced the similar result to average result of empirical studies in Table 2.
3.2. Splitting Money Game
Splitting Money Game is also a frequently used example in Game Theory. The game is described as follows. You and your friend are splitting 7 dollars. Your friend makes an offer to you from 0 dollar to 7 dollars, such as 3 dollars or 5 dollars. If you accept the offer, then you will receive such dollars, and your friend will take the rest. However, if you reject the offer, you and your friend both will receive nothing, and the money will be donated. The payoff matrix is showed in Table 3.
An online anonymous survey of this game has been conducted and received 302 respondents. The result is showed in Table 4.
The experiment is set as follows.(i)The state vector , where in , represents offer $, and represents “accept” or “reject”.(ii)Initial state .(iii)Stage vector for offer is .(iv)Rotation matrix is diagonal matrix : (v)Set utility vector as where the elements of are corresponding to in order and .
By performing the setting above, the experiment concludes the following results.(i)For known different offers, experiment produces a probability vector for choosing “accept” by Player 2: .(ii)For known different offers, experiment produces a probability vector for choosing “reject” by Player 2: .(iii)For unknown offer, the probability vector is for choosing “accept” and “reject,” respectively.
3.3. The Price Is Right?
The Price is Right is a game where participants need to choose the same price as opponent’s choice in order to win. The description of the game is given as follows. Las Vegas proposed a new game. The dealer will give you four cards, and each card has a price on it; for example, card 1 is 1000$, card 2 is 2000$, and so on. Before the game starts, dealer would write down a price from one of the cards secretly and then save it in an envelope; witness would make sure nobody can touch the envelope during the game. Now the game started; you need to choose one of the cards. After you made your choice, witness will open the envelope and dealer will judge the result according to the following rules.(1)If the price of the card you chose is same as the price which is written, you win the such amount of money. For example, you choose a card with 1000$, and dealer also wrote 1000$. You will win 1000$.(2)If the price of the card you chose is different from the price that is written, you will lose. And you will be judged as loser, and you need to pay half of the difference. For example, you choose the card with 1000$, but the dealer wrote 4000$ instead, then you need to pay to the dealer.
The experiment is set as follows.(i)The state vector , where in .(ii)Initial state .(iii)First stage vector for different is .(iv)Rotation matrix is sum of diagonal matrix and .(a)For , (b)For , (v)Set utility vector as where the elements of are corresponding to in order and .
By performing the setting above, the experiment concludes the following results.(i)For choosing 1000$ : 0.1512.(ii)For choosing 2000$ : 0.1664.(iii)For choosing 3000$ : 0.4572.(iv)For choosing 4000$ : 0.2252.
3.4. A Sheriff’s Dilemma
A Sheriff’s Dilemma is a classic Bayesian Game in Game Theory. A Bayesian Game introduces multiple payoff matrices with corresponding probability to describe the scenario. The description of the game is presented as follows. You, the sheriff, are facing a suspect. The suspect has a gun. You are pointing at each other, and now, you need to make the decision whether you are going to shoot him (assume there is no way to talk). The suspect has a possibility to be the criminal, but also can be innocent. Here, let us say it is half and half, which means that you cannot really tell whether the suspect is a criminal or innocent. The criminal would rather shoot even if the sheriff does not, as the criminal would be caught if he does not shoot. The innocent suspect would rather not shoot even if the sheriff shoots. The payoff matrix is presented in Table 7.
|The row is sheriff's decisions; column is suspect's decisions.|
An online anonymous survey of this game has been conducted and received 89 respondents. The result is showed in Table 8.
The experiment is set as follows.(i)The state vector , where in .(a)When , 1 represents “shoot” and 2 represents “not shoot” when suspect is innocent.(b)When , 3 represents “shoot” and 4 represents “not shoot” when suspect is a criminal.(c)When , 1 represents “shoot” and 2 represents “not shoot” for sheriff.(ii)Initial state .(iii)First stage vector for different is .(iv)Rotation matrix is a diagonal matrix : (v)Set utility vector as where the elements of are corresponding to in order and .
By performing the setting above, the experiment concludes the following results.(i)For known suspect shoot, experiment produces a probability for choosing shoot by sheriff which is 0.8752.(ii)For known suspect not shoot, experiment produces a probability vector for choosing shoot by sheriff which is 0.2929.(iii)For unknown suspect shoot/not shoot, experiment produces a probability vector for choosing shoot by sheriff which is 0.5827.
4. Conclusions and Future Works
This paper introduced a generalized quantum-inspired decision making model for intelligent agent. And the proposed model is verified by four experiments successfully. The model is aiming to provide a tool for intelligent agent to perform “human-like” decision instead of “machine-like” decision. Even though this paper limits the setting between two players, two-dimensional decision spaces are in fact the foundation of multiagents environment. Furthermore, the presented model is able to model much more complex and larger decision space than previous researches.
Some future works are considered. The first problem is that the number of decisions does not always follow , and how to disable one or more necessary dimensions is fairly important to study. Second, since the payoff matrix and utility function affect the results fundamentally, the study of both may improve the performance of the model. Third, more social studies and empirical results on human decision making are needed; they are used for adjusting and improving the model. Fourth, the performance of the model in multiagents environment is worthy of being studied.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors acknowledge scholarship from University of Malaya (Fellowship Scheme). The research is supported in part by HIR Grant UM.C/625/1/HIR/MOHE/FCSIT/10 from University of Malaya.
- J. Tenenbaum and T. L. Griffiths, “Generalization, similarity, and bayesianinference,” The Behavioral and Brain Sciences, vol. 24, no. 4, pp. 629–791, 2001.
- O. Cappe, E. Moulines, and T. Ryden, Inference in Hidden Markov Models, Springer, 1st edition, 2005.
- E. Shafir and A. Tversky, “Thinking through uncertainty: nonconsequential reasoning and choice,” Cognitive Psychology, vol. 24, no. 4, pp. 449–474, 1992.
- S. Li and J. Taplin, “Examining whether there is a disjunction effect in prisoner's dilemma games,” Chinese Journal of Psychology, vol. 44, no. 1, pp. 25–46, 2002.
- A. Tversky and D. Kahneman, “Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment,” Psychological Review, vol. 90, no. 4, pp. 293–315, 1983.
- B. W. Carlson and J. F. Yates, “Disjunction errors in qualitative likelihood judgment,” Organizational Behavior and Human Decision Processes, vol. 44, no. 3, pp. 368–379, 1989.
- A. Tversky and D. J. Koehler, “Support theory: a nonextensional representation of subjective probability,” Psychological Review, vol. 101, no. 4, pp. 547–567, 1994.
- R. S. Wyer Jr., “An investigation of the relations among probability estimates,” Organizational Behavior and Human Performance, vol. 15, no. 1, pp. 1–18, 1976.
- R. M. Hogarth and H. J. Einhorn, “Order effects in belief updating: the belief-adjustment model,” Cognitive Psychology, vol. 24, no. 1, pp. 1–55, 1992.
- A. Tversky and E. Shafir, “The disjunction effect in choice under uncertainty,” Psychological Science, vol. 3, no. 5, pp. 305–309, 1992.
- E. M. Pothos and J. R. Busemeyer, “A quantum probability explanation for violations of “rational” decision theory,” Proceedings of the Royal Society B, vol. 276, no. 1665, pp. 2171–2178, 2009.
- J. S. Trueblood and J. R. Busemeyer, “A quantum probability account of order effects in inference,” Cognitive Science, vol. 35, no. 8, pp. 1518–1552, 2011.
- J. S. Trueblood and J. R. Busemeyer, “A quantum probability model of causal reasoning,” Frontiers in Cognitive Science, vol. 3, article 138, 2012.
- J. R. Busemeyer and P. D. Bruza, Quantum Models of Cognition and Decision, Cambridge University Press, New York, NY, USA, 2012.
- R. T. A. Croson, “The disjunction effect and reason-based choice in games,” Organizational Behavior and Human Decision Processes, vol. 80, no. 2, pp. 118–133, 1999.
- J. R. Buesmeyer, M. Matthew, and Z. Wang, “Quantum game theory explanation of disjunction effects,” in Proceedings of the 28th Annual Conference of the Cognitive Science Society, pp. 131–135, 2006.
Copyright © 2014 Yuhuang Hu and Chu Kiong Loo. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.