The term “learning” is often used to refer to a generally stable behavioral change resulting from practice. However, it is a fundamental biological capacity far more developed in humans than in other living beings. In an animal or human being, the learning phase may often be viewed as a series of choices between multiple possible reactions. Here, we analyze a specific type of human learning process related to gambling in which a subject inserts a poker chip to operate a two-armed bandit device and then presses one of the two keys. Through the use of an electromagnet, one or more poker chips are given to the individual in a container located in the apparatus’s center. If a chip is provided, it is declared a winner; otherwise, it is considered a loser. The goal of this paper is to look at the subject’s actions in such situations and provide a mathematical model that is appropriate for it. The existence of a unique solution to the suggested human learning model is examined using relevant fixed point results.

1. Introduction

Learning is a fundamental biological capacity that is much more evolved in humans than in any other living being. The central topic in learning philosophy is how multiple forms of learning take place in a human brain and body since this was explicitly formulated in the discipline of learning psychology, but with additional feedback from other psychological disciplines and the adjacent areas of sociology, pedagogy, and biology, including contemporary brain science.

In modern mathematical learning experiments, the researchers concluded that a basic learning experiment was compatible with any stochastic process. Thus, it is not a novel concept (for detail, see [1]). However, after 1950, two critical features emerged mainly in the research initiated by Bush, Estes, and Mosteller. Firstly, the learning method egalitarian essence was a core feature of the developed model. Secondly, these frameworks were studied and applied in areas that did not conceal their quantitative aspects.

Several studies on human actions in probability-learning scenarios have produced different results (for the detail, see [25]).

In 2019, Turab and Sintunavarat [6, 7] proposed a functional equation to examine the experimental work of Bush and Wilson [8] on a paradise fish. In this experiment, a fish was given two options for swimming. The fish had options to swim on either side (right or left) of the tank’s far end.

In [9], the authors recently addressed a kind of traumatic avoidance learning experiment for normal dogs suggested by Solomon and Wynne [10]. They examined the psychological responses of 30 dogs enclosed in a small steel grid cage and proposed a mathematical model. The suggested avoidance learning model’s existence and uniqueness of a solution result were investigated using the appropriate fixed point method.

For the research in this area, especially related to the two-choice behavior, we refer to [1113] and the references therein. It is worth noting that most animal behavior studies in a two-choice situation discussed above have focused only on the animals’ approach toward an inevitable conclusion. Bush and Wilson [8], on the other hand, divided such responses into four categories depending on the food source and side chosen (right-reward, right nonreward, left reward, and left nonreward).

In this work, by following the work presented by Turab and Sintunavarat [6, 9] and the idea discussed in [8, 14], our aims are to discuss the two-armed bandit experiment proposed by Goodnow and Pettigrew [15] and propose a convenient mathematical model. We evaluate our findings under the experimenter-subject controlled events to see the feasibility of the suggested model. The existence of a unique solution to the proposed model is examined by using the appropriate fixed point theorem. In the end, we raise some open problems for the interested readers.

2. A Two-Armed Bandit Experiment

In [15], Goodnow and Pettigrew presented an experiment related to the gambling theory. This gambling activity involves playing a poker game with chips worth one penny each (see Figure 1). The subject (S) is given 200 chips by an experimenter (E). He/She inserts into the machine one of these chips and pushes one of two buttons. A chip drops into the payout box with a clatter of noise when the bet is successful. The payoff box has a glass face, and the heap of chips he/she has won can be seen by S. The subject is not permitted until the end of the experiment to carry the chips out of this box. Whatever the outcome of the bet, between each test, the machine becomes unusable for several seconds, and S wait until two signal lights and a loud buzz appear, indicating that the device is ready to take the next bet. The apparatus is fully programmed such that inserting a chip before the device’s ready is useless for S.

When the subject S implants a chip (upper center light) and clicks a key (left or right lower), the lights on the face of the machine flash on successively (upper outer lights in Figure 1). These lights are parallel to the control machine’s lights controlled in an adjacent space by E. A master switch to turn the device on or off is also included in the control machine, along with a key that allows the machine to eject a chip into the pay-off box when pushed. The one-way mirror enables E from the control room to view S’s activities.

2.1. Procedure

The assignment’s method and directions were given to S and E. The S was instructed that he/she is playing for cash and that he/she would be paid for the discrepancy between the number of wins and losses. There were 120 trials allowed for every S, divided into 12 blocks of 10 trials each. The probability of the above task was 50 : 50, 70 : 30, and 90 : 10. When the experiment is completed, S was asked the following questions: (1)How did you decide which alternative you should choose?(2)How he/she thought about the strategy of always betting on one key?

2.2. Results

The results were described in terms of the average proportion of choices of one alternative: pushing the ‘left button’ in the gambling experiment provided the greater likelihood of these alternatives outside the 50 : 50 scenario. In Table 1, the findings are presented.

3. Mathematical Modeling of the Two-Armed Bandit Experiment

In the above experiment, significant interest lies in the behavior of a subject S; press right or left button, ` or `,’ and get the reward in terms of a poker chip. In our view, if a subject chooses the reward side, there would be an occurrence of alternative and if the subject made a move to the other side, then there will be an occurrence of alternative . Thus, according to the mathematical point of view, there would be four possibilities of events, depending on the action of the subject and the reward. These events are listed in Table 2.

Depending on the action of the subject and getting the chance of the reward, we have the following four events (see Table 3).

The probability of the outcomes and are and , respectively, where ]. The experimental pattern asks for the outcomes of the responses (whether the subject get the reward or not), trials’ fixed proportion of ]. Therefore, we get the event probabilities stated below (see Table 4).

We define as the learning rate parameters and their values can be recognized as a measure of the ineffectiveness of the corresponding events in altering the response probability.

If, on some trial, is the possibility of response with outcome and is fulfilled, the next possibility of with outcome will be , and if is achieved with outcome then the new probability would be with the event probability Similarly, if is performed with outcomes and then the new probabilities of are and , with the event probabilities and , respectively. For the four events , we can define the transition operators ] as for all ].

By considering the work presented in [6, 8, 9] and the above transition operators with their corresponding probabilities and events given in Table 4, we introduce the following functional equation, which can discuss all the aspects of the two-armed bandit model.

Fixed point theory, on the other hand, began in the second half of the nineteenth century as a method of using iterative estimations to demonstrate the existence and uniqueness of solutions to ordinary differential and integral equations. It is a wonderful combination of basic and applied analysis, geometry, and topology. A fixed point theoretic viewpoint can be seen in Picard’s work, which is a fundamental notion in the field of metric fixed point theory. Nevertheless, it is credited to the Polish mathematician “Banach,” who abstracted the underlying principles into a framework that can be applied to find the existence of a unique solution to the broad range of applications beyond differential and integral equations. It has been extended and generalized in numerous directions (for the detail, see [1618]). We suggest the reader to see [1921] for further information on fixed point theory and its applications in various spaces.

The following stated outcome will be required in the progression.

Theorem 1 (see [22]). Let be a complete metric space and be a Banach contraction mapping (shortly, BCM), that is, for some and for all Then, has one fixed point. Furthermore, the Picard iteration in that can be defined as for all , where , converges to the unique fixed point of .

4. Existence and Uniqueness Results

We let For the rest of this article, represents the class with consisting of all real-valued continuous functions which satisfy the following relation

Clearly, is a Banach space with for all .

Following that, we can rewrite the functional equation (2) as where is an unknown function, .

Theorem 2. For and with where If there is a such that is -invariant, that is, , where is defined for each as for all then is a BCM.

Proof. Let . For each distinct points , we obtain By applying the definition of the norm (5), we obtain where is defined in (7). This gives that As a result of we can claim that is a BCM with the metric imposed by .
We get the following conclusion from Theorem 2 about the uniqueness of a functional equation (6)’s solution.☐☐

Theorem 3. The stochastic equation (6) has a unique solution with where is defined in (7). Assume that there is a such that is -invariant, that is, , where defined for each as for all Furthermore, the following iteration in defined by converges to the unique solution of (12).

Proof. We reach the conclusion of this theorem by combining the Banach fixed point theorem with Theorem 2.☐

The following corollaries arise from the preceding findings.

Corollary 4. For and with where If there is a such that is -invariant, that is, , where defined for each as for all then is a BCM.

Corollary 5. The stochastic equation (6) has a unique solution with where is defined in (7). Assume that there is a such that is -invariant, that is, , where defined for each as for all Furthermore, the iteration in () defined by converges to the unique solution of (12).

5. A Certain Case with Experimenter-Subject-Controlled Events

It has been highlighted that the examination of any experiment is truly based on suppositions. Therefore, experiments are classified into contingent and noncontingent, based on the occurrences of the results. It has been suggested that the correspondence of contingent experiments is for the events of experimental-subject (contingent) and noncontingent experiments are for the events of experimental control.

In the previous models on imitation problems such as T-maze experiments with fish and dog (see [6, 9]), it was already mentioned that such experiments required a contingent approach; the result of the trials was entirely dependent on the subject’s choice. Thus, such types of models required experimenter-subject-controlled events. The two responses and along with outcomes and are choosing the right or left side or pushing the right or left button, which coincides with rewarding and non-rewarding or correct and incorrect, respectively. Now we define the probabilities and which indicate the conditional probability of outcomes and of the given alternatives and respectively. With such conditions, we have the following Table 5.

We have the following functional equation from the data given above: where is an unknown function, and . We shall begin with the following finding.

Theorem 6. For and with where Assume that, if there is a such that is -invariant, that is, , where defined for each as for all then is a BCM.

Proof. Let . For each distinct points , we obtain By applying the definition of the norm (5), we obtain where is defined in (19). Thus, we have As a result of one can see that is a BCM.☐☐

For the unique solution of (18), we get the subsequent conclusion from Theorem 6.

Theorem 7. The stochastic equation (18) has a unique solution with Assume that, there is a such that is -invariant, that is, , where defined for each asfor all Furthermore, the iteration in defined by converges to the unique solution of (24).

Proof. The conclusion of this theorem can be found by combining Theorem 6 with the Banach fixed point theorem.☐☐

6. Conclusion

In this work, we have discussed a special type of stochastic process related to the two-armed bandit experiment [15] which plays a vital role in observing the subject’s behavior in a two-choice situation. We reviewed the operant’s responses under such conditions and provided a mathematical model for it. The Banach fixed point theorem was used to determine the existence of a unique solution to the two-armed bandit learning model. We investigated the proposed model’s adaptability by subjecting it to some controlled events. Moreover, the presented approach is straightforward and easy to verifiable. Thus, the proposed approach can be used to investigate more psychological learning experiments related to animals and humans in the future.

Now, for the interested readers, we propose the following open problems.

Question 1. Assume that if a subject does not press any button on a specific trial , how can we describe such an event by a model?

In the end, we also leave the stability problem (for the detail, see [2327]) of the stochastic equation given below as an open problem: where and is an unknown function.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

All authors contributed equally to the manuscript and typed, read, and approved the final manuscript.


This work was funded by the University of Jeddah, Saudi Arabia, under grant No. (UJ-21-DR-93). The authors, therefore, acknowledge with thanks the university technical and financial support.