Abstract

When presented with multiple choices, we all have a preference; we may suffer loss because of conflicts arising from identical selections made by other people if we insist on satisfying only our preferences. Such a scenario is applicable when a choice cannot be divided into multiple pieces owing to the intrinsic nature of the resource. Earlier studies examined how to conduct fair joint decision-making while avoiding decision conflicts in terms of game theory when multiple players have their own deterministic preference profiles. However, probabilistic preferences appear naturally in relation to the stochastic decision-making of humans, and therefore, we theoretically derive conflict-free joint decision-making that satisfies the probabilistic preferences of all individual players. To this end, we mathematically prove conditions wherein the deviation of the resultant chance of obtaining each choice from the individual preference profile (loss) becomes zero; i.e., the satisfaction of all players is appreciated while avoiding conflicts. Further, even in scenarios where zero-loss conflict-free joint decision-making is unachievable, we present approaches to derive joint decision-making that can accomplish the theoretical minimum loss while ensuring conflict-free choices. Numerical demonstrations are presented with several benchmarks.

1. Introduction

Since the industrial revolution, we have witnessed important technological advances; however, we are currently faced with issues such as the finiteness of resources, which has led to the need for developing strategies of competition or resource sharing [1, 2]. We need to consider the choices of other people or any entities and those of ours. In this regard, decision conflicts are a critical issue that needs to be addressed. For example, assume that many people or devices attempt to connect to the same mobile network simultaneously. In this case, the wireless resources available per person or device will become significantly smaller, and it will result in poor communication bandwidth or even zero connectivity attributable to congestion [3]. This example considers a case in which an individual suffers from choice conflict with others. Here, the division of resources among entities is allowed, but in other cases, a separable allocation to multiple entities is not permitted. As another example, we consider a draft in a professional football league. Each club has specific players to pick in the draft; however, in principle, only a single club can sign with one player. Thus, decision-making conflicts must be resolved. These examples highlight the importance of achieving individual satisfaction while avoiding conflict.

Sharpley and Scarf proposed the top trading cycle (TTC) allocation algorithm [4] to address this problem. A typical example is a house-allocation problem; each of the students ranked the houses in which they wanted to live from a list of houses. In this scenario, the TTC allocates one house to each student. The solution provided by the TTC is known to be game-theoretic core-stable; that is, arbitrary numbers of students can swap houses with each other and still not obtain a better allocation than the current one. There are many other mechanisms that originate in TTC, which include those that allow indifference in preference profiles [57] and those where agents can be allocated fractions of houses [8].

Although TTC works well when players have deterministic rankings, humans or artificial machines can also have probabilistic preferences. That is, unlike the house-allocation problem, an agent with probabilistic preferences would be unsatisfied by always receiving its top preference. Instead, over repeated allocations, the distribution of outcomes and their probabilistic preferences should be similar. Such probabilistic preferences can appear naturally in relation to the stochastic decision-making of humans. The multiarmed bandit (MAB) problem [9, 10] encompasses probabilistic preferences.

In the MAB problem, one player repeatedly selects and draws one of the several slot machines based on a certain policy. The reward for each slot machine is generated stochastically based on a probability distribution that the player does not know a priori. The MAB problem aims to maximize the cumulative reward obtained by a player under such circumstances. First, the player draws machines, which include those that do not generate many rewards, to accurately estimate the probability distribution of the reward that each machine follows; this is referred to as exploration. However, the extensive drawing of the highest reward probability machine increases the cumulative reward after the player is confident in the estimation. This is known as exploitation. To successfully solve the MAB problem, it is necessary to strike a balance between exploration and exploitation, especially in a dynamically changing environment; this is called the exploration-exploitation dilemma [11].

An example of probabilistic preferences can be observed in the context of the MAB problem. The softmax algorithm, a well-known stochastic decision-making method, is an efficient method to solve this problem [12]. If the empirical reward for each machine at time is , the probability that a player selects machine isHere, represents the temperature, which controls the balance between exploration and exploitation. Thus, the player has a probabilistic preference at each time step about which machine to select and with what percentage.

Furthermore, we can extend the MAB problem to a case involving multiple players, which is called the competitive MAB problem [13, 14]. In a competitive MAB problem, when multiple players simultaneously select the same machine and the machine generates a reward, the reward is distributed among the players. The aim is to maximize the collective rewards of players while ensuring equality [15]. Decision conflicts inhibit players from maximizing the total rewards in such scenarios.

For example, wireless communications suffer from such difficulties, wherein the simultaneous usage of the same band by multiple devices results in the performance degradation of individual devices, as shown in Figure 1. Players make a probabilistic decision to explore and exploit simultaneously because they do not know the speed of each bandwidth in advance (here, we assume that the speed changes probabilistically to reflect an uncertain environment). However, the line is shared if they choose the same bandwidth, and there is a decrease in the transmission speed.

For solving competitive MAB problems, Chauvet et al. theoretically and experimentally demonstrated the usefulness of quantum entanglement to avoid decision conflicts without direct communication [16, 17]. A powerful aspect of entanglement is that the optimization of a reward by one player leads to the optimization of the total rewards for the team; this is not easily achievable because if everyone attempts to draw the best machine, it can lead to decision conflicts that will diminish the total rewards in the competitive MAB problem. Further, Amakasu et al. proposed utilizing quantum interference such that the number of choices can be extended to an arbitrary number while perfectly avoiding decision conflicts [18]. A more general example is multiagent reinforcement learning [19, 20]. In this case, multiple agents can learn individually and make probabilistic choices while balancing exploration and exploitation at each time step. Successfully integrating these agents and making cooperative decisions without selection conflicts can help accelerate learning [21].

Given such motivations and backgrounds, it is important to clarify how the probabilistic preferences of individual players can be accommodated while avoiding choice conflicts. The question is whether it is possible to satisfy all player preferences while eliminating decision conflicts; if so, what are the conditions for realizing such requirements?

This study theoretically clarifies the condition under which joint decision-making probabilities exist such that the preference profiles of all players are satisfied perfectly. In other words, the condition that provides the loss (the deviation of the selection preference of each player from the resulting chance of obtaining each choice determined via the joint decision probabilities) becomes zero is clearly formulated. In addition, we derive joint decision-making probabilities that minimize the loss, even when such a condition is not satisfied.

A related problem of satisfying the needs of multiple players is a type of game theory called satisfaction games [22, 23]. Here, a minimum utility threshold is defined for each player, and a joint strategy is defined as a satisfaction equilibrium if all players obtain utility above the threshold. Like in other game theories, the payoff and reward or utility for each player can be defined by joint actions in the satisfaction games; however, in our study, there is no such external payoff. Instead, we define satisfaction as the correspondence between players’ probabilistic preferences and the selection probability of each choice as a result of joint selection. In addition, players learn the optimal action by observing the outcomes of the environment and updating their actions in satisfaction games. However, in our study, we consider joint selection resulting from deliberation by players cooperating with each other rather than learning over multiple trials.

The consensus reaching process is another field that is closely related to our research [2426]. In this area, consensus reaching is achieved by aggregating the preferences of multiple individual players using simple methods, such as weighted sums, to obtain a collective preference, and then, by calculating the distance between the collective preference and individual preferences and by adjusting the individual preferences through feedback [27]. The players continue this loop until they reach consensus. Meanwhile, our study focuses not on modifying individual preferences but on the aggregation of the preferences of multiple players in this process.

2. Theory

2.1. Problem Formulation

This section introduces the formulation of the study problem. To begin with, we examine cases where the number of players or agents is two: Player A and Player B. There are choices for each player; these choices are called arms in the literature on the MAB problem, where is a natural number greater than or equal to two.

Each player selects a choice probabilistically depending on his/her preference probability. Here, the preference of Player A is represented by

Similarly, the preference of player B is given by

These preferences have the typical properties of probabilities:

The upper side of Figure 2 schematically illustrates a scenario where players A and B each have their own preferences in a four-arm case.

Now, we introduce the joint selection probability matrix, which is given as

The nondiagonal element , with , denotes the probability that player A selects arm and player B selects arm . The diagonal terms are all zero because we consider nonconflict choices. Here, the summation of over all s and s is unity and .

Our interest is to find that meets the preferences of both players A and B. We formulate the degree of satisfaction of the players to incorporate this perspective as indicated in the following way. By summing up s row-by-row in the joint selection probability matrix, a preference profile given byis obtained. represents the probability that player A selects arm as a result of the joint selection probability matrix . We call “satisfied preference” (the link with preference will be discussed hereafter).holds based on the definition of the joint selection probability matrix. This structure is illustrated by the red-shaded elements in the second row of the joint selection probability matrix shown in Figure 2.

Similarly, satisfied preferences along the columns of represent the probability that player B selects arm because of the joint selection probability matrix .

Our aim is to find the optimal , whereinholds for all and ; that is, the players’ preferences are the same or close to the satisfied preferences .

We define loss akin to an -norm to quantify the degree of satisfaction.which comprises the sum of squares of the gap between the preferences and the satisfied preferences of the two players.

We consider the scenario shown in Figure 2 as an example. The preferences of players A and B are

Now, we consider the following joint selection probability matrix .

By taking the sum of the first row, we can calculate the probability that player A chooses arm 1 as a result of this matrix.

Similarly, if we calculate all satisfied preferences, we obtain

All preferences are equal to the original preferences of the players, which implies that loss is zero. In other words, this joint selection probability matrix completely satisfies the preferences of the player.

In the following sections, we prove that the loss defined previously can be zero; in other words, perfect satisfaction for the players can indeed be realized under certain conditions. In addition, even in cases where zero loss is not achievable, we can systematically derive a joint selection probability matrix that ensures a minimum loss.

Now, we define the popularity for each arm and propose three theorems and one conjecture based on the values of .

Definition 1. Popularity is defined as the sum of the preferences of players A and B for arm .Because preferences and are probabilities, it holds thatHereafter, the minimum loss is denoted by .Meanwhile, the loss function can be defined in other forms such as the Kullback–Leibler (KL) divergence, which is also known as relative entropy [28]. Extended discussions on such metrics should be considered in future studies; we consider that the fundamental structure of the problem under study will not change significantly.
Key notations adopted in this paper are summarized in Table 1.

2.2. Theorem 1
2.2.1. Statement

Theorem 1. We assume that all popularities are smaller than or equal to one. Then, it is possible to construct a joint selection probability matrix that makes the loss equal to zero.

2.2.2. Auxiliary Lemma

To prove Theorem 1, we first prove a lemma in which the problem settings are modified slightly. In the original problem, we treated the preference of each player as a probability. In this modified problem, however, their preferences do not have to be probabilities as long as they are nonnegative, and the sum of the preferences for each player is given by a constant , which we refer to as the total preference. Their preferences are called and . We will put a hat over each notation to avoid confusion about which definition we are referring to between the original and modified problems.We still refer toas a joint selection probability matrix, although technically, each is no longer a probability but a ratio of each joint selection occurring, and the sum of all entries is , instead of 1. The other notations are defined in a manner similar to the original problem.

Lemma 1. We assume that all popularities are smaller than or equal to the total preference . Then, it is possible to construct a joint selection probability matrix that makes the loss equal to 0.

Theorem 1 is a special case of Lemma 1 when .

2.2.3. Outline of the Proof

We prove Lemma 1 using mathematical induction. In Section 2.2.4, we show that Lemma 1 holds for and . In Sections 2.2.5 and 2.2.6, we suppose that Lemma 1 has been proven for arms, and then, we show that Lemma 1 also holds for arms. In Sections 2.2.5, we assume the existence of s that satisfy certain conditions and prove that perfect satisfaction for the players is achievable using these s. Then, in Section 2.2.6, we verify their existence.

2.2.4. When and When

When ,is the only case in which the assumption of Lemma 1, i.e., is fulfilled. Due to the constraint (21),

In this case,makes the loss equal to zero.

When , the loss becomes zero when we set the values in the joint selection probability matrix as

The satisfied preferences for players A and B via result in and , which perfectly match the preferences of players A and B, respectively.

However, all elements in must be nonnegative.

To summarize these inequalities, the following inequality holds:

If a nonnegative that satisfies (30) exists, given in (28) is a valid joint selection probability matrix, which makes the loss zero. In other words, if we can prove that , which is in the range of (30), exists, regardless of and , Lemma 1 holds for .

To prove that a nonnegative that satisfies (30) always exists, we consider the following three cases depending on the value of the left-hand term of (30). In each case, we show that the three candidates for the right-hand term of (30) are greater than the left-hand term.

[Case 1] When .

are evident from the definitions in (21). In addition, holds because of the assumption of lemma . Thus,[Case 2] When .

First, the definition (21) guarantees the following:Next, using the assumption ,Finally,Therefore,[Case 3] When .

implies the following equation:

In addition, definition (21) guarantees . Finally,holds because . Hence,

From (31), (35), and (38) for any ,

Therefore, it is evident that that satisfies (30) exists. Hence, given in (28) is a valid joint selection probability matrix that makes the loss zero.

2.2.5. Induction

In this and the following sections, we suppose that Lemma 1 has been proven to hold when there are arms . Here, we show that Lemma 1 also holds for case. In the following argument, the arm with the lowest is denoted as arm .

Now, we make the following assumption, which focuses only on the values in the th row or th column.

Assumption 1. There exist that satisfy all of the following conditions (41)–(45).
The sum of the th row is equal to .The sum of the th column is equal to .The sum of the th column without the th row is nonnegative.The sum of the th row without the th column is nonnegative.In the gray-shaded area of the joint selection probability matrix in (48), all remaining popularities are less than or equal to the remaining total preference. We note that .First, we suppose that Assumption 1 is valid and show the way to construct a joint selection probability matrix that makes the loss zero, using these s. Later, we will prove that Assumption 1 is indeed correct.
Here, we show that, using the s assumed to exist in Assumption 1, we can make the loss equal to zero. From conditions (41) and (42), , which are terms in the definition of regarding arm , are zero.Next, we consider the loss for the remaining part of the joint selection probability matrix in the gray-shaded region described in (48), which is defined as represents the loss corresponding to a problem in which the preference setting is described in Table 2. and are defined as follows:Now, we prove that the optimal is zero. and fulfill the requirements for the preference; that is, the preferences need to be nonnegative because s and s follow (43) and (44) and the total preferences for both players are the same. In this section, we suppose we have proven that Lemma 1 holds when there are arms; furthermore, because of (45), all remaining popularities are smaller than or equal to the remaining total preference . Thus, Lemma 1 in the case of arms verifiesFrom (46) and (50), with and the optimal s for the gray-shaded region,Therefore, if Assumption 1 is correct, Lemma 1 holds for arms.

2.2.6. Verification of Assumption 1

In this section, we prove that Assumption 1 is correct. We refer to the arm with the highest as arm .

First, by contradiction, we prove that there is at most one arm that violatesand if there is such an arm, arm is the one that breaks (53). Note that all the other arms, which satisfy (53), follow (45).

We assume that there are two arms that do not satisfy (53) (we call them arm and arm ), and we prove that this assumption leads to a contradiction.

If we add each side,

As and ,

Together with (55), we obtain the contradiction

Therefore, by contradiction, at most one arm violates (53). Such an arm followsand has the highest since other popularities are no bigger than . We call it arm so that .

Now, we can systematically determine s in the th row or the th column to show that Assumption 1 is correct; that is, s fulfill the conditions (41)–(45). We cannot arbitrarily fill in the values in the th row or the th column such that their sum is and , respectively; instead, we must consider the preferences of the other arms. (43)–(45) give us boundaries for s and s to exist. We must not only ensure that and do not exceed these boundaries, but we must also ensure that a sufficient probability is assigned to the most popular arm ; otherwise, the left-hand side of (45) can sometimes be greater than the right-hand side for arm . We consider the following three cases depending on the size relation of . For each case, it is possible to construct the optimal that satisfy all conditions (41)–(45). Note that these three cases are collectively exhaustive.Case 1. and Case 2. Case 3.

As a reminder, arm represents the arm with the lowest popularity and arm represents the arm with the highest popularity. For detailed construction procedures, see Appendix A.

The previous three cases prove that which satisfy (41)–(45) always exist. In other words, Assumption 1 is indeed correct. Therefore, if we assume that Lemma 1 has been proven to hold for arms, then Lemma 1 holds when there are arms.

Hence, using mathematical induction, Lemma 1 holds for any number of arms.

2.3. Theorem 2
2.3.1. Statement

Here, we return to the original problem, where the preferences of players are probabilities. Popularity can still be greater than one because it is the sum of the preferences for each arm.

Theorem 2. If any value of is greater than one, it is impossible to make the loss equal to zero, and the minimum loss is given by

Note that . In a case where the th arm is the most popular, i.e., , the following joint selection probability matrix is one of the matrices that minimize the loss:

2.3.2. Outline of the Proof

We can prove that the loss function is convex by showing that its Hessian is semidefinite. Appendix B provides the details of the proof. In Section 2.3.3, we rewrite the problem into an optimization problem with equality and inequality constraints and derive s that satisfy the Karush–Kuhn–Tucker conditions [29, 30] (hereinafter, KKT conditions). In a convex optimization problem, where the objective function and all constraints are convex, points that satisfy the KKT conditions provide global optima [31]. Therefore, the optimal joint selection probability matrix comprises the aforementioned s.

2.3.3. KKT Conditions

We derive the optimal joint selection probability matrix and calculate the minimum loss. The flattened vector of a joint selection probability matrix is defined as

Functions and are defined as

The problem of minimizing the loss while satisfying the constraints can be written as

Here, is the loss corresponding to . Because the objective function and constraints are all convex, that satisfies the following KKT conditions yields the global minimum.

The following parameters satisfy all conditions described in (64).

See Appendix C for the verification of each condition.

Hence,provides a global optimum. The minimum loss is

Note that is an example of the global optima, and there could be other matrices that provide the same minimum loss.

Therefore, Theorem 2 was proven.

2.4. Theorem 3

Thus far, we presented two theorems concerning two players and arms. In this section, we consider the case where players exist and where is greater than two. Let the players be called players A, B, C, etc. Moreover, the preference of player for arm is denoted by . As in the original problem,holds. The sum of the preferences for each arm is denoted as popularity:

We define as the arm index selected by the th player. Then, represents the joint selection probability of the th player selecting arm . The collection of joint selection probabilities is called the joint selection probability tensor. The satisfied preference or resultant selection probability for player comprises the sum of all probabilities of cases in which the th player selects arm :

The loss is defined as the sum of the squares of the gap between the preferences and the satisfied preferences.

Here, the th player is called player .

2.4.1. Statement

Theorem 3. Suppose that there are players and arms. If any is greater than one, it is not possible to make the loss equal to zero.

2.4.2. Proof by Contradiction

We suppose without loss of generality that . We first assume that it is possible to make the loss equal to zero and then prove that this leads to a contradiction. If the loss becomes zero, the following are required:

Terms that appear on the right-hand side of (72) are not identical to those on the right-hand side of (73) because in (73). Similarly, any term on the right-hand side of (72) is not identical to a variable on the right-hand side of all equations (73) and (74) . Therefore, if we consider the sum of the right-hand sides of (72)–(74), , it should be smaller than or equal to the sum of all elements in the joint selection probability tensor, which is 1.

In contrast, if we take the sum of the left-hand sides of (72)–(74), , we get . Thus, by comparing both sides, we obtain . This result contradicts the assumption that .

Hence, with a proof by contradiction, Theorem 3 holds for any number of players and arms.

2.5. Conjecture

The construction method introduced in Theorem 1 is expected to be applicable to general players. Here, we propose the following conjecture.

Conjecture 1. Suppose that there are players and arms. If all the values of are less than or equal to one, it is possible to make the loss equal to zero.

This conjecture seems true; however, thus far, the proof has not been completed in generality and is left for future studies.

3. Numerical Demonstrations

This section introduces several baseline models that output a joint probability selection matrix for the probabilistic preferences of two players. Then, we show the extent to which the loss can be improved by using the construction method of the optimal joint probability selection matrix introduced in Theorems 1 and 2 (henceforth, “the optimal satisfaction matrix”). The definitions of the notations, such as preference and loss, follow those in Section 2.1.

3.1. Baselines
3.1.1. Uniform Random

In what we call the “uniform random” method, the resulting joint selection probability matrix is such that all elements are equal except for the diagonals, which are filled with zeros. In other words, decision conflicts never occur; however, the selection is determined completely randomly by the two players. If we consider the preference settings shown in Table 3, the output of this method is

3.1.2. Simultaneous Renormalization

In this method, the product of the preferences of both players is considered first. Then, the diagonals where decision conflicts occur are modified to zero, and finally, the whole joint selection probability matrix is renormalized so that the sum is 1. The formula used is

The joint selection probability matrix generated by this simultaneous renormalization method for the case in Table 3 is

3.1.3. Random Order

In what we call the “random order” method, the players first randomly determine in which order they will draw arms. This was inspired by the random priority mechanism proposed by Abdulkadiroğlu and Sönmez where preferences are deterministic [32]. Instead, we consider probabilistic preference profiles. Each player selects an arm based on a predetermined order; however, the arms already drawn by the previous players cannot be selected again. Thus, the selection probabilities for those arms are set to zero. Under the settings in Table 3, if we want to calculate , two possible orders are considered. In the first case, player A draws first. In this case, the joint selection probability is

In the other case, player B draws first, and the probability is

Therefore, considering the average,

Similarly, we can calculate all joint selection probabilities, and the resulting matrix is

3.2. Performance Comparison

We compare the loss of uniform random, simultaneous renormalization, random order, and the optimal satisfaction matrix. Four preference settings are examined to evaluate the performance of each method. Namely, the degree of satisfaction with the players’ preferences is investigated by comparing the loss . is used as a normalization term to ensure that the sum of the preference is one.(1)Arithmetic progression and same preference.(2)Modified geometric progression with common ratio 2 and same preference.(3)Modified geometric progression with common ratio 2 and reversed preference.(4)Geometric progression with common ratio 3 and same preference.

In cases (1)–(3), the optimal satisfaction matrix achieves because , whereas in case (4), it is impossible to achieve because

For , the following numbers are used.

Figure 3 summarizes loss as a function of the number of arms accomplished by each method.

In all cases, the optimal satisfaction matrix performs the best, followed by random order, simultaneous renormalization, and uniform random. The result in case (1) shows that loss decreases with an increase in the number of arms for all methods. This trend is because our choice of loss is similar to the -norm. The absolute value of each preference decreases with an increase in the number of arms.

In the real world, the ratio of preference settings in case (2) is likely to appear more frequently than in case (1). For uniform random, simultaneous renormalization, and random order, the loss increases with the number of arms when the two players have the same preference. In contrast, the results show that the optimal satisfaction matrix consistently achieves 0-loss, and this underlines the importance of the construction method of the optimal joint selection probability matrix in the real world, where there are a vast number of choices.

The result in case (3) shows that simultaneous renormalization and random order exhibit good accuracy together with the optimal satisfaction matrix. This result is due to the fact that the probability of decision conflicts happening is significantly smaller in this case (3), where the players have strong, reversed preferences. The diagonal terms are renormalized in simultaneous renormalization, which perturbs the other terms in the joint selection probability matrix. In case (3), these diagonal terms are smaller than in the other cases; therefore, perturbations to the other terms are reduced. In random order, the second player sets his/her preference for the arm drawn by the first player to zero; however, in the setup of case (3), this preference tends to be small because the first player is more likely to select an arm with a higher preference, which is an unfavored arm for the second player. This implies that the second player can select an arm based on a preference almost equal to his/her original preference.

In case (4), is greater than 1, and thus, the optimal satisfaction matrix also does not achieve 0-loss. However, at approximately , the loss for random order is approximately 1.2 times smaller than the loss for simultaneous renormalization, whereas the optimal loss is almost twice as small as the loss for random order.

4. Conclusion

In this study, we theoretically examined how to maximize the satisfaction of each player by designing joint selection probabilities properly while avoiding decision conflicts when there are multiple players with probabilistic preferences. The present study demonstrated how to accomplish conflict-free stochastic decision-making among multiple players, wherein the preference of each player is highly appreciated. We clarified the condition when the optimal joint selection probabilities perfectly eliminate the deviation of the resulting choice selection probabilities and the probabilistic preference of the player in two player, -choice situations, which leads to what we call zero-loss realizations. Furthermore, even under circumstances wherein zero loss is unachievable, we showed how to construct, what we call, the optimal satisfaction matrix, and whose joint selection probabilities minimize the loss. Furthermore, we generalized the theory to -player situations and proved the conditions under which zero-loss joint selection was impossible. In addition, we numerically demonstrated the impact of the optimal satisfaction matrix by comparing several approaches that could provide conflict-free joint decision-making. The situation addressed in this study is very generic and can be widely applied to scenarios such as communication technology, multiagent reinforcement learning, and resource allocation, where multiple players make probabilistic choices, and choosing the same option is detrimental.

There are still many interesting future topics, including the mathematical proof of the conjecture shown in this study, which discusses the condition of zero-loss conflict-free joint stochastic selection involving an arbitrary number of players. The detailed analysis of situations when we replace loss with other metrics such as KL divergence also remains to be explored. Furthermore, extending the discussions to external environments and not just player satisfaction will be important for practical applications. This study paves the way for multiagent conflict-free stochastic decision-making.

Appendix

A. Construction Procedures for Assumption 1

In the induction step presented in Section 2.2, we assumed the existence of s that satisfy certain conditions, and we proved that perfect satisfaction is achievable using these s. Here, we prove this assumption, namely, Assumption A.1 is true.

Assumption A.1. There exist that satisfy all of the following conditions (A.1)–(A.5).
The sum of the th row is equal to .The sum of the th column is equal to .The sum of the th column without the th row is nonnegative.The sum of the th row without the th column is nonnegative.In the gray-shaded area of the joint selection probability matrix as follows, all remaining popularities are less than or equal to the remaining total preference. Note that . Now, we can systematically determine s in the th row or the th column of the joint selection probability matrix to show that Assumption A.1 is correct; that is, these s fulfill the conditions in (A.1)–(A.5). We cannot arbitrarily fill in the values in the th row or the th column such that their sum is and , respectively; instead, we must consider the preferences of the other arms. (A.3)–(A.5) give us boundaries for s and s to exist. Not only must we ensure that and do not exceed these boundaries, but we must also ensure that a sufficient probability is assigned to the most popular arm ; otherwise, the left-hand side of (A.5) can sometimes be greater than the right-hand side for arm . We consider the following three cases depending on the size relation of . For each case, it is possible to construct the optimal that satisfy all conditions (A.1)–(A.5). These three cases are collectively exhaustive.[Case 1] and [Case 2] [Case 3] Arm represents the arm with the lowest popularity, and arm is the arm with the highest popularity. All arms except for arm satisfy the following condition:[Case 1] and The following s on the th row or th column satisfy the conditions (A.1) and (A.2).In addition, (A.3) and (A.4) are satisfied because and in this particular case. Moreover, (A.5) is satisfied because for ,which leads toFor the other s, (A.5) follows because they satisfy (A.6), and (A.6) is equivalent to (A.5) when .Hence, s described in (A.7) fulfill all conditions (A.1)–(A.5) in [Case 1].Table 4 illustrates an example of a case where and . Here, the most popular arm is arm 1 and the least popular arm is arm 4. In this case, the first row and first column of the joint selection probability matrix should be filled as Then, the remaining gray-shaded region should satisfy the preference setting described in Table 5. Each popularity is less than or equal to the total preference in the remaining part, i.e., the assumption of Lemma 1 holds for this part.[Case 2] In this case, as , it follows thatWhen , let be the arm index satisfying the inequalityWhen , we define . always exists becauseThen,satisfy conditions (A.1) and (A.2). Moreover, s in (A.14) fulfill (A.3) because for ,For , the definition of satisfies the condition, which implies thatFor ,Furthermore, with described in (A.14), condition (A.4) is also satisfied because for , (A.11) verifiesand for the other ,Finally, (A.5) is satisfied because for , is always nonnegative because . (A.20) is equivalent toThe other s follow (A.6), and with the prerequisites , it follows (A.5).Therefore, s given in (A.14) satisfy all conditions (A.1)–(A.5) in [Case 2]Table 6 shows an example of [Case 2] Here, the most popular arm is arm 1 and the least popular arm is arm 4. In this case, becauseTherefore, the first row and first column of the joint selection probability matrix should be filled as Then, the remaining gray-shaded region should satisfy the preference setting described in Table 7. Each popularity is less than or equal to the total preference in the remaining part, i.e., the assumption of Lemma 1 holds for this part.[Case 3] This case is similar to [Case 2] If we swap and in the discussion of [Case 2], we obtain s that satisfy all conditions (A.1)–(A.5).

B. Convexity of loss

When we derived the optimal joint selection probability matrix in Section 2.3 of the main text, we used the fact that the loss function is convex. Here, we prove that the loss function is convex.

We prove that each and are convex, and we know that is convex because the sum of convex functions is also convex.

The first term of is defined as

The Hessian matrix for in terms of all s iswhere

As this is a block diagonal matrix, the eigenvalues of are the union of the eigenvalues of and . The eigenvalues of are obviously , and those of are .

Therefore, all eigenvalues of are nonnegative, which means that is positively semidefinite, and this implies that is convex. Similarly, each and is convex because of the symmetry of the loss function, given that is not unique among all s. From the previous argument, we know that loss function is convex.

C. Verification of the KKT Conditions

In Section 2.3.3 of the main text, where we derived the point which satisfies all the KKT conditions, we defined new notations and functions. The flattened vector of a joint selection probability matrix is defined as

Functions and are defined as

The problem can be written as

represents the loss corresponding to . Because the objective function and constraints are all convex, that satisfies the KKT conditions as follows gives the global minimum.

Here, we show thatsatisfy (C.4)–(C.7).

Condition C.1. Stationarity (C.4).
In the following argument, we call the element in each vector, which is in the same dimension as “the th element” in a vector.
From the definition of , the th element of , denoted by , isWith described in (C.11),In addition, with and described in (C.10) and (C.11), the th element of isMoreover, the th element of isTherefore, stationarity is successfully achieved with , , and given in (C.8)–(C.11).

Condition C.2. Complementary slackness (C.5).
From (C.10) and (C.11) in both cases where ,

Condition C.3. Dual feasibility (C.6).
Because , is positive and so is . Thus,

Condition C.4. Primal feasibility (C.7).
Because , it follows that and are nonnegative. Then,Furthermore,Therefore,holds.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Disclosure

A preprint was published in [33].

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported in part by the CREST project (JPMJCR17N2) funded by the Japan Science and Technology Agency, Grants-in-Aid for Scientific Research (JP20H00233), and Transformative Research Areas (A) (22H05197) funded by the Japan Society for the Promotion of Science. AR was funded by the Japan Society for the Promotion of Science as an International Research Fellow.