Abstract

Photonic accelerators have been intensively studied to provide enhanced information processing capability to benefit from the unique attributes of physical processes. Recently, it has been reported that chaotically oscillating ultrafast time series from a laser, called laser chaos, provides the ability to solve multi-armed bandit (MAB) problems or decision-making problems at GHz order. Furthermore, it has been confirmed that the negatively correlated time-domain structure of laser chaos contributes to the acceleration of decision-making. However, the underlying mechanism of why decision-making is accelerated by correlated time series is unknown. In this study, we demonstrate a theoretical model to account for accelerating decision-making by correlated time sequence. We first confirm the effectiveness of the negative autocorrelation inherent in time series for solving two-armed bandit problems using Fourier transform surrogate methods. We propose a theoretical model that concerns the correlated time series subjected to the decision-making system and the internal status of the system therein in a unified manner, inspired by correlated random walks. We demonstrate that the performance derived analytically by the theory agrees well with the numerical simulations, which confirms the validity of the proposed model and leads to optimal system design. This study paves the way for improving the effectiveness of correlated time series for decision-making, impacting artificial intelligence and other applications.

1. Introduction

Optics and photonics have been extensively studied for high-speed information processing in various applications, especially machine learning [15]. One of the important branches of the research frontier is reinforcement learning [6], wherein the impacts of photonics have been intensively examined [79]. The multi-armed bandit (MAB) problem regards decision-making in obtaining high rewards from multiple selections, called arms, wherein the best arm is initially unknown. MAB problems concern a difficult trade-off known as the exploration-exploitation dilemma, which captures a fundamental aspect of reinforcement learning [6].

The physical properties of photons have been utilized in solving MAB problems [7, 8]. In particular, chaotically oscillating ultrafast time series generated by semiconductor lasers, called laser chaos, has been successfully utilized in resolving two-armed bandit problems in GHz order, which we call laser chaos decision-maker hereafter [7]. As introduced below, the principle of the laser chaos decision-maker simply depends on the signal-level comparison between the chaotically oscillating time series and the threshold level. It has also been demonstrated that such a level comparison-based principle is scalable in a tree architecture, which can be experimentally demonstrated up to 64 arms [10].

Furthermore, the applications of laser chaos decision-makers have been studied to benefit from their prompt adaptation abilities in dynamically changing uncertain environments [1114]. Takeuchi et al. applied laser chaos decision-making to channel selection problems in wireless communications [11], in which communication channels suffer from dynamically changing disturbances due to traffic, interference, or fading [15]. Kanemasa et al. extended the principle using laser chaos decision-maker to channel bonding in IEEE 802.11ac networks [12]. Furthermore, Duan et al. optimized user-pairing in non-orthogonal multiple access (NOMA) systems by laser chaos decision-maker [13]. Moreover, Kanno et al. combined laser chaos-based decision-making with photonic reservoir computing, where adaptive model selection is realized to enhance the computing capability [14].

In [7], it was demonstrated that the autocorrelation inherent in laser chaos time series impacted the decision-making performances. Indeed, chaotic time series with negative maximum autocorrelation yield superior performances when compared with pseudorandom numbers, colored noise, and random shuffle surrogate data of the original laser chaos time series [7]. Furthermore, Okada et al. extensively examined the decision-making acceleration by laser chaos using surrogate analysis, such as the Fourier transform surrogate [16]. It was found that both statistical distributions of the amplitude of time series and negative autocorrelation therein impact decision-making performances [16].

In the literature, the usefulness of negative autocorrelation in time series has been theoretically analyzed regarding code division multiple access (CDMA) [1719]. To achieve high performance in CDMA, the cross-correlation between the spreading sequences must be small. The optimal negative autocorrelation to minimize the interference has been mathematically derived, and the chaotic map that generates the smallest cross-correlation was defined. In addition, ref. [19] clarifies that the negative autocorrelation that minimizes cross-correlation accelerates the performance of solution search algorithms for combinatorial optimization problems. An FIR filter to generate the optimal chaotic CDMA sequence was also proposed based on the negative autocorrelation analysis [20]. Moreover, the effectiveness of such optimal negative autocorrelation codes has been experimentally demonstrated using software-defined radio systems [21].

However, regarding decision-making, the fundamental underlying mechanism of how negative autocorrelation inherent in time series impacts performance superiority is still unclear. That is, the results in the previous studies [7, 16] are all limited in empirical findings. If the effectiveness of the negative autocorrelation in laser chaos or correlated time series for decision-making is theoretically grasped, it allows, for example, a systematic design approach to derive the optimal autocorrelation depending on given problem situations. Besides, the insights gained by mathematical modeling ensure the reliability of the effectiveness provided by the negative autocorrelation in time series.

In this study, we theoretically construct a model to account for the effect of negative autocorrelation in decision-making performances. The theory of this study is inspired by correlated random walk [22, 23]. Contrary to conventional random walks, which have transition probabilities independent of prior events, correlated random walks have probabilities dependent on prior events [22, 23]. That is, the notion of correlated random walks allows us to represent state-dependent, different probability evolution dynamics. Such a theoretical architecture accounts for the interplay between the correlated time series and the evolution of decision-making. We clarify the validity of the proposed theoretical model by confirming the excellent agreement of the decision-making performances derived analytically by the proposed model and by numerical simulations.

The rest of the article is organized as follows: Section 2 reviews the mechanism of laser chaos decision-maker. In Section 3, we introduce a numerical method to generate an arbitrary autocorrelation in time series, by which the relevance between autocorrelation and the resultant decision-making performance is systematically examined. Section 4, which is the most important contribution of this study, demonstrates the theoretical model of decision-making based on correlated time sequences. Section 5 demonstrates the agreement of the decision-making performances predicted by the proposed theory and numerical simulations. Section 6 concludes the article.

2. Laser Chaos Decision Maker: Using Time Series for Decision-Making

As mentioned in Section 1, the laser chaos time series allows ultrafast decision-making. Figure 1(a) schematically illustrates the architecture of the laser chaos decision-maker for a two-arm bandit problem, which is the scope of this study [7]. The two arms are called slot machines A and B. Laser chaos is generated by subjecting a portion of the output light back to the laser by an externally arranged reflector, which is called delayed feedback. We compare the intensity level of the laser chaos with a certain threshold value, which is denoted by T(t).

The decision-making is executed as follows: When the sampled value of the time series is above the threshold, the decision is to choose slot machine A; otherwise, slot machine B is selected. The threshold T(t) is updated according to the result of the slot machine play. Overall, the threshold update is conducted under the assumption that the revised threshold will lead to the same decision in the subsequent decisions when the present action is successful, whereas the threshold is revised to the opposite direction when the present action is a failure [7, 8, 10].

More precisely, the values of threshold T(t) are determined bywhere is called the threshold adjuster and is the nearest integer to . can take an integer value ranging from −N to N, with N being a natural number. Therefore, the number of levels that the threshold adjustor can take is 2N + 1. Here, k is a coefficient to convert to .

is updated depending on the result of the action conducted at t − 1:where Δ denotes increment, which is given by Δ = 1 in this study. α is the forgetting parameter for weighting previous threshold adjuster variables, ranging from 0 to 1, that is . Ω is called the penalty parameter [7, 8].

A hierarchical formation of such two-armed bandit problems has been proposed to deal with problems with more than two arms [10]. The elemental structure is the abovementioned two-armed situations with a dynamically updated threshold. This study focuses on two-arm situations as the first theoretical analysis on the laser chaos decision-maker. The analysis of cases with more than two arms can be done by extending the method proposed in this study; however, that will become a very complicated analysis. Therefore, we focus on a simple case in this study, and the cases with more than two arms will be our future work.

3. Effectiveness of Correlated Time Series on Decision-Making

As described in Section 1, the performance of the two-armed bandit problem using laser chaos time series depends on the autocorrelation inherent therein [7, 16]. The best performance is obtained when the autocorrelation of the time series exhibited its negative maximum [7]. Furthermore, the surrogate data analysis of laser chaos time series clarifies the impact of time-domain correlation [10]. In this study, to examine the influence of correlations in time series in a systematic manner, we introduce an artistically constructed time-correlated time series and analyze its influence on decision-making performance.

We construct a time series whose amplitude follows a Gaussian distribution while having a determined autocorrelation by utilizing the Fourier transform surrogate method [24]. The various steps involved are as follows:(1)A time series r(t) is constructed with t ranging from 0 to T − 1, where T  is the length of the time series. Here, we suppose that . Specifically, holds, indicating that r(t) undergoes a time correlation specified by λ to its previous point r(t − 1). We call λ the autocorrelation coefficient in this study.(2)The Fourier transform of r(t) is calculated and denoted by R(f).(3)The phase of R(f) is revised by randomly assigned numbers, while the power spectrum is maintained. The revised Fourier domain signal is denoted by R′(f).(4)By taking the inverse Fourier transform of R′(f), a new time series is generated, which is denoted by r′(t).

Through the process above, the autocorrelation of the resultant r′(t) is equivalent to that of r(t). However, the amplitude distribution of r′(t) follows a Gaussian profile because of the randomized phase factors in the Fourier domain. The above-described process corresponds to a special case of Fourier transform surrogate [24].

Snapshots of the time series generated for the cases when the time correlation is specified by λ = 0.8, 0, and −0.8 are shown in Figures 1(b)–1(d), respectively. All of the time-series signals appear random, but there are distinct differences in their autocorrelation. With λ = 0.8, the signal level at time t is similar to the signals around that point, that is radically large signal-level differences in consecutive data points are rarely observed (Figure 1(b)). Conversely, with λ = −0.8, meaning a strong negative autocorrelation, the signal at time t has almost the exact opposite value to the surrounding data (Figure 1(d)). As a result, the time series exhibits a highly time-varying structure. Meanwhile, the histogram of the signal level of these time series follows the same Gaussian distribution.

It should be noted that the above-described Fourier transform surrogate-based procedure does not perfectly reproduce the experimentally observed laser chaos time series. This is because the correlation in the above process is determined only by in Step (1), whereas the experimental laser chaos involves very long-range time correlations via delayed optical feedback. However, we consider that the Fourier transform surrogate-based method is quite beneficial to this study for several reasons.

The first is that the correlation between two successive points can be specified by an arbitrary number, allowing values smaller than even −0.5, which was experimentally not feasible, at least in the previous studies [7, 10]. Therefore, systematic analysis is enabled for a wide range of . The second is that amplitude distributions are kept equivalent between each other even when is configured to different values, which also allows us a clear examination of the impact of autocorrelation inherent in the time series.

For these reasons, we use the time-series r′(t) generated using the above process. We then analyze how the MAB performance depends on the autocorrelation specified by λ. In evaluating the performance of the MAB problem, we employ the correct decision rate (CDR). The CDR(t) is defined as the ratio of selecting a slot machine with the highest reward probability at a time step t and averaged over m simulations or cycles. That is, CDR(t) is expressed bywhere m is the number of cycles with different random initial conditions. Here, Ci(t) = 1 when the slot machine with the highest reward probability is selected at the tth decision (or time t) of the ith cycle. In other words, correct decision-making is conducted. Otherwise, Ci(t) = 0, meaning that correct decision-making is not executed. In the following simulations, m = 60000.

Figure 2 summarizes the calculated CDR at t = 1000 as a function of the autocorrelation coefficient λ in several different reward environments and the setting of the decision-maker. The reward probability of the two slot machines, called machine A and machine B, is denoted by PA and PB, respectively. For example, in Figure 2(a), PA and PB are given as 0.9 and 0.3, respectively. In this situation, the correct decision is to select machine A as it is the slot machine with the highest reward probability (PA > PB). In addition, the number of levels of threshold adjustor is 5, and specified by N = 2. It should be emphasized that a higher CDR is obtained when the autocorrelation is negative; indeed, the best CDR is given by λ = −0.6.

Figures 2(b)–2(f) examine other reward settings and decision-maker conditions. Table 1 summarizes the reward probabilities of slot machines and the number of threshold levels N for each MAB problem. In Figures 2(b) and 2(c), PA and PB are differently configured while maintaining the same threshold number as in Figure 2(a) (i.e., N = 2). More specifically, the difference of PA and PB is only 0.1 in Figure 2(b) by setting (PA, PB) = (0.6, 0.5). Similarly, the difference is 0.2 in Figure 2(c) by setting (PA, PB) = (0.9, 0.7). That is, the difficulties in finding the best machine are configured differently. Here, it should be noted that the highest CDR is accomplished when the autocorrelation coefficient λ is given by −0.8 and −0.3 in Figures 2(b) and 2(c), respectively. That is, the best decision-making is realized with negatively correlated time series.

The reward setting of (PA, PB) in Figures 2(d)–2(f) is the same as in Figures 2(a)–2(c), respectively. The only difference is in the threshold value, which is specified by N = 4. The achieved CDR was different because of the change in the value of N. However, it should be noted that the highest CDR performances are all obtained with negative autocorrelation when λ is given by −0.6, −0.9, and −0.6 in Figures 2(d)–2(f), respectively.

4. Theoretical Model of Decision-Making Using Correlated Time Series

This section shows a mathematical model to account for the impact of correlated time series on decision-making. Here, we focus on two-armed bandit problems where two slot machines are called machines A and B. Figure 3 shows a conceptual architecture of the proposed model. We assume that slot machine A has a larger reward probability than slot machine B, that is PA > PB. Therefore, the correct decision would be to choose slot machine A.

Here, we assume that the subjected time sequence takes either of the two signal levels specified by + x or −x, which is denoted by sky blue marks in Figure 3. In the meantime, remember that the threshold level, T(t) given by equation (1), takes in total 2N + 1 different signal levels, each of which is represented by −N, −N + 1, …, N − 1, N. Furthermore, we assume that the higher-level signal + x satisfies N − 1 < x < N, meaning that the upper signal level of the incoming time series is below the maximum threshold level but greater than the second maximum threshold. Similarly, the lower signal level (−x) satisfies −N < −x < −N + 1, indicating that the lower signal level of the subjected time series is above the minimum threshold level but less than the second minimum threshold.

Based on the decision-making principle described in Section 2, we summarize the decision-making process in the present situation. Let the signal level of the incoming time series at time t and the threshold level at time t be denoted by s(t) and T(t), respectively.(i)If , regardless of the signal level s(t), slot machine A is selected. This is because T(t) = −N < s(t) always holds since the minimum of s(t) is −x, which is larger than −N.(ii)Similarly, if , slot machine B is selected regardless of the signal level s(t) because T(t) = N > s(t) always holds since the maximum of s(t) is +x, which is smaller than N.(iii)If , the decision of selecting slot machine A or B depends on the signal level of s(t).(1)If s(t) is given by + x, the decision is to select machine A because s(t) = +x is greater than N – 1.(2)Conversely, if s(t) is given by −x, the decision is to select machine B because s(t) = −x is smaller than −N + 1.

Furthermore, the incoming signal s(t) contains inherent correlations, as discussed in Sections 1 and 2. Concerning the fact that s(t) under study is a two-level signal train, we can think of the probability where the signal level s(t + 1) at time t + 1 is different from s(t) at time t, that is s(t + 1) = +x results after s(t) = −x or s(t + 1) = −x after s(t) = +x. Since the autocorrelation between two consecutive timings is given by λ, such a signal-level changing probability is given by . Conversely, the probability of exhibiting the same signal level is given by .

Therefore, such stochastic processes are represented by conditional probabilities given bywhere Pr denotes probability. The important aspect is that the internal status of the decision-maker, represented by T(t), is tightly coupled with the correlated time series subjected to the system as well as the betting results of the slot machine playing, which is specified by PA and PB.

The behavior of the revision of T(t) is described by the following cases:(i)If , slot machine A is always selected. The threshold is updated as(ii)If , slot machine B is always selected. The threshold is updated as(iii)If ,

when slot machine A is selected, the threshold is updated asand when slot machine B is selected, the threshold is updated as

It should be noted that regardless of the machine selection and betting result, the threshold level always increases or decreases in this case, meaning that the same threshold level is not allowed.

The procedure summarized above is a special case of the principle shown in Section 2 by specifying the parameters therein by . In addition, we have to emphasize that the upper and lower limits of T(t) are newly posed when the decrement or increment of the threshold is not permitted beyond the range between −N and N. Hereafter, we refer to this as the stopping rule. This setting is the simplest case for the laser chaos decision-maker. We use this simplest case to keep our analysis model from being too complicated. Cases with other settings may be possible by extending our proposed scheme, but this will be a future project.

To theoretically deal with the abovementioned seemingly complex situations, we introduce a set , which represents the state of the model at time t. The space spanned by is .

Herein, we can characterize the state transition probability between two states. Let, for example, the current state is specified by (i, +x) while T(t) is not at the border, that is . Here, we consider the probability of the state transition as (i + 1, −x). It should be noted that the decision is to select machine A in this given situation (i, +x) since the signal level + x is larger than the current threshold T(t). In this state transition from (i, +x) to (i + 1, −x), the threshold is incremented (ii + 1) and the incoming signal level is reversed (+x −x). Such a situation occurs when the slot machine A playing is unsuccessful and the incoming signal level is flipped, whose probability is given by (1 − PA)μ. Similarly, all transition probabilities are determined.

The notion of correlated random walk allows us to summarize such transitions in a unified manner [22, 23]. We first introduce the probability of the state by , meaning the probability of the state with T(t) = i and s(t) = σ. In addition, we define a probability vector , which is given bywhich combines the probabilities involving the threshold level being i for different signal levels of the time series (+x and −x).

We denote the probability of the threshold being i at time t, regardless of the incoming signal level, by , which is mathematically equivalent to the -norm of . That is

Based on these preparations, the recurrent formulae of lead us to precisely characterize the behavior of the system.

Case 1. The probability vector for the case when the threshold is between −N + 1 and N – 1 at time t + 1 is given bywhere the matrices P and Q are given byEquation (11) clearly implies that the probability vector of the threshold being i comprises the transitions from the states with the thresholds being i – 1 and i + 1. The elements of the matrices P(i) and Q(i) are intuitively easily understood by the following. The dynamics given by equation (11) are schematically illustrated in Figure 4(a).
The matrix P(i) concerns the probability of decrementing the threshold level. For example, the (1, 1)-element of P(i), or P1,1(i), represents the probability of the transition from the state (i, +x) to (i − 1, +x). The state (i, +x) indicates that the decision is to select machine A. The decrement of the threshold indicates that the result is a win. The probability of consecutive identical signal levels is given by 1 − μ. Hence, P1,1(i) = PA (1 – μ). Similarly, P1,2(i) means the probability of the transition from the state (i, −x) to (i – 1, +x); the difference is the change of the polarity of the incoming signal level. Therefore, P1,2(i) = (1 – PB) μ. Similarly, P2,1(i) corresponds to the probability of the transition from the state (i, +x) to (i – 1, −x), and P2,2(i) corresponds to the transition from (i, −x) to (i − 1, −x). The blue arrows in Figure 4(a) schematically represent the role of the matrix P(i), which concerns the decrementing of the threshold level.
Conversely, the matrix Q(i) concerns the probability of incrementing the threshold level. Q1,1(i), for example, represents the probability of the transition from the state (i, +x) to (i + 1, +x), meaning that the threshold is incremented while the signal level is unchanged. This situation represents the decision to select machine A, the result is lost, and the polarity of the incoming signal is the same; the corresponding probability is given by (1 – PA) (1 – μ). Similarly, other elements of Q(i) are specified straightforwardly. The red arrows in Figure 4(a) schematically represent the role of the matrix Q(i), which concerns the incrementing of the threshold level.

Case 2. The probability vector for the case when the threshold is at the edge on the negative side, −N at time t + 1 is specified byEdges are to be treated carefully in this case. First, P(−N + 1) in the second term on the right-hand side of equation (14) describes the transition of the decrement of the threshold level from –N + 1 to N, which has already been defined in equation (12). Second, since there are no threshold levels smaller than −N, the transitions involving increments or any Q matrix are not included in equation (14). Third, what is different from Case 1 above is that the threshold level can be maintained at the edges, which is indicated by the first term on the right-hand side of equation (14). More specifically, the P matrix at −N is given byP 1,1(−N) means the state transition from (−N, +x) to (−N, +x). This corresponds to the decision to select machine A, the result is a win, and the signal polarity is unchanged. Therefore P1,1(−N) = PA (1 – μ). Similarly, P1,2(−N) means the state transition from (−N, −x) to (−N, +x); what is different from P1,1(−N) is the change in polarity. Hence, P1,2(−N) = PAμ. Likewise, P2,1(−N) and P2,2(−N) can be obtained. The blue arrows in Figure 4(b) illustrates the role of the matrix P(−N), which concerns keeping the same threshold level.

Case 3. Similar to Case 2, the probability vector for the case when the threshold is N at time t + 1 is specified byThe meaning of equation (16) is similar to equation (14). Q(N − 1) in the right-hand side of equation (16) has been already defined in equation (13). As in Case 2, the threshold level can be maintained at the edge, which is shown by Q(N) in equation (16). This is given byQ 1,1(N) means the state transition from (N, +x) to (N, +x). This corresponds to the decision to select machine B, the result is a win, and the signal polarity is unchanged. Therefore Q1,1(N) = PB(1 – μ). Similarly, Q1,2(N) indicates the state transition from (N, −x) to (N, +x); what is different from Q1,1(N) is the change in polarity. Hence, Q1,2(N) = PBμ. Likewise, Q2,1(N) and Q2,2(N) can be obtained. The red arrows in Figure 4(c) illustrate the role of the matrix Q(N), which concerns keeping the same threshold level.
Finally, a remark is needed for the matrix P at N and matrix Q at −N, which should be different from the one given by equations (12) and (13), and are given byThis is because the decision at the edges does not depend on the incoming signal level. For example, with the threshold at N, the decision is always to select machine B because both signal levels + x and −x are smaller than the threshold. Hence P1,1(N) means the probability of the state transition from (N, +x) to (N − 1, +x), meaning that the decision is to select machine B, the result is a loss, and the polarity of the signal is unchanged. Therefore P1,1(N) = (1 − PB) (1 − μ). Similarly, all other elements in equations (18) and (19) are specified. The blue arrows in Figure 4(c) and the red arrows in Figure 4(b) illustrate P(N) and Q(−N), respectively.
Figure 5 summarizes the chains of the probability vector by equations (11), (14), and (16). The blue arrows, which regard the decrement of the threshold level, are induced by either a win by selecting machine A or a loss by selecting machine B. In contrast, the red arrows, which represent the increment of the threshold level, are triggered by either a win by selecting machine B or loss by selecting machine A. The thresholds at the edge (−N and N) involve arrows of transitions to an identical threshold.
Finally, the CDR can be discussed using the probabilities defined above. Assume that the correct decision is to select machine A. The selection of machine A is realized excessively in the following two cases:(1)The threshold is −N. In this case, both signal levels −x and +x result in the decision to choose machine A.(2)When the threshold is between −N + 1 and N – 1, the input signal level of +x results in the decision to choose machine A.Hence, the probability of selecting machine A at time t, denoted by CDR(theory) (t), is given by

5. Evaluation

With the theoretical model shown in Section 4, we can calculate the time evolution of the probability vector and its L1-norm from any initial conditions. Consequently, CDR(theory) (t) is derived by equation (20).

Here, we examine the case when the reward probabilities are given by PA = 0.9 and PB = 0.7 and assume that N is given by 2, meaning that the number of threshold levels is 5. Herein, the initial probability vector is given by while assuming all the other vectors are zero. The autocorrelation coefficient λ specifies the time-correlated, two-level signal trains.

Figure 5(b) shows the analytically calculated chains of probability vectors. As time evolves, the probability vector at the edge (i = −2) increases, indicating a high likelihood of choosing machine A, which is the correct decision (since PA > PB).

To examine the mechanism more deeply, Figures 6(a)6(c) demonstrate the time evolution of the probability when the threshold is at level i (i = −2, −1, 0, 1, 2) and when the autocorrelation λ is specified by −0.8, 0, and 0.8, respectively. What is commonly observed in these figures is that , indicated by blue curves, increases as the time elapses, leading to a high chance of selecting machine A or correct decision-making. Meanwhile, , indicated by green curves, exhibits approximately 0.2 at a time step of 25 when λ is 0.8 (Figure 6(c)), whereas it shows nearly zero at the same timing when λ is −0.8 (Figure 6(a)). This indicates that the probability of choosing machine B, which is the wrong decision, is not negligible when λ = 0.8.

From another perspective, the blue, red, and yellow markers in Figure 6(d) characterize the probabilities of the threshold at t = 1000, which is written as , when the autocorrelation is specified for λ values given by −0.8, 0, and 0.8, respectively. We can clearly observe a large probability greater than 0.6 about the threshold level of −2, regardless of λ values.

It is remarkable that for λ = −0.8, the probability monotonically decreases as the threshold increases, whereas for λ = 0.8, the probability increases when the threshold increases from 0 to 2. Even with zero autocorrelation (λ = 0), a slight increase in probability is observed at the threshold level of 2. We assume that a positive autocorrelation tends to conduct similar decisions consecutively, and hence the decision can be locked in a status, which is actually not the optimal one. Indeed, a related tendency is observed in Figures 6(a)6(c), where the dynamic change of probabilities, most notably by indicated by orange curves, exhibits a strong oscillatory behavior with λ = −0.8, whereas it is attenuated when λ = 0.8.

As discussed in Section 4, the decision-making ability can be theoretically derived as , given in equation (20) using the probability model. We examined depending on a variety of conditions. Herein, the reward probabilities (PA, PB) and the number of threshold levels specified by N are summarized in Table 1, which are the same as discussed in Section 3 and Figure 2. For example, Figure 7(a) concerns the case (PA, PB) = (0.9, 0.3) and N = 2. The red curves in Figure 7 show as a function of autocorrelation coefficient λ ranging from −0.95 to 0.9 with 0.05 interval. In addition, λ = −0.99 is examined. For all cases in Figure 7, the maximum is obtained when the autocorrelation coefficient is negative, indicated by red arrows therein, which coincide with the numerical observations shown in Figure 2.

Furthermore, we numerically simulate the correct decision rate CDR(t) defined in equation (3) based on the original decision-making algorithm described in Section 3 while adapting the stopping rule in Section 4. The results are shown by the blue curves in Figure 7. We observe in all panels in Figure 7 that the results from theory (red) and simulation (blue) match well with each other. Additionally, while the blue marks exhibit fluctuations since they are obtained as a statistical average via numerical results, the results in red marks are smooth because they are analytically derived based on the theory described in Section 4.

6. Conclusion

In this study, we construct a theoretical model to account for the acceleration of decision-making by correlated time sequences. Previous studies have shown that the solution to the two-armed bandit problem is accelerated by negative autocorrelation inherent in the time series subjected to the decision-making system. However, its underlying mechanisms are unclear. We begin the discussion by clarifying the impact of time-domain correlation on decision-making by utilizing time series with specific autocorrelation designed via Fourier transform surrogate. Coinciding with the prior reports of using experimentally observed laser chaos time series, we confirm that the negative autocorrelation accomplishes superior decision-making performance. The difficulties in understanding the underlying mechanism of such acceleration stem from the fact that multiple entities are involved: the dynamical reconfiguration of the internal status of the decision-maker (the threshold level and its revision), time-domain structure of the incoming time series, and stochastic attributes of the environment (reward probability of slot machines). The theoretical model of this study unifies these entities based on correlated random walks. Furthermore, the decision-making performance obtained analytically by the theoretical model agrees with the numerical results from simulations, which validates the proposed theory. Additionally, this indicates that the optimal autocorrelation for maximizing can be obtained through the model without executing enormous numerical simulations. The proposed scheme to select the best laser chaos with the best autocorrelation can accelerate performance in applications such as wireless communication systems [1113]. This study constitutes a foundation of the intellectual mechanism enhanced by correlated time series, which is important for future information and communications technology.

The laser chaos decision-maker can quickly solve MAB problems with GHz order decisions. Therefore, it will be possible to optimize decisions in wireless communication systems in real time. However, a dedicated device for the laser chaos decision-maker is necessary. In the meantime, a chip-scale photonic implementation has been recently demonstrated [25] on the basis of the recent advancements in integrated photonics technology, indicating the potential for system integration and miniaturization.

Data Availability

The data that are used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the CREST project (JPMJCR17N2) funded by the Japan Science and Technology Agency and Grants-in-Aid for Scientific Research (JP20H00233) funded by the Japan Society for the Promotion of Science. A preprint of the article is available on arXiv at https://arxiv.org/abs/2203.16004 [26].