#### Abstract

A circuit of evaluation and selection of the alternatives is considered a reliable model in neurobiology. The prominent contributions of the literature to this topic are reported. In this study, valuation and choice of a decisional process during Two-Alternative Forced-Choice (TAFC) task are represented as a two-layered network of computational cells, where information accrual and processing progress in nonlinear diffusion dynamics. The evolution of the response-to-stimulus map is thus modeled by two linked diffusive modules (2LDM) representing the neuronal populations involved in the valuation-and-decision circuit of decision making. Diffusion models are naturally appropriate for describing accumulation of evidence over the time. This allows the computation of the response times (RTs) in valuation and choice, under the hypothesis of ex-Wald distribution. A nonlinear transfer function integrates the activities of the two layers. The input-output map based on the infomax principle makes the 2LDM consistent with the reinforcement learning approach. Results from simulated likelihood time series indicate that 2LDM may account for the activity-dependent modulatory component of effective connectivity between the neuronal populations. Rhythmic fluctuations of the estimate gain functions in the delta-beta bands also support the compatibility of 2LDM with the neurobiology of DM.

#### 1. Introduction

Even simple decisions imply higher cognitive functions that integrate noisy sensory stimuli, prior knowledge, and the costs-and-benefits related to possible actions in function of their time of occurrence. Accumulation of noisy information is a reliable pattern performed by neural pools in cortical circuitry during decision making (DM) process. This process is time absorbing, especially when the quality of information is poor and there exist many possible alternatives that may be evaluated and compared. There exists large consensus in the studies of DM toward the conformation of a phase of accumulation of evidence until a decision is made [1–11]; that is, the decision maker is expected to keep on gathering information until the evidence in favor of one of the alternatives suffices. Thus, the stochastic integration of information up to a certain threshold gives rise to a speed-accuracy tradeoff (the performance of the responses increases for slower response times) that is bounded by the costs associated with obtaining more information. In this context the responses times (RTs) to the stimuli characterize the speed-accuracy tradeoff because they allow the identification of the time when a decision is made (although not yet completed by the motor action) [12]. RT studies have addressed the implementation of diffusive models for describing decisional behaviors and the identification of the neuronal areas related to the decisional activity. DM is a process that involves different areas of the brain. These regions include the cortical areas that are supposed to integrate evidence supporting alternative actions and the basal ganglia (BG) that are hypothesized to act as a central switch in gating behavioral requests [9–15]. Neurons in the middle temporal area (MT) are known to encode motion stimulus [13], while the decision process itself occurs in other areas including posterior parietal cortex and prefrontal cortex. Perceptual choice experiments with primates [2, 14] enabled to relate the selective activation of neurons in the lateral intraparietal area (LIP) with the perceptual choice and the response time [15], and this activity would persist throughout a delay between the stimulus and the saccadic movement. This implies that the LIP neurons can respond neither purely to a motor signal, nor simply to sensory input [16]. Rather, LIP neurons are also supposed to contribute to the working memory associated with guiding the eye movement [17]; that is, they would store information about the target location. Neurons in the prefrontal cortex display similar properties during visual motion discrimination tasks [18]. Further studies of human neuroimaging and monkey single-neuron physiology have supported the hypothesis that the parietal and frontal cortices form a system for temporal accruing of data and categorical decision making. These areas would exert executive control on sensory neurons by providing top-down signals that convey information on semantic categorization derived by the stimulus-response association [19, 20].

In natural environments several sensory stimuli produce different alternatives and hence demand the evaluation of different possible responses, that is, a variety of behaviors. In other terms, also a selection question arises [21] whereby the (probability) distribution of the correct response has to take control of the individual’s motor plant [22]. The action selection then would resolve a conflict among decisional centers throughout the brain. A central switch that considers the urgency and opportunity of specific response to the stimuli results in an optimal solution in computational terms that is physiologically reliable by taking the basal ganglia (BG) as the neural base for that switch. Accordingly, BG gather input from all over the brain and, by sending tonic inhibition to midbrain and brain stem targets involved in motor actions, block the cortical control over these actions [23–25]. Therefore, the inhibition of the neurons in the output nuclei, caused by BG activity, determines the disinhibition of their targets and the actions would be consequently selected. In other words, BG by acting as a central switch would evaluate the evidence and facilitate the best supported responses [22, 26, 27]. Many studies have reported a significant increase in the firing rate of the neurons of cortical areas representing the alternative choices during DM in visual tasks. The increase of the firing rates then would provide accumulation of evidence (i.e., information) related to the alternatives [1, 2]. The association between the neural firing rates and the DM process is by now an accepted fact and, by the way, some points are worthy to be mentioned. The ramping of the firing rates does not merely anticipate the motor action but would also relate to the target selection. Rather, the rate of growth of the neural activity is proportional to the response times, and so it may predict the decision time. In fact, it triggers the spiking burst of the downstream neurons (in SC and caudate), and the occurrence of their crossing of a defined threshold level marks the decision time. The ramping of the firing rates is also proportional to the prior probabilities of the alternatives and to their probabilities of being rewarded.

The main purpose of this work was to set the theoretical, neurobiologically sustainable bases for representing the two stages of valuation and choice of DM during Two-Alternative Forced-Choice (TAFC) task in terms of two distinguished layers of neuronal populations performing diffusive dynamics (2LDM), under the assumption that in the DM among alternative options the cortical areas (lateral prefrontal and parietal cortex) integrate the corresponding weighted evidence of the alternatives, whilst the ventromedial prefrontal cortex and the striatum encode the value of different options [28]. The secondary objective was to verify the ability of the 2LDM to account for possible influence that the populations exert over each other. Therefore simulation of time series reproducing the probability of doing motor action (visual targeting) during Two-Alternative Forced-Choice (TAFC) visual task was performed. Power spectrum of the gain functions and synchronization analysis of the instantaneous phases of the activities of the neuronal populations in the two layers of the model suggested activity-dependent modulation of the effective connectivity between the populations. The so-called Two-Alternative Forced-Choice (TAFC) task has often characterized the experimental setting for DM analysis [5, 6, 11]. Bogacz and coauthors [29] evidenced that the TAFC task models typically make three fundamental assumptions: (a) evidence favoring each alternative is integrated over time; (b) the process is subject to random fluctuations; and (c) the decision is made when sufficient evidence has accumulated favoring one alternative over the other. The major issue about the modality of integration of evidence is generally solved in favor of the integration of the difference in evidence, rather than the independent integration of evidence for each alternative. The application of the diffusion models in the study of cognitive processes had been introduced by Ratcliff [5] and since then on they had kept their theoretical soundness in the context of the analysis of decision making under uncertainty [1, 2, 4, 6–11] because it is relatively simple and well characterized [30] and it has been proven to implement the optimal mechanism for TAFC decision making [22, 31]. In applying the diffusion model to the TAFC, it is assumed that the accrual of noisy evidence corresponding to the two alternatives is carried on until their difference reaches a decisional threshold (Figure 1).

It has been shown [5, 31] that, under experiments with human subjects performing TAFC tasks, the drift diffusion model yields accuracy and reaction times (RTs). An advantage from drift diffusion models (DDM) is that, given a level of accuracy, it results in the fastest decision maker, for a fixed decision threshold. The accuracy tends to increase proportionally to the rising of threshold which results in a speed-accuracy tradeoff. This speed-accuracy tradeoff is usually considered a basic parameter for interpreting the results of both behavioral and neurological experiments [12, 15, 32]. The surprising capability of DDM to fit behavioral and neurological data seems to indicate that some decision making processes in the brain are really computed by a similar mechanism that accumulates evidence [33].

However, the canonical diffusion models assume that momentary evidence is accumulated continuously and at* constant rate*, that is, linearly, over the time until a decision threshold is reached [2, 15, 34, 35]. The assumption of linear integration of evidence in human decision making has been recently criticized because it misses the occurrence of refractory periods (“decisional blinks”) during DM, which areknown in psychological literature [36, 37]. On the contrary, the rate of evidence accumulation during DM has been found to fluctuate rhythmically in the delta band as a mechanism of sensory and attentional selection [38–40]. The static linear-nonlinear transfer functions (LN cascades) implemented in the 2LDM for modelling the probability of firing rate of the neuronal populations in response to the stimuli accomplish the hypothesis of nonlinear momentary evidence accumulation. Moreover, in 2LDM a saturating, that is, sigmoidal, activation function is used; therefore, the link between mean population depolarization and expected firing rate (i.e., the input-output map) is parameterized by the slope of the sigmoidal function. Interestingly, the slope is allegedly related to the first- and second-order convolution kernels of the Volterra series, which represent a sufficient specification of population dynamics [41–43]. The driving influence and the activity-dependent modulatory components of the effective connectivity between neuronal populations can therefore be estimated by analyzing the synchronization between the gain functions of the two layers. Specifically, modulatory connections are revealed by asynchronous coupling, whilst driving connections relate to synchronous interactions [44].

Although the evidence accumulation and choice formation are usually described as a one-stage process such that a decision is given as soon as the decisional variable reaches a threshold, it is empirically yet unknown whether decision making is performed in a single neuronal circuit [45]. Merging the two recognized stages of evaluation of alternatives and behavior/motor-action selection into one stage not only renders a model unable to explain overlapping (feature-fusion) phenomenon arising from high-rate succeeding stimuli [46], but seems counterintuitive with respect to the well-established statement that decisions are not unitary events since derivation from two distinct sequential processes [28, 47] as well. Neither would a one-stage diffusion model be able to represent the nonlinear interactions between the cortical and subcortical neuronal populations involved throughout the brain, nor would it properly describe high-conflict over long time scale (>3 s) choices, nor would it yield well distinguished estimation of the times for evaluation and action selection. From a computational point of view, one stage would be a natural outcome only if the activation functions of the computational layers in the multilevel neuronal network were linear, because any multilayer neuronal network, under the condition of linearly separable (i.e., independent) input patterns, might be restricted to a single layer of linear units. However, this assumption is not reasonable in neurobiological context, because it rules out the nonlinear coupling among brain areas, that is, the activity-dependent connections.

The most intriguing two-stage models have been proposed in terms of integrate-and-fire attractor networks [33, 48–50], where the first network evaluates through competitive learning the evidence-biased firing rates of the neurons responding to each of the possible choices and consequently takes a provisional decision in favor of the most valued input. The second network provides final decisions on the base of the confidence in the first level decision, so that changes of the first level decisions are made possible. Positive feedback makes the integrate-and-fire attractor network a nonlinear model and hence it is consistent with the neurophysiology. Strikingly, attractor network exhibits, at local (i.e., cortical) level, nonlinear diffusive dynamics [48], where the biasing input stands for the drift and the stochastic spiking of the neurons provides the diffusion component. Hence, both attractor network model and 2LDM consider decision making inherently a process that involves two levels of computation of nonlinear diffusion dynamics. Also the attractor network model contemplates the role of basal ganglia as driving system of the “global” competition for the action/behavior selection, but in this case through a linear diffusion process. This marks the difference with 2LDM, where the possible implication of BG activity is to be considered in terms of nonlinear diffusion system. As abovementioned, independent, separable input patterns are necessary for linear integration, but this would hinder adaptive mechanisms. Moreover, since the adaptive tuning of the decision threshold is expected to be modulated by reward signals [51, 52], the dopamine-dependent corticostriatal synapses are described as the neurobiological locus for threshold modification [53]. This finding enhances the assumption of nonlinear behavior of BG, whereas spiny projection neurons display* bistable* behavior [54] just because bistability calls for nonlinearity, feedback, and hysteresis, which are conditions consistent with the implementation of reinforcement learning in BG.

The paper is organized as follows.

Section 2 is about the neuronal populations codes and the relationship between interpulse intervals and response times. The last part of the section is dedicated to the description of the two-layered diffusion model (2LDM).

Section 3 presents the results of synchronization analysis between the instantaneous phases of the activities of the two neuronal populations and the power spectrum of the gain functions from the application of the 2LDM over simulated data that were obtained by resampling time series of the probability of visual targeting (likelihood) recorded during a Two-Alternative Forced-Choice (TAFC) visual task.

Section 4 summarizes and discusses the main results and adds some comments about computational and neurobiological implications or potential developments of the 2LDM.

The appendix deals with statistical theory on distances between features.

#### 2. Two-Layered Diffusion Model (2LDM)

##### 2.1. Population Code

As long as the cells in a neural population have similar response properties, that is, acting in a statistically similar way [55], then the brain collects and organizes information from patterns of activity involving populations of neurons [56, 57]. In the work of Sanger [55] it is also described that the input-output (stimulus-response) map stems from the modulation of information; that is, calculation on values that are represented by population codes (encoding) and feature extraction about the input stimuli (decoding), in the brain, may be seen as relations between different population codes that provide internal representations of the input-output map. In this perspective, then, computation in the brain relies on commutations from one internal representation to another. By assuming that populations of neurons regulate the responses to stimuli, we can consider the effect of the accumulation of activity from a combination of two neural populations and . The gathering and processing of information during the experiments then would elicit spike-trains from those cells of within the interval . If we count the number of spikes emitted up to a time , we obtain the variables that represent the sequence of evidence accumulation. Over the time, the occurrence of noise makes stochastic variables, and so the process of accumulation of spikes from the neural population traces a random pattern that is expected to end as it encounters a bound at a finite time [12]. The two processes by which the neural system learns the input stimuli , thus, determine the behavior of the decisional variable. This learning activity then would give rise to a first level of codification through that provides the elaboration and probabilistic valuation of the input . In fact, we can imagine a binary code where the “1 s” corresponds to , that is, to the times the bound is trespassed. Afterwards, the codification from is “translated” into the population by the variable . This provides another binary code based on the overtaking of the threshold , which ultimately drives the eye movements during the computational task. Calculation on values that are represented by population codes and feature extraction about the input stimuli in the brain may be seen as relations between different population codes that provide internal representations of the input-output map. Hence, we can “translate” the likelihood of into a pulsed binary code , say the -code, where is a nonlinear transform of such that and at time assume the value “1” (pulse) if the likelihood , or the value “0” (no-pulse) otherwise.

##### 2.2. Interpulse Intervals and Response Times

After the signal has been reformulated on the base of the -code (Figure 2), we obtain a string of symbols and the lengths of the sequences of the zeros provide the holding times, that is, the empirical interpulse intervals, IPI; that is, the variable recoded according to results in periods of subthreshold location that are broken out by sequences of impulses. Analogously, from the recoded we obtain the corresponding empirical holding times IPI given the transform . Thus we can imagine a functional chain among the bounds , , and scaled by some opportune nonlinear transfer functions , (without loss of generality, we set ). Let us assume that behaves a renewal process. The expected value and the variance of a renewal process may be obtained from the observed IPI data. In fact, for large , the variable is normally distributed with mean and variance , where and are the mean and the standard deviation of the corresponding IPI sequence, respectively [58, 59]. Therefore, the time series can be reconstructed by averaging out over the neural population a Gaussian random variable with mean and variance .

The importance of IPI arises from the hypothesis that the information transferred within the nervous system is usually encoded also by the timing of spikes [60–62]. (Since we are dealing ultimately with the threshold-dependent variable , the random variable is implicitly involved, where is the scalar diffusion process that describes the evolution of the potential, that is, of the evidence, between two consecutive neuronal firings. then is the theoretical counterpart of the IPI.) Thus, by studying the properties of the set of the times that correspond to the crossings of the threshold we both realize the relationship between the impulse rate of the variables and IPI and solve the so-called first passage time problem [63] and hence the response time problem as well. We can then consider the IPI as the expression of the response time of the process through the threshold. Given this association, it becomes natural to compare the theoretical distribution of the response times to the observed distribution of the IPI. An impulse of the variable is elicited any time the process crosses the threshold and then starts again according to a renewal process. (This assumption is necessary to identify the time series of successive pulses times as a sample extracted from a population with the same distribution of the random variable [64].) The question then is how to model the distribution of the response times. We hypothesized the ex-Wald distribution of the response times [65] so that the cumulative distribution function of the response time variable is given by where is the diffusion parameter (i.e., noise of the process), , and are the drift and the threshold of the diffusion process, and (>0) is the rate parameter of the exponential distribution. is the cumulative distribution function of the Wald distributed component of the response time variable and is the standard Gaussian distribution: The estimation of the parameters of the response time distribution involves a backward procedure. Firstly, the variable must be determined for an initial value of the threshold so as to obtain the distribution of the corresponding IPI. Secondly, the best combination of the parameters for the RT distribution of will be assigned as the one which minimizes some error function, say the root mean square error (RMSE) of the difference between their corresponding distribution of RT and the observed IPI. (Indeed the RMSE is a quadratic function of the errors and is optimal when the residuals are distributed as normal random variables. In that occurrence the RMSE is a convex surface. On the contrary, in presence of heavy-tailed distributions of the residuals, the RMSE becomes suboptimal, and it had better use other criteria, e.g., based on entropy measures. However, in front of the computational complexity involved in the inverse method for deriving the parameters of the diffusion models, the RMSE may turn out to be economical.) Lastly, the parameters for the RT distribution of the variable can be estimated analogously by comparison to the interpulse-interval distribution IPI. Of course the final result of the sequential procedure is -dependent; that is, the particular initial value of may affect the estimated vector of optimal parameters. Therefore, the question is how to initialize . The assignation to the average of the likelihood, that is, , suggests an interesting interpretation under the information theory perspective. In fact, we may expect that the function should be learned so as to maximize the mutual information between and subject to noise effect. This is the so-called infomax principle, by which the process of stimuli-learning gives rise to optimization algorithms [66]. If the noise that affects the system is Gaussian and independent of the input , then the mutual information between and resolves in the difference between the entropy of the output and the entropy of the noise [60]. It implies that, to improve the information transmission, the entropy of the signal must be maximized. Therefore, the value of the function , which corresponds to the maximal entropy of the binary signal , is expected to be very close to the one that corresponds to the average . To recover the mapping from stimulus to impulse rate we can apply a nonlinear transformation of a convolution of the stimuli , . By assuming the logistic function for the nonlinear transforming function we can write , where is the convolution of the stimuli (i.e., or ) with an opportune function that is obtained in two stages. In the first stage the transfer function estimate is computed for the input signal and the binary output signal , where is the variable representing the probability of impulse rates, that is, if or if . The relationship between and is shaped by the static (i.e., time invariant) transfer function , that is, the ratio of the cross power spectral density of and over the power spectral density of . In the second stage, the inverse discrete Fourier transform of is computed. Since the inputs are generated from Gaussian process, then is Gaussian too. According to the theorem of Bussgang, the cross-correlation between and scales the autocovariance function of by a value ; therefore we can correct with . Next, the convolution of vectors and forms the argument of the logistic transfer function . This procedure yields a static linear-nonlinear model for the probability of firing rate of the neuronal populations in response to the stimuli and the variable is then expressed by , where is the noise term [67].

##### 2.3. Structure of the Model

Let us consider the input-output map between input and the final state of decisional variable . Data inflow at time ; proceeds from external input and recurrent output obtained at time . This relation, which implies relatively complex computational paradigms, is mediated by populations of neurons and in different areas in the brain. Cells of neuron population , activated by input , respond according to a tuning curve and generate the time series of spikes . Variable counts the spikes until the threshold is reached. This event affects the observable variable and thus the final decisional state through a second neuron population . The firing of neurons is integrated in variable that exceeds threshold and ultimately drives the path of the variable . The state of at any time holds the whole information-set available up to , including the implicit reward corresponding to the state at that time. By aiming at the maximization of the reward, the system would give rise to gap evaluation and error reduction that ultimately involves a feedback circuitry. In this way, the information backpropagates from the decision stage to the valuation stage in order to elicit the adaptation of the threshold in the valuation stage. This mechanism of reinforcement triggers the competition between the alternatives and the valuation is ultimately addressed to the most probable rewarded one (Figure 3).

#### 3. Simulation

##### 3.1. Methods

In order to test the ability of the model to detect effective interactions between the neuronal populations, simulation of the 2LDM was carried on by resampling time series of conditional probabilities from a previous experiment of eye tracking. Nine subjects had been asked to look at two abstract images displayed on a screen for 5 seconds (s) at randomly assigned locations (left or right side). Each subject performed ten trials. The two images were balanced by extension and by photometrical characteristics (color, luminance, and contrast). Eye movements had been recorded during the period of 5 s (sampling frequency 1/50 ms) and at the end of that time subjects declared which of the images was their preferred one. The likelihood, that is, the probability of visual targeting towards one of two images conditional to the final chosen stimulus, was then calculated over the total 90 choices. One hundred surrogates of this likelihood time series were obtained by using the iAAFT technique (iterated amplitude adjusted Fourier transform) [68], so preserving the marginal distribution and power spectrum of the original signal. Next, Gaussian noise proportional to the standard deviation of the original likelihood was added at thirty randomly selected points in each surrogate series. Run test was applied to the resulting modified iAAFT surrogates, and were retained only the ones for which the null hypothesis of mutually independence of the elements in the sequence was rejected. This procedure guaranteed the generation of forty realistic-structured data vectors to which the 2LDM was implemented. Paired -test for comparing the rates of populations’ activity variables and was done. Power spectrum of the gain functions calculated for the two layers was reported. Hilbert-transforms of the average rates of populations’ activity variables and were produced so as to derive their instantaneous phases [69]. The* correntropy* coefficient , which is a measure of correlation in the reproducing kernel Hilbert space (RKHS) ranging over , proper for nonlinear relationship [70], operated as coefficient of phase locking (i.e., synchronization) between the phases and . Correntropy measures were calculated dynamically, that is, in running windows (of depth = 6 data points), over the phase signals.

To use the phase locking indices in a meaningful way, we need to know their distribution under the null hypothesis of independent pairs of oscillatory activity. Only values that depart significantly from what would be expected for independent oscillators can be considered as revealing the presence of synchronization. The distribution of the index, computed for pairs drawn randomly from the surrogate ensembles, can be considered as an approximation of the distribution under the null hypothesis [71]. Therefore, the iAAFT surrogates of both average rates of populations’ activity variables and were Hilbert-transformed and the resultant instantaneous phases attained, and the time series of correntropy values over running windows (of the same size as before) between and was obtained.

To test the null hypothesis that the mean of the distances between features ( and ) is zero, the Weibull-like distribution of the variable was considered (see the appendix).

##### 3.2. Results

The resampled time series of the likelihood of visual targeting at the final selected image ranged over , with mean = 0.5853 and SD = 0.095. The original likelihood data series had mean = 0.6630 and SD = 0.0951. A paired-samples -test was conducted to compare the rates of populations’ activity variables and . There was a significant difference between the rates of (mean = 0.2775, SD = 0.001) and the rates of (mean = 0.1891, SD = 0.0009); (99) = 649.85; . Power spectrum of the gain function in showed higher components than in up to the (lower bound of) beta-band (Figure 4). Level of synchronization between the instantaneous phases calculated by the Hilbert-transform of the rates of populations’ activity variables and was determined in terms of correntropy coefficients (Figure 5). Departures from zero values indicate phase locking. To test the null hypothesis of asynchronous state, the vector of correntropies between the surrogate instantaneous phases was considered as representative of the null hypothesis. The distance between and was expected to be distributed as a Weibull random variable (see the appendix and Figure 6) with shape and scale parameters and . According to the Weibull-like distribution, we found that the test statistic, mean (distance)/S.E.(distance) = 15.323, was significantly different from zero (). Values in the distance feature vector greater than the critical value = 0.756 (for the significance level of 5%) revealed the times of synchronization occurrence (Figure 7). Synchronized activities of the neuronal population were concentrated in the time interval between and and peaked also at .

#### 4. Conclusions

The model presented in this study assumes that the trajectories of an observable variable induced by the TAFC decision making task are conditional to the final state , and so they trace the information processing. Under this hypothesis, the possible association between the formation of a decision, as determined by the path, and the final state of the decisional process can be investigated by considering that populations of neurons determine neuronal responses to stimuli (Figure 3). More specifically, here it is hypothesized that the series of the likelihood are generated by sequential activation of two neuronal populations and and that the decisional process is the effect of accumulation of activity by a pool of neuron populations. This would engender diffusive dynamics of the accumulated evidence. Thus, the proposed model, 2LDM, is, to a certain extent, an implementation of the two-stage circuitry of valuation and decision which is computationally reliable in terms of both neurobiology and Bayesian theory [72, 73]. From this perspective, likelihood ultimately relies on commutations from one internal representation to another, according to their diffusive processes of activation.

There is a theoretical linkage between 2LDM and the well-recognized integrate-and-fire attractor network model [33, 48–50] since both models rely on nonlinear diffusive dynamics. Major difference rests in the expected dynamics of the basal ganglia involved during the decision making process, which we considered driven by nonlinear patterns rather than linear patterns. Furthermore, the characterization of the input-output map in terms of the infomax principle makes, ultimately, the 2LDM an entropy-thresholding algorithm where the model’s parameters (threshold, diffusion noise, and drift) should be tuned to maximize the mutual information between the representations they engender and the inputs that feed the layers. This is consistent with the Q-learning adaptation, since learning the “best” action on the two thresholds to maximize the cumulative entropy is equivalent to learning the optimal behavior which maximizes the reward [74, 75]. Nonlinearity in the 2LDM is given by static linear-nonlinear functions that express the gain of the input-output map, so overcoming the theoretical weakness inherent in the canonical diffusion models which assume that momentary evidence is accumulated continuously and at constant rate, that is, linearly, until a decision threshold is reached. This way to model nonlinear dynamics is not a novelty in neuroscience because it fits for Volterra series representation which, through the first- and second-order kernels, estimates the driving and modulatory influence that one population exerts on the other. The slope of sigmoidal transfer function yields information about the effective connectivity between the neuronal populations, because it is a proxy of the Volterra kernels [76].

Simulation was used to test the ability of 2LDM to represent interactions between the neuronal populations on reliable time series and did not aim at investigating the underlying cognitive process. Synchronous interaction was present within a restricted median time interval, where, supposedly, the dynamics of the two neuronal populations were mutually reinforcing [44]. Instead, asynchronous interaction was prominent. This kind of finding is expected for modulatory (i.e., top-down) connections rather than for driving influence. Neurobiological consistency of the results was found also in terms of the power spectrum of the gain functions, which showed rhythmic oscillations in the low-frequency bands (from delta- to beta-bands). The spectral content of neuronal activity in the circuits of valuation and choice may yield information about the mechanisms underlying the DM [77]. In fact, neuronal oscillations are associated with reverberating activity at local and large scale [78] and reverberation would elicit prolonged accumulation of evidence during decision making [79]. Delta-band oscillations in cortical areas have been associated with attention [80], while the occurrence of synchronization in the delta-band is reported to be widespread and modulated by the different decision alternatives and context specific [81]. Theta-band oscillations are expected to operate in many cognitive functions including memory and DM [82, 83]. The striatum oscillations in theta frequency range are prevalent but activity in lower band is also observed [84]. In a study of DM [85], oscillations in alpha and beta frequency bands had been found synchronized with the phase of delta and theta oscillations (*phase-amplitude coupling*) in medial frontal cortex. This synchronization might reflect a mechanism of feedback valence coding in the medial frontal cortex. Beta-band activity has been linked to reverberation, which is a possible mechanism for memory consolidation and accumulation of evidence [86], as well as to computational operating in DM rather than neuronal representation of the sensory evidence [87]. Our finding of increased beta activity (although in the lower bound of the beta frequency range) in the second neuronal population, which would perform the selection of the optimal alternative, seems consistent with this latter perspective.

Improvement in the optimization of the 2LDM parameters is expected by considering other error functions instead of RMSE if the distribution of the residuals is not Gaussian and is heavy-tailed such that it exhibits large skewness and kurtosis. A challenging task would be the implementation of further layers for studying the subcircuits possibly involved in the valuation or choice stage of DM (e.g., the direct and indirect pathways in BG). Finally, the application of 2LDM to specific cognitive experimental task would yield information about how the speed-and-accuracy performance may vary on the base of some psychometric or behavioral smoothing parameter.

#### Appendix

#### Statistics of Distances between Features

To measure the similarity between two feature vectors, many distance measures have been proposed [89, 90]. A common metric class of measures is the -norm. The distance from one reference vector to another feature vector can be formalized as In order to derive the distribution of the variable we can refer to the following.

Lemma A.1. *For nonidentical and correlated random variables , their sum is distributed according to the generalized extreme value distribution (Gumbel, Frechét, and Weibull).*

Lemma A.2. *If in Lemma A.1 the variables are upper bounded, the sum of variables is Weibull-distributed.*

Theorem A.3. *For nonidentical, correlated, and upper bounded variables , the random variable , expressed by their sum, adheres to the Weibull distribution.*

Corollary A.4. *For finite length feature vectors with nonidentical, correlated, and upper bounded values, the -distances for limited , from one reference feature vector to other feature vectors, adhere to the Weibull distribution.*

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.