Abstract

This paper investigates the theoretical bound to reduce the parameter uncertainty in Bayesian adaptive estimation for psychometric functions and proposes an exploration-exploitation (E-E) approach to improve the computation efficiency for parameter estimations. When the experimental trial goes on, the uncertainty of the parameters decreases dramatically and the space between the maximal mutual information and the theoretical bound gets narrower, so the advantage of classical Bayesian adaptive estimation algorithm diminishes. This approach tries to trade off the exploration (parameter posterior uncertainty) and the exploitation (parameter mean estimation). The experimental results show that the proposed E-E approach estimates parameters for psychometric functions with same convergence and reduces the computation time by more than 34.27%, compared with the classical Bayesian adaptive estimation.

1. Introduction

Bayesian adaptive estimation plays an important role in certain parameter estimations of psychometric functions [15]. In psychophysics, psychometric function reflects the quantitative relationship between physical stimulation and subject’s psychological perception [2]. Watson and Pelli first applied the QUEST method in psychophysics [4]. Gradually, Bayesian adaptive estimation has been developed and widely used in psychophysics, behavioral and neural sciences [1, 6], clinical fields [7, 8], etc. It sequentially selects the stimulus, in the way of minimizing the uncertainty of parameters, and then updates the parameter prior distribution, to effectively estimate the parameters.

More and more practical experiments are undertaken online [9] (e.g., the research of driving behaviors [6, 10], clinical [7, 8, 11], and visual perception [1214]). Therefore, the challenge faced by the researchers is the computation efficiency in optimizing the stimulus after collecting subject’s data during typical psychophysical experiments [10]. One way is to estimate multidimensional parameters simultaneously. Kontsevich and Tyler proposed the method to estimate two-dimensional parameters [5], and Kujala and Lukka applied this method to more general psychometric functions [1, 2, 10]. Then, psychometric functions with higher dimensional parameters were estimated, such as four-dimensional parameters in the contrast sensitivity function and driving gap acceptance function [6, 13]. Furthermore, Watson extended the QUEST method to estimate psychometric parameters with multiple dimensions [15]. On the other hand, it is well known that the optimization algorithm of Bayesian adaptive estimation considers to make full use of the information contained in parameter distribution and emphasizes the convergence of the estimation. Kuss et al. discussed the importance of parameter prior distributions to extract the information contained in experimental data [3]. The authors in [1620] considered the estimation deviation of parameters by effectively using the limited measurement information to improve the estimation efficiency.

However, in each implementation trial of Bayesian adaptive estimation, the optimization algorithm selects the most informative stimulus, by searching the parameter space of the psychometric function [1, 6, 13]. If the parameter dimension increases, the time complexity of the stimulus selection increases exponentially. The upper bound of the information gained from the optimization algorithm [21, 22] has not been well studied, and how the theoretical bound impacts the stimulus selection as well as the computation efficiency needs further investigation. Furthermore, the MSE curves of the estimated parameters in experiments usually become almost level after some trials [6], which is also desirable to be explained.

This paper investigates the theoretical upper bound of the information gain of the parameters resulting from the optimization algorithm of Bayesian adaptive estimation. This bound theoretically decides how much information the estimation algorithm can gain trial by trial and explains why the advantage of information gain from Bayesian adaptive estimation diminishes with the decrease of the uncertainty of the parameter distribution. Therefore, this paper proposes the exploration-exploitation (E-E) approach to improve classical Bayesian adaptive estimation by selecting the stimulus randomly once the low-parameter uncertainty is detected, from the perspective of machine learning [21, 2325]. The proposed approach tries to trade off the exploration (parameter posterior uncertainty) and the exploitation (parameter mean estimation). It is not necessary for the exploitation trials to search the stimulus space and parameter space to calculate the maximal mutual information repeatedly and thus to improve the computation efficiency substantially. The proposed E-E approach is applied to two parameter estimation instances, contrast sensitivity function (CSF) and heterogeneous gap acceptance function (GAF). Experiment simulation results demonstrate that the computation time is saved by 34.74% for CSF and 34.27% for GAF with same MSE convergence. Thus, the proposed algorithm, compared to the classical Bayesian adaptive estimation, is more suitable for the practical online experiment implementations.

2. Problem Statement

2.1. Psychometric Function

In psychophysics, the psychometric function is used to describe the probability of psychological feedback after a certain stimulus is applied to the individual subject [2]. Usually, the psychometric function with multidimensional parameter is represented as , where is the stimulus and is the random binary feedback which indicates that the subject “rejects” or “accepts” the given stimulus. When given the stimulus , the conditional probability of the subject’s feedback can be expressed as

The objective is to estimate the subject’s true parameter of the psychometric function in as few steps as possible, due to the cost of collecting the individual subject’s data. Parameter estimation problems exist in many fields such as visual [12, 13], olfactory [26, 27], and behavioral [6, 28] research.

2.2. Bayesian Adaptive Estimation

Bayesian adaptive estimation is mainly used to estimate the subject’s parameter in psychometric function . Let be the random variable of the subject’s feedback and be the random variable of the parameters. Given the feedback space , stimulus space , and parameter space , it selects the most informative stimuluswhere is the parameter prior distribution for trial and mutual information measures the information gain between parameter and observation of the subject [1, 10]. Observe the subject’s feedback after applying stimulus . According to Bayes rule, update the parameter posterior probability:which is the prior probability for trial . The details of Bayesian adaptive estimation algorithm can be found in [1, 5], and the flowchart is shown in Figure 1 [6].

The basic idea of Bayesian adaptive estimation is to find the most informative stimulus to gain the maximal information in each trial and thus to reduce the parameter posterior uncertainty maximally trial by trial, according to equation (3). Currently, Bayesian adaptive estimation adopts the gridding method to discretize the parameter space [29]. The parameters of psychometric function are estimated by mathematical expectation (MEAN) or maximum a posterior probability (MAP) of the parameter posterior [6, 13, 29, 30].

3. Theoretical Bound of Information Gain

Previous experiments indicate that the parameter posterior tends to be peaky and the uncertainty of the parameters decreases, when the implementation of Bayesian adaptive estimation converges. The parameter posterior distribution is concentrated towards the mean of the distribution. Moreover, Kujala [31] and Paninski [32] presented the asymptotic theory about the convergence of Bayesian adaptive estimation, i.e., the parameter posterior distribution is asymptotically normal [31]. Bayesian adaptive estimation selects the most informative stimulus to gain the maximal information. However, this maximal mutual information has the upper bound, which decides that the space for the information gained from the trial is limited. It is important to measure theoretically how much parameter uncertainty reduction or space for information gain can be anticipated by using this optimization strategy [1, 10, 21].

Given the psychometric function and the conditional probability in equation (1), the mutual information can be formulated as [1, 2, 6, 32]

For the symmetry, the mutual information can be rewritten as [2, 33]wherewhere is defined as the entropy of the binary distribution with probability and [6].

Theorem 1. Let be the stimulus and be the prior distribution of parameter ; then,where is the entropy of parameter .

Proof. For stimulus , the following holds:by the theory of information [31]. In fact, holds given any stimulus d; then, . Also,

Proposition 1. Let be the stimulus and be the random variable of the subject’s feedback; then, in Bayesian adaptive estimation,where is the conditional entropy of parameter .

Proof. By the theory of information cannot hurt [33, 34], we can getTheorem 1 indicates that the mutual information for any stimulus will never be greater than the entropy of parameter , i.e., the information gained from the trial will be less than the uncertainty of parameter , in the sequential decision of Bayesian adaptive estimation. Proposition 1 indicates that under the observations of the random subject’s feedback , the parameter entropy can be reduced for any stimulus and decreases monotonically and sequentially.
In the implementation of Bayesian adaptive estimation, the parameter posterior becomes peaky and asymptotically normal [31, 32], i.e., the parameter posterior distribution is asymptotically normal such that the determinant of the posterior covariance in a certain neighborhood of the true subject parameter value is asymptotically minimal [32]. This can be explained by Proposition 1 that the uncertainty of parameters decreases monotonically. When the parameter uncertainty is low enough, the maximal mutual information will be close to the current parameter entropy. The information gained from the maximal mutual information decreases gradually, and the advantage obtained from Bayesian adaptive estimation diminishes continuously. This can clarify why the MSE curves of the estimated parameters become almost level after some trials, which is mentioned in Introduction. On the other hand, the space to reduce the parameter uncertainty is narrow and the parameter uncertainty will continuously decrease with different stimulus from Proposition 1. So, different stimuli do not create much difference in the information gain from the Bayesian inference, especially in the MSE curves of the parameters. In this case, we can use the other strategy to select the stimulus instead of the most informative stimulus without hurting the accuracy of parameter estimation.

4. Exploration-Exploitation Approach for Bayesian Adaptive Estimation

It should be noticed that the optimization algorithm of classical Bayesian adaptive estimation searches the parameter space and stimulus space to compute the maximal mutual information for each trial , by calling psychometric function repeatedly. According to Theorem 1, when the entropy of the parameters in the implementation of Bayesian adaptive estimation is low enough, the space to gain the information gets narrow dramatically. In this case, we can try the other strategy to select the stimulus to avoid the large computation. This paper proposes the exploration-exploitation (E-E) approach to generate the stimulus randomly to enhance the computation efficiency, instead of the most informative stimulus in Bayesian adaptive estimation, when the low-parameter entropy is detected. Therefore, this proposed approach tries to trade off the exploration (parameter posterior uncertainty) and the exploitation (parameter mean estimation).

4.1. Exploration Based on Maximal Mutual Information

For trial , when the parameter distribution is still highly uncertain and Bayesian adaptive estimation has great advantages to explore the stimulus space, the maximal mutual information is far away from the current bound and the algorithm chooses to gain the information maximally. Then, observe the subject’s response and update the parameter prior distribution by Bayesian inference.

4.2. Exploitation Based on Random Stimulus

For trial , when the parameter distribution has low uncertainty and the maximal mutual information is close to the bound, we carry out the exploitation strategy by randomly selecting one stimulus , observe the subject’s response, and update the parameter prior distribution. Because no searching in the stimulus space and parameter space is required, such strategy greatly improves the computation efficiency. According to Proposition 1, this exploitation strategy will continuously reduce the parameter uncertainty and gradually sharpen the parameter distribution after Bayesian inference.

4.3. Algorithm for Exploration-Exploitation Approach

Based on the above analysis, we propose the algorithm of the proposed E-E approach, the exploration-exploitation Bayesian adaptive estimation. To implement the algorithm, we adopt the threshold to the bound of the maximal mutual information. If , the proposed algorithm selects stimulus through maximal mutual information; otherwise, the algorithm selects stimulus randomly.

To estimate the parameters of given psychometric function , all inputs of Algorithm 1 are initialized. Step 2 of Algorithm 1 calculates the mutual information for all stimulus in the current experimental trial. Step 3 selects the most informative stimulus, by using exploration strategy to update the parameter prior for the next trial, and calculates the parameter entropy to decide whether to go to Step 4. Step 4 selects stimulus randomly, by using exploitation strategy to update the parameter prior. The parameter estimator is calculated by the MEAN of the parameter posterior.

Input: parameter space , initial parameter prior , stimulus space , threshold , experiment trial .
Output: estimated parameter .
Step 1: Set .
Step 2: For all , compute by (1). Compute binary posterior entropy and binary conditional posterior entropy by equations (5) and (7). Calculate mutual information .
Step 3: Select and compute parameter entropy . Apply the stimulus to the subject and observe the subject’s response . Update the parameter prior distribution , and then let . If ; go to Step 2. Else, go to Step 4.
Step 4: Select randomly. Apply the stimulus to the subject and observe the subject’s response . Update the parameter prior distribution . If , let and go back to Step 4. Else, output the parameter estimator .
End

The asymptotic theory presented by Paninski shows that the Bayesian adaptive estimation converges for psychometric functions [32]. It is well known that the convergence holds when choosing the stimulus randomly [32]. Thus, the E-E approach will finally converge, no matter when to switch from the exploration procedure based on the maximal mutual information to exploitation procedure based on the random stimulus.

5. Experiment Simulations

To demonstrate the performance and computation efficiency of the proposed E-E approach for Bayesian adaptive estimation, we conduct experiment simulations for the parameter estimation problems of contrast sensitivity function (CSF) and heterogeneous gap acceptance function (GAF). The CSF and GAF are classic empirical models from the fields of vision [35] and transportation [28], respectively, and the Bayesian adaptive estimation method for CSF and GAF models was studied by Lesmes et al. [13] and Zhu and Zhang [6]. In this paper, we conduct computer simulations instead of real-word experiments. At each trial of the simulated experiment, the most informative design or random design for the parameter estimation is computed, and the subject’s feedbacks are observed. The performance of the proposed EE-BAE algorithm is compared with the classical Bayesian adaptive estimation algorithm, and both algorithms are implemented in MATLAB R2018a with CPU i5-10400F, RAM (16 GB DDR4), and GPU Nvidia GeForce RTX 2060s (8G).

In Bayesian adaptive estimation, the choice of parameter initial prior distribution greatly influences the estimation convergence [2, 3, 6, 13]. This paper focuses on the performance and computation efficiency of the E-E approach with the theoretical bound. Therefore, to avoid the influence of parameter initial prior distribution, the paper adopts the non-informative uniform prior [6] as the initial prior distribution for both the proposed E-E approach and the classical Bayesian adaptive estimation. To reduce the randomness effect in the simulations, each experiment is repeated for 5000 times. In order to make fair comparisons, all initial settings and gridding settings are set the same.

The mean square error (MSE) [6, 13] between the estimated parameter value and the true value is assessed as the criterion for both algorithms. The MSE for the true parameter in the psychometric function is defined aswhere is the estimator of parameter in trial .

5.1. Contrast Sensitivity Function

Contrast sensitivity (CS) is a clinical measure to predict the functional vision. The parameter estimation problem of the contrast sensitivity function (CSF) mainly investigates how grating sensitivity varies with spatial frequency and contrast in the visual perception [13, 35]. CSF can be represented as [13, 35].where andwith the logarithmic sensitivitywhere , . and are the stimuli, where is the grating frequency and is the grating contrast. is the parameter vector to be estimated. The subject has binary feedback , where indicates the correct response for the grating of frequency and contrast and indicates the wrong response.

The ranges of the parameters in CSF are set as , , , and . The ranges of stimuli are set as and [13, 35]. The experiments are conducted by the grid searching. 20 grid points are set for each parameter, and 20 points are set for each stimulus ( and . The parameter estimation experiment simulations are conducted for 250 trials, and the threshold of the E-E approach value in the simulations is given as to apply the exploitation strategy in 110 trials. The parameter entropy curves for by the proposed E-E approach and classical Bayesian adaptive estimation can be seen in Figure 2. The MSE performances of two approaches are compared (as shown in Figure 3).

Figure 2 shows that curves of two methods decrease monotonically as discussed in Proposition 1. Curves of the E-E approach and classical Bayesian adaptive estimation decrease quickly. This means that the proposed E-E approach can effectively reduce the uncertainty of parameters. It is reasonable to see that the red line is a little higher than the black line after the exploitation strategy is applied because classical Bayesian adaptive estimation selects the most informative stimulus for all trials.

Both the proposed E-E approach and the classical Bayesian adaptive estimation select the most informative stimulus when , and the E-E approach selects the stimulus randomly after . The MSE curves of the E-E approach and the classical Bayesian adaptive estimation converge and almost overlap as shown in Figure 3. This is slightly different from the curves because we take the mean to compute the parameter estimator. Therefore, the proposed E-E approach trades off the parameter posterior uncertainty and the parameter mean estimation.

Figure 3 shows that both the proposed E-E approach and the classical Bayesian adaptive estimation can accurately estimate all parameters in the CSF estimation, and the difference between their MSE performance is marginal. For 250 trials, the experiment of the E-E approach runs for 7.57 seconds, but the classical Bayesian adaptive estimation runs for 11.60 seconds. The computation time of the E-E approach is substantially shortened by 34.74%.

5.2. Heterogeneous Gap Acceptance Function

The heterogeneous gap acceptance studies the driver’s response (acceptance or rejection) to different driving gaps when crossing a traffic stream, to provide the driving propensity of the individual driver [6, 28]. Miller’s heterogeneous gap acceptance function with two parameters is represented as [6, 28].

The stimulus is the gap which driver faces, and is the standard cumulative normal probability function. Binary feedback indicates that the driver accepts the gap when facing it, and indicates that the driver rejects . are the driver’s parameters to be estimated.

The ranges of the parameters of GAF are set as , . The range of stimuli is set as [6, 28]. The experiment simulations are conducted by the grid searching. 20 grid points are set for each parameter, and 25 points are set for the stimulus . The experiments are taken for 300 trials, and the threshold of the E-E approach is set as to apply the exploitation strategy in 160 trials. The parameter uncertainty is compared in each experimental trial by the E-E approach and classical Bayesian adaptive estimation, as shown in Figure 4. The MSE comparisons for GAF parameter estimation by two methods are shown in Figure 5.

Figure 5shows that in the parameter entropy of CSF between two methods decreases monotonically and quickly. Similar to the explanation for the results of CSF, the red E-E approach line diverts a little higher from the black classical algorithm line.

Figure 4 shows the performance comparisons of the parameter estimation of GAF between the E-E approach and the classical Bayesian adaptive estimation. Similar to the results of CSF experiments, the MSE curves of both E-E approach and classical Bayesian adaptive estimation converge and the MSE difference between two methods is minor (as shown in Figure 4). The parameter estimation of converges faster than parameter obviously. GAF has two parameters, and CSF has four parameters to be estimated. The true parameter values for CSF are selected far away from the mean of initial prior distribution, and the true parameter values for GAF are selected close to the mean of initial prior distribution. So, the shapes of MSE curves of GAF are different from CSF, and the convergence of the estimations takes more trials. For 300 trials, the experiment of the E-E approach runs for 2.11 seconds, but the classical Bayesian adaptive estimation takes 3.21 seconds. The computation time of the E-E approach is saved by 34.27%.

6. Conclusion

The paper investigates the theoretical bound of the information gained from Bayesian adaptive estimation for the parameter estimation in psychometric functions. The advantage to gain the information from classical Bayesian adaptive estimation is limited when the parameter posterior distribution gets peaky. Especially, the bound of the information gain gradually decreases when the estimation experimental trial goes on. Thus, the paper proposes the exploration-exploitation approach to accelerate the computation by selecting the stimulus randomly once the low-parameter uncertainty is detected and trades off the parameter posterior uncertainty and the parameter mean estimation. The experiment simulation results, from the parameter estimations of psychometric functions CSF and GAF, indicate that the proposed approach improves the computation efficiency by 34.74% for CSF and 34.27% for GAF with the same accuracy for estimations. This computation efficiency is well suitable for online experiments. The proposed exploration-exploitation approach for Bayesian adaptive estimation can be applied in parameter estimations of various psychometric functions in psychophysics. It can be also extended to behavioral and neural sciences and clinical and more fields using the idea of Bayesian adaptive estimation.

Data Availability

No underlying data were collected or produced in this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (grant no. 11126355) and Innovation Team of School of Mathematics and Statistics of Yunnan University (grant no. ST20210106).