Computational Intelligence and Neuroscience

Volume 2015, Article ID 493769, 11 pages

http://dx.doi.org/10.1155/2015/493769

## Estimating Latent Attentional States Based on Simultaneous Binary and Continuous Behavioral Measures

Departments of Psychiatry, Neuroscience and Physiology, School of Medicine, New York University, New York, NY 10016, USA

Received 14 December 2014; Revised 25 February 2015; Accepted 9 March 2015

Academic Editor: Pasi A. Karjalainen

Copyright © 2015 Zhe Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Cognition is a complex and dynamic process. It is an essential goal to estimate latent attentional states based on behavioral measures in many sequences of behavioral tasks. Here, we propose a probabilistic modeling and inference framework for estimating the attentional state using simultaneous binary and continuous behavioral measures. The proposed model extends the standard hidden Markov model (HMM) by explicitly modeling the state duration distribution, which yields a special example of the hidden semi-Markov model (HSMM). We validate our methods using computer simulations and experimental data. In computer simulations, we systematically investigate the impacts of model mismatch and the latency distribution. For the experimental data collected from a rodent visual detection task, we validate the results with predictive log-likelihood. Our work is useful for many behavioral neuroscience experiments, where the common goal is to infer the discrete (binary or multinomial) state sequences from multiple behavioral measures.

#### 1. Introduction

##### 1.1. Motivation

In behavioral neuroscience experiments, a common task is to estimate the latent attentional or cognitive state (i.e., the “mind”) of the subject based on behavioral outcomes. The latent cognitive state may account for an internal neural process, such as the motivation and attention. This is important since one can relate the latent attentional or cognitive state to the simultaneous neurophysiological recordings or imaging to seek the “neural correlates” at different brain regions (such as the visual cortex, parietal cortex, and thalamus) [1–4]. Naive determination of such latent states might lead to erroneous interpretations of the result and in some cases even affect the scientific statement. Therefore, it is important to formulate a principled approach to estimate the latent state underlying the behavioral task, such as attention, detection, learning, or decision making [5–9].

In a typical experimental setup of attention task, animals or human subjects are instructed to follow a certain (such as visual or auditory) cue to deliver their attention to execute the task. At each trial, the experimentalist observes the animal’s or subject’s behavioral outcome (which is of an either binary or multiple choice) as well as the latency (or reaction time) from the cue onset until the execution. However, it shall be cautioned that the observed behavior choice does not necessarily reflect the underlying attentional or cognitive state. For instance, a “correct” behavioral choice can be due to either unattended random exploration or attended execution. In contrast, an “incorrect” behavioral choice can be induced by unattended random exploration or attended yet erroneous decision. Therefore, a simple and direct assignment of behavioral outcomes to attentional states can lead to a false statement or misinterpretation on the behavior. To avoid such errors, it is essential to incorporate* a priori* knowledge or all experimental evidence to estimate the latent state. One direct behavioral measure is the statistics of the latency. Another prior information is the task difficulty and the animal’s overall performance. Based on the animal’s experiences (naive versus well-trained) or the task difficulty, one can make a reasonable assumption about the dynamics of latent state process. Similar rationale also applies to other cognitive tasks that involves latent state, such as learning, planning, and decision making.

Markovian or semi-Markovian models are powerful tools to characterize temporal dependence of time series data. Markovian models assume history independence beyond the consecutive states (whether it is first-order or high-order dependence), whereas semi-Markovian models allow history dependency; therefore, they are more flexible and they accommodate the Markovian model as special cases. In addition, semi-Markovian models can often be transformed into Markovian models by embedding or augmentation (such as the triplet Markov model) [10]. Typically, Markovian or semi-Markovian models presume stationary probability distributions (for state transition as well as the likelihood function) in time, although this assumption may deviate from the real-life data that often exhibit different degrees of nonstationarity. Despite such deviation, we still believe that Markovian or semi-Markovian models are appropriate for modeling a large class of behavioral data. In addition, statistical models can be adapted to accommodate nonstationarity via online learning, especially for large data sets [11–13].

##### 1.2. State of the Art

In the literature, there has been a few works attempting to estimate latent attentional or cognitive states based on simultaneous binary and continuous behavioral measures [15]. In their work, the latent cognitive state was modeled as a continuous-valued random-walk process (which is Markovian). The inference was tackled by an expectation maximization (EM) algorithm [16, 17] based on state space analysis [18, 19].

Alternatively, the attentional state can also be characterized by a discrete or binary variable. Assuming that the attentional state is Markovian or semi-Markovian, one can model the latent process via a hidden Markov model (HMM) [20, 21] or a variable-duration HMM [22] or a hidden semi-Markov model (HSMM) [23–27]. We use the semi-Markovian assumption here. The contribution of this paper is twofold. First, motivated from neuroscience experiments, we formulate the behavioral attention task as a latent state Markovian problem, which may open a way of data analysis in behavioral neuroscience. Specifically, we extend the explicit-duration HMM (or HSMM) to mixed observations (with discrete behavioral outcome and continuous behavioral latency) and derive the associated statistical inference algorithm. This can be viewed as modeling conditionally independent variables with parametric observation distributions in HMM or HSMM [28]. Second, we apply the proposed method to analyze preliminary experimental data collected from a mouse visual attention task.

The rest of the paper is organized as follows. In Section 2, we will present the method that details probabilistic modeling and maximum likelihood inference for the HSMM. Section 3 presents the results from simulated data as well as experimental data collected from free-behaving mice performing a visual detection task. We conclude the paper with discussions in Section 4.

#### 2. Method

##### 2.1. Probabilistic Modeling

We formulate the attention process as a hidden semi-Markov chain of two states, where (0: unattended; 1: attended) denotes the latent binary attention variables at trial . Conditioned on the attention state , we observe discrete (here, binary) choice outcomes (0: incorrect; 1: correct) and continuous, nonnegative latency measures . Unlike the HMM, the HSMM implies that the current state depends not only on the previous state, but also on the duration of previous state [25, 29]. To model such time dependence, we introduce an explicit-duration HMM. Specifically, let denote the remaining sojourn time of the current state . In general, the probability distribution of the sojourn time iswhere the indicator function if and zero otherwise. In the case of modeling intertrial dependence, the sojourn time is a discrete random variable ; therefore, the explicit duration distribution can be characterized by a matrix , where () and the integer is the maximum duration possible in any state or the maximum interval between any two consecutive state transitions. Because of the state history dependence, the state transition is only allowed at the end of the sojourn:

Similar to the standard HMM, the HSMM is also characterized by a transition probability matrix (), where , as well as an emission probability matrix , where and . The initial state probability is denoted by . For all matrices , **,** and , the sum of the matrix rows is equal to one.

Furthermore, we assume the conditional independence between the binary behavioral measure and the continuous behavioral measure ; this implies thatwhere is characterized by a probability density function (PDF) parameterized by . Since the latency variable is nonnegative, we can model it with a probability distribution with positive support, such as exponential, gamma, lognormal, and inverse Gaussian distribution. For illustration purpose, here we model the latency variable with a lognormal distribution :where denotes the univariate latency variable; is normally distributed with the mean and variance ; and . The lognormal distribution is of the exponential family.

Notes the following.(i)Note that it is possible to convert a semi-Markovian chain () to a Markovian chain by defining an augmented state and defining a triplet Markovian train (TMC) [10]. The triplet Markov models (TMMs) are general and rich and consist many Markov-type models as special cases.(ii)If multivariate observations from behavioral measure become available, we can introduce multiple probability distributions (independent case) or multivariate probability distributions (correlated case) to characterize statistical dependency [30].

##### 2.2. Likelihood Inference

The goal of statistical inference is to estimate the unknown latent state sequences and the unknown variables . Following the derivation of [29], here we present an expectation-maximization (EM) algorithm for simultaneous binary and continuous observations.

We first define a* forward variable* as joint posterior probability of and :and the marginal posterior probabilityIn addition, we define the ratio of the* filtered* conditional probability over the* predicted* conditional probability: where the third step of (7) follows from as well as the Markovian property and the last step of (7) follows from the conditional independence between and .

To compute the predictive probability, we define and where . Therefore, the observed data likelihood is given byConditional on the parameters , the expected complete data log-likelihood is written asOptimizing the expected complete data log-likelihood with respect to the unknown parameters yields the maximum likelihood estimate.

Similar to [29], we introduce notations for two conditional probabilities:where denotes the conditional probability of state starting at state and lasts for time units given the observations; and denotes the conditional probability of state transition from to . Note the consistency holds for .

To derive the forward-backward updates, we further define a* backward variable * as the ratio of of the smoothed conditional probability over the predicted conditional probability : where the third equality of (14) follows from

For notation convenience, we define another four sets of random variables:where and represent the forward and backward recursions, respectively. Note that we also have [29]

##### 2.3. EM Algorithm

The EM algorithm for the explicit-duration HMM consists of a forward-backward algorithm (E-step) and the reestimation (M-step). The E- and M-steps are run alternatingly to optimize the expected log-likelihood of the complete data (12).

In the E-step of forward-backward algorithm (note that when , the forward-backward algorithm reduces to the standard Baum-Welch algorithm used for the HMM.), we can recursively update the forward variable and backward variable . Specifically, in the forward update,with an initial value . And in the backward update,with an initial value for any . In the end, we obtain the smoothed conditional probabilities , , and and .

In the M-step, we use the smoothed probabilities for reestimating the model parameters : where , , , and are normalizing constants such that the sum of probabilities is equal to one. In addition, the unbiased maximum likelihood estimates of in the lognormal distribution are given by where .

Upon the algorithmic convergence (the convergence criterion is set as the consecutive log-likelihood increment is less than a small-valued threshold, say ), we compute the* maximum a posteriori* (MAP) estimates of the state and duration as

##### 2.4. Model Selection

In practice, the maximum length of state duration is usually unknown, and we need to estimate the order of the HSMM (since the state dimensionality is fixed here). In statistics, common model selection criteria include the Akaike information criterion (AIC) or Bayesian information criterion (BIC):where denotes the total number of free parameters in the model. Alternative order estimator has been suggested [25]: with .

It shall be emphasized that the AIC and BIC are only asymptotically optimal in the presence of large amount of samples. In practice, experimental behavioral data is often short, and therefore it shall be used with caution or combined with other criteria.

##### 2.5. Alternative Parametric Formulation

Previously, we have assumed a nonparametric probability for (), which has degrees of freedom. Alternatively, we may assume that the state duration is modeled by a parametric distribution, such as the geometric distribution where , , and . In this case, the probabilistic model has degrees of freedom.

For the associated EM algorithm, the E-step remains similar (replacing the calculation of ), whereas the M-step includes additional step to update the parameters of parametric distribution. For instance, in the case of geometric distribution, the parameter is updated as which is similar to the* methods of moments* in maximum likelihood estimation.

#### 3. Results

##### 3.1. Simulated Data

*Setup*. In computer simulations, we set the total number of trials as , with the maximum state duration . We simulate the state sequences and observations using the following matrices: The structure of the matrix implies that, for the unattended state, there is a higher probability for state duration of two; for the attended state, the highest probability is for state duration of three. Conditional on the attentional state, the latency variable is assumed to follow a lognormal distribution: (for the unattended state) and (for the attended state). Two distributions have approximately 13.5% overlap in the area (Figure 1). One realization of simulated latent attentional state sequence and behavioral sequence are shown in Figure 2. Comparing Figures 2(d) and 2(e) in this illustration, we can see the estimate using both behavioral measures is more accurate and closer to the ground truth (Figure 2(a)).