Abstract

Attention allows us to selectively process the vast amount of information with which we are confronted, prioritizing some aspects of information and ignoring others by focusing on a certain location or aspect of the visual scene. Selective attention is guided by two cognitive mechanisms: saliency of the image (bottom up) and endogenous mechanisms (top down). These two mechanisms interact to direct attention and plan eye movements; then, the movement profile is sent to the motor system, which must constantly update the command needed to produce the desired eye movement. A new approach is described here to study how the eye motor control could influence this selection mechanism in clinical behavior: two groups of patients (SCA2 and late onset cerebellar ataxia LOCA) with well-known problems of motor control were studied; patients performed a cognitively demanding task; the results were compared to a stochastic model based on Monte Carlo simulations and a group of healthy subjects. The analytical procedure evaluated some energy functions for understanding the process. The implemented model suggested that patients performed an optimal visual search, reducing intrinsic noise sources. Our findings theorize a strict correlation between the “optimal motor system” and the “optimal stimulus encoders.”

1. Introduction

The human vision system is a foveocentric structure reflecting the specific anatomical distribution of photoreceptors across the retina, which ensure the best resolution just in a small central region called fovea; outside this region, visual resolution decreases sharply. To overcome this perceptive limit, the human brain has developed fast and accurate eye movements (saccades) for pointing the fovea at interesting objects in space [1]. In other words, each saccade landing point represents the locus in space where the fovea gets the most detailed information; outside this point, elements of a scene may be localized but are less accurately distinguished. Due to this physiological constraint, the amount of information that can be processed at once by visual system is limited; therefore, spatial attention is used to select relevant locations of the visual field for enhanced processing [2] that may occur overtly when associated with an eye movement toward the selected location or covertly without an eye movement. While overt attention is strictly related to saccade motor programming and focuses on the saccade goal, covert attention is more linked to the perceptual characteristics of the peripheral vision and is under the influence of stimulus features (bottom up attraction) and cognitive enhancement (top down conduction). It has been found that visual search of complex scenes is influenced by both top-down factors [3] including previous knowledge, expectations, current cognitive status, and expected goals and bottom-up factors that reflect sensory features of the stimulus such as orientation, luminance, shape, and brightness. In particular bottom up driving of gaze toward the most salient stimuli occurs first; then as visual exploration goes along; there is an increment of cognitive processes influencing visual search in a top down modality. By combining the top down and bottom up information during search, our brain gets a clear view of the conspicuous items (both in terms of cognitive relevance and feature saliency) and their location becoming able to build up an internal, task specific, representation of the scene [46]. More specifically the relative conspicuity of each element of the scene is reproduced into a saliency map, which allows to predict where the eyes will be attracted first during the exploration of a scene. Although the saliency map simply refers to the bottom up characteristic of a scene, a more top-down influence of the visual search is represented in the priority map [7]. Covert attention is particularly efficient during top-down visual search where it is used to collect global information of the scene throughout a parallel processing of conspicuous elements and to increase the discrimination abilities of peripheral vision by enhancing spatial resolution at the attended target location [8] and by fading irrelevant locations. From a neuro hysiological point of view, object’s features and spatial localizations acquired during visual exploration through the retina [9] are principally sent to visual cortex (V1) via lateral geniculate nucleus (LGN) ([1012], for a review). However fast spatial information about the visual scene is also sent to the superficial layer of the superior colliculi (SC). This subcortical structure, integrating multisensory inputs with eye and head motor plan, is important for orienting attention, in retinotopic coordinates, towards newly appearing objects in the visual field [10, 13]. More upstreaming, information about object’s features and spatial localization is elaborated in the dorsal (occipitotemporal: where) and ventral (occipitoparietal: what) streams [1417] in order to construct a priority map [7, 10, 18, 19] (see Figure 1(a)).

Various formal models have been proposed: Feature Integration Theory [20], Guided Search [21], Premotor Theory [22], Theory of Visual Attention [23], and Winner Takes All [24]. Feature Integration Theory proposes a two-stage visual attention process: during first stage humans process several primary visual features; during second stage, the objects are analyzed with details. Winner Takes All and Guided Search are devoted to assign saliency to locations in the visual field. Recently, Reynolds and Heeger [18] proposed a review of a normalized model of attention; the model studied how stimulus features and cognitive attention should work to perform an efficient visual search.

These models describe how humans could select some aspects of the scene, but how the motor control system may optimize visual search by predicting the sensory consequences of an impending saccade? Eye movements are controlled by the cerebrocerebellar communication loop and described by feedforward or inverse control models [2527] (see Figure 1(b)).

Recently, some authors [28, 29] have applied the theory of optimal control () in the mechanisms that regulate motor control; humans seem to adapt their behavior to minimize some cost function, such as noise, in performing an action [30]. Najemnik and Geisler argued that humans could choose fixations that maximize information gained about the target’s location; this strategy [31] allows humans to select features from the scene optimally such as an “ideal searcher.”

Our work aims to extend the principle to the mechanisms that regulate visual search. We proposed a method to study how the strategies of selective attention may be modulated by the motor control performance. Since the cerebellum is the brain structure where some dynamic aspects of motor control are optimized to reduce errors, we compare our formal model with the ocular motor behavior of well characterized cerebellar patients who have specific motor control failure [32, 33].

Therefore, we developed a mathematical stochastic model based on the Monte Carlo method (), able to simulate ongoing visual search in a cognitively demanding task (Figure 2). Some common energy functions were evaluated and we compared model results with a control group of healthy subjects () and two groups of patients with degenerative cerebellar ataxia. Indeed, the cerebellum is implicated in maintaining the saccadic subsystem efficient for vision; this ability is often disrupted in degenerative cerebellar diseases, as demonstrated by saccade kinetic abnormalities.

Therefore, two groups of patients affected by spinocerebellar ataxia type 2 () and patients with late onset cerebellar ataxia () were enrolled in the study. The results of subjects (, , and ) were compared with the model’s outcome.

2. Material and Methods

2.1. Patient’s Clinical Findings

is an autosomal, dominantly inherited neurodegenerative disorder, mainly characterized by cerebellar ataxia, cerebellar atrophy at MRI, and slow eye movements [34]. is a group of patients with degenerative, genetically unrecognized pure cerebellar ataxia associated with atrophy limited to the cerebellum at MRI. Demographic, clinical, and genetic data of a larger sample including the patients recruited here have been reported by Klockgether, Muzaimi et al., and us [3538].

2.2. Experiment Design

We used a highly cognitive demanding task, namely, the trail making test [39, TMT], in which subjects were asked to follow an alphanumeric sequence with their gaze. The trail making stimulus was a pop-up high contrast image consisting of a sequence of numbers and letters () arranged in an unpredictable manner. In our version of the trail making task (), numbers were at the bottom and letters at the top of the image.

The version proposed was simplified to help subjects to perform the task efficiently and avoiding any performance requirement (In the current experiment we enrolled patients affected by cerebellar disease; psychological test proposed to patients could be biased by subjective self-underestimation, fear of judgment, or fatigue; therefore, we used a simplified easy version to avoid eliciting of frustration). The distribution of symbols in a predefined geometric order allowed a more clear definition of the gaze shift during sequencing, since the distance from the center and between the symbols required a real gaze shift avoiding the target detection by periphery [40, 41].

During the task, the subject was asked to fixate a central red dot; after the dot disappeared, and the subject could explore the (Figure 2). The is particularly suitable for studying selective attention, as it does not require any explicit feedback by subjects, and the test performance can be evaluated automatically and reproduced by a computational model.

2.2.1. Subjects Enrollment and Training

Seven patients, a mixed group of six patients with genetic cerebellar ataxia (), and 23 healthy subjects were enrolled in the study. All were in the age range of 25–55 years.

The patients included in the study were previously diagnosed as reported by Federighi et al. [37]. Exclusion criteria for control subjects included any history of neurological or eye problems, toxic or drug abuse, and current pharmacological treatment for neurological or eye diseases. All subjects gave their informed consent and the study was approved by the Regional Ethics Committee.

All subjects were trained by a psychologist, before the experiment, showing a paper version of the TMT. All subjects performed a first attempt of the experiment for seconds; after a pause of five minutes the procedure started.

2.2.2. TMT Procedure

Subjects were seated at a viewing distance of from a 32′ color monitor (51 cm × 31 cm = 33.2 deg × 21.6 degree of visual angle). Eye position was recorded using an ASL 6000 system, which consists of a remote-mounted camera sampling pupil location at . Head movements were restricted using chin rest and bite. After the calibration phase, subjects performed a simple validation phase eliciting four reflexive saccades and measuring the error between the eye position and the target; the calibration procedure was repeated until the error was less than . A red dot appeared at the screen center for five seconds; then a randomized version of TMT (different from the version used during the training step) appeared for seconds. Subjects could stop the experiment if they thought they had concluded. All subjects performed the experiment within the given time. Different sessions of the experiment were performed for each patient on the same day as well as on different days. When a patient asked to stop the experiment because he/she was tired, the procedure was restarted after a resting period. All subjects were able to perform the task. Seven patients reported fatigue.

2.2.3. Patient’s Test Postassessment

To verify the visual acuity and the ability to perform the test by patients, we implemented a “guided TMT search”: a grayscale TMT version was proposed to patients (, ) where letters and numbers were highlighted by a red color step-by-step following the alphanumeric sequence () every 2 seconds. We measured the sequencing ability and the distribution of eye fixations. Indeed, some other studies [37, 4245] reported that cerebellar lesions injury may affect the accuracy of saccades.

2.3. Distribution of Eye Fixations Evaluation

To analyze visual search and model outcome, we defined some indicators. For each target (letter and number), we defined a region of interest () (). was defined as larger than the symbols of the test. We evaluated how humans directed next exploration according to the distribution of latest fixations (see Section 2.4). The fixations were identified by the dispersion algorithm developed by Salvucci and Goldberg [46].

For each fixation, we evaluated the Euclidean distance in pixels from the center of nearest () and the euclidean distance in pixels from the center of target () (see Figure 3).

2.4. Selection Mechanism Evaluation

To evaluate ongoing visual search Engel, Ponsoda et al., Findlay et al. [5, 47, 48], and, later, us Veneri et al. [41] developed a geometric method (Figure 4); the proposed procedure defined “observed direction” as the direction of the subject’s gaze. We evaluated the is a special weight avoiding artifact due to borders; for example, the weight () was set to 1 for letter in the center and for letter near the border. The operator is the radial difference between the two directions and ranged from to .

The selection mechanism was evaluated calculating the direction of saccade at given time (Figure 4(a)) versus the previous fixations made () in a given time interval and the direction of saccade versus the expected target () (Figure 4(b)).

Then, setting the “direction reference” as the direction from the current fixation to the previous fixation at the given time , the scalar direction difference was expressed as

We defined also

Finally, setting the “direction reference” as the direction from the current fixation to the expected direction (the direction to target) at a given time , the scalar direction difference was expressed as

models the ability of humans to remember visited ROI and must be calculated.

2.5. Calculation

Figure 5 shows the model architecture. The model was based on three subsystems: the first block (“relevant item selection” of Figure 5) provided the probability of directing the gaze to the correct target (next symbol) based on an internal representation of the target with probability ; the second block “fixations distribution” of Figure 5 provided the probability of moving the gaze far from the latest fixations with probability [41, 49, 50]. Third block (“peripheral vision”) directed the gaze to target when target was in a neighborhood of with probability and modeled covert attention [19]. Latest block perturbed () fixations location in order to simulate model variability.

After normalization, the weighted union probability between two mutually exclusive events was calculated (5): where .

The model calculated the probability by (5): then it selected the direction to move in accordance with the probability. The procedure was repeated for each symbol and exited when the model performed the sequence correctly. was the level of competition between the blocks “relevant item selection” and “fixations distribution.”

Probabilities are , , and , where is the normal distribution of mean and variance . and were evaluated through (2) and (4).

Output of model was an array of fixations where was set to , according to variance of of subjects, was set to , according to variance of of subjects, , and and were free variables.

2.6. Energy Function

Optimization theory requires the definition of one (or more) energy/cost function to be minimized. Therefore, according to [45, 51, 52], we defined two function costs based on saccades’ properties; for each saccade, we evaluated the euclidean distance in pixels from the saccade start point to the end of saccade (), skipping short saccade inside the same . A global function saccade energy was measured evaluating the path length through the following formulas: where is the sampling time. Equation (7) is the sum of all saccades’ length.

Fixations represent the cognitive act to process the scene; cardinality and duration are measures of task performance [19, 53, 54]. Therefore, the task execution energy was defined counting the number of fixations inside the to complete the task: Equation (8) is the number of steps made to complete the task.

2.7. Stochastic Model Application

The basic idea was to apply the stochastic model defined in Section 2.5 to study some target function (energy , , and ) as an optimal control problem. Optimization problems can mostly be seen as one of two kinds: we need to find the extrema of a target function cost over a given domain; performance is highly dependent on the analytical properties of the target function. Therefore, if the target function is too complex to allow an analytical study or if the domain is too irregular, the method of choice is rather the stochastic approach [55, 56]. Since visual search is a complex system under the influence of many mechanisms, it is not easy to predict the selection mechanism through an implicit and deterministic model; therefore, we developed a stochastic model based on the simulation (To understand Monte Carlo simulation the reader should consider the following case: a player wants to measure the surface of his carpet in his room ; the player may randomly launch a button 100 times and count the number of times () the button falls onto the carpet. It is easy to verify that the carpet surface is . In our work, the sought-after measure is the energy to execute the task (the carpet surface) and the sought after system is the set of parameters (the carpet geometry).) [57]. Therefore, the model attempted all possible explorative strategies, varying parameters , , and . For each simulations, we varied parameters and we evaluated the function cost , , and (Monte Carlo optimization).

From an intuitive point, the model computed the solutions domain perturbing fixations distribution; the outcome could be compared with subjects’ visual search (, , and ) data; see Figure 6.

3. Results

3.1. Subjects

The positive trend of (Figure 7) for all groups suggested that saccade’s direction tended to move away from latest fixations. In particular, the trend of gaze direction with respect to the distribution of fixations made increased in the last second () suggesting that the basic model operation (Section 2.5) was compatible with subjects’ exploration.

To assess the differences of exploration strategy among groups, we evaluated distance to nearest ROI () and number of visited regions of interest (). ANOVA did not report significant difference on (, (2,33) = 2.209) and it was confirmed by posthoc analysis (, , and ). On the contrary, ANOVA reported a significant difference on (,   (2,33) = 9.52 and post-hoc Holm-Sidak confirmed the significant difference of CTRL-SCA2 (, ) and between - (, ) and no significant difference between patients (, ).

Our preliminary conclusion was that performance could be considered equivalent among groups but with different strategies (Table 1). Indeed, patients ( and ) preferred sparser fixations instead of targeted saccades. This strategy was found by several authors [5860] and has been referred fixation as “center-of-gravity” fixations. Center-of-gravity occurs when targets are surrounded by nontargets, and the saccades, instead of landing at the designated target, land in the midst of the whole configuration. This effect was, firstly, attributed to an error of the filter selection [61] and then was considered a necessary mechanism to execute more efficient saccades [60, 62]. We concluded that patients could direct the gaze into the ROI, but preferred sparse fixations. To understand this effect, we compared human results with model results.

3.2. Model Simulation

Using varying parameters , , and , the model (Figure 5) calculated the , , and : the three surfaces (slightly smoothed for readability) shown in Figures 8 and 10, report the domain of all available explorations choosing the minimum value along the dimension of the parameter .

To assess the validity of the model we evaluated the normalized root mean square error () of . varied from to : , , and . We accepted this result as an acceptable error; indeed, only of subjects reported a and, in any case, .

The model reported a local minimum , when , , and ; subjects, however, performed a less efficient exploration (Table 1). To study this result, in depth, we calculated the parameter, which provided an estimate of saccadic energy; we have to note that it is not possible to compare of subjects and model due to noise on saccade’s trajectory; indeed, in two recent papers [37, 38], we found that motor control noise of reported a significant difference with and . Figure 8(b) shows the overall solutions domain varying parameters , , and . Overall minimum hyperplane was found at , which corresponded to .

Comparison of model simulations with subjects’ performance was done by setting model’s equal to of subjects (0.90, 0.45, and 0.60); Figure 9 shows the model compared to subjects’ exploration: the three groups of lines of Figure 9(a) showed the model numbers of steps to complete the task () for , , and and varying and . The parameter controlled the dispersion of fixations with direct (nonlinear) influence on , and provided the needful variability to adapt within group variability among subjects. The three lines of Figure 9(b) show the saccade energy () of the model and the corresponding value of subjects.

To evaluate the influence of saccade control on visual search, we analyzed saccade amplitude (): ANOVA reported a significant difference among groups (, ) and post-hoc holm-sidak confirmed the significant difference of between - (, ) and - (, ) and no significant difference between patients (, ). Indeed, we found that saccade amplitude of patients ( and ) was less than 23.5% and 28.1% of healthy subjects’ saccades (Table 1).

Comparing of subjects and model, model reported similar value (Figure 10(a)) to subjects with an error rate of ; analyzing the trend varying fixations’ dispersion (), it seems plausible that and preferred an exploration which minimized saccade amplitude. We tried to minimize the following empirical function cost bringing together the goal parameter , , and We found that and preferred to minimize saccade amplitude () and saccade energy () rather than steps to complete the task (); the opposite was true for ().

3.3. Patients Test Postassessment Results

Since subjects with cerebellar injury have reported low precision on visual search tasks, we asked patients (, ) to perform a “guided TMT search” where letters and numbers were highlighted in red step-by-step. Patients were able to complete the sequence with few fixations outside the ROI (). We concluded that the low precision performance reported by several authors [37, 42, 45] was negligible compared to the size of the ROI. Similar findings were reported by van Beers.

4. Discussion

Assuming an error rate 14% ( for ) of the model, both patients and healthy subjects performed a suboptimal exploration with different strategies: trend reported in Figure 9(b) suggested that patients (, ) preferred sparser fixations at an intermediate position between the targets (“center-of-gravity” fixations) and optimally tuned to keep saccade amplitude () as low as possible and energy saccade () in a neighborhood of the minimum (suboptimal exploration); on the contrary, healthy subjects () preferred saccades directed near targets (target selection). Performance () of patients was similar to healthy subjects (); patients completed the task with more energy and steps but not with a significant difference from .

These different strategies on visual search exploration between healthy subjects () and cerebellar patients ( and ) suggested a direct or indirect influence of the cerebellum on the visual selection processing.

4.1. Theoretical Considerations about the Cerebellum’s Role

Traditional views of the cerebellum hold that this structure is engaged in the control of action with a specific role in the motor skills [63]. The properties of cerebellar efferent and afferent projections (see Figure 1(b) for a short reference) suggest that the cerebellum is generally involved in (“not necessarily” [64]) integrating motor control and sensory information to coordinate movements. The model of neuronal circuitry of the cerebellum proposed by Ito [27, 32, 65] makes it possible to consider more concrete ideas about cerebellar processing: cerebellum is thought to encode internal models that reproduce the dynamic properties of body parts. These models control the movement allowing the brain to precisely predict the consequence of a movement without the need for sensory feedback [6668] or to perform a quick movement in the absence of visual feedback [27]. Indeed, cerebellum is able to reproduce a stereotyped function of the desired displacement of eyes and provides a reference signal during movement [69]. To explain this mechanism two models have been proposed. Kawato et al. [26] proposed a forward model (cerebellum forward model) that simulates the dynamics (or kinematics) of the controlled object including the lower motor centers and motor apparatus (Figure 1(b)); the motor cortex should be able to perform a precise movement using an internal feedback from the forward model instead of the external feedback from the real control object [65, 67]. As such, motor learning can be considered to be a process by which the forward model is formed and reformed in the cerebellum through based learning. When the cerebellar cortex operates in parallel with the motor cortex, as mentioned above, it forms another type of internal model that bears a transfer function, which is reciprocally equal to the dynamics (or kinematics) of the control object [25, 26]. The inverse model can then play the role of a feedforward controller that replaces the motor cortex serving as a feedback controller.

As further proof of this theory, it is well known that cerebellar lesions [43, 44] induce permanent deficits, affecting dramatically the consistency [42, 45] and the accuracy of saccades [37]. While a wide range of evidence has emerged accounting for a dominant role of cerebrocerebellar interactions in motor control and its movement-related functions are the most solidly established that, recent studies have clearly suggested an influence of the cerebellum in cognitive and behavioral functions [65, 67, 70, 71], including fear and pleasure responses [7274]. Allen et al. [75], through a magnetic resonance experiment, found a direct activation, during visual attention, of some cerebellar areas independent from those deputed to motor performance. Paulin [64] developed a computational physical model estimating the movement status, useful for trajectory tracking and trajectory prediction; Paulin concluded that the cerebellum is not a motor control device per se but a device for optimizing sensory information about movements. Kellermann et al. [76] identified a cerebelloparietal loop consisting of posterior parietal cortex (visual area V5, cerebellum) that facilitates predictions of dynamic perceptual events. Other studies provided findings about a direct implication of the cerebellum on language, verbal working memory [7781], and timing [82]. Gottwald et al. reported significant defects of patients with focal cerebellar lesions in the divided attention and working memory but not in selective attention task [83]. In addition, Exner et al. [84], Golla et al. [85], and later Hokkanen et al. [86] reported normal attentional shifting in patients affected by cerebellar lesions.

4.2. The Dominant Role of the Cerebellum

A number of functional hypotheses (sometimes contradictories [87]) have been advanced to account for how the cerebellum may contribute to cognition and a large amount of neuroanatomical studies showed cerebellar connectivity with almost all the associative areas of the cerebral cortex involved in higher cognitive functioning [73]; nevertheless, the hypothesis of the dominant role of the cerebellum on motor control remains the most probable to explain our findings and was confirmed by the proposed model. Cerebellar patients have a well-known disturbance in controlling saccade endpoint error. The two groups reported here, however, substantially show a different saccadic behavior, due to the involvement of diverse anatomic structures. The saccadic behavior mainly characterizing is the low speed with quite preserved accuracy indicating a failure of the brainstem saccade generator more than cerebellum. shows a well-preserved speed but a complete loss of saccadic error control. This pathophysiological substrate, however, does not distinguish their visual exploration during sequencing; indeed, both groups of patients similarly performed a multi step visual search avoiding the direct foveation of the target. This multistep spatial sampling strategy probably enables these patients to increase the discriminative potentialities of the covert attention avoiding unnecessary large saccades moving the eyes over wrong targets. If this strategy is true, saccades not only are associated with an overt shift of attention but also increase the potentialities of covert attention during active search. Moreover, when the cerebellum has reduced capacities, the control of long saccade is difficult, due to propagation of error; [51] and [45] found that patients affected by reduced saccade velocity in reflexive tasks (involuntary movement); Rufa and Federighi [88] found that patients, sometimes, interrupted saccades. Then, it is plausible that in voluntary movements (free visual search test) patients, affected by cerebellar disease, adapted visual search exploration to minimize the saccade control effort; they preferred sparse fixations and short saccades, but maintained overall saccade energy around a minimum point. The implemented model supported this hypothesis.

5. Conclusions

In our work, we implemented a stochastic model able to replicate humans strategies during an ongoing sequencing test; the model results were compared to the performance of a group of healthy subjects and two groups of patients having well-documented motor control disease.

From the methodological point of view, we introduced the optimization method () to study the properties of selective attention. Todorov and later van Beers proposed the as a valuable instrument to link eye/hand motor control and the central nervous system to minimize the consequence of motor control noise. Najemnik and Geisler implemented a Bayesian framework to validate that human visual search is a sophisticated mechanism that maximizes the information collected across fixations. We “connected” these two theories to integrate the motor control and information-processing systems on a single reproducible model. Our conclusions are similar to the conclusion of Najemnik and Geisler: humans tend to tune visual search in order to maximize information collected during the search, but, in clinical context, patients try to minimize the effort to control eye movements.

Therefore, our opinion is that humans tend to apply an optimal selection to minimize a function cost (effort, energy), and we provided evidence of this theory studying the motor control influence through a model based on optimization method and comparing results with cerebellar patients. Proposed model, however, is affected by some restrictions: model is specific for the test and it is not easy to generalize to other psychological tests; function costs have to be defined according to the disease studied. In our future work, we aim to adapt the basic principle of and on the real image processing model.

Conflict of Interests

The authors certify that there is no conflict of interests with any financial organization regarding the material discussed in the paper.

Acknowledgments

Thanks are due to Professor Maria Teresa Dotti for the valuable contribution. The study in part was funded by EC FP7-PEOPLE-IRSES-MC-CERVISO 269263.