Abstract

When we explore a visual scene, our eyes make saccades to jump rapidly from one area to another and fixate regions of interest to extract useful information. While the role of fixation eye movements in vision has been widely studied, their random nature has been a hitherto neglected issue. Here we conducted two experiments to examine the Maxwellian nature of eye movements during fixation. In Experiment 1, eight participants were asked to perform free viewing of natural scenes displayed on a computer screen while their eye movements were recorded. For each participant, the probability density function (PDF) of eye movement amplitude during fixation obeyed the law established by Maxwell for describing molecule velocity in gas. Only the mean amplitude of eye movements varied with expertise, which was lower in experts than novice participants. In Experiment 2, two participants underwent fixed time, free viewing of natural scenes and of their scrambled version while their eye movements were recorded. Again, the PDF of eye movement amplitude during fixation obeyed Maxwell’s law for each participant and for each scene condition (normal or scrambled). The results suggest that eye fixation during natural scene perception describes a random motion regardless of top-down or of bottom-up processes.

1. Introduction

In the visual exploration of a 2D scene like a photograph or a painting, the eyes fixate regions of interest to extract useful information and make large saccades to jump from one region to another. During fixation, the eyes continue to move through tremor, drifts, and microsaccades. Little is known about tremor [14] and drifts [57] due to limitations in recording systems. In contrast, microsaccades have been widely studied and their critical role in vision is being progressively uncovered (for reviews see [8, 9]).

Though until the late 1970s [10] microsaccades were generally thought to be uncritical for vision, several studies did show microsaccades to correct random intersaccadic drifts [11], counteract retinal fatigue [12, 13], prevent visual fading [1416], enable low-contrast discrimination [17], enhance stereoscopic hyperacuity [18], or enhance fine spatial detail [19], while tremor and drifts were still believed to be uncritical for vision [20]. In the last fifteen years, neurophysiological studies on microsaccade-induced neural activity have corroborated the idea that microsaccades refresh retinal images to prevent fading, and have reopened the debate on the role of all fixational eye movements in vision, including tremor and drifts [8, 21, 22].

During the free viewing of a scene, visually-guided saccades depend on bottom-up processes induced by stimulus properties and on top-down influences inherent in knowledge and expectations. Though involuntary and non-conscious, fixational eye movements can also be influenced by cognitive demands [2327]. So far, except for a few reports suggesting that fixational eye movements may [57, 11, 2831] or may not be random [13, 3235], any formalization of the random nature of eye fixation has been almost neglected. To our knowledge, only Engbert and Kliegl [30] examined this issue by showing different eye fixation behaviors depending on different time scales. In the present study, our aim was to test the hypothesis of a random generation of eye fixation. Specifically, we examined whether fixational eye movements—tremor, drifts, and microsaccades taken as a whole—showed a random motion during spontaneous visual perception as do molecules in a gas [36] or particles in a fluid [37].

To achieve this goal, we conducted two experiments, which are presented and discussed in turn. For both experiments, we built the probability density function (PDF) of fixational eye movement amplitude, that is, the distances covered by the eyes during fixation, and compared the experimental distribution to a theoretical PDF based on the Maxwell law [36, 38, 39]. Both experiments involved a natural scene perception task to induce spontaneous and active vision as much as possible. In addition, Experiment 1 tested the influence of expertise (top-down processes) on eye fixation randomness by contrasting novice versus expert participants in perceiving natural scenes. Little is known about the effect of expertise in scene perception and the behavior in fixational eye movements, while it is established that high-attentional demand tends to suppress microsaccades [10, 23, 24, 40]. On the other hand, Experiment 2 explored the influence of stimulus properties (bottom-up processes) on eye fixation randomness by contrasting meaningful (original) versus meaningless (scrambled) scenes. Should eye fixation prove to be random in natural scene perception, the experimental PDF was expected to fit the Maxwellian PDF regardless of internal (Exp. 1) or external (Exp. 2) contingencies.

2. Experiment 1

2.1. Materials and Methods
2.1.1. Participants

Eight healthy adults, 5 women and 3 men aged years (range = 22.0–44.4 yrs), took part in the study. All participants had normal vision and were unaware of the goal of the experiment.

Three participants (1, 3, and 4) were novice to natural scene perception, whereas 4 participants (2, 5, 7 and 8) were experts: participants 2 and 5 were landscape architects; participant 7 was a designer, and participant 8 held a Ph.D. in ecology. Neither a novice nor an expert subject, participant 6 was a postgraduate student in landscape architecture.

2.1.2. Ethics Statement

The study adhered to the tenets of the Declaration of Helsinki. Our study was approved by the University of Angers Ethics Committee. All participants gave their verbal consent to participate in the study; this consent procedure was approved by the local ethics committee (University of Angers, France).

2.1.3. Stimuli

Sixteen black-and-white photographs (2000*1598 pixels, 256 gray-scales) of natural scenes were taken by VB mostly in the French region of the Pays de la Loire (see Figure 1(a)). The luminance of the scenes was on a 256-gray-level scale.

Stimuli were displayed full screen on an NEC monitor (Japan; 21 inch, 1280*1024-pixel definition, 60 Hz refresh rate) located at 1 m from participants’ eyes, such that stimuli covered, respectively, 22.3° and 17.0° of visual angle horizontally and vertically.

2.1.4. Apparatus

We used a faceLAB video device (Seeing Machines, Australia). A computer controlled a stereo-head, which was mounted on a tripod below the stimulation screen and held two cameras, one for each eye. Eye tracking was performed binocularly using faceLAB 4.2.2 software (Seeing Machines, Australia) in Precision Gaze configuration for optimal gaze quality level. The sample frequency was 62.5 Hz. Typical static accuracy of gaze direction was 0.5–1°. Another computer controlled the stimulation through an external screen using Gaze Tracker 05.02.03 software (Seeing Machines, Australia).

2.1.5. Procedure

The experiment was conducted in a dark room; the luminance of the background was 0 cd m−2. The center of the screen was adjusted at eye level and in the median plane. Calibration was performed through six steps: pupil calibration, face point reference, tracking parameters, ocular parameters, gaze calibration, and screen calibration. Gaze calibration was done using a target randomly moving on a 9-point grid that participants had to fixate as accurately as possible.

Experiment started immediately after calibration. There were 16 trials corresponding to the 16 stimuli. The order of presentation of stimuli was random. The time course of each trial is illustrated in Figure 1(b) and was as follows. First, a fixation cross appeared for 3 s in the upper right corner of a grey screen. Then, the stimulus was displayed for the time taken by the participant for free visual exploration during which eye movements were recorded. The participant clicked on the mouse when he finished his exploration, which caused the disappearance of the scene. After a pause, he clicked again for the next trial to start. Participants were instructed to look at the scene as spontaneously as possible.

2.1.6. Data Analysis

The 16 scenes were pooled together within each participant as we were not interested in between-scene differences. This resulted in 8 series of data (corresponding to 8 participants) which we analyzed using Gaze Tracker 05.02.03 software (Seeing Machines, Australia). Only periods of fixation were kept for further analysis. Eye fixation was determined using a sliding window algorithm operating over the set of gaze points: for each window we set two parameters (100 ms for minimum fixation duration and 42.6 arcmin for maximum fixation amplitude) to decide whether or not fixation had occurred [8]. Figure 2(a) illustrates periods of fixation.

Using home-made scripts under Matlab 7.0 (The MathWorks, US), we then calculated the distances from the positions of the eyes in and planes using the Pythagorean theorem, and built the experimental and theoretical distributions. The experimental distribution was the frequency histogram of distances (i.e., eye movement amplitude during fixation), and the theoretical distribution was given by Maxwell’s algorithm as described in (25) of the following section.

2.1.7. Maxwell’s Law

To test the hypothesis of a random distribution of eye movements during fixation, we first determined the probability density function (PDF) of eye movement amplitudes during fixation. We hypothesized an analogy between the statistical equilibrium of molecule velocity in gas and eye movement velocity during fixation. The distribution of molecule velocity in gas was formalized by Maxwell in 1859 in a statistical physics law that has since then been extensively supported by experimental data [36, 38, 39]. Though Maxwell is especially known for his findings in electromagnetism, he brought a significant contribution in thermodynamic statistics, less known but on which we focus here. The location and speed of molecules change when they collide but the distribution law remains identical, thus characterizing the gas at equilibrium: there is always the same number of molecules and their velocity ranges from to . Similarly, we posited that in 2D space the PDF of fixational eye movement amplitudes is always the same: there is always the same number of amplitudes between and regardless of either the observer or the observed scene. The PDF of fixational eye movement amplitudes is also a PDF of velocities as amplitudes were covered in a constant time period. We considered a symbolic space, which we called distance space. Here, our use of the term “distance” must be understood as referring to the amplitude of the movement of the eyes during fixation. In that distance space, we considered an elementary area around a given point . That area contained a certain number of distance vectors that ended inside this area. If the total number of distance vectors was , we could define a function , and if we imposed and to tend to zero, the function represents the density of points in the sense of the geometrical probability, which corresponded to the distance in the symbolic space. If we could find the function , the distance distribution could be determined.

The previous equation was rewritten as Since the number of distance vectors was constant, the normalization condition gave: The double integral was calculated between and in order to include all possible values of both and . For example, the number of distance vectors whose component was between and Following Maxwell’s method to determine molecule velocity distribution, we put forward two hypotheses: (i) the two components and have two independent distribution laws; this assumption is true only if the distance vector set occurs at random; (ii) the distribution of distance vectors in the distance space is isotropic. These two assumptions meant that the function took the following form: In other words it was the product of two identical functions each of them depending on only one coordinate. Let us emphasize the fact that this was the consequence of the random choice of distance vectors. The second hypothesis was implicitly taken into account in (4) but it also implied that the point density around the origin 0 of the distance space had to respect a spherical symmetry. Another way to express this property was to say that when one moves on the circle of equation where was any constant, the function had to remain constant: where was any constant.

These two hypotheses were sufficient to find the distance distribution function.

Equation (5) above, which expressed the isotropy condition, was rewritten as follows: This led to If we noticed after (4) that , we could rewrite the relation (6) as follows: Let us recall that in (7) the differentials and were not independent since they were linked by (6).

Applying Lagrange’s multiplier method to and (7) we obtain where the Lagrange multiplier was an arbitrary constant. This equation could be satisfied only if the two coefficients of the differentials and were simultaneously equal to zero or written in another way: After (9), the Lagrange multiplier depended only on . Likewise, after (10) depended only on . This implied that was constant and that the function took the form or if we wrote .

Since the constant had to be positive we put .

So (12) became .

As a consequence the function in (4) in its turn became It depended only on which satisfied the isotropy condition.

The number of distance vectors ended into the elementary area : and the proportion of distance vectors between and was obtained by considering the area bounded by two infinitely close circles whose radii were and : In the latter equation, was the area taken into account. This led to To determine the two constants and , we used two additional conditions:

The first one said that the total number of distance vectors was : The first additional condition could be written . That was to say that On the other hand, we could say that the cumulative distance was where was the mean distance which could be easily obtained from experimental data.

The second one reduced to a simple integral: or, if we compared to (16): Since the general integral form gives, respectively, for and : and by remarking that in (18) the integral was calculated from to , (18) and (21) gave, respectively, so that where

2.1.8. Statistical Analysis

Under Statistica 7.0 (StatSoft, US), experimental and theoretical distributions of eye-movement amplitude during fixation were compared using the Kolmogorov-Smirnov (KS) two-sample test for every participant. We took 95% of values as we observed for highest values a systematic gap between experimental and theoretical data, that is, we did not take into account the extreme tail of the distribution. Indeed 95% of the data in Exp. 1 spanned from 0 to 20 arcmin (see Figure 3) while the remaining 5% cover a large width from 20 to 45 arcmin. As a result classes up to 20 arcmin had a tiny size compared to the overall distribution. For that reason, we performed statistics ignoring the weakest classes as suggested by Borel et al. [41].

To test the goodness of fit between experimental and theoretical distributions of eye movement amplitude during fixation, we took Nash and Sutcliffe’s criterion from hydrology [42]. The law given by (25) in Maxwell’s law section has a continuous graph. Nevertheless the graph obtained from experimental data is necessarily discontinuous. Thus, to compare the PDF of experimental data with that of simulated data, we had to choose the interval width of the -axis in our distribution plots (see Figure 3). To achieve this goal, we used the Nash and Sutcliffe criterion or value, which compares experimental and theoretical values according to the following formula: in which and are, respectively, experimental and theoretical values of proportion of vector distances, and is the mean of experimental values. The more approaches 1, the more data fit the Maxwell law. To determine the best fit, we calculated values using different eye-movement amplitudes during fixation for each participant and for the group of participants.

2.2. Results
2.2.1. Descriptive Statistics

Table 1 summarizes the viewing time per scene (in seconds), the number of samples recorded during eye fixation, the mean distance, and the SD of eye-movement amplitude (in arcmin) for each participant and for the group, and the level of expertise for each participant.

2.2.2. Nash and Sutcliffe’s Criterion

Table 2 shows the results for the Nash and Sutcliffe criterion using different from to , in which is the mean amplitude of eye movements during fixation. Experimental data were better simulated using a equal to for participants 1–4, using a equal to for participant 5, and using a equal to for participants 6–8.

For the group of participants, it was which provided the highest mean value. Thus, this value was selected for representing the experimental data against the simulated data in Exp. 1 (see Figure 3) and Exp. 2 (see Figure 4). This choice does not imply any basic feature of the data acquisition scheme: it emerges from the necessity to have both a high number of points and for a given point a high number of data.

2.2.3. Experimental versus Simulated Data

Figure 3 shows the PDF of experimental data and of theoretical data using Maxwell’s law for each participant. There was no difference between experimental and theoretical distributions for each participant (KS, ).

We identified two groups with contrasted mean amplitude of eye movements, which was arcmin for participants 1, 3, and 4 versus arcmin for participants 2, 5, 7, and 8 (Mann-Whitney, ).

2.3. Discussion

The results of Exp. 1 suggest that eye fixation during natural scene fixation obeys Maxwell’s law as the PDF of experimental data did not significantly differ from the theoretical distribution. This was true regardless of participants’ expertise. The only parameter that varied was the mean amplitude of eye movements, which was higher in novice participants as compared to expert ones.

Nevertheless, one could argue that the reported PDF of eye movement amplitude during fixation was simply a description of the noise characteristics of our eye tracker gaze estimation system. It is indeed true that the faceLAB video device is not as optimal as other systems to study eye fixation. Furthermore, Exp. 1 only used natural scenes and did not offer a comparison to controlled stimuli.

To overcome these limitations, we undertook a second experiment in which we used a high-sample-frequency video eye tracker (1000 Hz) and added a control condition in which original scenes were filtered to produce a scrambled version of the images. Visual exploration was still free for facilitating active vision but with a fixed time equivalent to the average spontaneous viewing time of Exp. 1.

3. Experiment 2

3.1. Materials and Methods
3.1.1. Participants

Two healthy men aged years (range = 37.8–44.0 yrs) participated in the study. They had corrected-to-normal vision. Both participants were novice in natural scene perception. Participants 1 and 2 were, respectively, unaware and aware of the goal of the experiment. Ethics statement was as in Exp. 1.

3.1.2. Stimuli

Stimuli were the 16 natural scenes of Exp. 1. As control stimuli, a scrambled version of each scene was built using the image scrambling algorithm based on chaos theory and sorting transformation by Liu et al. [43]. Examples of scrambled stimuli are illustrated in Figure 1(c). The luminance was not different from that of original stimuli ( on a 256 gray-level scale; Student’s test, ).

Stimuli were displayed full screen on a Sony Trinitron monitor (Japan; 21-inch, 1280*1024-pixel definition, 100-Hz refresh rate) located at 57 cm from participants’ eyes. Thus stimuli covered, respectively, 35.2° and 28.2° of visual angle horizontally and vertically.

3.1.3. Apparatus

We used an EyeLink 1000 high-speed camera device (SR Research, Canada). A host computer controlled the high-speed camera used in desktop mount configuration. Eye tracking was performed monocularly using the EyeLink 1000 Host Application (SR Research, Canada). The sample frequency was 1000 Hz. Typical accuracy with the head supported was 0.25–0.5° and typical spatial resolution was <0.01° RMS. A display computer controlled the stimulation using EyeLink 1000 Host Application (SR Research, Canada).

3.1.4. Procedure

The experiment was conducted in a dark room (luminance of the background was 0 cd m−2). The center of the screen was adjusted at eye level and in the median plane. Head movements were minimized with chin and forehead supports. Gaze calibration was performed using a target randomly moving on a 9-point grid that participants had to fixate as accurately as possible.

After calibration, the experiment made up of 32 trials (16 scenes and their 16 scrambled versions) started. The order of presentation of stimuli was random. The time course of each trial was as follows: a fixation cross appeared for 3 s in the upper left corner of a grey screen, then the stimulus was displayed for 9 s (corresponding to the round average of Exp. 1), then the experiment continued with the next trial. Participants were instructed to look at the scene as spontaneously as possible.

3.1.5. Data Analysis

The 16 scenes were pooled together within each participant and each condition (original versus scrambled). Thus 4 series of data (2 participants times 2 conditions) were analyzed using EyeLink 1000 Host Application (SR Research, Canada). From the EyeLink Data Files provided by the software, we extracted periods of eye fixation during the 9 s visual exploration of scenes using home-made scripts under Matlab 7.0 (The MathWorks, US). Figure 2(b) illustrates periods of fixation.

Using home-made scripts under Matlab 7.0 (The MathWorks, US), we then calculated the distances from the positions of the eyes in and planes using the Pythagorean theorem, and built the experimental and theoretical distributions. The experimental distribution was the frequency histogram of distances (i.e., eye-movement amplitude during fixation). The theoretical distribution was given by Maxwell’s algorithm as described in (25) of the Maxwell’s law section.

3.1.6. Statistical Analysis

Under Statistica 7.0 (StatSoft, US), the experimental and theoretical distributions of eye-movement amplitude during fixation were compared using the Kolmogorov-Smirnov (KS) two-sample test for every participant, using 95% of values.

3.2. Experiment 2: Results
3.2.1. Descriptive Statistics

Table 3 presents the number of samples recorded during eye fixation, the mean distance, and the SD of distances (in arcmin) for each participant and for the group, separately for normal or scrambled scenes.

3.2.2. Experimental versus Simulated Data

Figure 4 shows the PDF of experimental data and of theoretical data using Maxwell’s law for each participant, separately for normal (P1n and P2n) and scrambled (P1s and P2s) scenes. There was no difference between experimental and theoretical distributions for each participant and for each scene condition (Kolmogorov-Smirnov, ).

3.3. Discussion

The results of Exp. 2 using a high-sample-frequency video eye tracker suggest that eye fixation during natural scene fixation obeys Maxwell’s law as the PDF of experimental data did not significantly differ from the theoretical distribution. This was true regardless of stimulus conditions, which were meaningful (original scenes) or meaningless (scrambled version of original scenes).

4. General Discussion

The pioneer research by Cornsweet [11] suggested that microsaccades are stochastically corrective of deviations arising from fixational drifts. Following Cornsweet, further studies investigated the randomness of drifts [57], whereas other studies suggested control mechanisms within the drifts [13, 3235]. The present study sought to examine the Maxwellian nature of eye fixation applying statistical physics to the psychophysics of eye movements. Using a video eye tracker during free (Exp. 1) or fixed-time (Exp. 2) visual exploration of natural scenes, we showed that the amplitude of eye movements during fixation obeys Maxwell’s law suggesting that fixational eye movements describe a motion similar to that of molecules in a gas [36] or of particles in organic and inorganic bodies [37].

Since in Exp. 1 the sample frequency of our video eye tracker was weak (62.5 Hz), Exp. 2 used a higher-sample frequency (1000 Hz) providing a similar pattern of results. In both cases, we did not differentiate between the different subtypes of fixational eye movements, and the Maxwellian nature of eye fixation we report here concerns fixational eye movements as a whole. The fact that our experimental data obeyed Maxwell’s law in both experiments corroborate the proposal that for Brownian motion the result is independent from measure frequency, as originally suggested by Perrin [44, 45]. We have exhaustively presented Maxwell’s law as it is the first time, to our knowledge, that such a demonstration has been made in 2D space. We took the Nash and Sutcliffe criterion from hydrology [42] to optimize the goodness of fit between observational and simulated data. To our knowledge, it is also the first time that this criterion has been used in visual science.

We used natural scenes as stimuli enabling us to elicit spontaneous and active vision, that is, without any instruction except spontaneously looking at the scene. As we were only interested in absolute fixational activity, we did not seek the superimposition of eye movements with visual stimuli. Exp. 1 showed two contrasted groups based on the mean amplitude of eye movements. Interestingly, the group showing the highest mean amplitude was made of novice participants (1, 3, and 4), that is of individuals without any professional experience of visual exploration of natural scenes in 2D or 3D space. In contrast, the group exhibiting the lowest mean amplitude was made up of visual experts. Indeed, participants 2 and 5 were two landscape architects, participant 7 was a designer, and participant 8 held a PhD in ecology: all were confirmed practitioners of daily visual observation and analysis of natural or artificial scenes through either photographs of scenes or landscapes in the real 3D world. We therefore suggest that the mean amplitude of eye movements during fixation of natural scenes may be a tool for measuring visual expertise. One hypothesis is that expertise may have led to fixational eye movement suppression due to higher attentional control, in line with studies showing that microsaccades are suppressed in high-acuity or high-attentional demand tasks [10, 23, 24, 40]. Interestingly between the two groups of novice versus expert subjects, participant 6, who was a postgraduate student in landscape architecture, exhibited intermediate mean amplitude of fixational eye movements of 4.93 arcmin, suggesting such a measure may be sensitive.

Another point that needs discussion is that Exp. 1 led to a mean amplitude of eye movements during fixation equal to 5.291 arcmin for the group of eight participants, whereas Exp. 2 yielded a mean amplitude equal to 1.166 arcmin for the group of two participants. Thus, the ratio between Exp. 2 and Exp. 1 was of 5.291/1.166 = 4.538. One could argue that we should have got a ratio of 16 (i.e., 1000/62.5), with respect to the different sample frequency between Exp. 2 (1000 Hz) and Exp. 1 (62.5 Hz). However such inference would be erratic. In fact, this ratio of around 4 is an additional demonstration of the Brownian nature (i.e., random) of fixational eye movements during natural scene perception. Indeed, the mean of Brownian displacements in a given time period is not proportional to the number of shocks (in our study, the number of displacements) but to the square root of the number of shocks, thus in our study [44, 46, 47]. That we obtained 4.538 and not exactly 4 may simply be explained by the fact that we manipulated not physical but physiological data, added to the fact that Exp. 2 only included two participants thus lowering statistical power. In all, we suggest that the ratio of ~4 for the mean amplitude of eye movements between the two experiments is further evidence of the random nature of eye fixation during natural scene perception. Importantly, this hypothesis does not exclude that both stochastic and deterministic structures may be hidden below the Maxwellian macrostructure of fixational eye movements taken as a whole, which would warrant further investigation using finer dynamic models (e.g., [30]).

Why would the central nervous system randomize fixational eye movements? Recently, Engbert and Kliegl [30] applied statistical physics to microsaccades during fixation using an approach developed for analyzing human posture control [48]. In a re-examination of Cornsweet’s hypothesis [11], the authors showed that during eye fixation of a stationary spot, microsaccades both produce fixation errors to enhance perception on a short time scale and reduce fixation errors and binocular disparity on a long time scale, which would weigh against a random uncontrolled movement [30]. Differences in results with the present study may be due to the different nature of the task, or more likely to the different analyses, and thus needs to be further investigated. A final issue concerns the neural underpinnings of such a random generator for fixational eye movements. It would be interesting to verify whether some neurones exhibit the Maxwell law in their firing code. Potential candidate neurones may be those of the foveal portion of the superior colliculus, which have recently been shown to play a causal role in microsaccade generation in primates [49].

To conclude, our results show that fixational eye movements during natural scene perception obey Maxwell’s law and support the fact that eye movements during fixation describe a motion similar to that of molecules in gas [36, 38, 39] or particles in organic and inorganic bodies [37]. Such a Maxwellian nature of eye fixation is robust since it is independent of top-down processes such as participants’ expertise (Exp. 1) or of bottom-up processes such as those inherent in physical and semantic properties of the stimulus (Exp. 2.)

Acknowledgment

Results were presented at the 8th Forum of Neuroscience (Barcelona, Spain, 2012), and at the XX Biennial Meeting of the International Society for Eye Research (Berlin, Germany, 2012). The authors thank X. Liu (Dalian University, Liaoning, China) for providing the scrambling algorithm.