#### Abstract

The area of interest (AOI) reflects the degree of attention of a driver while driving. The division of AOI is visual characteristic analysis required in both real vehicle tests and simulated driving scenarios. Some key eye tracking parameters and their transformations can only be obtained after the division of AOI. In this study, 9 experienced and 7 novice drivers participated in real vehicle driving tests. They were asked to drive along a freeway section and a highway section, wearing the Dikablis eye tracking device. On average, 8132 fixation points for each driver were extracted. After coordinate conversion, the MSAP (Mean Shift Affinity Propagation) method is proposed to classify the distribution of fixation points into a circle type and a rectangle type. Experienced drivers’ fixation behavior falls into the circle type, in which fixation points are concentrated. Novice drivers’ fixation points, which are decentralized, are illustrated in the rectangle type. In the clustering algorithm, the damping coefficient determines the algorithm convergence, and the deviation parameter mainly affects the number of clusters, where larger values generate more clusters. This study not only provides the cluster type and cluster counts, but also presents the borderlines for each cluster. The findings provide significant contribution to eye tracking research.

#### 1. Introduction

Drivers’ visual attention characteristics are some of the most important issues in conducting research on human factors affecting driving, and they play an important role in surrogate safety analysis, behavioral intention recognition, and early risk warnings. The development of eye tracking devices, such as Eyelink, Tobii, and Dikablis, made capturing eye movement features possible and easier than ever before. The most commonly used parameters include, but are not limited to, the saccade features, fixation features, pupil size, and perclos. Such data can be collected by means of real vehicle tests or simulators, where participants are required to wear a pair of eye tracking glasses. Driving simulations are easier to organize, and large sample data can be obtained. Data collected in vehicle tests, though, is real and exact, which is important and necessary for the validation of driving simulations.

In real vehicle tests, Henning et al. [1] used turn signals and rearview observations as the indication of lane change intentions. Bhise et al. [2] found that drivers rotate their heads before searching visually in real vehicle driving scenarios. Lethaus and Rataj [3] conducted both real and simulated driving tests. After collecting data on drivers’ visual characteristics in real vehicle tests, they proposed driving behavioral predictors using vision related parameters, drawing the conclusion that eye movements precede other behavioral features. In their simulation study, they compared the accuracies of different driving behavior prediction algorithms [4]. In another simulation, Salvucci and Liu [5] found that drivers have different degrees of attention to rearview mirrors during lane keeping and lane changing which is consistent with Henning’s results [1].

Psychophysiological research using eye gaze data has been a popular method for measuring drivers’ attention allocation [6], situation awareness [7, 8], and hazard perception [9, 10]*.* The gaze dispersion patterns provide some understanding of what information drivers use to keep themselves engaged in the driving task and what information they use to keep driving. In Tyron Louw’s study, drivers’ horizontal and vertical gaze dispersion during both conventional and automated driving were compared, concluding that, during automation, drivers’ horizontal gaze was generally more dispersed than that observed during manual driving, and it was more concentrated when drivers performed a visual secondary task. Drivers’ vertical gaze was most dispersed when the road scene and dashboard were completely occluded during automation [11]. The eye movement feature has been an important measurement for driving behavior research. For example, Navarro et al.’s research analyzed the variance of time spent looking in the tangent point area with the assistance factor of automated steering [12]. Tivesten and Dozza investigated the effect of both driving context and visual-manual phone tasks on drivers’ glance behavior in naturalistic driving and concluded that drivers indeed spend more time looking at the road and have a lower proportion of long off-road glances [13]. However, while gaze concentration has been used successfully in manual driving to distinguish between the effects of visual and cognitive load and it is commonly accepted that drivers have different degrees of interest in different targets or areas while driving, there is scarce criteria for the area of interest (AOI) division established in the literature. Dividing the visual area into several subareas according to drivers’ interests helps researchers to have better understanding of drivers’ eye behavior. The division of AOI is visual characteristic analysis required in real vehicle tests and simulated driving scenarios. Based on the AOI division we can obtain statistical eye movement parameters, including the fixation duration, fixation counts, saccade counts, time to first glance, and first glance duration. These basic parameters and their transformations, such as the total glance time, AOI attention ratio, glance rate, glance location probability, percentage of eyelid closure, and saccade trajectory, are key explanatory variables used to interpret drivers’ behavior. If we treat the visual area as a whole, without dividing it into subareas according to the study purposes, most of basic visual parameters, especially their transformations, are difficult to obtain. Some researchers just provide AOI division results, including AOI counts and AOI borders, arbitrarily. The NTHSA [14] divided drivers’ visual area into eight subareas, namely, the front left, front, front right, left window, right window, rearview mirror, left mirror, and right mirror, to analyze the time window of lane changes. To compare the differences in visual characteristics between left lane change and going straight, Olsen et al. [15] divided the visual area into nine regions, namely, the front, rearview mirror, left window, right window, left mirror, right mirror, left blind spot, right blind spot, control panel, other interior, and indeterminate, and the results showed that before a left lane change drivers pay attention to left AOIs twice more often than to front AOIs. Although these studies showed intuitively sound conclusions, it is still not clear how AOIs were divided.

Using the front left, left window, and left mirror as the indicators of left lane change intention is widely accepted. The problem is how to determine the borderline for each AOI. For example, if the area of the rearview mirror is too large, fixation points that in fact belong to the windshield view may be mistakenly assumed as rearview fixation points. If the area is smaller than it should be, some rearview fixation points may be processed incorrectly. A vague borderline between different AOIs may lead to wrong conclusions. Therefore, determining the number of AOIs and their borderlines is of great significance.

Based on fixation data from real vehicle driving tests, this paper proposes a clustering method to divide the visual area into a certain number of subareas and determine the borderline of each AOI.

#### 2. Methodology

Cluster analysis (clustering) is the process of grouping data into classes or clusters based on similarity. Objects similar to one another are grouped within the same cluster, while different objects are placed in different clusters. That is to say, clusters formed this way should be highly homogeneous internally and highly heterogeneous externally. Cluster analysis methods are mostly used when there are no any prior hypotheses in the exploratory phase of the study. In a sense, the cluster analysis finds most significant solutions. As an important tool of data mining, cluster analysis is widely applied in statistics, image processing, information retrieval, and machine learning.

Commonly used algorithms of cluster analysis include the -means clustering [16], hierarchical clustering [17], fuzzy clustering [18], spectral clustering [19], and density-based clustering [20]. Due to the complexity of gaze distribution and the larger amount of the data, one clustering method can only solve a particular type of problems. The improved Affinity Propagation (AP) Algorithm, Mean Shift Affinity Propagation (MSAP) Algorithm, is used in this paper, and the result is compared to the -means clustering. Then the paper gives the suggestion of critical AOI division.

##### 2.1. Division Algorithm for Area of Visual Interest

###### 2.1.1. Affinity Propagation (AP) Algorithm

For the fixation area clustering problem, the output of the division algorithm is the area where fixation points concentrate. This concept is consistent with the AP (Affinity Propagation) cluster algorithm. The AP cluster method, proposed by Frey and Dueck [21], obtains high quality clusters by transmitting information among nearby data points. It classifies data points on the basis of similarity among data points. The AP method assumes all fixation points as potential cluster centers. In other words, each point can be treated as an initial representative point. This approach eliminates the need for selection of initial points and also leads to more stable and higher quality clustering results. The AP method is described below.

denotes the similarity between fixation points and , which is determined by the negative Euclidean distance between and . The closer the distance between the two points, the greater the similarity between them.

By traversing through all fixation points the similarity matrix is obtained:where the diagonal element in is the deviation parameter . The larger the value of is, the more likely is a cluster center.

Two information parameter matrices and are based on , where is the attractiveness matrix and is the matrix of membership degrees. Attractiveness is defined as the likelihood that is the cluster center of fixation point . The element is the point-to-information from fixation point to potential cluster center , representing the attractiveness of . The membership degree is defined as the likelihood that chooses as its cluster center. The element is the information sent from to , representing the grade of membership for fixation point . The attractiveness and membership degrees determine the iterations of the clustering algorithm and thus become the core idea of the algorithm.

Given that is the cluster center of fixation point , the assignment of each non-center point to its cluster center represents the clustering results.

In the information iteration process, the damping coefficient plays an important role in the algorithm convergence. The larger the value of , the better the convergence. In each iteration step, the updated results of and are the weighted sums of the last and current iteration outcomes, where the corresponding weights are and , respectively. Given the number of the current iteration, the weighted iteration equations are as follows:where can suppress possible artifacts of the algorithm.

###### 2.1.2. Mean Shift Affinity Propagation (MSAP) Algorithm

Through iterative deployment, the AP algorithm uses the information transfer of each point-to-point pair to reach stable cluster classification. The computational complexity is , which is large and high-dimensional. Regarding drivers’ fixation points, the large data size makes the calculation of the similarity matrix more complicated and thus increases the time and space complexity of the clustering process. To avoid the influence of complexity on the clustering of the fixation interest area and to reduce the scale of the similarity matrix, the time and space complexity needs to be decreased in order to accelerate the clustering process.

Accordingly, we propose an improved AP method, the Mean Shift Affinity Propagation (MSAP). First, we use the mean shift method to preprocess input fixation points. The number of data sets is replaced by the number of areas, where each area can be treated as a whole. Then, instead of fixation point coordinates, mean coordinate values of fixation points in one area are considered as data points. Using the number of areas rather than the number of data sets, the AP algorithm conducts a redivision process to improve clustering performance.

The AP algorithm clusters fixation points based on similarity. fixation points constitute an similarity matrix . Before calculating , we need to define the feature space. Considering the correlation between eye movements and sight lines, the scan angle of the sight line is introduced as the main feature of the feature space. After preprocessing data using the mean shift method, any fixation point entered is expressed as a three-dimensional information vector , where represents the two-dimensional coordinates and is the scan angle of the sight line of the entered fixation point.

The core function can be written as follows:where is a constant, and denote the spatial dimension (in this research , ), is the radius of the core function, representing the core size of the spatial domain, and is the radius of the feature space, representing the core size of the value domain.

#### 3. Fixation Data Acquisition and Characteristics Analysis

##### 3.1. Pixel Coordinates and Reference Markers

In a real driving test, scenes captured by the cameras of the Dikablis eye tracker (in Figure 1) always change with head turning during lane changing, turning, accelerating, and decelerating. To obtain the relative position of a fixation point, the pixel coordinates and reference marker (pasted on the control panel or windshield) are introduced. The camera coverage is divided into small squares. Regardless of how scenes change in the camera, the relative coordinates of the extracted fixation point (the reference coordinate origin is fixed at the upper-left corner ) never change. The coordinates of the fixation point are labeled as , where and .

Assuming that the camera projection area is , which is constant with length and width , define the driver’s vision coverage as with length and width . The necessary and sufficient condition for the transformation from pixel coordinates to two-dimensional coordinates iswhere and represent the number of scenes and markers, respectively, and denotes the presence of the th marker in the th scene. If marker exists in scene , ; otherwise, .

##### 3.2. Conversion of Pixel Coordinates to Two-Dimensional Coordinates

If the setting of reference markers satisfies the first condition in (6), then there is at least one reference marker in the scene. As the marker’s pixel coordinates change with different scenes, a single reference marker may have more than one pixel coordinate. However, given a pasted marker on the windshield, like in Figure 2, there should be only one fixed two-dimensional coordinate. Although the pixel coordinate of the fixation point shifts, its direction relative to the reference marker does not change. We define the direction of fixation point A relative to reference marker B as vector . Given the two-dimensional coordinate of marker B, the only relative two-dimensional coordinate of fixation point A can be computed using the following equation:where and are the two-dimensional coordinates of fixation points and reference markers and and are the pixel coordinates of fixation points and reference markers in the same scene.

##### 3.3. Types of Fixation Points

The real vehicle driving tests involved 16 drivers, whose basic driving attributes are recorded in Table 1. A SUV is employed in the test to maintain a good visual condition for drivers. In order to ensure driving environment is consistent for all the participants, each experiment lasted for at least 30 minutes and conducted on workdays in good weather during free hours (10:00 am–11:00 am; 15:00 pm–16:30 pm) included the same freeway and highway scenarios using D-Lab driving analysis system (made in Ergoneers company, Germany). During the test, all participants were almost exposed to the same natural lighting condition which is about 30000 Lux. Drivers wear the Dikablis eye tracking glasses which can track drivers’ eyes movement and collect data related to eyes, such as fixation point, and the data can be output and basically processed by D-Lab system. Before data recording, a five-minute warm-up driving was conducted, and during the tests, the participants were required to behave normally to provide a driving behavior as realistic as possible. In total, more than 8000 fixation points (average 8132, sample rate 81.36%) were extracted for each driver using the D-LAB Studio v3.0. Then, the correlation analysis of the fixation points was conducted to figure out the visual characteristics of all participants.

In the bivariate correlation analysis, the Pearson correlation, confidence intervals, and significance tests were used. The Pearson correlation coefficient is given as follows:where and are the means of and , respectively, and and represent the standard deviation of and , respectively.

Given the null hypothesis , the statistic is computed after Fisher’s -transformation.where follows the normal distribution with mean 0 and standard deviation .

The significance test value is given by

The confidence interval is given as follows:

Given (95% confidence interval), the computed values of and are presented in Table 2. The D-Lab analysis system automatically overlays drivers each fixation spot and draws the hot spot figure as shown in Table 3. Further analysis shows two types of the fixation point distribution. One type corresponds to the case where and are uncorrelated, and the other type corresponds to the case of weak negative correlation. According to the distribution of fixation points in Table 3, the negative correlation case is defined as the circle type and the uncorrelated case as the rectangle type.

###### 3.3.1. Circle Type

In this case, the coordinates of fixation points present weak negative correlation. The high frequency gazed area (also called the hot spot area) is concentrated in a certain region where drivers look straight forward. The most focused point (actually a small area) is the center of the circle. The closer the fixation point to the center, the more the driver gazes at that point. Experienced drivers’ fixation points are classified into this type. These drivers focus on the lanes ahead, having less head turns and quicker scanning speeds. For prejudging and lane changes, compared to novice drivers, experienced drivers spend less time watching the left/right rearview mirrors, and their sightline returns to the front windshield quicker. Some experienced drivers even watch the rearview mirror before changing a lane. Their eye movement behavior explains why experienced drivers’ fixation points are more concentrated in the hot spot area and distributed in a circle shape.

###### 3.3.2. Rectangle Type

In this case, the coordinates have no significant correlation. The hot spot is not concentrated and shows a rectangular shape. Novice drivers’ fixation points fall into this category. With less experience, these drivers rarely watch control panels or rearview mirrors. During a lane change, they spend more time (longer duration or repeated views) watching the left/right rearview mirrors. Their eye behavior forms a long strip distribution.

#### 4. Division of Area of Fixation Interest

According to the clustering algorithm proposed above, the procedure to classify the fixation points (more than 8000 for each participant) is described as follows.

##### 4.1. Data Preprocessing

Extract the feature space based on the scan angle of the sight line using the MS (mean shift) method. The initial data is divided into multiple data sets. Then, the central coordinates of data sets (areas) can be calculated and input into the AP algorithm as improved initial fixation points.

##### 4.2. Initialization

Calculate the negative Euclidean distances between input fixation points to obtain the similarity matrix . Initialize variables by assigning reasonable values. We hereby set , , the maximum iterations = 1000, the successive iterations for a given cluster center = 100, and = a certain value.

##### 4.3. Iterations

In each iteration, compute the values of and according to (3) and (4) to find potential clusters. There are two criteria to end iterations, namely, the maximum number of iterations (1000) and the number of successive iterations for a given cluster center (100).

##### 4.4. Result Judgment

If the clustering results, including the distribution and clustering number, do not meet requirements, change the value and repeat the clustering procedure until satisfactory results are achieved.

In the clustering algorithm, two parameters affect the number of iterations and the number of clusters. One is the damping coefficient , which determines the algorithm convergence. The other one is the deviation parameter , which mainly affects the number of clusters. Different values lead to different cluster numbers. Larger values generate more clusters.

Different values of the damping coefficients result in different numbers of iterations and different turbulence curves. Taking one driver’s fixation points as an example (Figure 3), a smaller (0.50) leads to dramatic fluctuations and a lower number of iterations. A larger (0.90) leads to the opposite results. To explain this phenomenon, we need to consider the relationship among , , and . In each iteration, and are both affected by . For smaller values of , the net similarity between two successive iterations fluctuates sharply, possibly leading to a local optimum or instability. Larger values of lead to slower convergence speeds but make iterations stable, avoiding the shortcomings of smaller .

**(a)**

**(b)**In this study, there are more than 8000 fixation points for each participant. To make the iteration process as stable and convergent as possible, a larger (=0.90) is advisable. Although the number of iterations increases, the changing trend of net clustering similarity levels off.

In Figures 4(a), 4(b), and 4(c), the median () is the diagonal element in the similarity matrix. We can see that the larger the value (absolute value), the larger the number of clusters. We do not consider less than two or more than ten clusters, since a too small or too large number of clusters lose internal homogeneity and external heterogeneity. By assigning the value of 2 × medians (), median (), and median ()/2, we can obtain two, four, and six clusters, respectively. The center of each cluster is labeled by the black star.

**(a)**= 2 × median (); number of clusters = 2

**(b)**= median (); number of clusters = 4

**(c)**= median ()/2; number of clusters = 6Different numbers of clusters have different explanations. Given = 2 × median (), the fixation area on the windshield for both circle and rectangle types is divided into two parts. The dividing line for the circle type is slightly to the left of the center and for the rectangle type just the opposite. In Figure 5, a driver looks straight at the target (a car in front or the lane) while driving normally. As the driver is placed on the left side of the car, the driver’s sight line intersects the windshield to the left of the center line. However, usually, a novice driver, who drives slower than average, pays more attention to passing-by vehicles or other information on the left and right sides, making his fixation points diluted.

When = median (), clear differences appear. For the circle type (experienced drivers), the area of fixation interest is divided into four parts, which have one-to-one correspondence to the left rearview, right rearview, rearview, and control panel views. Novice drivers’ eye movement characteristics show little attention to the rearview mirror and control panel; therefore, the area of fixation interest is split into four parts horizontally and almost evenly. Figure 5 shows the division lines’ relationship between the novice and experienced drivers on two AOIs situation. It is clear that two types of driver divide attention to left and right differently.

Given = median ()/2, the number of clusters increases to six. Figure 4(c) shows trends similar to Figure 4(b). For the circle type, the only difference is that the areas of rearview panel are both divided into two subareas, which is hard to explain reasonably.

#### 5. Conclusions and Discussion

Using real vehicle driving tests, eye movement data for 16 drivers was obtained. In this research, fixation points were extracted in the D-LAB Studio to divide the area of visual interest into subareas with the purpose to find out how drivers watch objects in sight and determine the distribution of AOIs and the difference in fixation outcomes between different types of drivers (experienced and novice). According to the clustering results using the MSAP method, the following conclusions can be drawn.

The clustering method called the MSAP is suitable for the division of AOIs. Generally, both cluster number and AOI distribution show consistency with our driving common sense. The most important finding is that experienced drivers’ fixation points are distributed in a circle, in which the center is the most-watched region. At the same time, novice drivers’ fixation points are relatively decentralized, showing a rectangular shape. In this rectangle, there is no clear concentration region in both horizontal and vertical directions. Therefore, in eye movement or eye tracking research, it is imperative to distinguish experienced and novice drivers; otherwise, a bias may arise. To obtain a reasonable division of AOIs, using data on experienced participants is preferable.

According to Figure 4, dividing the front vision area into four subareas (AOIs), in addition to the right mirror area and left mirror area, makes most sense. As shown in Figure 6, the front four AOIs are the left vision region (left windshield and left rearview mirror), right vision region (right windshield and right rearview mirror), rear vision region, and control panel vision region. Since Lethaus and Rataj [3] concluded that eye movements precede drivers’ behavioral features, the four AOIs are consistent with driving habit. Drivers would be fixed at the left windshield area on usual driving, and pay more attention to the rear mirror area when they try to understand the traffic information behind themselves, or they would transform their fixation to the control panel when they like to know whether they were speeding. The differences between experienced and novice drivers are reflected in the rearview mirror and control panel regions. Watching the rearview mirror makes lane change safer, and in-vehicle entertainment devices (located in the control panel area) make driving more relaxed. However, due to relatively limited driving experience, novice drivers rarely look at rearview mirrors and control panels. This phenomenon requires a reasonable design of control panels intended for human-machine interactions requiring as little visual attention as possible.

Compared with the AOI division methods mentioning in Introduction, the division of AOIs in Figure 6 is more closely related to driver glance behavior. Since Olsen et al. [15] made no difference between experienced and novice drivers and they just divided the AOIs mechanically as shown in Figure 7, actually, drivers sit at the left side in the car and usually pay more attention to the left windshield, so the AOIs division cannot reflect driver concerns when it just use the “front” as an AOI of front windshield in Figure 7.

In addition, the AOI division result is not consistent with any previous study. This could be explained by the fact that the gaze dates are collected as usual driving, which include going straight and lane changing, not only one special driving condition. Since Fitch was aimed at analyzing the difference between lane change crashes and near-crashes, the font area was qualitatively divided into the front left, front, and front right [14], but the borderlines between different AOIs are not clear. So it is worth noting that the number of clusters depends on the purpose of study. If we predict lane change intentions, it is suggested to divide the left/right view regions into two (or more) subareas, namely, the windshield and left/right rearview mirror subareas. And the MSAP clustering method would be an effective way to find the AOI borderlines.

Together with further investigation on drivers’ eye movement characteristics, the AOI division can provide useful information for control panel design, objects placement in vehicles, and so forth. More importantly, the concept of the AOI can provide a connection between drivers’ behavioral intentions and eye tracking data. For example, a fixation point projected on the left rearview mirror may be an implication of a left lane change. Based on the real-time distribution of fixation points, we can predict the lane change probability during driving. However, this task is for future research.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This paper was supported by the National Basic Research Development Program of China (973 Program) (Project 2014CB046801); the National Science Foundation of China (Project 51208261); and the Postdoctoral Science Foundation of China (Project 2015M572728 and Project 2016M602972).