#### Abstract

Car-following behavior is a vital traffic phenomenon in the process of vehicle driving. For modeling the car-following behavior, it is crucial to capture the reaction delay for balancing with safety and comfort, but it is generally ignored in existing works. This work proposes a car-following model based on attention-based ensemble learning to automatically capture the reaction delay from driving data and better depict the traffic flow characteristics. The model integrates a data-driven model and a theory-driven model, and a weight computation method is proposed to combine the advantage of these two different models. In detail, an encoder-decoder model and attention mechanism are employed to capture the reaction delay from driving data. Extensive experiments show that the proposed model could balance safety with comfort and help avoid unsafe driving behavior.

#### 1. Introduction

Car-following (CF) model, as the basis of the theoretical research on traffic flow, has caught more and more attention in the emerging of intelligent driving. CF model is a mathematical description of the movement of a car in the same lane given the change in the moving state of the front vehicle under the case of no overtaking [1]. CF models could generally be divided into theory-driven and data-driven methods. Abound of related works have been carried out to improve the simulation accuracy of the CF model for the safety and comfort of driving, which provides a dynamic data simulation for the traffic flow theory. The theory-driven CF model puts forward the theoretical hypothesis about the research object according to data characteristics and establishes a mathematical model that accords with the academic view. The data-driven CF model usually uses a large amount of high-precision vehicle driving data to describe the vehicle’s state changes in motion. Both models need vehicle driving data as support. However, it remains hard to select a more accurate mathematical model through a single speculative idea from massive traffic data and complex data characteristics. So, the theory-driven models show a significant insufficiency to meet the requirement of an intelligent transportation system. On the other hand, the data-driven models offer a considerable advantage to achieve higher prediction accuracy without understanding the internal mechanism of the research object [2]. In recent years, an increasing number of CF methods have been developed based on data-driven models, including early linear models proposed by Chandler and Herman [3] and Sasaki [4] and nonlinear models by Newell [5], Li et al. [6], and Yu et al.[7].

In modeling CF behavior, the driver’s behavior is essential to reproduce the actual traffic state. A few scholars have incorporated safety and comfort into the driver’s driving behavior and considered reaction-delayed driving behavior. However, it remains challenging to capture the reaction delay automatically. At the same time, improving the simulation accuracy and avoiding unreasonable simulation results are also urgent problems to be solved [8]. Reaction delay is a common characteristic of humans in operation and control, such as automobiles, which integrates mental processing time, movement time, and device reaction time [9]. Estimation of reaction delay derives from the stimulus-reaction theory, which could be expressed as the time delay between the changes in driving conditions and subsequent reactions. Reaction delay has gradually become an indispensable key factor in the study of CF behavior [10, 11]. Recently, reaction delay has also been paid more and more attention in the data-driven CF models. Specifically, more historical driving behavior information is considered for the reaction delay in the model of practical applications. For example, Huang et al. proposed a long short-term memory neural networks-based CF model to capture natural traffic flow characteristics by incorporating the driving memory [12]. Jafaripournimchahi et al. developed a new CF model to investigate the effects of driver anticipation and driver memory on traffic flow [13]. Fei et al. developed a CF model with driver time memory based on real-world traffic data, which is effective and robust, thereby improving simulation accuracy [14]. Chen et al. believed that data-driven CF models could be a promising research direction [15, 16].

As the primary goals, safety and comfort are the essential considerations in modeling the CF problem. So far, Intelligent Driving Model (IDM) is a widely used theory-driven model to get considerable performance. Moreover, each parameter has an explicit physical meaning, and the change of the following vehicles could be displayed intuitively. However, it will generally be more comfortable and stable when there is a smaller velocity disparity between front and rear cars [17]. To address this issue, Ma and Qu utilized statistical methods to estimate the reaction delay and used the seq2seq model to build a CF model for effectively reducing unreasonable driving behaviors [8]. Ying integrated the theory-driven model in the way of linear combination, which ensures simulation accuracy and improves the security of the data-driven model [18]. Still, it remains challenging to balance between safety and comfort and maximize the traffic flow.

What is more, although considerable CF models have been proposed, there is still a vital challenge to avoid unreasonable driving behaviors, such as frequent braking and acceleration. To address these issues, we propose an attention-based ensemble learning CF (AEL-CF) model to offer control decisions by combining multiple models to address these issues. The experimental results show that the model could capture reaction delay, ensure simulation accuracy, improve safety and comfort, and even reproduce the actual CF trajectory.

The main contributions of this work could be summarized as follows:(i)We first propose an attention-based encoder-decoder car-following (AED-CF) model, automatically capturing the driver’s reaction delay. Specifically, the captured reaction delay is reflected by different weights assigned, which are obtained from the input sequence by the model. To the best of our knowledge, we are the first to automatically capture reaction delays from driving data in the CF model.(ii)An attention-based ensemble learning CF (AEL-CF) model is proposed to incorporate the AED-CF and IDM models. The AEL-CF model could automatically capture the reaction delay and then effectively overcome the shortcomings of the data-driven model (i.e., frequent changes in acceleration).

The rest of this paper is organized as follows: we propose the AED-CF and AEL-CF models and determine the input and output variables of the model in Section 2, Section 3 introduces the experimental data and experiment details, including the training process of AED-CF and AEL-CF models; Section 4 discusses in detail the performance of the proposed models, and Section 5 concludes this work.

#### 2. Methodology

This section mainly introduces the construction of feature data, the framework of the model, and the combination of the model. Most scholars estimate the reaction delay by analyzing a large number of data. Generally, the data-driven model is not employed as a substitute for the theory-driven model. Combining the advantages of data-driven and theory-driven models to get higher simulation accuracy is worthy of further exploration and research. This work proposes an attention-based ensemble learning (AEL-CF) model with a memory effect, which helps to capture the reaction delay.

The AEL-CF model combines the advantages of the IDM model and the attention-based encoder-decoder car-following (AED-CF) model we proposed. The IDM model will be introduced in the experimental part. The framework of the AED-CF model is shown in Figure 1. This section also introduces the AEL-CF model to combine the IDM and AED-CF models.

As shown in Figure 1, the AED-CF model comprises input, encoder, attention, and decoder layers. The model considers the temporal characteristics of CF behavior for capturing the reaction delay, whose input layer includes the main vehicle speed, gap distance, and relative speed. The encoder layer extracts the latent feature. At the same time, the attention weight coefficient is obtained through the attention layer. Then, hidden information about temporal characteristics could be acquired. Finally, the hidden details on temporal characteristics are decoded by the decoder layers. For the CF model, the speed at each step could be regarded as a response caused by a stimulus at a specific historical moment. Input variables of the AED-CF model are regarded as the stimulus, output variables as the response, and vehicle speed for the next step is the model’s output. Thus, the relationship of the stimulus and response could be obtained in the following equation:where denotes the function of mapping relation between input and output variables; denotes the maximum historical time steps, and represents the updating time step in the AED-CF model which is 0.1 s.

##### 2.1. Input and Output Variables

In the CF model, the subject vehicle (*n*) will adjust its state according to the driving condition of the leading vehicle (*n *−* *1). Input variables of data-driven CF model generally use space headway , gap distance , relative speed difference , and speed of the subject vehicle . The output variables usually use subject vehicle speed or acceleration , as shown in Table 1. According to some previous studies [8, 12, 19–21], we decided to use , , and as the input variables of the AED-CF model and as the output variables. Through the comparative experiment, the proposed model outperforms the baseline methods when the time step is 30, and the time step of the input is set to be 30 (3.0* *s).

##### 2.2. Encoder-Decoder Model

Recurrent neural network (RNN) is more and more widely used in the CF model. For example, Huang et al. successfully captured the driver’s asymmetric driving behavior by using the long short-term memory (LSTM) model [12]. Ma and Qu employed a Seq2Seq model to analyze the simulation information on the next step with historical driving information [8]. Inspired by this, we decided to establish a CF model-based encoder-decoder model, which will be briefly introduced in this section.

The encoder-decoder model has shown an excellent performance in sequence prediction [22] and has been widely used in various fields. The first encoder-decoder model was proposed to solve the problem of phrase representation [23], in which it integrates RNN. Since then, some scholars have made improvements encouraged by it and have also achieved good results in language translation [24]. As shown in Figure 2, LSTM is used as the neuron of the encoder-decoder model in this work, and the encoder and decoder layers use the same framework with the same single-layer LSTM structure. The LSTM units first take in the information from input variables and encode it step by step in the encoder layer. Then, the final output is the context vector C, representing the input variables’ summary information. The decoding process obtains the data by analyzing the context vector C and the previous memory cell state.

##### 2.3. AED-CF Model

Ma and Qu used statistical methods to analyze the availability of high-fidelity trajectory data and estimated reaction delay [8]. We hope that the model could discover the latent information of the data and capture reaction delay automatically. Therefore, we propose an AED-CF model at first to capture reaction delay automatically. The framework of the encoder-decoder model is described in the previous section, and the attention layer is described below.

Generally, the driver’s judgment of the current state will be reflected in action after a certain period, leading to the main risk of accidents. The attention mechanism was employed to improve safety for its superior performance. Under the condition of limited computer performance, the attention mechanism could allocate resources well to optimize the computing speed. The role of the attention mechanism in the encoder-decoder model is to structurally select a subset of the input variables, making the model focus on more practical information. Assigning different weights to the input variables is the crucial point, which allows the model to focus on the more valid content information. The framework of the attention layer is shown in Figure 3.

The input-output relationship of the attention layer could be described as in equation (2). is the activating result of with the softmax function, which could be obtained as in equation (3), and could be calculated by equation (4).where denotes the input variables of the attention layer; denotes the output variables of the attention layer; and denotes matrix transpose.

##### 2.4. AEL-CF Model

The data-driven CF model owns the ability to describe the state changes in the real following process accurately. However, it could not guarantee the comfort level in the driving process. For example, frequent acceleration and deceleration is the primary reason for the decrease of the comfort level. The theory-driven CF model could significantly improve the comfort level. Still, there exists the disadvantage that it could not accurately describe the state changes of the vehicle. After the AED-CF model, an AEL-CF model is proposed in this work to consider reaction delay for balancing between comfort and safety.

In this work, the combination forecasting method is used to constitute the model. The combination forecasting method is the prediction of more than two different prediction methods for the same problem. According to the different forms, it could be divided into equal weight combination and unequal power combination. Equivalent weight combination means that the predicted values of each forecasting method are combined into new predicted values according to the same weight. Unequal weight combination implies that the importance given to the forecasting methods is different. In this work, we selected an unequal weight combination. Ying used the evaluation index of the model to calculate the parameters of the combined model [18], but more evaluation indexes involved in the calculation mean a better fusion effect [25]. Therefore, the evaluation indexes , , and of the model are used to calculate the parameters of the combined model. The parameters of the combined model are determined by the following equations:where denotes the safety evaluation index; denotes the comfort evaluation index; denotes the mean square error of the model output; denotes the contribution of the model; and indicates the weight of the model.

#### 3. Experiment and Results

##### 3.1. Data Description

Deep learning is a data-driven method, and massive data are required for gaining knowledge from data. The NGSIM datasets, collected from the US Highway 101 in Los Angeles, California, from 7:50 a.m. to 8:05 a.m. on June 15, 2005, were employed in this work. What is more, to avoid the effect of lane changing, the trajectories of vehicles that kept driving on the five lanes (lane 1, 2, 3, 4, 5) without any lane changing were collected. Finally, 810 vehicles’ trajectories (436,518 samples) were collected.

##### 3.2. Data Preprocessing

For raw NGSIM data containing anomalous acceleration and deceleration values, the symmetric exponential moving average (SEMA) is employed to reduce measurement errors [26, 27]. With SEMA, the smoothed vehicle position is determined by the following equations:where denotes the interval between collecting data and generally be set to 0.1 s; denotes the average time window width to ensure that the moving average is symmetrical; and denotes the moving average range, which is set to be 0.5 s.

After further extraction of the processed data, a total of 1,091 vehicles of car-following data were obtained. The data of 600 vehicles were randomly selected, of which 500 groups were used as training datasets (Data500 for short) and 100 groups as test datasets (Data100 for short). In addition, Data100 will also be used to calibrate the parameters of the IDM model.

##### 3.3. AED-CF Model

This section describes the experiment in detail, including dataset segmentation, training strategy, objective function, and optimization algorithm.

Data500 is used for parameter iteration of the model. We randomly select 70% of the dataset as the training set while the remaining constitutes a verification dataset. The model is updated iteratively only on the training set. The loss function is used to evaluate the model performance on the validation set at the end of each update round. To get rid of overfitting or underfitting, “Early Stopping” is introduced in training. The model stops training when the results do not achieve improvement two times.

We defined the loss function to evaluate the error between the actual and predicted values. In this work, the mean square error (MSE) of speed and gap distance is used as the loss value to optimize the model. The loss function of the proposed model is shown in the equation as follows:where and denote actual speed and gap distance, respectively, and and denote simulated speed and gap distance, respectively.

The optimization algorithm of the model adopts the adaptive optimization algorithm, which is named “Adam.” The detailed parameters are defined as follows: lr = 0.001, *β*1 = 0.9, *β*2 = 0.999, *ε* = 1e-08, and decay = 0.0. The hyperbolic tangent function tanh is selected as the activation function for both the encoder and decoder parts. The final output uses a parametric rectifying linear unit (PRELU) to reduce the risk of overfitting. The formula is shown as follows:where *α* is a learnable parameter vector.

Through experimental optimization, the best choice of other super parameters is given as follows: the number of neurons in the LSTM unit is 32; the depths of encoder and decoder are both set to be one layer, and the number of training rounds is set to be 20.

##### 3.4. AEL-CF Model

The AEL-CF model combines the IDM and AED-CF models using an unequal weight combination prediction method. During the experiments, we found that the comfort evaluation index of the subject vehicle of the AED-CF model is lower than that of the IDM model, but its safety and simulation accuracy are higher than other models, and it could automatically capture the reaction delay. To combine the advantages of each model, we combine the IDM and AED-CF models together. The combined weight is calculated by equations (5) and (6), and the specific calculation results are shown in the next section.

##### 3.5. Experimental Results

In this section, we optimized the evaluation indexes of the model and calculated the parameters of the AEL-CF model according to the experimental results. The final experimental results are shown for the effectiveness evaluation of the model.

###### 3.5.1. Evaluation Index Design

To quantify the simulation performance of the model, Ying proposed the comfort and safety metrics [18]. Based on these metrics, the improved indexes are offered in this work to reflect the model’s performance on the whole verification dataset. The speed mean square error is also employed as the evaluation index of the model. represents the oscillation of the leading vehicle’s increasing speed in the process of driving, which is an index for comfort; represents the minimum time headway between the front and rear vehicles, which is an index for safety; represents the velocity error between the simulated trajectory and the field data, which is the index for system accuracy. As shown in equations (13)–(15), the larger , smaller and mean a better model.

###### 3.5.2. Performance Comparison

In this section, the models are compared with the model evaluation metrics. Firstly, the AED-CF model, IDM, and LSTM model are compared, respectively. Then, the parameters of the AEL-CF model are calculated based on the experimental results of the IDM model and the AED-CF model by equations (5) and (6). Finally, the AEL-CF, AED-CF, and IDM models are evaluated with performance comparison.

It has been shown that LSTM networks could achieve better prediction results by increasing the number of layers to be eight [12]. Therefore, the LSTM model parameters used for comparison in this work are as follows: the number of LSTM layers is set to be eight; the number of neurons is set to be 32, and the historical time step is 50. The optimization algorithm, loss function, and other configurations are consistent with the AED-CF model.

IDM is a kind of theory-driven model used widely in various applications, which was proposed by Treiber et al. by integrating the effects of expected gap distance between the leading vehicle and subject vehicle and expected speed to be controlled [28]. The model is shown as follows:where denotes the function of expected gap distance, which is calculated by velocity and relative velocity . The goal of the model is to standardize expected velocity , maximum acceleration , maximum deceleration , expected time interval , and minimum space interval .

Noteworthily, the output of IDM is the acceleration value, but the evaluation function used in this work requires the speed value. Therefore, the following equation is employed to convert acceleration value into speed value:

Then, we used the simulated annealing algorithm (SAA) to obtain the optimal parameters for IDM. The optimal parameters for Data100 are shown in Table 2.

IDM, LSTM, and the AED-CF models we proposed in this work are compared on the Data100. The model is evaluated by , , and . The evaluation results shown in Table 3 show that the AED-CF model outperforms the baseline methods in system accuracy and safety evaluation. However, IDM is still better in comfort evaluation.

To further compare the models, we select Vehicle 39 and plot the observed and simulated trajectory profiles in Figure 4. The results show that all the models could follow the trend of observed trajectory well. In contrast, the simulated trajectory of the AED-CF model is more consistent with the field data.

**(a)**

**(b)**

According to the evaluation results of the IDM and AED-CF models, the parameters of the AEL-CF model are calculated by equations (5) and (6). The weight of IDM is 0.4725, while that of the AED-CF model is 0.5275. We also evaluated the AEL-CF model on the Data100. The results are shown in Table 4. We could see that the AEL-CF model well integrates the advantages of the AED-CF and IDM models. It not only ensures the accuracy of simulation speed but also improves the model’s comfort level.

Although the simulation accuracy of AEL-CF shows a slight decrease with a comparison of the AED-CF proposed in this work, it still keeps a significant improvement with a comparison of IDM. What is more, both comfort and safety indices are improved by comparing AED-CF and IDM. These experiments demonstrate that the AEL-CF model owns the ability to combine the advantage of IDM and AED-CF models.

To show the system accuracy performance, we also selected Vehicle 39 to observe the simulation effect of the AEL-CF model. It is compared with IDM and AED-CF models. The results are shown in Figure 5.

**(a)**

**(b)**

#### 4. Discussion

The model captures the reaction delay automatically from the Data100, which is selected randomly. As shown in Figure 6, we visualize the weight of attention in the model. From Figure 6, we can see that the reaction delay captured by different vehicle data has a similar distribution. The input history step size of the model is 30 (3.0* *s), and the attention mechanism focuses more on the 16th and 29th-time steps. The weight allocation of the 16th time step indicates that the AEL-CF model will focus more on the penultimate 14th time step of the input sequence before simulating the next vehicle speed.

**(a)**

**(b)**

**(c)**

**(d)**

In other words, the AEL-CF model will set the reaction delay at 1.4* *s. For the trajectory data, the state at the next step is closely related to the last moment, so the model also assigns more weight to the 29th time step in the weight assignment of attention. The model is hard to make further distinctions between these two-time steps, which is a shortcoming of the model. But in general, the model shows the ability to capture the reaction delay.

As a comparison reference, Ma and Qu estimate the reaction delay by sampling the time lag of sequence between the relative velocity and acceleration and find the reaction delay which equals 1.33 seconds [8]. The reaction delay value computed automatically by the AEL-CF is close to the manual sampling results with the same dataset. These results show that the AEL-CF model owns the ability to capture the reaction delay automatically.

The ability to capture reaction delay could also be reflected in the simulation result. The following pairs between the leading vehicle with ID 2581 and the subject vehicle with ID 2593 were selected for simulation to verify this ability further. As shown in Figure 7, the speed peaks in the subsequent cars, which are around 1.4* *s, also show the validity of the reaction delay estimation. Noteworthily, the reaction delay could be different for specific drivers, and the estimation value in this work is the average value of all response latencies in the NGSIM dataset.

**(a)**

**(b)**

#### 5. Conclusion

Benefiting from the data available in NGSIM, we make our efforts to the car-following (CF) behavior modeling. To automatically capture the driving behavior characteristic of reaction delay, we introduced the encoder-decoder framework and attention mechanism into the car-following behavior modeling. This paper has made the following contributions:(1)Aiming at the car-following behavior modeling problem, an attention-based encoder-decoder car-following (AED-CF) model is proposed to automatically capture the driver’s reaction delay.(2)Based on the evaluation results of each model, an attention-based ensemble learning CF (AEL-CF) model is proposed, which combines the IDM and AED-CF models with unequal weight combinations. Experimental results show that the AEL-CF model owns the ability to improve both safety and comfort indexes.

In summary, the AEL-CF model illustrates a combination of the rule of IDM and AED-CF models. Experimental results show the proposed AEL-CF could capture reaction delay automatically and improve both comfort and safety indexes. Considering the reaction delay varies with the driving environment, AEL-CF’s automatic reaction delay detection could be applied for different situations.

#### Data Availability

The NGSIM data used to support the findings of this study are freely available for download at the US DOT Intelligent Transportation System (ITS) Public Data Hub (https://its.dot.gov/data/).

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

Portions of this research were funded through the projects of the National Science Foundation of China (nos. 41971340 and 41471333), projects of the Fujian Provincial Department of Science and Technology (Nos: 2021Y4019, 2020D002, 2020L3014, and 2019I0019), and the support of the Foundation of Fujian Key Laboratory of Automotive Electronics and Electric Drive (Fujian University of Technology) (No: KF-X18002).