Abstract
Destination prediction is an active area of research, especially in the context of intelligent transportation systems. Intelligent applications, such as battery management in electric vehicles and congestion avoidance, rely on the accurate prediction of the future destinations of a vehicle. Destination prediction methods can utilise mobility patterns and can harness the latent information within vehicle trajectories. Existing approaches make use of the spatial information contained within trajectories, but this can be insufficient to achieve an accurate prediction at the start of an unfolding trajectory, since several destinations may share a common start to their trajectories. To reduce the prediction error in the early stages of a journey, we propose the Destination Prediction by Trajectory Subclustering (DPTS) method for iteratively clustering similar trajectories into groups using additional information contained within trajectories, such as temporal data. We show in our evaluation that DPTS is able to reduce the mean distance error in the first 40–60% of journeys. The implication of reducing the distance error early in a journey is that locationaware applications could provide more accurate functionality earlier in a journey. In this article, we (i) propose the Destination Prediction by Trajectory Subclustering (DPTS) method by extending an existing destination prediction method through incorporating an iterative clustering stage to decompose groups of similar trajectories into smaller groups and (ii) evaluate DPTS against the baseline performance of the existing method.
1. Introduction
Intelligent transportation systems can assist drivers and can benefit from having an accurate prediction of the destination in advance. This is a key motivation for research into methods for robust destination prediction, which utilises patterns learnt from daily life. Destination prediction is also linked with traffic assessment, where intelligent vehicles or roadside units report the amount of congestion [1, 2] and traffic flow efficiency, where cooperative routing occurs to minimise the congestion encountered on route [3–5]. Techniques in these areas can have mutual benefit when used together, such as to facilitate realtime rerouting [6, 7]. Besse et al. proposed a method for destination prediction [8], which we refer to as BDP (denoting Besse et al.’s destination prediction method), that uses trajectory similarity classification. This is a technique that tries to predict which group of trajectories an unfolding trajectory is most likely to match. To calculate the trajectory groupings, Besse et al. use hierarchical agglomerative clustering with the symmetrized segment path distance (SSPD), an instancebased trajectory distance metric [9]. Their method uses either a simple unweighted score based on the GMM likelihood, or a weighted score that uses auxiliary variables (such as the hourofday and the dayofweek) and weighting functions to modify the score of each cluster. In this article, we opt for using the unweighted BDP method as a baseline, so that the GMM likelihood score can be used directly without needing to define a weighting function for each auxiliary variable. The unweighted BDP method suffers from poor performance at the start of a journey, where limited spatial information is available. When only a small proportion of a journey has been completed, there is only spatial information for the completed section, and since multiple journeys may originate at a single location and share an initial route, this makes it difficult to distinguish the destination early on. Other destination prediction methods exist in the literature, but either use external information from outside the vehicle, such as ground cover data or road type information, to improve predictive performance [10, 11], require knowledge of the identity of the driver [12, 13], or use a complex representation of road network [12–14]. In this article, we assume that such information is not available and that the identity of the driver is unknown.
In this article, we (i) propose the Destination Prediction by Trajectory Subclustering (DPTS) method, which extends BDP [8] by using additional data and an iterative subclustering approach to decompose trajectory clusters into more specific groupings, and (ii) we evaluate DPTS against the baseline performance of BDP (with the unweighted score). A key difference of DPTS from BDP is the use of iterative subclustering that can take multiple metrics and their respective parameters, performing iterations of clustering. In contrast, BDP uses a single iteration of clustering with the SSPD metric alone. DPTS can be easily extended by adding additional metrics and iterations to the clustering process, and by varying the order of iterations.
This article is organised as follows. Section 2 reviews the related work, Section 3 presents the DPTS method, and Section 4 introduces our experimental methodology and the datasets used for evaluation. Section 5 presents the results of applying DPTS to vehicle trajectories and compares the performance to that achieved by the baseline unweighted BDP. Finally, Section 6 outlines future work and concludes the article.
2. Related Work
Human mobility is a broadening field of research [15–18], which provides useful analysis that can influence areas such as urban transportation planning. Understanding human mobility patterns can have a significant impact on numerous applications, including destination prediction. Schneider et al. note that the average person only visits a small number of locations on a daily basis and that 90% of the population visit less than 7 distinct locations per day, according to the surveys analysed in their work [18]. Patterns in human mobility can be modelled as human mobility motifs, which are abstractions of activity patterns, such as a hometoworkbased tour [18–22]. By analysing mobile phone data, Schneider et al. identified 17 daily motifs, which account for 90% of the recorded trips in their data [18]. Büscher et al. investigated the stability of the most common motifs over time [21], and Li et al. investigated infrequent motif detection [23]. Travel diaries, and the trends or behaviours learnt from analysing them have also been widely researched [24–27]. Multiple studies claim that there are common days for various activities or tasks [24, 26, 28], with an increased stability of people’s travel behaviours on work days [25]. Seasonal, geographical, economic, and cultural factors all have an impact on people’s activity patterns [26–28], in addition to access to and availability of public transport and personal vehicles [29, 30].
Destination prediction has been the subject of much research, with recent work using historical GPS trajectories in order to predict an individual’s next location or final destination. Markov models are widely used for destination prediction [31–33], with some methods considering multiple transport modes [13, 34–36], while others focus on vehicle trajectories [12, 37]. Other approaches include Bayesian inference [10, 11, 14, 38] to predict the intended destination of an individual, Gaussian mixture models [8, 39], decision tree learning [35, 40, 41], and support vector machines [6]. Existing research into destination prediction has shown that consideration of temporal aspects, such as the dayofweek or the hourofday, can improve predictive performance [10, 12]. Research into predicting an individual’s next location is similar to destination prediction, but focusses on predicting the next intermediate location, rather than the final destination. Ziebart et al. provide a good example of next location prediction, where they predict individual turns along a trajectory [11].
Several destination prediction methods use a gridbased approach [10, 11, 37] or have a graph representation of the road network [12, 14, 42]. However, this is more computationally expensive than solely using the raw GPS instances and often requires external data to enable preprocessing, such as map matching and GIS data [42]. Clustering is frequently used as an initial step in destination prediction, translating stay points, which are instances of low mobility, into places [8, 13, 43, 44].
There has been some research that attempts to address the data sparsity problem. For example, Xue et al. propose a method called Subtrajectory Synthesis (SubSyn) that decomposes trajectories into smaller segments and connects these to adjoining segments, creating synthetic trajectories [37]. This greatly increases the number of possible trajectories that can be modelled from an input dataset, since it is rare to have an exhaustive set of input trajectories available. The available data can also be increased by considering multiple individuals and how individuals complement each other, with similar trajectories increasing the value of the training data [45].
Krumm and Horvitz [10] and Xue et al. [37] both use a 1km gridbased approach, and Ziebart et al. evaluate their PROCAB algorithm on multiple grid sizes [11]. A coarse grid improves the matching performance, but also increases destination prediction error, since the grid squares span a larger area. The opposite is seen with a fine grid, and therefore, an appropriate grid size should be selected to achieve an acceptable tradeoff between trajectory matching performance and destination prediction error. A poor choice of grid size may cause separate destinations to be grouped. The grid representation is extended in work by Chen et al. in which grid cells are merged together where adjacent cells have similar routes [28, 46]. The WhereNext algorithm uses a similar approach, in which Monreale et al. propose Tpatterns, which are sequences of regions [41].
Alternatives to gridbased approaches include map matching or generating local graph representations of the road network [12, 14, 42]. Simmons et al. use a mapping database to provide a road graph, in which linkgoal pairs can be formed [12]. Their model can predict the next link and subsequently infer the final destination [12]. Karimi and Liu construct a treebased structure using additional roadnetwork data, alongside amenity information [42]. Similarly, Patterson et al. also propose a graphbased representation, which is constructed from a street map provided by the US Census Bureau [14].
Clustering techniques are used in many approaches as a means of extracting locations, with kmeans [13, 32], hierarchical [8], and densitybased [43] clustering being used. Choi and Hebert adopt the kmeans approach to cluster trajectory segments [32], similar to Ashbrook and Starner who map significant locations into clusters [13]. Ashbrook and Starner use a graph comparing the number of clusters to the number of locations, locating the knee point in order to select a suitable number of clusters [13, 47]. This has proven to be a popular method, and similarities are seen in several related approaches [10, 12, 14, 34, 38, 48]. Cho extract intermediate instances by using a more computationally expensive Gaussianmeans approach [35]. Conversely, Gambs et al. use a densitybased approach to generate the corresponding locations from their input data [36].
To train methods for destination prediction, the first step is to separate the training data into distinct trips or trajectories, such that the first instance of a trajectory is the start location and the final instance is the destination. Time thresholding is often used for this task [10, 13, 34], where the threshold is a minimum duration between two consecutive recorded instances (and instances are not recorded if an individual stays in the same place). Chen et al. [40] use a threshold of 2 minutes, Krumm and Horvitz [10] and AlvarezGarcia et al. [34] use a threshold of 5 minutes, while Ashbrook and Starner opt for a 10minute threshold [13]. The threshold value used varies, implying that it is nontrivial to find a suitable value. In the dataset collected and used in this article, we avoid the need to use time thresholding, since trips are naturally segmented when the vehicle from which trajectories are collected is powered down.
Approaches to destination prediction also have varying input data, with some only using spatial information from within trajectories [8], while others use multiple external sources [10, 42]. For example, Krumm and Horvitz use ground cover data [10], vehicle speed is used by Fukano et al. [48], Karimi and Liu use data on local amenities [42], and others use temporal data [6, 10, 12]. Temporal data, such as the dayofweek or the hourofday, can act as an indicator of the next location and have been shown to improve predictive performance [12]. In this article, we aim at avoiding the dependency on external data and, since temporal data are implicitly available within a trajectory record, our approach will focus on the use of data that are naturally contained within trajectory data.
Besse et al. propose a destination prediction method (which we refer to as BDP), which uses distributionbased models to match similar trajectories [8]. The training trajectories are grouped using hierarchical agglomerative clustering, with the distance between trajectories computed by the symmetrized segment path distance (SSPD) [9]. Using the clustered trajectories, Besse et al. train 2D Gaussian mixture models over each cluster, using the latitude and longitude to fit a distribution to a sample of training coordinates. Once these models are trained, a likelihood can be assigned for each cluster in an unfolding trajectory, and its destination is predicted using the centroid of the most likely cluster. Besse et al. also propose using a weighted score, which uses auxiliary variables, such as the hourofday, with each variable associated with a weighting function to modify the GMM likelihood. Our proposed method avoids the need for defining such weighting functions and is easily extensible in terms of adding additional variables. Our method also results in smaller clusters that naturally take the auxiliary variables into account, which can be beneficial for interpreting predictions. Since our focus is on identifying suitable clusters from which to make predictions, we evaluate DPTS against BDP with the unweighted simple score. BDP has several benefits over other methods since it does not rely on external data [10, 42], it does not require a mapping of the road network, which is computationally expensive to process [12, 14], and it does not discretise the space into a grid representation [10, 11, 37]. While Besse et al. propose the use of auxiliary variables, these variables do not segregate the trajectories into more specific clusters, unlike the proposed DPTS method. Gaps in the literature also exist where trajectories could be clustered into more specific groupings, using criteria such as spatiotemporal attributes. While not the main aim of this article, our proposed method, DPTS, is extensible by design and allows multiple attributes to be used to narrow down specific trajectory groupings.
3. Destination Prediction by Trajectory Subclustering (DPTS)
The motivation behind Destination Prediction by Trajectory Subclustering (DPTS) is to reduce the distance error in destination prediction using vehicle data, specifically when making predictions in the early stages of a journey. We define the distance error as the Haversine (or spherical) distance between the actual and predicted destination. Reducing the distance error improves confidence in the correctness of the predicted destination, which can in turn improve locationaware applications, such as recommendations for which routes to avoid [3, 4] and locations of electric vehicle charging points [49, 50]. The notation used in this article and the functions used when defining DPTS are stated in Tables 1 and 2, respectively. In this article, we define a trajectory , as a strictly ordered sequence of instances , in which an instance is a latitude, longitude, and a timestamp. We hypothesize that the distance error in prediction can be reduced by using the additional data that are contained within the trajectories, such as temporal data or vehicle sensor data, including the vehicle speed and status of doors. Using these additional data, we can group trajectories into more specific clusters than those of BDP, enabling us to (i) lower the average distance between the trajectories within a cluster (and since destination prediction uses cluster centroids, a lower average distance has the potential to reduce the average error) and (ii) improve the prediction of which cluster an unfolding trajectory belongs to. In this article, we focus on using temporal data from within the trajectories. However, our method is data agnostic and can be used with different input data depending on the application and the data available. For example, the number of passengers in a vehicle (obtained from seatbelt status data) could be used as an input to help separate and predict trajectory clusters. We evaluate DPTS using the temporal properties of trajectories to decompose the clusters, in addition to the spatial information, using data that are implicitly available in the time signal associated with each instance in a trajectory. The evaluation in this article assumes that the raw time signal in a trajectory can be translated into a suitable format, for example, from a Unix timestamp to a date and time.
3.1. Overview and Definitions
DPTS begins by performing an initial clustering of the trajectories, akin to that in BDP. The trajectories are clustered using hierarchical agglomerative clustering, using pairwise dissimilarity matrices. For spatial dissimilarity, we adopt the approach taken by BDP, which uses the symmetrized segment path distance (SSPD) to generate the dissimilarity matrices [9]. SSPD uses the segment path distance, which is calculated as the mean of all distances from the points composing the trajectory, , to the trajectory, [9]. SSPD is calculated as the mean of the sum of the segment path distance from to and the segment path distance from to . For temporal similarity, we focus on two properties: the dayofweek and the hourofday. These temporal properties are only considered for the first instance in a trajectory, unlike the spatial similarity, which considers each instance within a trajectory. This is done to minimise the required computation, since the start instance is a key temporal indicator. We define two functions, and , which, when given an input trajectory, , convert the time of the first instance into encoded values for the dayofweek and hourofday, respectively. Since our approach uses hierarchical agglomerative clustering, dissimilarity matrices are required for both the dayofweek and hourofday. To create these dissimilarity matrices, we use the following definitions of how the differences in the dayofweek and hourofday are calculated.
Definition 1. The difference in dayofweek between trajectoriesandis defined as
Definition 2. The difference in hourofday between trajectoriesandis defined asFigure 1 gives a highlevel overview of the proposed DPTS methodology, showing how clusters are generated and highlighting the differences between DPTS and BDP (using the unweighted score). In particular, BDP clusters the input trajectories on spatial distance using SSPD, whereas in DPTS, there is an iterative process, in which clustering occurs according to the rows of a parameter matrix, .
(a)
(b)
Definition 3. A DPTS parameter matrix,, is a sparse matrix in which each row corresponds to a single clustering iteration, each column corresponds to a clustering attribute, and each entry corresponds to the parameter used:We define a DPTS parameter matrix, , as a sparse matrix in which the column headings correspond to the available attributes on which to cluster, and the rows implicitly indicate which attribute is used for clustering in a given iteration and the parameter value to be used (see Definition 3). An attribute is comprised of two parts, namely, the signal that is used, such as SSPD, and the measure to be used, such as the maximum cluster criterion. Each row corresponds to a single iteration of the hierarchical clustering process (ordered ), such that a row contains at most a single nonnull entry, denoting the parameter value, , to be used in iteration of the hierarchical clustering using the attribute corresponding to column . The number of columns corresponds to the number of clustering attributes considered, and the number of rows corresponds to the number of iterations, plus one null row. The final row only contains null entries, which is interpreted as being the termination criteria for clustering. We define two functions to access entries in a parameter matrix, namely, and . Both functions take as input a single row of the matrix, , and identify a nonnull column, such that returns the distance function corresponding to this nonnull column and returns the entry in the column. Both of these functions are undefined for a row containing only null entries.
In our evaluation, discussed later in Section 5, we consider three different signals for clustering, namely, SSPD, the difference in dayofweek (using Equation (1)) and the difference in hourofday (using (2)). We use the maximum cluster criterion as the measure for the SSPD signal (adopted from BDP [8]), and the distance criterion as the measure for the difference in dayofweek and the difference in hourofday. These attributes are denoted msspd, ddow, and dhod, respectively. In this article, our evaluation uses each attribute a maximum of once, meaning that a maximum of 3 clustering iterations are performed. An example parameter matrix is shown in Example 1, which will cause DPTS to perform 3 iterations of clustering. The first iteration will use SSPD with a clustering parameter of 25, followed by the hourofday with a parameter value of 6 and the final iteration will use the dayofweek, with a parameter value of 2. An illustration of representing the clustering performed in BDP using a DPTS parameter matrix is shown in Example 2. Since BDP only uses SSPD for clustering with a single clustering iteration, the parameter matrix only has a single nonnull entry in the topleft cell.
Example 1. An example parameter matrix,, for DPTS:
Example 2. A representation of example BDP algorithm parameters in the form of a DPTS parameter matrix,:

3.2. The Training Stage of DPTS
Algorithm 1 details the approach used to generate the clusters. Given a set of input trajectories, , and a parameter matrix, , the algorithm starts by selecting the distance function, , and hierarchical clustering parameter, , from the first row, , in . The distance function, , is then used to compute a dissimilarity matrix, , over the trajectories, . Hierarchical agglomerative clustering is then performed using the dissimilarity matrix, , and the clustering parameter, , to generate a set of clusters. For example, using the parameter matrix from Example 1, the initial dissimilarity matrix would be computed using the SSPD distance function, and the subsequent hierarchical agglomerative clustering would generate up to 25 clusters. For each further iteration of clustering, represented by the rows in , dissimilarity matrices are computed over the trajectories, , in each cluster, , in the current set of clusters, that is, . These dissimilarity matrices are used to generate a further set of clusters, . Each new set of clusters, , generated over the current clusters, is appended to , for use in the following iteration. This process is repeated for each of the clustering iterations specified in the parameter matrix, , updating the current set of clusters at the end of each iteration with the newly calculated clusters . Once the current row of the parameter matrix contains only null entries, the algorithm terminates and returns the clusters resulting from the final iteration of clustering.

A set of GMMs, , are trained on the resulting clusters, as specified in Algorithm 2. In DPTS, a feature vector is used to define the features for training the GMMs. In this article, we consider the latitude, longitude, encoded dayofweek, and encoded hourofday. In BDP, the latitude and longitude of an instance are the only features used in the GMM. When using a weighted score, Besse et al. use additional variables, such as encoded dayofweek and encoded hourofday, and weighting functions to modify the likelihood score, but these variables are not used to subdivide clusters of trajectories. Our approach can also be extended to include additional data, for example, the vehicle signals that are included in each instance, . For each cluster, , all instances from the trajectories within are extracted, containing the features in the provided feature vector. A sample of these instances is selected uniformly at random and without replacement according to the parameter, , which controls the maximum number of instances to select. If the number of instances in is less than , then all instances are selected. GMMs are built starting with a single component, up to the minimum of either the parameter or the number of instances, in increments of 1. Each GMM with an increased number of components, , trained on the selected instances of , is evaluated using the Bayesian information criterion (BIC) [51], and if it has a lower BIC than the best BIC observed so far, then the best GMM, , and its BIC, , are updated with the current values. This is repeated for every cluster, , in the set of clusters output from the clustering stage, , and the trained GMMs are returned in a set, .

The overall training stage of DPTS is detailed in Algorithm 3, in which the GMMs are trained. This method takes six parameters: (i) a set of training trajectories, , (ii) a parameter matrix, , (iii) a parameter matrix for BDP, , (iv) the maximum number of components to consider for each GMM, , (v) the maximum number of instances to select when training a GMM, , and, (vi) the set of features to use to train the GMMs. The training stage returns two sets of GMMs, and , containing the trained GMMs for each cluster in and , respectively.
The training stage first clusters all trajectories in , using the parameters defined in , and trains a set of GMMs, , for each cluster in using only the latitude and longitude, , in the feature vector (see Algorithm 2). This is equivalent to performing BDP (with the unweighted score) on the input trajectories. We perform this step to allow DPTS to revert to the prediction made by BDP should its expected performance be better. DPTS then generates a set of clusters, , for all trajectories in using the parameter matrix, (see Algorithm 1). The GMMs in are trained with Algorithm 2, using the feature vector input to the algorithm, such as . Once the GMMs have been trained, the training stage of DPTS is complete, which returns two sets of GMMs, namely, trained using the parameters in , and using the parameters in .
3.3. Trajectory Prediction
Algorithm 4 defines the process of predicting the cluster in which an unfolding trajectory belongs. The loglikelihood for each GMM in is calculated for each instance, , within the trajectory, , and is used to score the GMMs. This algorithm can be run with and , to obtain the respective predictions. The loglikelihood is then translated into a probability using the softmax function. The prediction algorithm iterates through each instance, , in the trajectory, maintaining a sum of the likelihood and probability over all instances. DPTS predicts the cluster for the final instance in the trajectory, where the probability is averaged. As the algorithm iterates through each GMM, , the total likelihood is compared to the best seen so far, updating the predicted cluster and its respective probability if it exceeds the previous best. The method returns the predicted cluster and its probability.
In DPTS, we introduce the notion of a decision threshold, which is the value to be exceeded by the probability of the DPTS prediction in order to use the DPTS prediction. Failing to exceed the decision threshold will cause DPTS to revert to the prediction made by BDP. In DPTS, we consider two modes of decision threshold, namely, a static and dynamic mode, controlled by a Boolean flag, . The static mode, , is where the prediction probability of DPTS, , is compared to a fixed predefined decision threshold, . In the dynamic mode, , the DPTS prediction probability, , is compared to the prediction probability of BDP, , multiplied by the decision threshold parameter value, . The decision threshold parameter value, , is therefore used to scale to increase or decrease the likelihood of the exceeding the decision threshold. Such scaling is needed since the BDP prediction probability, , may be consistently higher than that of DPTS, , since the BDP clusters are less specific. Algorithm 5 defines the method to check whether the decision threshold is exceeded or not. The algorithm returns true if the DPTS prediction has exceeded the decision threshold and therefore will be used for prediction.
The deployment stage of DPTS is illustrated in Figure 2 and detailed in Algorithm 6. This method takes five parameters: (i) the unfolding trajectory to predict, , (ii) a set of trained GMMs using BDP, , (iii) a set of trained GMMs using DPTS, , (iv) a Boolean flag that indicates whether to use the dynamic or static mode for the decision threshold, , and (v) the value to use within the decision threshold calculation, . The algorithm begins with a given an input trajectory, , for which cluster predictions and their corresponding probabilities, for both BDP and DPTS, are computed. Based on these probabilities, the decision threshold, , is evaluated, and if it is exceeded, then the DPTS prediction is used; otherwise, the algorithm reverts to using the prediction made by BDP. The predicted destination itself is obtained by taking the cluster centroid of the predicted cluster.
4. Data and Experimental Methodology
In this article, we use three separate datasets to evaluate DPTS, two of which are those used by Besse et al. to evaluate BDP [8], on which DPTS is based. The first of these, the Caltrain dataset, contains 4,127 taxi trajectories originating from Caltrain Station, San Francisco [52]. The second, the Porto dataset, contains 19,423 taxi trajectories commencing from Sao Bento station, located in the centre of Porto [53]. The third dataset, named POL, is a pattern of life dataset, collected over a number of nonconsecutive weeks for a single participant. Unlike the Caltrain and Porto datasets, the POL dataset does not have a single starting location for all trajectories and so allows us to evaluate the performance of DPTS when trajectories do not have a common starting location.

For all stages of the evaluation, unless explicitly stated, we explore in detail the effect of the parameters on the Caltrain dataset and state the best results for the Porto dataset. Due to the different nature of the POL dataset, we evaluate DPTS on the POL dataset separately in Section 5.5. Unless explicitly stated, our comparison against the baseline BDP method uses the unweighted score, rather than relying on auxiliary variables and weighting functions to modify the score since, as noted in Section 2, our focus is on identifying suitable clusters from which to make predictions. We comment on the effectiveness of our method on these datasets, noting the differences. In this article, we use a value of 10000 for and 20 for , since these parameters are not the focus of our investigation and these values were used in the original evaluation of BDP, allowing for a direct comparison [8].
The first stage of our evaluation of DPTS investigates the order of clustering and the parameters for the dayofweek and hourofday clustering, to find the best performing values for each. We perform all combinations of clustering with SSPD, dayofweek and hourofday using two iterations. Within this parameter search, we use a decision threshold of 0 in the static mode, meaning that the DPTS prediction will always be used. Table 3 shows the set of parameters used in this evaluation. For the SSPD clustering, we use the parameter values from the work of Besse et al., which are 25 and 45 for the Caltrain and Porto datasets, respectively. We train the GMMs with 4 different feature vectors, namely, , , , and , resulting in 392 sets of results for each dataset. Evaluating the mean distance error for each parameter combination against the baseline performance of BDP, we discard those that are significantly outperformed by the baseline from further evaluation.


After the parameter search has been completed, the next stage evaluates the effect of our proposed decision thresholds on performance. We analyse the decision threshold in both static and dynamic modes, and compare these results to both the baseline performance of BDP and the performance of DPTS where the decision threshold is set to 0. For the static and dynamic modes of the decision threshold, we explore the parameter value, in the range [0, 1], in increments of 0.05 and 0.1, respectively.
The third stage of our evaluation explores the impact of the clustering parameter for SSPD. In our initial analysis, we use the best performing parameter for each dataset, as reported by Besse et al. [8], and so we also investigate a range of values for the SSPD clustering parameter, in increments of 5. Our stopping criteria is where the supplied parameter value causes an error due to the number of clusters being too large and therefore not giving sufficient data to properly train the GMMs.
In the next stage of our evaluation, we add a third iteration of clustering to DPTS, considering SSPD, dayofweek and hourofday simultaneously. The ordering of clustering iterations is evaluated, and the performance of three iterations is compared to that of using two iterations, using the mean distance error.
For the final stage of our evaluation, we consider destination clustering, specifically on the POL dataset. The evaluation of the POL dataset is notable since, unlike the previous datasets, the POL dataset does not contain a single starting location for all journeys. To explore this aspect, we propose adding a fourth clustering approach, which groups trajectories based on the trajectory destinations, using the Haversine distance between each destination to generate the dissimilarity matrix, .
5. Results
In this section, we discuss the results of applying DPTS to the Caltrain [52], Porto [53], and POL datasets. We evaluate the effect of the clustering parameters and analyse the impact of introducing a decision threshold using the evaluation approach outlined in the previous section. Unless stated, the results presented in this section are based on the Caltrain dataset [52]. Due to its distinct nature, the POL dataset is evaluated separately in Section 5.5. Note that simplicity figures that have trajectory completion on the xaxis have an origin of 0%; however, the data points start from the first instance of the trajectory.
5.1. Clustering Parameter Search
This section evaluates our novel iterative clustering approach and the impact of altering the parameters within the parameter matrix, . In this analysis, we discuss in detail the effect of altering the parameters on the Caltrain dataset and simply report the best performing parameters on the Porto dataset.
We first give an overview the classification performance for each of the 6 parameter combinations outlined in Table 3. Note that there is a strong correlation between the features used in the GMM and the clustering criteria. For example, if the hourofday is used to cluster the trajectories but is not present in the feature vector provided to the GMM, then the performance is be severely degraded. The exception to this is that the features are always needed in the feature vector to achieve a reasonable performance, even if SSPD was not included in the clustering stage. The classification performance for the top performing parameters for each combination is shown in Figure 3, in addition to the baseline performance.
(a)
(b)
(c)
(d)
Clustering with SSPD followed by the dayofweek achieves a peak performance of 85.90% at 95% trajectory completion. This was obtained by setting the clustering parameter for the dayofweek to , and as our feature vector. If the dayofweek is omitted from the feature vector, then the performance falls to a maximum of 14.73%. Interestingly, if the hourofday is also included, that is, , the performance sees a notable drop, with a maximum of 42.52% at 85% trajectory completion. These results are shown in Figure 3(a).
Conversely, if we cluster using the dayofweek followed by SSPD (see Figure 3(c)), then the clustering parameter, , does not make any difference to the performance. Slightly decreased performance is observed in the first 10% of trajectory completion, but after this, the performance exceeds that of having SSPD followed by the dayofweek. The peak performance is 89.02%, achieved at 90% trajectory completion. Similar to SSPD followed by the dayofweek, omitting dayofweek from the feature vector causes a noticeable drop in performance, as does the addition of the hourofday.
When clustering by the hourofday followed by SSPD, we observe a peak performance of 79.21% at 85% trajectory completion, as illustrated in Figure 3(d). There is a noticeable degradation in performance when not using the hourofday in the feature vector, as seen in the previous results. If we reverse the order of clustering to have SSPD followed by hourofday, a peak performance of 75.58% is achieved at 90% trajectory completion (see Figure 3(b)). From these results, we can see that higher performance is achieved when the temporal component (dayofweek or hourofday) is clustered prior to the spatial component, SSPD.
If we consider both temporal components, the dayofweek and the hourofday, without SSPD, the classification performance is misleading. The dayofweek and the hourofday are taken from the start of the trajectory, and therefore, their respective values are constant throughout. These combinations are unsuitable due to the little information they provide.
We take the best performing parameters from each of the 6 clustering combinations, using the classification percentage at 100% trajectory completion. The parameters and the feature vector used for each of the top combinations are shown in Table 4. The temporalonly combinations are included for reference, but show a misleading classification accuracy as noted above. Figure 4 illustrates each of the top combinations from Table 4 against the baseline performance, BDP. Most of the performance gain can be seen in the initial 40% of the unfolding trajectories, after which BDP starts to outperform the DPTS combinations. Due to the misleading performance, the temporalonly combinations are omitted from Figure 4.
If we consider the predicted clusters and calculate the distance error from the prediction to the ground truth, we obtain the results shown in Figure 5. The first point to note is the two straight lines, which show the prediction error of both temporalonly combinations. This is expected, since the temporal values provided to the GMM do not change as the trajectory progresses, but it may not be immediately apparent as to why such high classification performance translates to a large prediction error. If we refer back to Table 4, we note the large average cluster distances for the temporal combinations. This explains the high distance error, because even though the classification performance is good, the clusters are noticeably larger, and therefore, the centroid that is used for prediction is on average further from the actual destination. When clustering with SSPD and then the dayofweek, we see no improvement over the baseline. The other combinations, dayofweek to SSPD, hourofday to SSPD, and SSPD to hourofday, all show reductions in distance error over the baseline from 20% to 60% of trajectory completion. After 70% of trajectory competition, the baseline performance is unbeaten. Given that we saw no improvement when clustering from SSPD to dayofweek and that the temporal combinations have such large cluster distances, we omit these combinations from further evaluation.
When applying DPTS to the Porto dataset, the hourofday SSPD combination gives the highest performance. Even though there is a slight improvement in the middle of the trajectories, the overall performance is lower than that of BDP, due to degraded performance at the start and end of the trajectories. This follows the trend seen with the Caltrain dataset. Overall, DPTS is outperformed by BDP on the Porto dataset, by an average of 7 metres.
5.2. Evaluation of the Decision Threshold
Considering the results discussed in Section 5.1, we see that the baseline performance exceeds that of DPTS in the final portion of the journey. To address this, we propose using a decision threshold that combines our novel method, DPTS, and the existing method, BDP, within a single wrapper. The decision threshold selects a prediction to use at different stages of the unfolding trajectories, according to the prediction probability of DPTS, , and BDP, . As described in Section 3, we consider two modes for the decision threshold, namely, a static mode and a dynamic mode .
First, we evaluate the effect of a decision threshold in the static mode, by considering values in the range [0, 1] with increments of 0.05. The effect of the decision threshold, , on SSPD hourofday is shown in Figure 6(a). A decision threshold of 0 in the static mode is essentially removing consideration of BDP, since all probabilities greater than 0 will pass, and therefore, the result will be identical to our original results. Conversely, a decision threshold of 1 will always revert to the baseline results of BDP. We can see that setting a threshold of 0.05 improves the performance past 50% trajectory completion, with no apparent loss of performance below 50% completion. If we increase the decision threshold to 0.1, we notice a loss of performance (compared to a decision threshold of 0) from 15 to 50% of trajectory completion, after which the performance improves. Further increasing the decision threshold to 0.15 leads to a more significant degradation in performance from 10 to 65% of trajectory completion, after which a small improvement is made for the remainder of the journey. At this decision threshold, we also see a slight improvement in performance in the first 5% of trajectory completion compared to our original results. Any further increase in the decision threshold has the effect of improving the first part of the journey (0–15% trajectory completion), degrading the middle of the journey (15–65% trajectory completion), and improving the final part of the journey (65–100% trajectory completion). Overall, in static mode, a decision threshold of 0.05 gives the best tradeoff, resulting in the highest average performance for SSPD hourofday.
(a)
(b)
(c)
Figure 6(b) illustrates the effect of the decision threshold in static mode on dayofweek SSPD. We observe a similar trend to SSPD hourofday, but note that the original result (with a decision threshold of 0) performs nearer to the baseline result in the final stage of the trajectories (65–100% trajectory completion). Therefore, it seems that adding a decision threshold will have a smaller positive impact on this combination. Decision thresholds of 0.05 and 0.1 provide a good tradeoff between performance in the middle and final parts of the journey. We note that a decision threshold of 0.15 gives a greater loss of performance in the middle of the journey, similar to that reported in the analysis of SSPD hourofday. In the static mode, a decision threshold of 0.05 also gives the best tradeoff for dayofweek SSPD performance.
The combination of hourofday SSPD, as shown in Figure 6(c), appears to give the best results of the three alternatives. Most notably, the early part of the journey (0–10% trajectory completion), is nearer the baseline performance than the other two combinations. As with the other results, we see that a decision threshold of 0.05 gives the optimum performance tradeoff, with more apparent losses seen for decision threshold values of 0.15 and above. All three sets of results appear to provide the best overall performance when a decision threshold of 0.05 is used, with a more significant loss of performance with a decision threshold of 0.15.
We will now consider the decision threshold in dynamic mode , to investigate whether this outperforms the static mode . Figure 7 shows the performance when a decision threshold is used in dynamic mode. The decision threshold in dynamic mode , as explained in Section 3, is where the probability of the DPTS prediction, , is compared directly to the probability of the baseline BDP prediction, . The decision threshold parameter value, , is used to scale the probability of the BDP prediction, . For our evaluation, we explored parameter values in the range [0, 1] in increments of 0.1. Overall, we found that a decision threshold value of 0.7 for SSPD hour and 0.4 for day SSPD and hour SSPD gave the best prediction performance. On average, using the decision threshold in dynamic mode causes a slight improvement in performance compared to the static mode. This gain, however, is minimal in terms of metres and appears to be of little effect, but could be influenced by properties of the input dataset.
(a)
(b)
(c)
When evaluating the decision threshold on the Porto dataset, a slight improvement over the performance of BDP is seen. A decision threshold in the dynamic mode with a parameter value of was used on the hourofday SSPD combination, giving an average distance error of 10 metres lower than BDP. Figure 8 illustrates the performance comparison between BDP, DPTS , and DPTS .
5.3. Altering the SSPD Parameter Values
We investigated changing the clustering parameter, , on the highest performing combinations. In the results discussed above, this was fixed at the values used by Besse et al. in their investigation [8]. Intuitively, lowering the parameter value in BDP should increase the trajectory classification but also increase the destination prediction error, since the destinations in these larger clusters will be more spread out. However, since DPTS performs iterative clustering, there may be benefits to lowering the value for SSPD.
Figure 9 illustrates the comparison between BDP, DPTS (hourofday, SSPD, ), DPTS (SSPD, hourofday, ) with , and DPTS (SSPD, hourofday, ) with . It is immediately apparent that reducing provides a significant reduction in destination error in the first portion of the journey. After 60% of the trajectory is complete the performance degrades below the performance of BDP. When introducing a decision threshold greater than 0, the performance gains are comparable at the start of the journey, and the performance degradation is slightly reduced past 65% trajectory completion. Overall, the variant with and a decision threshold of provides the best performance on average over the entire duration of the journey, with significant gains in the first 30–40% of the trajectory.
If we compare DPTS (SSPD, hourofday, ) with with the weighted version of BDP, we observe similar performance at the start of the journey. As the trajectory unfolds, there is a larger performance gap between DPTS and the weighted version of BDP, with BDP seeing a maximum of 366 metres lower prediction error at some points.
When applied to the Porto dataset, no gains in performance were observed, and the original clustering parameter for SSPD, , produced the highest performance.
5.4. Adding a Third Clustering Iteration
We now evaluate the performance of DPTS using three iterations of clustering and compare the performance with using two iterations. The motivation behind including an additional iteration is that we can further decompose the clusters (while trying to maintain a high accuracy for the trajectory classification). The drawback of adding a third iteration is that it can generate a large number of clusters, each containing only a few trajectories. If the number of clusters increases too much, there could be a situation in which some clusters only contain a single trajectory, and therefore, the cluster has no training data and is not useful for prediction.
When using three iterations of clustering, we find the best combination to be SSPD hourofday dayofweek. However, Figure 10 shows that the performance of this combination is not as high as to that of two iterations with a reduced for SSPD (as discussed above). When we add a decision threshold in dynamic mode, the performance is degraded in the initial 40% of the trajectories, but sees improved performance from 60% competition onwards, nearer to that of the BDP. Taking into consideration the average distance error throughout the trajectory, the extra computation required for the additional layer, and the increased number of GMMs required, we take the previous combination with two clustering iterations to be the better variant.
5.5. Evaluating DPTS on the POL Dataset
Applying DPTS to the POL dataset provides an insight into a more general application of the algorithm, since unlike the other datasets the POL dataset contains trajectories with multiple starting locations. When applying BDP to the POL dataset, we notice an increase in distance error at around 50–80% trajectory completion. This is due to the characteristic that, unlike the taxi datasets, we do not have a single starting location, and therefore, we cannot assume a fixed direction of travel away from the source. To address this issue, we add another iteration of clustering, in which we generate a dissimilarity matrix of trajectories based on the destination location to be used as input to the hierarchical agglomerative clustering. For our evaluation, we use 2500 metres as the clustering parameter, , for this iteration of clustering. Further exploration of this value is outside the scope of this article and could be investigated in future work.
Figure 11 shows the results of applying DPTS to the POL dataset, with a comparison to the performance of BDP. When we apply DPTS, using four iterations of clustering (hourofday, dayofweek, destination, SSPD, ), we see a significant improvement over BDP. This combination outperforms BDP over the entire trajectory, with an average reduction in error of over 1.2 km. A small decrease in error can be seen as the trajectory unfolds, unlike the sudden rise in error as seen with BDP.
6. Conclusion
In this article, we propose DPTS, an extension to the existing BDP method. DPTS uses an iterative clustering stage and a decision threshold to improve the destination prediction performance on vehicle trajectories. DPTS harnesses the additional properties of the trajectories, attempting to further decompose them into more meaningful clusters, rather than using these temporal properties in combination with weighting functions to modify the likelihood. For our evaluation, we use the temporal properties of dayofweek and hourofday.
When applying DPTS to the Caltrain dataset, we see an improvement in overall performance, where our decision threshold allows the prediction to revert back to that of BDP towards the end of the trajectories, as the performance of BDP improves. If two iterations of clustering are used, with smaller parameter values for SSPD, we see a reduction in error for the first half of the journey compared to BDP; however, this is at a cost of lower performance in the final 40% of trajectories. We see severely reduced effectiveness from DPTS when used on the Porto dataset, barely matching the performance of BDP. This implies that the capability of our method is somewhat dependant on the data. However, when applied to the POL dataset, which has multiple starting locations, we see promising results. BDP struggles to accurately predict destinations, with an increase in error in the middle of the trajectories. When applying DPTS with multiple clustering iterations, we see notable gains in prediction performance over BDP that are consistent throughout the unfolding trajectories. This implies that selecting the best approach in practice is highly dependent on the application setting and the nature of the data available. In practise, we recommend adding clustering iterations for attributes that help differentiate clusters when using DPTS for applications. We also recommend the consideration of both static and dynamic thresholds, adjusting parameter values to maximise performance and balance the tradeoff between performance in the early stages of a journey against that in the latter stages.
Reducing the prediction error can provide benefits to locationaware applications, such as onroute traffic updates, intelligent parking suggestions, and amenity recommendations at the destination. Without an accurate location, these applications will suffer from reduced effectiveness and ultimately poor user trust. Having a more accurate prediction earlier in the trajectory can enable these applications to provide their locationbased functionality in a more timely manner.
Future work will consider the decision threshold process and investigate whether the parameter values can be removed, in order to make DPTS more generic across datasets. Additionally, further information from the time signal can be extracted to analyse the effect of seasonality and trends in user mobility to see whether this can aid performance. Finally, additional investigation will be conducted into adding the difference in selected vehicle signals in further iterations of DPTS.
Data Availability
Michal Piorkowski, Natasa Sarafijanovic‑Djukic, Matthias Grossglauser, and CRAWDAD dataset epfl/mobility (v. 2009‑02‑24) can be downloaded from https://crawdad.org/epfl/mobility/20090224, https://doi.org/10.15783/C7J010, Feb 2009. KAGGLE dataset ecml/pkdd 15: Taxi trajectory prediction (1) can be downloaded from https://www.kaggle.com/c/pkdd15predicttaxiservicetrajectoryi/data, Apr. 2015.
Conflicts of Interest
The authors declare that there are no potential conflicts of interest regarding the publication of this article.
Acknowledgments
This work was supported by Jaguar Land Rover and the UKEPSRC grant EP/N012380/1 as part of the jointly funded Towards Autonomy: Smart and Connected Control (TASCC) Programme.