Abstract

Trajectory data mining has become an increasing concern in the location-based applications, and the trajectory partition is taken as the primary procedure of trajectory data mining. The amount of movement trajectories of nodes is typically very large, and the trajectory shapes are extremely diverse, which makes the trajectory partition a vital issue to the trajectory data mining results. In this work, the movement behaviors of nodes are analyzed from the aspects of moving speeds, stop points, and moving directions, and then a novel Trajectory Partition Method based on combined movement Features (TPMF) is proposed to partition the trajectories. In TPMF, we first extract the change points where the movement speeds of nodes are varied significantly; then, we extract the stop points by detecting the speed variations of nodes; finally, the Douglas-Peucker algorithm is applied to partition the subtrajectories according to the extracted feature points (change points and stop points). Simulations are carried out on the Geolife trajectory dataset, and the simulation results indicate that TPMF can achieve a preferable trade-off between the simplification rate and the trajectory partition error, while the running time is shortened as well.

1. Introduction

Recently, the mobile communication devices with GPS modules are very popular due to the development of location-aware technology. For example, some people (nodes) carrying communication devices travel along urban roads, and the personal geographical locations can be recorded by the GPS modules at regular intervals. Thus, each movement trajectory consisted of some GPS points arranged in chronological order. Note that we can obtain many valuable information by mining these trajectories, such as the movement behaviors of nodes, which can be exploited and served for the applications of location recommendations [1], destination predictions [2], and personal navigations [3].

In general, the personal trajectories are very complicated, since the nodes always move casually. There is a vital issue to be addressed for the trajectory data mining: the trajectory data is generated quickly, and the storage space of nodes is occupied, which brings large computation burdens and storage burdens in trajectory data mining. On the contrary, the reduction of the number of trajectory data leads to the loss of some key information, making the outcome of trajectory prediction inaccurate.

The trajectory partition is served for the removals of redundant GPS points and the detections of node behaviors. Especially, it is necessary to simplify the trajectories to avoid the unbearable computation burdens in trajectory data mining phase, while the valuable information regarding the trajectory contours can be reserved as much as possible. In this paper, we partition the trajectories into successive line segments according to the combined movement features (moving speeds, stop points, and moving directions). The main intention of our work is to make a preferable trade-off between the simplification rate and the trajectory partition error while shortening the running time.

To reduce the computation burdens in the trajectory mining phase, the existing trajectory partition methods always attempt to find and remove the redundant trajectory points and reserve the valuable trajectory points. However, the results of trajectory partition are not preferable because the movement features are not comprehensively considered in these methods. To this end, we take into account the typical movement features, such as the spatial locations, directions, time, movements, and stops for analyzing the trajectories, and these features are concluded and combined as moving speeds, stop points, and moving directions. The innovations of this paper are given as follows:(1)The novel idea of partitioning the trajectories based on moving speeds, stop points, and moving directions is proposed to remove the redundant trajectory points while ensuring the trajectory accuracy.(2)A stop point extraction algorithm based on moving speeds is put forward to extract the stop points more accurately. Especially, we improve the stop point extraction through applying our proposed mechanisms of forward search and backward search based on moving speeds. Our method reduces the unnecessary computations by detecting the speeds, and more accurate results can be obtained from the mechanisms of forward search and backward search.(3)A Douglas-Peucker algorithm based on perpendicular Euclidean distance and directions is utilized to maintain the overall shapes of trajectories as much as possible through retaining the points where the directions change abruptly; i.e., we combine the indices of perpendicular Euclidean distance and moving directions to identify the points where the directions change abruptly while the overall contours of trajectories are basically maintained.

The rest of the paper is organized as follows: Section 2 gives some related works. Section 3 introduces the Trajectory Partition Method based on combined movement Features (TPMF). Section 4 reports the simulation results for performance evaluation of TPMF. Finally, Section 5 concludes this paper.

The technique of trajectory partition is derived from that of trajectory compression. In early work, two representative strategies of trajectory compression are proposed in [4], which introduces an offline compression method and an online compression method, respectively. Given a fully generated trajectory, the offline compression generates an approximate trajectory by removing some redundant points from the original trajectory. The well-known offline trajectory compression method is Douglas-Peucker (DP) algorithm [5], and the idea of DP is to replace a subtrajectory with an approximate line segment, as shown in Figure 1(a). DP recursively partitions each trajectory into two subtrajectories by selecting the points can contributing the largest error as the splitting points, until the specified perpendicular Euclidean distance is satisfied. As the perpendicular Euclidean distance is not associated with the time stamps, and thus an improved Top-Down algorithm [6] is proposed to compress the trajectories, where the temporal-distance information of each sample point is defined and considered. In contrast to the offline compression, the online compression does not require complete trajectories, and each trajectory is compressed immediately while it is transmitted. SQUISH [7] works in the streaming environment with a fixed-size buffer. For each point, it maintains a priority which serves as the upper bound of SED distance for the neighboring points. This is because when a point is deleted from buffer, its priority score will be accumulated to its two neighboring points. Since SQUISH is not error-bounded, its subsequent work SQUISH-E [8] is designed to be adaptive to different objectives by introducing two parameters (compression ratio bound) and (compression error bound). BQS [9] picks at most eight significant points, forming a convex hull to enclose all the points in the buffer. Then, an upper bound and a lower bound are derived, such that a point can be quickly decided to be removed or reserved. Recently, Liu et al. propose an one-pass error bounded trajectory simplification algorithm named OPERB [10]. Based on a local distance checking method, OPERB maintains a directed line segment to approximate the buffered points and guarantees that the distance from the current point to the line segment is bounded.

In addition, some other researches aim to reduce the number of trajectory data and maintain the semantic meanings, such as the types of places where the nodes stay for a long period. References [11, 12] attempt to maintain the semantic meanings of original trajectories after trajectory compression process.

As a basis of the techniques of trajectory clustering and trajectory classification, we need to partition the original trajectories into several successive line segments for a further process. Lee et al. [13] present a trajectory partition method based on Minimum Description Length (MDL), as shown in Figure 1(b), where L(H) represents the length sum of all trajectory partitions, and L(DH) represents the sum of the difference between the trajectory and its trajectory partitions. MDL finds a list of characteristic points to minimize the value of L(H) +L(DH). Reference [14] provides a trajectory partition method based on the position perturbation, as shown in Figure 1(c), and this method calculates the perpendicular distance from a point to the previous moving direction. If the perpendicular distance is larger than a preset threshold, then the previous point is reserved. The above steps are repeated until all points have been traversed. Besides, Fu et al. [15] first find the stop points and then apply DP algorithm to simplify the trajectory segments between adjacent stop points. Besides, a corner detection based method is adopted to divide trajectories in [16].

Most of aforementioned works always divide trajectories spatially, and it is necessary to take into account the critical movement features such as the moving speeds, stop points, and moving directions. Moreover, a preferable trade-off between the simplification rate and the trajectory partition error is expected to be achieved.

3. Trajectory Partition Method

In real-world scenes, we note the three significant characteristics regarding the node movements: (i) the moving speeds of nodes changes greatly due to the switches of different travel modes or some special events (e.g., sudden braking or abrupt deceleration in driving) [17]; (ii) the nodes are prone to visit some locations intentionally or unintentionally, and the locations visited frequently or stayed for a long period can be marked as stop points; (iii) the moving directions of nodes are varied frequently when some travel modes (such as the walking mode) are adopted [17].

To this end, we propose a trajectory partition method based on combined movement features (moving speeds, stop points and moving directions). Several definitions are first given as follows.

Definition 1 (a GPS point). A GPS point is expressed as a quadruple, which is composed of the latitude, longitude and timestamp of the -th point, where , denoting the number of GPS points in a trajectory.

Definition 2 (a trajectory). A trajectory is composed of the GPS points arranged in the chronological order and is denoted by .

Definition 3 (a stop point). A stop point denotes a location where a node arrives at the time and departs at the time .

3.1. Moving Speeds

Usually, the moving speed of a node is changed largely when its travel mode is switched [17]. The common travel modes include walking, riding bicycles, taking buses, driving cars, or taking the subway. As shown in Figure 2, suppose the node travels from the location A to the location B and the change point is the position where the node switches the travel mode from walking to taking the bus.

To detect the change point caused by the switch among different travel modes, we define the moving speed difference as , where represents the average speed on the line segment . If is larger than a given threshold, then will be retained as a change point.

Considering the fact that the speed changes are usually caused by the switches among different travel modes (Figure 3), we take the standard deviation of moving speeds as the speed threshold, since the standard deviation can reflect the speed fluctuations. The speed threshold is expressed aswhere ; i.e., denotes the average speed in a trajectory.

3.2. Stop Points

Another fact is that the nodes are possible to stay around some locations for a long period, e.g., waiting for the bus, having a dinner at a restaurant, resting in a leisure area, or shopping at a mall, and these locations can be taken as stop points. There are two types of stop points [1820]: (a) the position-invariant stop points: such as the SP1 in Figure 4, which is determined according to the preset staying period; (b) the position-offset stop points: the node moves around a position for a preset period, such as the SP2 in Figure 4.

In previous works, the stay points are detected according to the distance threshold and the time threshold, i.e., the distance between an anchor point and its successors is judged whether it is larger than the distance threshold, and then the time span between the anchor point and the last successor is measured. If the time span is larger than the time threshold, a stay point is determined.

Notice that the stop point cannot be extracted when there is a round-trip path. As illustrated in Figure 5, a customer (node) visits a convenience store from and then walks along the direction . Assume the distance between and is larger than the distance threshold, while the time duration of the node traveling from to is also larger than the time threshold, and hence the average coordinate of all points from to can be denoted by the stop point, which actually leads to the extraction of a false stop point.

In addition, this process may fail to accurately detect the stop points due to the false position of the start point; as shown in Figure 6, if is taken as a start point mistakenly, then a false stop point is extracted.

Actually, the setting of the distance threshold will affect the detection accuracy of stop points. If a smaller distance threshold is set (e.g., 10m), a smaller portion of points located within the ranges of stay points is not detected, as shown in Figure 7(a), and and are not detected. If a large distance threshold is set (e.g., 100m), the points not falling into within range of stay point are also detected, as shown in Figure 7(b); , , and can be detected.

Furthermore, the time threshold is related to the number of stay points, and a smaller time threshold will result in more stop points; on the contrary, less stop points are detected if a larger time threshold is set.

To address the issues mentioned above, we observe that the speed of a node always becomes slow when it visits a location, and thus we propose a Stop Points Extraction Method based on the moving Speeds of nodes (SPEMS). In SPEMS, the stop points can be extracted through the following steps.

Step 1. Each pair of adjacent GPS points , is traversed, and the moving speed is calculated.

Step 2. If is equal to 0 or ( denotes a speed threshold), then a backward search with the center is performed to extract a point , such that the inequalities and are satisfied, where denotes a search radius threshold and .

Step 3. A forward search is performed to extract a point , such that the inequalities and are satisfied, where .

Step 4. If the time interval ( denotes the time threshold), then we take the intermediate point as a stop point. Obviously, we remove the points from to except , and this mechanism can avoid these points being repeatedly detected.

Figure 8 gives an example of forward search and backward search, where is found by a backward search with the center and is found by a forward search.

An example is given in Figure 9, where we assume and hence we can find the points and with the center through the backward search and the forward search, respectively. If , we set as the stop point; i.e., is a stop point.

The pseudocode of SPEMS is depicted in Algorithm 1.

Input: .
Output: SP.
 1:  .
 2:  while do
 3:   j=i+1.
 4:   Calculate .
 5:   if or then
 6:    .
 7:    .
 8:    .
 9:    if then
 10:     .
 11:     Add to SP.
 12:     Remove points from to except .
 13:     .
 14:    else
 15:     .
 16:    end if
 17:   else
 18:    .
 19:   end if
 20:  end while
3.3. Moving Directions

The changes of moving directions are concerned with the current travel modes; e.g., the moving direction will be changed more frequently when the nodes adopt the walking mode rather than taking vehicles.

Besides, the traditional DP algorithm cannot maintain the shapes of the original trajectories very well in some special cases, such as the case shown in Figure 10, where the point of the street corner is not taken as a feature point. The reason is that DP algorithm divides the trajectories on the basis of perpendicular Euclidean distance, and thus the points with large direction changes are easy to be ignored.

Therefore, we propose a DP algorithm based on Direction changes and Perpendicular Euclidean Distance (DPDPED). In DPDPED, the error measure is defined aswhere and . represents the angle between and , represents the angle between and , where and are two adjacent feature points. is a point on the line segment , and is the projection on . Essentially, DPDPED combines the maximum angle and the perpendicular Euclidean distance into the index of error measure, and the points ignored by DP can be found through detecting the direction changes of nodes. In this mechanism, since the value of is very sensitive to the direction changes, and hence the points with large direction changes can be found easily. As illustrated in Figure 11, a split point divides the trajectory into two subtrajectories.

Algorithm 2 gives the pseudocode of DPDPED.

Input: .
Output: FP.
 1:  =0.
 2:  .
 3:  if then
 4:   +1.
 5:   .
 6:   .
 7:   while do
 8:    .
 9:    if then
 10:     .
 11:     .
 12:    end if
 13:    +1.
 14:   end while
 15:   if then
 16:    .
 17:    .
 18:    .
 19:   end if
 20:  end if
3.4. Trajectory Partition Algorithm

In TPMF, the change points are first extracted based on the speed changes, and then SPEMS algorithm is applied to extract the stop points. The extracted change points and stop points are treated as feature points. Finally, DPDPED is executed to simplify the subtrajectories between the adjacent feature points. Figure 12 illustrates our trajectory partition process, where two change points are found by the detection of speed changes in Figure 12(a), and then two stops points are extracted by our SPEMS in Figure 12(b), finally, a point with direction change largely is found by DPDPED in Figure 12(c). The pseudocode of TPMF is depicted in Algorithm 3.

Input: , the maximum
  angle threshold , the distance threshold .
Output: Feature Point Set FP.
 1:  .
 2:  , .
 3:   Calculate standard deviation of speed in
    TR.
 4:  while do
 5:   if then
 6:    .
 7:   end if
 8:   .
 9:  end while
 10:  .
 11:  .
 12:  Sort the points of FP in a chronological order.
 13:  .
 14:   find the indexes of feature points in FP1
     in TR.
 15:  .
 16:  while do
 17:   .
 18:   .
 19:  end while
 20:  Remove duplicate points from FP.
 21:  Sort points in FP in a chronological order.

According to the steps of TPMF, the time complexity is calculated as follows:where is the number of removed points within the ranges of stop points and the expression of is given bywhere denotes the -th stop point in and denotes the number of points around the stay point . denotes the average number of points between adjacent feature points. Therefore, the time complexity of TPMF is written as . When is large enough, we have

3.5. Trajectory Partition Error

Two metrics have been applied to measure the performance of trajectories partition methods: (a) simplification rate refers to the ratio of removed points to original points and (b) running time represents the execution time of trajectory partition method to measure the time complexity. Besides, the error between the simplified trajectories and the original trajectories should also be measured, and we define the metric of trajectory partition error, as shown in Figure 13, and are two adjacent feature points, and the average perpendicular Euclidean distance and the average angle are used to evaluate the error of trajectory partition results.

The expression of average perpendicular Euclidean distance is given bywhere denotes the subscript set of feature points, represents the perpendicular Euclidean distance between and , and s denotes the number of points in .

The expression of average angle is written aswhere represents the set of angles between and and represents the set of angles between and .

Furthermore, the Min-Max standardization is used to standardize and , respectively. Finally, we define the trajectory partition error , where and are the weights reflecting the impacts of average perpendicular Euclidean distance and average angle on the trajectory partition. Then, the expressions of are written aswhere denotes the minimum value of all the perpendicular Euclidean distances from the points between adjacent feature points to the simplified line segment. Likewise, represents the maximum value.

The expressions of are written aswhere denotes the minimum value of all average angles between adjacent feature points and denotes the maximum value.

4. Simulations

In this section, we present an extensive simulation study of our TPMF. We conduct several simulations to evaluate TPMF on the Geolife dataset. All simulations are run on a PC equipped with Windows 7, 3.20GHz CPU and 4 GB memory. The trajectory partition methods are realized by Python language.

4.1. Dataset

GPS trajectory dataset is cited from (Microsoft Research Asia) Geolife [21] project, which collects the trajectories of 182 users (nodes) during five years (from April 2007 to August 2012). These trajectories are recorded by different GPS loggers and GPS phones and have a variety of sampling rates. 91.5 percents of the trajectories are logged in a dense representation, e.g., every 15 seconds or every 510 meters per point.

4.2. Parameter Settings

The trajectory data used in our simulations includes 171 trajectories, each of which contains tracks of one node every day. Firstly, we select six trajectories from the 171 trajectories to determine the parameter values. The details of each trajectory are shown in Table 1.

To observe the impacts of and on the number of stop points, four trajectories are selected for this simulation. Figure 14 shows the number of stop points under different search radius and time threshold.

From Figures 14(a)14(d), we can observe that the curves decrease rapidly when increases from 60 s to 120 s, which is attributed to the fact in TPMF more points are treated as the stop points when a smaller time threshold is set. When 120 s, the curves descend slowly, especially when 180 s300 s the curves remain almost stable, which indicates that the fluctuation of the number of stop points is very slight, and the reason is that the range of 180 s300 s is close to the duration that the node stays at the stop points, and thus the value of should be selected from the interval s. In addition, when =10 m or =15 m, the number of extracted stop points is not large enough, compared with the number of original points, and thus we set =20 m. In Figures 14(a) and 14(d), the number of stop points does not change obviously when 180 s240 s. Similarly, Figures 14(b) and 14(c) also illustrate that the number of stop points is not changed when 240 s300 s, and thus we set =240 s.

As shown in Figure 15, the number of stop points remains unchanged when 0.6 m/s0.7 m/s, which is attributed to the fact that the speed falling into the interval of 0.6 m/s0.7 m/s is very close to the moving speed of the node, and we can set m/s. Since the trajectory partition error increases (or decreases) as or increases (or decreases), there are no suitable values can be obtained. Thus, we set and smaller in our simulations. Without loss of generality, we assume the impacts of average perpendicular Euclidean distance and average angle be equivalent, and each of and is set to 0.5. Actually, and are adjustable weights, and the values of and can be easily adjusted for different applications or different trajectory datasets. The main simulation parameters are provided in Table 2.

4.3. Comparisons of Trajectory Partition Methods

Seven algorithms are compared in this simulation. Our proposed algorithm TPMF is compared with other six algorithms (DP, MDL, SQUISH, SQUISH-E, BQS, and OPERB) in terms of simplification rate, running time, and trajectory partition error. The parameter settings of other algorithms as given in Table 3, where PED represents the Perpendicular Euclidean Distance and SED represents the Synchronized Euclidean Distance. The table term “yes/no” indicates whether the algorithm uses PED or SED. We first partition all the trajectories of one node, and the simulation results are reported in Figure 16.

As shown in Figure 16(a), with regard to the simplification rate, OPERB outperforms other algorithms and the curves of TPMF and BQS higher than those of DP, MDL, SQUISH, and SQUISH-E. Particularly, MDL always achieves the lowest simplification rates. As illustrated in Figure 16(b), the running time of MDL is much longer than other algorithms. In Figure 16(c), MDL outperforms other algorithms in terms of trajectory partition error.

To observe the impacts of the number of trajectories, we select 10 nodes from the dataset, and the trajectory details are shown in Table 4. The simulation results are given in Figure 17.

In Figure 17(a), with regard to the average simplification rate, OPERB is generally higher than other algorithms, and the simplification rate of OPERB is about 90%. the plots of TPMF and BQS are close to each other, and their simplification rates fall into the interval from 85% to 90%. The plot of DP is slightly lower than those of TPMF and BQS. Especially, SQUISH and SQUISH-E remain stable due to the constraint of a target simplification rate 80%, and MDL obtains the lowest simplification rates among all algorithms.

Figure 17(b) illustrates that OPERB executes faster than other algorithms due to its one-pass scanning mechanism, while the running time of MDL is much longer than others. Besides, TPMF consumes less running time than those of DP, MDL, SQUISH, SQUISH-E, and BQS. This is attributed to the fact that TPMF partitions the trajectories according to the extracted feature points (change points and stop points), and hence the number of computations is significantly reduced.

In Figure 17(c), the average error of TPMF is always smaller than those of DP, BQS, SQUISH-E and OPERB and is close to that of SQUISH. MDL achieves the lowest average error, and this is because MDL partitions the trajectories approximately, and it retains the most points and preserves the trajectory shapes as much as possible, which produces a smaller trajectory partition error along with a lower simplification rate. Notice that OPERB obtains the largest average error along with the highest simplification rate as depicted in Figure 17(a).

Therefore, TPMF makes a preferable tradeoff between the simplification rate and the trajectory partition error. TPMF can reserve the most valuable feature points, which makes the redundant points be removed.

5. Conclusions

The trajectory partition is a primary procedure of trajectory data mining. The amount of movement trajectories is typically large, and the shapes of trajectories appear diversely in different applications. In this paper, a trajectory partition method based on combined movement features (moving speeds, stop points, and moving directions) is proposed. The outstanding advantage of our proposed method is that it takes into account the movement behaviors of nodes comprehensively, which can maintain the shape skeletons and movement features while massive redundant points are removed.

Future research will focus on investigating a self-adaptive solution of setting the method parameters in TPMF, e.g., the values of parameters , , and . In addition, a trajectory mining algorithm based on TPMF will be investigated to obtain the movement modes of nodes as well.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research is supported by National Natural Science Foundation of China (Grant Nos. 61872191, 41571389, and 61801215); Postdoctoral Science Foundation of China (Grant Nos. 2014M560379, 2015T80484); Natural Science Foundation of Jiangsu Province (Grant No. BK20160812).