Abstract

Compression technology is an efficient way to reserve useful and valuable data as well as remove redundant and inessential data from datasets. With the development of RFID and GPS devices, more and more moving objects can be traced and their trajectories can be recorded. However, the exponential increase in the amount of such trajectory data has caused a series of problems in the storage, processing, and analysis of data. Therefore, moving object trajectory compression undoubtedly becomes one of the hotspots in moving object data mining. To provide an overview, we survey and summarize the development and trend of moving object compression and analyze typical moving object compression algorithms presented in recent years. In this paper, we firstly summarize the strategies and implementation processes of classical moving object compression algorithms. Secondly, the related definitions about moving objects and their trajectories are discussed. Thirdly, the validation criteria are introduced for evaluating the performance and efficiency of compression algorithms. Finally, some application scenarios are also summarized to point out the potential application in the future. It is hoped that this research will serve as the steppingstone for those interested in advancing moving objects mining.

1. Introduction

In recent years, with the rapid development and extensive use of GPS devices, RFID sensors, satellites, and wireless communication technologies, it is possible to track various kinds of moving objects all over the world and collect a myriad of trajectory data with respect to the mobility of various moving objects (such as people, vehicles, and animals) containing a great deal of knowledge. These data need an urgent and effective analysis. A moving object spatial-temporal trajectory is a sequence of position, attribute, and time [1], which are three basic characteristics of geographic phenomena and three basis data of GIS database [2]. Moving objects move continuously while their locations can only be updated at discrete times, leaving the location of a moving object between two updates uncertain, for the limit of acquisition, storage, and processing technologies [3]. The simplest description of a trajectory is a finite sequence of geolocations with timestamps.

As time goes on, it will lead to a series of difficulties in storing, transmitting, and analyzing data, for the size of trajectory data is sharply increasing and the scale of data is growing huge and complex. First of all, the shear volumes of data can quickly overwhelm available data storage, which will make it difficult to store the data. For instance, if data is collected at 2 second intervals, 1 GB of storage capacity is required to store just over 800 objects for a single day. Therefore, the storage of data will result in an enormous cost. The cost of transmitting a large amount of trajectory data, which may be expensive and problematic, is the second major problem highlighting the need of compressing data. According to [4], the cost of sending a volume of data over remote networks can be prohibitively expensive, typically ranging from $5 to $7 per Mb. Thus, tracking a fleet of 800 vehicles for a single day would incur a cost of $5,000 to $7,000, or approximately $1,825,000 to $2,555,000 annually. Finally, along with the increasing of the data scale, it is difficult for us to extract the valuable and useful patterns.

To address these issues, two categories of trajectory compression strategies have been proposed, aiming to reduce the size of a trajectory while not compromising much precision in its new data representation [5]. One is the offline compression, which reduces the size of trajectory after the trajectory has been fully generated. The other is online compression, compressing a trajectory instantly as an object travels. On the one hand, it can reduce the memory space by compressing, which will make the storage of data easier. On the other hand, it can cut down the size of data, which will be convenient for the transmission of data. What is more, it can reserve the useful information in trajectories and remove redundant data from trajectories, which have the potential to make the thorough analysis of trajectory data easier. The data compression is a method that reduces the size of data to cut down the memory space and improve the efficiency of transmission, storage, and processing without losing information or reorganizes data to reduce the redundancy and memory space according to certain algorithms. Data compression can be classified into two categories, namely, lossless and lossy compression. Moving object trajectory compression aims to reduce the size and memory space of a trajectory on the premise that the information contained in trajectory data is reserved; that is to say, in order to cut down the size of data, it removes redundant location points while ensuring the accuracy of the trajectory [6]. Figure 1 is a schematic diagram of a trajectory compression, where the original trajectory is represented by black lines and the compressed trajectory consists of red lines (namely, ,  ,  , and ). There are 9 points in the original trajectory, but only 5 points are retained to approximately represent the original trajectory after compressing whose compression ratio is close to 50%. Thus, it can be seen that trajectory compressions play an important role in the storage and analysis of data. But trajectory compression tends to cause a certain loss of information, while compressing trajectories. Therefore, various trajectory algorithms existing in literature balance the tradeoff between accuracy and storage size.

The trajectory compression technology derives from the topographic cartography and computer graphics. The most native and simplest compression method is uniform sampling algorithm which simply takes every th point in the trajectory [7]. In 1961, Bellman put forward a new algorithm called Bellman algorithm, which solves linear generalization problems by dynamic programming methods [8]. This method will guarantee that the segments connecting the specific number of points selected from the curve are closest to the original curve, but the time overhead of it is giant which is up to . One of the most classical trajectory compression algorithms called Douglas-Peucker algorithm was presented in 1973 by Douglas and Peucker [9]. In 2001, Keogh et al. put forward the Opening Window algorithm to compress trajectory data online. However, traditional error metrics (such as perpendicular distance) are not suitable for moving object trajectories, whose spatial characteristics and temporal characteristics need to be simultaneously considered, due to their internal features. In 2004, Meratnia and Rolf put forward a top-down speed-based algorithm and a top-down time-ratio algorithm [3]. The former improves the existing compression techniques by exploiting the spatiotemporal information hiding in the time series, while the latter is a transformation of DP algorithm which took a full consideration of spatiotemporal characteristics by replacing spatial error with SED. In 2006, Potamias et al. put forward STTrace algorithm estimating the safe zone of successor point by location, velocity, and direction. Meanwhile, it is suitable for small memory devices [10]. Gudmundsson et al. developed an implementation of the Douglas-Peucker path-simplification algorithm in 2009, which works efficiently even in the case where the polygonal path given as input is allowed to self-intersect [11]. In 2009, Schmid et al. proposed that trajectories stored in the form of trajectory points can be instead of semantic information of road networks [12]. Since then, many researchers have been doing a great deal of studies about the semantic information of road networks [1318]. With the increasing of the data, traditional compression algorithms are quite limited for online trajectory data. Therefore, online trajectory compression becomes one of the hot topics [1921]. For example, Opening Window Time-Ratio algorithm was put forward by Meratnia, which is an extension to Opening Window using SED instead of spatial error, to take temporal features into account [2]. And Trajcevski et al. put forward another online algorithm called Dead Reckoning algorithm, which estimates the successor point through the current point and its velocity [22]. Out of traditional position preserved trajectory compression algorithms, many scholars have focused on different perspectives. For example, Birnbaum et al. proposed a trajectory simplicity algorithm based on subtrajectories and their similarity [23]. Long et al. proposed a polynomial-time algorithm for optimal direction-preserving simplification, which supports a border application range than position-preserving simplification [24]. Nibali and He proposed an effective compression system for trajectory data called Trajic, which can fill the gap of good compression ratio and small error margin [25]. Similar to STTrace, Muckell et al. put forward the Spatial QUalIty Simplification Heuristic method [26]. In 2012, Chen et al. proposed a Multiresolution Polygonal Approximation algorithm, which compressed trajectories by a joint optimization on both the LSSD and the ISSD criteria [27]. In 2014, Muckell et al. proposed a new algorithm, SQUISH-E, which compresses trajectories with provable guarantees on errors [28]. This algorithm has the flexibility of tuning compression with respect to compression ratio and error. Algorithms involved in this paper are summarized in Table 1 from the complexity, application scope, and error metric of them.

In literature [29], the traditional trajectory compression algorithms were classified into the following 4 categories, which are now unable to contain all of the compression algorithms.

(1) Top-Down. The data series is recursively partitioned until some halting condition is met. The popular top-down compression methods include Douglas-Peucker algorithm, top-down speed-based algorithm, and top-down time-ratio algorithm.

(2) Bottom-Up. Starting from the finest possible representation, successive data points are merged until some halting condition is met. The algorithm may not visit all data points in sequence.

(3) Sliding Window. Starting from one to the end of the data series, a window of fixed size is moved over the data points and compression takes place only on the data points inside the window. Spatial QUalIty Simplification Heuristic method and SQUISH-E algorithm are the popular sliding window methods.

(4) Opening Window. Starting from one to the end of the data series, a compression takes place on the data points inside the window whose size is decided by the number of points to be processed. Its process will not end until some halting condition is met. The window size is not constant while compressing. The famous Opening Window methods are Opening Window algorithm and Opening Window Time-Ratio algorithm.

The organization of this paper is as follows: the basic ideal of compression and typical algorithms of compression are introduced and discussed in Section 1. The related definitions about moving objects and their trajectories are summarized in Section 2. The survey of moving object compression algorithms is given in Section 3. Some validation criteria of compression performance are discussed in Section 4 to reveal their benefit for moving object compression. In Section 5, some public trajectory datasets are described. Some typical application scenarios are listed to show the application of moving object compression in Section 6. In Section 7, some disadvantages and future works are summarized.

2.1. Trajectory Data

A spatial-temporal trajectory of a moving object is defined as a sequence of position, attribute, and time in literature [1]. It is necessary for a formal description of a trajectory and its correlation attributes to describe the methods in this paper. A trajectory formally defined in literature [30] is also suitable in this paper. Giving TD as Trajectory Database denotes trajectory sets, and . A trajectory (TR) is a chronological sequence consisting of multidimensional locations, which is denoted by   .   , a sampling point in , is represented as , which means that the position of the moving object is at time . is a multidimensional location point. A trajectory represents a trajectory segment or subtrajectory of a trajectory , denoted as TS (Trajectory Segment), .

In this section, we classify the derivation of trajectories into 4 major categories, briefly introducing a few application scenarios in each category [31].

(1) Mobility of People. Real-world movements of people are recorded in the form of spatial-temporal trajectories passively and actively. Such records can be translated into a great amount of spatial-temporal trajectories that can be used in human behavior analysis and inferring social ties.

Active Recording. Travelers actively log their travel routes for the purpose of memorizing a journey and sharing experience with friends. In Flickr, a series of geotagged photos can formulate a spatial-temporal trajectory as each photo has a location tag and a time stamp corresponding to where and when the photo was taken. Likewise, the “check-ins” of a user in a location-based social network can be also regarded as a spatial-temporal trajectory, when sorted chronologically.

Passive Recording. A user carrying a mobile phone unintentionally generates many spatial-temporal trajectories represented by a sequence of cell tower IDs with corresponding transition times. In addition, transaction records of a credit card also indicate the spatial-temporal trajectory of the cardholder, as each transaction contains a time stamp and a merchant ID denoting the location where the transaction occurred.

(2) Mobility of Transportation Vehicles. A great number of vehicles (such as taxis, buses, vessels, and aircrafts) have been equipped with a GPS device. For instance, many taxis have been equipped with a GPS sensor, which enables them to report a time-stamped location with a certain frequency. Such reports formulate a large amount of spatial-temporal trajectories that can be used for resource allocation, traffic analysis, and improving transportation networks.

(3) Mobility of Animals. Biologists are collecting the moving trajectories of animals like tigers and birds, for the purpose of studying animals’ migratory traces, behavior, and living situations.

(4) Mobility of Natural Phenomena. Meteorologists, environmentalists, climatologists, and oceanographers are busy collecting the trajectories of natural phenomena, such as hurricanes, tornados, and ocean currents. These trajectories capture the change of the environment and climate, helping scientists deal with natural disasters and protect the natural environment we live in.

2.2. Road Network

A road network is defined as a directed graph , where is a finite vertex set in which every vertex denotes a location point, and is a finite edge set in which every edge denotes a segment connecting 2 vertexes. A road network can also be regard as a constrained 2-dimensional space, often referred to as 1.5-dimensional space. The road network contains 29 vertexes () and 36 edges () in Figure 2.

In 2-dimensional space, a point () is a two-tuple in the form of and a polyline () is a set of points. The distance () of the points () in the polyline () is the distance along the polyline () from its starting point to point . The definitions of road network point, distance, polyline, segment, and measurement in road network space are described as follow.

(1) Road Network Point. The point in road network space can be denoted in the form of , where is the point in a road and is the measurement of along the road.

(2) Road Network Distance. The distance () between 2 random points ( and ) in road network space is the shortest path length along the road from to .

(3) Road Network Polyline. The road network polyline is denoted as , where . The length of polyline is the summation of the distance between 2 adjacent vertexes.

(4) Road Network Segment. The road network segment in road network space is a road network polyline which owns and only owns 2 vertexes.

(5) Edge of Road Network. The edge () in road network is the path between 2 adjacent intersections.

(6) A Trajectory Model Based on Road Network. The trajectory model based on road network is a new representation of moving object trajectories, which matches GPS points with road network to more accurately describe the spatial motion information of moving objects by their motion laws in road network. Meanwhile, the model introduces a nonlinear interpolation function among sampling points to preferably describe variable motions. The trajectory model based on road network separates the locations from time stamps. In other words, a trajectory is represented by a spatial path and a temporal sequence. The spatial path of a trajectory in a road network is a sequence of consecutive edges. As shown in Figure 3, a trajectory sequentially passes edges ,  ,  ,  , and . Consequently, it can be represented by a spatial path in the format of . Note that a trajectory can start from or end at any point of an edge, not necessarily an endpoint. The temporal information of a trajectory is captured by a two-tuple , where represents the road network distance the object has traveled at the time stamp from the start of the trajectory and represents the time the object has traveled at the location from the start of the trajectory.

2.3. Perpendicular Distance and Synchronized Euclidean Distance

The perpendicular distance of point is the shortest distance between the current point and the segment connecting the first and last points of the trajectory, while Synchronized Euclidean Distance of point is the distance between the currently real point and the synchronized point acquired by interpolating between the precursor point and the successor point of the current point. In Figure 4, the perpendicular distance of point is denoted as and the Synchronized Euclidean Distance of point is denoted as SED.

As shown in Figure 4, is the current sampling point, is the segment connecting the first and last points of the trajectory, is the synchronized point of in segment , and the coordinate of is calculated by

The perpendicular distance and Synchronized Euclidean Distance between and are calculated by formula (2), according to formula (1):

2.4. Trajectory Similarity

The trajectory similarity is calculated by measuring the similarity between two trajectories or subtrajectories utilizing Euclidean distance, PCA Plus Euclidean distance, Hausdorff distance, Fréchet distance, and so on. In this section, we introduce 4 classical trajectory similarity measurements.

2.4.1. Euclidean Distance

Let and be -dimensional trajectory segments with length of . Their Euclidean distance denoted as is given in

2.4.2. PCA Plus Euclidean Distance

When computing PCA (Principal Components Analysis) Plus Euclidean distance, trajectory is firstly represented as a 1D signal by concatenating the and the projections. Then, location signal is converted into the first few PCA coefficients. The trajectory similarity is the Euclidean distance computed with the PCA coefficients, as shown in

Here, and are, respectively, the th PCA coefficient in two-dimensional space trajectory segments and , whose length is , and .

2.4.3. Hausdorff Distance

Given 2 trajectory segments and , their Hausdorff distance denoted as is given in

In the formula, is the direct Hausdorff distance of and , and is the Euclidean distance between sampling points and in and , respectively.

2.4.4. Discrete Fréchet Distance

Discrete Fréchet distance fully considers the location and sequential relationship of the point in trajectories while measuring their similarity. It scans the points on two trajectories and calculates its Euclidean distance point by point. The maximum Euclidean distance is the Discrete Fréchet distance between two trajectories. The calculating formula is shown as

Here, and are the trajectory segments whose lengths are and , respectively. Consider . and are the th points on trajectory segments and , respectively. is the Euclidean distance between and .

2.4.5. Others

In addition to the 4 trajectory similarity measures discussed above, Vlachos et al. put forward longest common subsequence which is different from distance calculation and is used to obtain the longest common subsequence existing in two trajectory sequences [32]. Chen et al. proposed Dynamic Time Warping method which is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions [33]. Lee et al. put forward a comprehensive distance function which is composed of three components: the angle distance, the parallel distance, and the perpendicular distance [1]. The method overcomes the limitations of the trajectory similarity measure by the length of trajectory segments. It can more comprehensively measure the similarity between trajectory segments. Yuan et al. extract trajectory structure and propose a structure similarity measurement for comparing trajectories in microlevel [34].

2.5. Compressive Sensing

Compressive sensing (CS) is an efficient signal processing technique to acquire and reconstruct a signal by finding solutions to underdetermined linear systems. It is also known as compressed sensing, compressive sampling, or sparse sampling. CS is with the principle that, through optimization, the sparsity of a signal can be exploited to recovery from far fewer samples than required by the Shannon-Nyquist sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity which requires the signal to be sparse in some domain. The second one is incoherence which is applied through the isometric property which is sufficient for sparse signals [35].

In this section, we will discuss CS given in literature [16] briefly. Given a vector , the representation can be computed on a basis by solving the linear equation which is said to be compressible if has a large number of elements with small magnitude. If there is a basis on which a given vector has a compressible representation, then is also compressible. Compressive sensing considers the problem of recovering an unknown compressible vector from its projections. Let be an projection matrix with . Consider the equationwhere is a noise vector whose norm is bounded by . Compressive sensing aims to reconstruct from and given the knowledge that is compressible on the basis . Compressive sensing shows that under certain conditions it is possible to recover by solving the following optimization problem:

Given , can be estimated from .

In the context of trajectory compression, is the trajectory measured by a Mobile Sensor Networks node. The dimension of is large. The MSN node computes and transmits to the server. The server can compute an estimated trajectory by using , , and to solve the aforementioned optimization problem as shown in (8). Note that the compression is lossy with representing both space savings and reduction in wireless transmission requirement.

3. Trajectory Compression Algorithms

In this section, we comprehensively analyze moving object trajectory compression algorithms which have been one of the research hotspots in the moving object data mining field. Existing trajectory compression algorithms include 2 categories: single trajectory compression (STC) and multiple trajectory compression (MTC). The former compresses each trajectory individually ignoring the commonalities among trajectories, and the latter compresses several trajectories or subtrajectories at the same time by the commonalities among trajectories (such as similarity). There are some different classification strategies about compression algorithms, but they are not unable to contain all of the compression algorithms as the rapid development of compression technology. Therefore, in this paper, we present a new classification strategy to divide trajectory compression algorithms into 5 categories on the basis of compression theories.

3.1. Distance Based Trajectory Compression

Distance (such as perpendicular distance and Synchronized Euclidean Distance) information is one of the most classic and common compression metrics in trajectory compression algorithms. Many researchers have devoted their talent to compress trajectories by deciding whether the sampling point is reserved based on distance, since 1973. The earliest distance based trajectory compression algorithm is Douglas-Peucker algorithm proposed by Douglas and Peucker [9], which recursively selects the point whose perpendicular distance is greater than given threshold until all points reserved meet the condition. Keogh et al. put forward Opening Window algorithm that online compresses trajectory data based on perpendicular distance. A transformation of Douglas-Peucker algorithm called top-down time-ratio algorithm, which takes a full consideration of spatial-temporal characteristics by replacing perpendicular distance with SED, is proposed by Meratnia and Rolf [3]. Then, an extension to Opening Window called Opening Window Time-Ratio algorithm using SED instead of perpendicular distance to take temporal features into account is proposed by Wu and Cao [2]. Gudmundsson et al. developed an implementation of the Douglas-Peucker algorithm which works efficiently even in the case where the polygonal path given as input is allowed to self-intersect [11].

Perpendicular distance based trajectory compression is simple and efficient, but it just considers the spatial features and ignores the temporal features of trajectories. Synchronized Euclidean Distance based trajectory compression not only is simple and efficient but also has a better compression effect than perpendicular distance based trajectory compression for it takes the spatial-temporal features of trajectories into account. Distance based trajectory compression provides an effective way to compress trajectory data and a satisfactory compression result, which has been applied to many fields, such as animal migration, hurricane prediction, and aerospace field. But there are some obvious shortcomings in processing limited trajectories, such as the trajectory of human activities in urban and taxi motion track and keeping the internal features in trajectories for distance based trajectory compression pays more attention to keeping the holistic geometrical characteristics of trajectories.

3.2. Velocity Based Trajectory Compression

Velocity is one of the most basic features of moving objects and it can reflect the motion features of moving objects as well as the internal features in trajectories. The researches on compressing trajectory data based on velocity are not perfect by now. A famous velocity based trajectory compression is top-down speed-based algorithm proposed by Meratnia and Rolf [3] improving the existing compression techniques by exploiting the spatiotemporal information hiding in the time series which can be made by analyzing the derived speeds subsequent to the trajectory. A large difference between the travel speeds of two subsequent segments is a criterion that can be applied to retain the data point in the middle. An online algorithm called Dead Reckoning algorithm proposed by Trajcevski et al. [22] compresses trajectory by estimating the successor point through the current point and its velocity. A polynomial-time algorithm for optimal direction-preserving trajectory simplification, which supports broader application range than position-preserving simplification, proposed by Long et al. [24] can be also regarded as a velocity based trajectory compression. This method uses the maximum angular difference between the direction of the movement during each time period in original trajectory and the direction of the movement during the same time period in a simplification of original trajectory.

Velocity based trajectory compression not only is simple and efficient but also can keep the internal features in trajectories; however, it is not popular, for the existing velocity based trajectory compression methods only take speed into account which may lead to greater errors and break the holistic geometrical characteristics of trajectories. In the future study, we hope that researchers will pay their attention to compressing trajectory data by various features of velocity (such as velocity direction and accelerated velocity) except for the magnitude of velocity.

3.3. Semantic Trajectory Compression

Semantic information in road network has more practical significance in representing moving object trajectories that are collected from limited moving objects. Semantic trajectory compression stores trajectories in the form of semantic information in road network instead of trajectory points, compresses spatial information in trajectory data by spatial compression methods, and compresses temporal information in trajectory data by temporal compression techniques, until some halting condition is met. The new and novel representation for trajectories that replaces trajectory data by the form of semantic information in road network was proposed by Schmid et al. [12] in 2009. Many researchers have paid their attention to semantic trajectory compression since then. Semantic trajectory compression was applied to human motion dataset in urban area by Richter et al. [13] which identifies the relevant reference points along the trajectory, determines all possible descriptions of how movement continues from here, and exploits motion feature description of reference points to compress trajectory data. Song et al. [14] proposed a new framework, namely, paralleled road-network-based trajectory compression, to effectively compress trajectory data under road network constraints. Different from existing works, PRESS proposed a novel representation for trajectories to separate the spatial representation of a trajectory from the temporal representation and proposes a Hybrid Spatial Compression (HSC) algorithm and error Bounded Temporal Compression (BTC) algorithm to compress the spatial and temporal information of trajectories, respectively.

Semantic trajectory compression is only suitable for limited moving objects, such as movement in road network, urban movement, and orbital trajectory, which will get a more realistic significance result in compressing trajectories of limited moving objects.

3.4. Similarity Based Trajectory Compression

Similarity based trajectory compression splits original trajectories into subtrajectories and then clusters subtrajectories with high similarity into the same group and clusters subtrajectories with low similarity into the different groups. And then it unifies spatial information of trajectory data in the same group by a certain strategy which will keep a set of spatial information and all temporal information in every group, until some halting condition is met. A famous similarity based compression is similarity based compression of GPS trajectory data proposed by Birnbaum et al. [23] which splits trajectories into subtrajectories according to the similarities among them. For each collection of similar subtrajectories, this technique stores only one subtrajectory’s spatial data. Each subtrajectory is then expressed as a mapping between itself and a previous subtrajectory.

Similarity based trajectory compression has great advantages in retaining the commonalities among trajectories. It is suitable for trajectory set and may be not suitable for a single trajectory for the error may be large.

3.5. Priority Queue Based Trajectory Compression

Priority queue based trajectory compression selects the best subset of trajectory points and permanently removes redundant and inessential trajectory points from original trajectory by utilizing local optimization strategies, until some halting condition is met. The Spatial QUalIty Simplification Heuristic (SQUISH) method based on the priority queue data structure proposed by Muckell et al. [26] prioritizes the most important points in a trajectory stream. It uses local optimization to select the best subset of points and permanently removes redundant or insignificant points from the original GPS trajectory. Three years later, Muckell et al. [28] presented a new version of SQUISH, called SQUISH-E (Spatial QUalIty Simplification Heuristic-Extended) which has the flexibility of tuning compression with respect to compression ratio and error.

Priority queue based trajectory compression is not only an online trajectory compression algorithm but also a trajectory compression algorithm that requires presetting the memory buffer. Hence, it can be well applied to real-time applications and small memory devices. It is suitable for all kinds of trajectories, but the compression effect and matching effect may be a little worse than the other compression methods.

3.6. Others

Considering that if the movement pattern and internal features are neglected, applications, such as trajectory clustering, outlier detection, and activity discovery may be not so accurate as we expected. Therefore, we expect that a new algorithm called structure features based trajectory compression which compresses trajectories based on movement pattern and structure features in trajectories, such as moving direction of objects, internal fluctuation in trajectories, and trajectory velocity or acceleration, will attract more attention of researchers, for instance, a polynomial-time algorithm for optimal direction-preserving simplification proposed by Long et al., which supports border application range than position-preserving simplification [24]. At present, most of the portable equipment used for data collection is inexpensive, power saving, and of lower computational capability, while the data processing procedure is often performed in supercomputers which have a higher computational capability. In order to effectively reduce the transport cost, we expect that compressive sensing based trajectory compression, which reduces the data scale in the process of acquiring data by combining compressive sensing with trajectory features, will attract more attention of researchers, for instance, Rana et al. present an adaptive algorithm for compressive approximation of trajectory in 2011, which performs trajectory compression, so as to maximize the information about the trajectory subject to limited bandwidth [36]. Four years later, another compression method called adaptive trajectory (lossy) compression algorithm based on compressive sensing has been proposed by Rana et al., which has two innovative elements [16]. First, they propose a method to compute a deterministic projection matrix from a learnt dictionary. Second, they propose a method for the mobile nodes to adaptively predict the number of projections needed based on the speed of the mobile nodes.

4. Validation Criteria of Compression Performance

Compression result validation is very important for compression algorithms and it can measure the level of success and correctness reached by the algorithms. There are many solutions to validate the result, mainly including Analysis, Experience, Evaluation, and Example. The Analysis solution includes rigorous derivation and proof or carefully designed experiment with statistically significant results. Experience solution is applied in real-world scenarios or projects and the evidence of approach’s correctness (usefulness or effectiveness) can be obtained from the process of execution. Evaluation uses a set of examples to illustrate the proposed approach, with a nonsystemic analysis of gathered information from the execution of examples. Example uses only one or several small-scale examples to illustrate the proposed approach, without any evaluation or comparison of the execution result. In this section, we mainly discuss 2 kinds of compression validation solutions. The first compression validation solution is performance metrics which are used for comparing the efficiency and performance of trajectory compression algorithms. And the other compression validation solution is accuracy metrics which are used for comparing the accuracy and information loss of trajectory compression algorithms. This section, respectively, denotes the original trajectory as OT whose length is and the compressed trajectory as RT whose length is , in order to facilitate the validation of trajectory compression.

4.1. Performance Metrics
4.1.1. Compression Ratio

Compression ratio () is an important index to measure the advantages and disadvantages of trajectory compression performance, which is defined as in

Compression ratio is the most common compression index which can accurately reflect the change of the size of trajectory data. But is influenced by the original signal data sampling rate and quantization accuracy and so on; it is difficult to make an objective measurement. For instance, a compression ratio of 70% indicates that 30% of the original points remained in the compressed representation of the trajectory; namely, if there are 100 points in original trajectory, only 30 points will be reserved in the compressed representation of the trajectory after compressing.

4.1.2. Compression Time

Compression time () is an important index to measure the efficiency of trajectory compression performance, which reflects the total time required by the compression. For example, a compression time of 24 indicates that the total time of compressing original trajectory is 24 ms.

4.2. Accuracy Metrics
4.2.1. Spatial Error

Given an original trajectory OT and its compressed representation RT, the spatial error (SplE) of RT with respect to a point in OT is defined as the distance between and its estimation . If RT contains , then is (e.g., ,  ,  , and in Figure 5 where there is a trajectory containing ). Otherwise, is defined as the closest point to along the line between precursor point and successor point of in trajectory RT. The precursor point of is and the successor point of is . Therefore, the spatial error of RT with respect to is the perpendicular distance from to line.

4.2.2. SED Error

Temporal characteristics of trajectory data are not considered in spatial error, so Synchronized Euclidean Distance (SED) is introduced to overcome this limitation. SED is also the distance between and its estimation , which is obtained by linear interpolation method, owning the same time coordinate with . If RT contains , then is (e.g., ,  ,  , and in Figure 6 where there is a trajectory containing ). Otherwise, is defined as the location point owning the same time coordinate with in trajectory RT. The estimation point of is . Therefore, the SED error of RT with respect to is the distance between and .

4.2.3. Heading Error

Heading error (HE) is the angular deflection between moving direction from the actual location point to along original trajectory and moving direction from the estimation location point to along compressed trajectory. The estimation owning the same time coordinate with is obtained by linear interpolation method. As shown in Figure 7, we specify clockwise direction is positive value and anticlockwise direction is negative value, to facilitate the calculation.

4.2.4. Speed Error

Speed error (SpdE) is an important metric for various kinds of transit applications. For instance, velocity measurement system gets overspeed hotspots by velocity information [37], as well as acceleration and deceleration data help to identify all kinds of irregular driving behaviors, which will help police to find vehicle’s illegal activities [38]. The computing method of speed error is similar to heading error. It calculates the difference value between actual velocity and estimated velocity instead of calculating angular deflection.

4.2.5. Information Loss Degree

Information Loss Degree (ILD) that can comprehensively analyze the accuracy and error of trajectory compression results is a comprehensive index to measure the advantages and disadvantages of trajectory compression effectiveness. Information Loss Degree can be calculated by the SED distance, Dynamic Time Warping distance, and Speed Corner between original trajectory and compressed trajectory.

Information Loss Degree based on SED (ILDSED) is the mean value of maximum SED distance error , average SED distance error , and minimum SED distance error between original trajectory OT and compressed trajectory RT, which can be calculated as

Information Loss Degree based on DTW (ILDdtw) is measured by the time warping distance between original trajectory OT and compressed trajectory RT, which can be calculated as

Here, is the SED error between point and , which, respectively, are the first point of OT and RT. and are the remaining trajectory after removing the first sampling point. ILDdtw calculates the Information Loss Degree by DTW error.

Information Loss Degree based on Speed Corner (ILDcorner) is measured by the original and compressed Speed Corner of moving objects which can be calculated as

5. Public Trajectory Datasets

There are quite a few real trajectory datasets that are publicly available. In this section, a detailed description of real trajectory datasets is given from their sources, characteristics, sampling rate, and so on.

5.1. GeoLife Trajectory Dataset

A GPS trajectory dataset from Microsoft Research GeoLife project was collected by 182 users in a period of over 5 years from April 2007 to August 2012. A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude, and altitude. This dataset whose size is 1.55 GB contains 17,621 trajectories with a total distance of 1,292,951 kilometers and a total duration of 50,176 hours. These trajectories were recorded by different GPS loggers and GPS-phones and have a variety of sampling rates. 91.5 percent of the trajectories are logged in a dense representation, for example, every 1~5 seconds or every 5~10 meters per point.

5.2. T-Drive Taxi Trajectories

A sample of trajectories from Microsoft Research T-Drive project was generated by over 30,000 taxicabs in a period of 6 months from March 2009 to August 2009. The total distance traveled by the taxis is more than 800 million kilometers and the total number of GPS points is nearly 1.5 billion. The size of the dataset is 756 Mb and the average sampling interval and average distance between two consecutive points are around 3.1 minutes and 300 meters, respectively.

5.3. GPS Trajectory with Transportation Labels

This is a portion of GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project. Each trajectory has a set of transportation mode labels, such as driving, taking a bus, riding a bike, and walking. There is a label file associated with each folder storing the trajectories of a user. A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude, height, speed, heading direction, and so forth. These trajectories were recorded by different GPS loggers or GPS-phones and have a variety of sampling rates. 95 percent of the trajectories are logged in a dense representation, for example, every 2~5 seconds or every 5~10 meters per point, while a few of them do not have such a high density being constrained by the devices. The size of the dataset is 560 Mb.

5.4. Check-in Data from Location-Based Social Networks

The dataset from a LBSN in China whose size is 10.68 Mb consists of 2,756,710 check-in data generated by 10,049 users excluding the timestamp and relationships between users. Each check-in includes the information of ID, latitude, longitude, and timestamp.

5.5. Hurricane Trajectories

This dataset is provided by the National Hurricane Service (NHS) containing 1,740 trajectories of Atlantic Hurricanes from 1851 to 2012. NHS also provides annotations of typical hurricane tracks for each month throughout the annual hurricane season that spans from June to November. The data were collected every 6 hours.

5.6. Movebank Animal Tracking Data

Movebank is a free, online database of animal tracking data helping animal tracking researchers to manage, share, protect, analyze, and archive their data. Movebank is an international project with over 11,000 users, including people from research and conservation groups around the world. A lot of datasets are collected in this database, such as Continental black-tailed godwits (data from Senner et al., 2015) whose size is 5.161 Mb [39], and Navigation experiments in lesser black-backed gulls (data from Wikelski et al., 2015) whose size is 29.09 Mb [40].

6. Application Scenarios of Trajectory Compression

(1) The unrestricted movement of moving objects is a typical application scenario of trajectory compression, such as a bird flying in the sky, a fish swimming in the sea, and a horse running on the grassland. Distance based trajectory compression and velocity based trajectory compression have a high efficiency in this application scenario and a good application prospect in many fields, such as studying animals’ migratory traces, behavior, and living situations, as well as animal migration research and hurricanes, tornados, and ocean currents prediction. For instance, animal tracking data helps biologists understand how individuals and populations move within local areas, migrate across oceans and continents, and evolve through millennia. This information is being used to address environmental challenges such as climate and land use change, biodiversity loss, invasive species, and the spread of infectious diseases. However, the data that need to be analyzed always have a large scale which will make them difficult to be analyzed and find the useful information in the data, so it is necessary to compress the data by removing the redundant data and only keeping the valuable data.

(2) The restricted movement of moving objects is another very important application scenario of trajectory compression, such as the motion track of taxis in urban area. Semantic trajectory compression can effectively and efficiently compress the trajectory data in this application scenario with respect to transport analysis, smart city plan, and smart transportation management. For instance, vast amounts of trajectory data can be collected by vehicle positioning equipment and other devices, which can be used to help police to find dangerous driving, predict the stream of people in major festivals in important places of a city, and trace escaping route of criminals. But the large scale of the data will lead to the difficulty of finding dangerous driving, predict the stream of people in major festivals in important places of a city, and trace escaping route of criminals for police, so the data are in urgent need of compression which can remove the redundant data and only reserve the valuable data in the dataset.

(3) Priority queue based trajectory compression is widely applied to small memory devices and has a high efficiency in this scenario. For instance, most of the portable mobile devices have a small memory. If the data have to be analyzed on portable mobile devices, it is easy to meet the breakdown (out of memory) that will lead to the device not working. Therefore, it is necessary for portable mobile devices with a compress application that may compress the data by removing the redundant data before analyzing them.

7. Conclusion and Future Work

Trajectory compression is an efficient way to reduce the size of trajectory data and reserve the useful and valuable information in large scale dataset, which is one of the important components of data mining technology. In this paper, the research status and new development of moving object trajectory compression algorithms in recent years have been surveyed and summarized. Firstly, the representative compression algorithms proposed in recent years are analyzed and summarized from algorithmic thinking, key technology, and the advantages and disadvantages. Then, the existing algorithms are classified into several categories according to compression theories. Thirdly, some typical valid criteria of compression result are summarized. Lastly, some application scenarios are pointed out and discussed.

On the basis of summarizing and surveying on the moving object trajectory compression and its theories, methods, and techniques, we also summarize the problems and the challenges existing in moving object trajectory compression, which mainly includes the following aspects: (1) Most of the current trajectory compression algorithms pay more attention to the holistic outline geometrical characters of trajectory and ignore the movement patterns and the internal features in trajectories. (2) Most of the current trajectory compression algorithms cannot fully combine time dimension with space dimensions, and they just regard time dimension as the additional dimension of space dimension of trajectory object. (3) The general applicability of trajectory compression algorithm is low. (4) Few researchers have paid their attention to compressing trajectories by compressive sensing which reduces the data scale in the process of acquiring data.

Competing Interests

No potential conflict of interests was reported by the authors.

Authors’ Contributions

Penghui Sun and Shixiong Xia contributed equally to this work.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities, China (with Grant 2015XKMS085).