Abstract

The increasing availability of location-acquisition technologies has enabled collecting large-scale spatiotemporal trajectories, from which we can derive semantic information in urban environments, including location, time, direction, speed, and point of interest. Such semantic information can give us a semantic interpretation of movement behaviors of moving objects. However, existing semantic enrichment process approaches, which can produce semantic trajectories, are generally time-consuming. In this paper, we propose an efficient semantic enrichment process framework to annotate spatiotemporal trajectories by using geographic and application domain knowledge. The framework mainly includes preannotated semantic trajectory storage phase, spatiotemporal similarity measurement phase, and semantic information matching phase. Having observed the common trajectories in the same geospatial object scenes, we propose a semantic information matching algorithm to match semantic information in preannotated semantic trajectories to new spatiotemporal trajectories. In order to improve the efficiency of this approach, we build a spatial index to enhance the preannotated semantic trajectories. Finally, the experimental results based on a real dataset demonstrate the effectiveness and efficiency of our proposed approaches.

1. Introduction

Spatiotemporal trajectories record the spatiotemporal position sequences of moving objects. The increasing access to positioning device technologies, such as smartphones, GPS-enabled cameras and sensors, results in vast volumes of collected spatiotemporal trajectories. Analyzing and mining spatiotemporal trajectories can study in depth various fields such as traffic coordination and management (e.g., road flow monitoring), tourist route recommendation, and natural disaster early warning (e.g., typhoon prediction). However, many applications in the mobility domain require a semantic interpretation of movement information. This semantic interpretation is usually obtained by mining semantic trajectories, which is the fusion of spatiotemporal trajectories and semantic information. Location-based social networks (LBSN), such as Twitter and Weibo, produce multifaceted semantic information, which contains the moving state of moving objects (e.g., speed and direction) and environment information (e.g., air temperature and spatial topological relationship) [1]. Combing semantic information, such as user’s personalized characteristics, landmark names, user’s interest, and occupation into the user’s spatiotemporal trajectories, will contribute to the recommendation of nearby hot spots of interest for users [2, 3]. It can be seen that mining semantic trajectories [4] can better meet the needs of decision analysis applications.

Different from spatiotemporal trajectories obtained by position-aware devices, semantic trajectories must be generated through semantic trajectory modeling. Semantic trajectory modeling includes trajectory data preprocessing, trajectory segmentation, and semantic enrichment. Among them, the semantic enrichment process is the key stage, which annotates appropriate semantic information (e.g., behavior attributes, environment information, and domain knowledge) in spatiotemporal trajectories. With different sources, complex types, and diverse forms of semantic information, there are different semantic enrichment process approaches.

The existing semantic enrichment process approaches can be divided into three categories: (1) Early approaches directly annotate velocity and direction in spatiotemporal trajectories. Due to lacking rich semantic information, the results of mining semantic trajectories annotated by early approaches have a low semantic interpretation. (2) Part approaches annotate domain knowledge in spatiotemporal trajectories through ontology. However, approaches based on ontology transform a semantic trajectory into RDF graph description [5], which causes the finding and reasoning semantic trajectories time-consuming. (3) Typical approaches annotate geographical object information, including areas of interest (ROIs), lines of interest (LOIs), and points of interest (POIs), through the spatial join [6] algorithm and map matching [7] algorithm. The execution time of [6, 7] is linearly correlated with the number of geospatial objects, which results in high time consumption. It can be seen that the existing semantic enrichment process approaches have the disadvantage of high time consumption.

On the other hand, given movement trajectories limited by topological relationship of urban road networks, there are common movement trajectories in the same geospatial object scenes. For example, commuters departing from the Tsinghua Park residential usually take Metro Line 4 to Beijing Zhongguancun SOHO Building. Due to traffic restrictions, it is easy to collect a large number of identical commuting trajectories. Obviously, new commuting trajectory information can be directly attached to historical commuting trajectories. Similarly, it is possible to directly annotate the semantic information in a preannotated semantic trajectory to new spatiotemporal trajectories. Using preannotated semantic trajectories for enrichment does not need a complicated computation and annotation process, which may avoid an inefficient semantic enrichment process.

In this paper, we propose a new semantic enrichment process approach named Efficient Semantic Enrichment Process for Spatiotemporal Trajectories based on Semantic Information Matching (SEPSIM), which firstly uses semantic information in preannotated semantic trajectories for annotating spatiotemporal trajectories. We first store preannotated semantic trajectories in the form of episodes. In this phase, we segment semantic trajectories into stop or move episodes. Then, we measure the spatiotemporal similarity between subtrajectories and episodes. The similarity of stop subtrajectories and move subtrajectories is measured, respectively. Finally, we propose a new algorithm named Semantic Information Matching Algorithm based on Similar Episodes (SESIM), which can match semantic information of episodes to a new trajectory. In order to put down the search cost of metrics and matching, we build a spatial index to store episodes of preannotated semantic trajectories.

In summary, this article makes the following contributions: (i)We propose an efficient semantic enrichment process framework (SEPSIM) for spatiotemporal trajectories based on semantic information matching. It includes three phases: preannotated semantic trajectory storage, spatiotemporal similarity measurement, and semantic information matching. In order to improve the efficiency of the SEPSIM approach, we establish a spatial index(ii)We propose a new standard to measure the effectiveness of semantic enrichment process approaches. Also, we compared different semantic enrichment process approaches in efficiency(iii)In order to verify the effectiveness and efficiency of the SEPSIM approach, experiments were performed by using the real trajectory dataset. The results prove the high effectiveness and efficiency of the SEPSIM approach

There are different semantic enrichment process approaches with different sources, complex types, and various forms of semantic information. Early semantic enrichment process approaches directly annotate velocity and direction in spatiotemporal trajectories, which generate semantic trajectories as stop and move subtrajectory sequences. Ashbrook and Starner [8] calculated the moving speed (whether the speed is zero) to identify stop subtrajectories. Due to poor speed measurement and other reasons, semantic trajectory stop segments do not match actual situation. Krumm and Horvitz [9] calculated the speed and direction to identify stop subtrajectories; Palma et al. [10] set the subtrajectory below the average speed as a stop subtrajectory, generating the semantic trajectory consisting of stop and move subtrajectories. In addition to calculating the moving speed, Zheng et al. [11] also calculated the acceleration and speed change rate to discover move subtrajectories with different modes of transportation (e.g., bicycles, buses, and self-driving) to enrich the semantic trajectory. Although early semantic enrichment process approaches were fast in annotation, the semantic information was not rich enough.

Part semantic enrichment process approaches annotate domain knowledge as semantic information through ontology. Spaccapietra et al. [12] first proposed an ontological method for semantic trajectory modeling. Based on the concepts of “stop” and “move,” the ontology was used to define semantic trajectories, and the semantic information of trajectories was further enhanced using the reasoning ability of ontology. Baglioni et al. [13] extended the definition of Baglioni’s ontology and proposed the concept of core ontology, which formally describes the concepts of stop, move, time, place, and mode in human mobile behavior, further enriching the definition of semantic trajectories. In 2014, Vandecasteele et al. [14] combined semantic trajectories with semantic events. Nogueira et al. [15] proposed the QualiTraj ontology to describe the various motion characteristics of original trajectories, especially the derivative characteristics, such as speed, acceleration, and direction. Nogueira and Martin [16] proposed a new ontology based on QualiTraj ontology with stronger information description ability, namely, Semantic Trajectory Episodes (STEP) ontology. It can not only describe basic motion characteristics but also describe environmental characteristics of moving trajectories on a higher semantic level. In 2018, Nogueira et al. [17] proposed the FrameSTEP, a semantic trajectory labeling framework based on STEP ontology. This method can calculate various physical movements and spatial geometric features of trajectory segments and use external reliable resources (such as OSM and LinkedGeoData geographic knowledge base) to label the environmental features of trajectories. However, approaches based on ontology need to represent semantic trajectories as RDF graphical descriptions, which results in time consumption.

The main source of information on semantic enrichment is geospatial objects with geometric features in geographical objects, including regions of interest (ROI), lines of interest (LOI), and points of interest (POI) [18]. At present, the typical semantic enrichment processing method uses the spatial join algorithm [6] to find the regions of interest (ROI) that have a topological relationship with spatiotemporal trajectories and label the regions of interest associated with spatiotemporal trajectories and the corresponding topological relationship. This algorithm needs to combine the external environment information (e.g., OSM map and Baidu map) to select the regions of interest associated with spatiotemporal trajectories. The execution time of the algorithm is linearly related to the number of geospatial objects, resulting in high time complexity and low semantic enrichment performance in the spatial connection process. For points of interest (POI), Sun et al. used an implicit Markov model [19] to label the POI categories for staying segments of spatiotemporal trajectories, but in the regions with intensive POI, staying segments may be related to multiple interest points. Coupled with the low GPS sampling rate, it is difficult to identify effective POIs. On the other hand, the LOI labeling method often uses a global map matching algorithm [7] to determine the location of spatiotemporal trajectories. Parent et al. proposed a “point-segment distance” measurement method [7] to replace the original distance function in the global map matching algorithm, which is suitable for labeling lines of interest in geographical scenarios such as dense road networks, parallel roads, and intersections. The global matching algorithm needs to perform metric matching on trajectory segments where spatiotemporal trajectories are located, which easily results in high time complexity of algorithm execution and low semantic enrichment performance.

3. Preliminaries

In this section, we will present definitions of all necessary concepts used in this paper and formally state the problem.

3.1. Basic Concepts

The SEPSIM approach proposed in this paper is aimed at annotating semantic information of preannotated semantic trajectories in spatiotemporal trajectories. The input of this problem is a trajectory, short for a spatiotemporal trajectory. Thus, we provide the definition of “trajectory” at first.

Definition 1 (trajectory). A trajectory is a sequence of sampling points in the form , , where is an object identifier and and are spatial coordinates and a time stamp, respectively. records the number of sampling points in trajectory .

Definition 2 (subtrajectory). A subtrajectory is a substring of a trajectory, i.e., , where .

Definition 3 (stop subtrajectory and move subtrajectory). Given the distance threshold and the number of point threshold, a DBSCAN cluster [20] analyzes the trajectory . Each cluster is a stop subtrajectory of the trajectory. If each in is an outlier, is a stop subtrajectory (). If point is in the end of a stop subtrajectory and point is in the beginning of another stop subtrajectory, , is a move subtrajectory ().
Then, we define “semantic trajectory” as the output of this problem. The main source of information on semantic enrichment is geospatial objects in geographical environment. For this reason, the semantic information matching in this paper refers to geospatial object information matching. First, we give the basic related to semantic information.

Definition 4 (geospatial object). According to geometric shapes, geographical objects are divided into three categories: region of interest (ROI), line of interest (LOI), and point of interest (POI). In this paper, we refer to ROIs, LOIs, and POIs collectively as geospatial objects. A geospatial object is defined as a uniquely identified specific space site (e.g., a park, a road, or a cinema). A is a quad , where represents a geospatial object identifier and denotes the category of it (e.g., ROI, LOI, and POI), and denotes its corresponding location attribute in terms of longitude and latitude coordinates and denotes its name.

Definition 5 (topological relation). For different types of geospatial objects, the topological relationship between subtrajectory and the geospatial object is defined as the following seven types: pass by ( is a LOI), pass by ( is a POI), pass by ( is a ROI), across ( is a ROI), enter ( is a ROI), leave ( is a ROI), and stop inside ( is a ROI).

Definition 6 (episode). An [21] is a subtrajectory of semantically homogeneous sections of a trajectory, such as and. We define an as a multilayered semantic sequence aligned in accordance with the time of a subtrajectory, i.e., , where denotes the corresponding to trajectory segments, denotes the average speed of an , denotes the direction of an , and denotes the corresponding geospatial information. The form of a specific is shown in Figure 1.

Definition 7 (semantic trajectory). A semantic trajectory is a sequence of in a spatiotemporal order of a moving object, i.e., .

The list of major symbols and notations in this paper is summarized in Table 1.

3.2. Problem Statement

Given a trajectory, a preannotated semantic trajectory dataset , two clustering thresholds and , four radii , and a similarity threshold , our goal is to annotate semantic information of preannotated semantic trajectories in trajectory , which can transform trajectory to semantic trajectory .

4. Framework

In this section, we will present the SEPSIM framework including preannotated semantic trajectory storage phase, spatiotemporal similarity measurement phase, and semantic information matching phase. Figure 2 outlines this framework.

Preannotated Semantic Trajectory Storage. Given the preannotated semantic trajectory dataset OST, the first step is to store them. In order to prevent reducing the semantic information matching accuracy, preannotated semantic trajectories are stored in the form of episodes, which are representative and diverse. Semantic trajectories are segmented into episodes by the moving state (stop/move) of the moving object. The output of this phase is a set of stop episodes and move episodes, which can represent and describe a certain region.

Spatiotemporal Similarity Measurement. Given a trajectory , the spatial-temporal similarity is measured between and episodes obtained in the first phase. We first segment trajectory into stop/move subtrajectories by DBSCAN clustering. Then, there are two subproblems that need to be solved: how to measure the similarity between the stop subtrajectory and stop episode and how to measure the similarity between the move subtrajectory and move episode. To solve the problem above, we propose the algorithms based on the Hausdorff distance [22] and based on the Longest Common Subsequence (LCS) [23], respectively. The output of this phase is stop and move episodes, which satisfy the specified similarity condition.

Semantic Information Matching. Semantic information of similar stop/move episodes is matched to trajectory in this phase, through the proposed semantic information matching algorithm (SESIM). The algorithm consists of two subphases: candidate episode sorting and semantic information mapping. We aim to generate a semantic trajectory that contains the most semantic information. For part subtrajectories which have no matching information, we complete the semantic enrichment process of by using the typical approach.

4.1. Preannotated Semantic Trajectory Storage

After we get the preannotated semantic trajectory dataset OST, the first task is to store them for the matching phase. Storing all preannotated semantic trajectories can reduce the workload of storage and search, but the effectiveness and efficiency of the matching phase between complete trajectories are poor. And storing complete preannotated semantic trajectories with corresponding episodes causes data redundancy. In order to ensure complete semantic information and avoid data redundancy, we choose to store preannotated semantic trajectories in forms of episodes. However, episodes can only be obtained through trajectory segmentation. There are two kinds of trajectory segmentation methods: segment according to geospatial objects and segment according to the moving state of the moving object. With complex and irregular distribution and a large number of geospatial objects, segmentation according to geospatial objects is easy to cause trajectory fragments and time consumption. Meanwhile, segmentation according to the moving state of moving objects has the advantages of high segmentation efficiency and clear segmentation rules. So, we choose to segment spatiotemporal trajectories by the moving state of moving objects. For the reason that the stop of the moving object produces trajectory point gathering, we segment preannotated semantic trajectories into stop/move episodes by DBSCAN clustering.

Given a preannotated semantic trajectory dataset OST and a new coming preannotated episode, there are three situations to compare with the episodes in the dataset OST. The first case is the newly episode not repeated in the dataset OST at all, the second case is partial repetition but not complete repetition compared with the dataset OST, and the third case is complete repetition. If all preannotated episodes were stored, it will cause querying multiple repeated episodes with the increasing dataset, which reduces the efficiency of the similarity measurement and matching phase. Therefore, there is a challenge: which preannotated episodes stored can guarantee to avoid redundancy and ensure the effectiveness and efficiency of matching.

To solve the challenge above, we choose to store representative and diverse preannotated episodes to build the dataset OST. The semantic information of the semantic episodes (spatial information and geospatial environment information) represents the geospatial environment characteristics of a certain region. Therefore, the representative semantic episode of a certain region is defined as the episode with the same or partial spatial information and incomplete semantic information compared with preannotated episodes in the set semantic trajectories OST. The diversity of episodes is reflected in the diversity of geospatial environment information, which can enrich the characteristics of a certain region. So, we define the diverse episode as an episode with new geospatial environment information compared with preannotated episodes in the dataset OST. In this paper, the representative and diverse episodes are obtained through trajectory classification in Figure 3. For a given preannotated episode dataset, we first classify it according to spatial information and then classify it according to geospatial information and topological relationship, and finally, the leaf nodes store fine-grained representative and diverse preannotated episodes for matching. The output of this phase is a set of representative and diverse stop/move episodes of the set semantic trajectory OST, which represent a certain region.

4.2. Spatiotemporal Similarity Measurement

For an incoming trajectory , we compare it with episodes to find similar episodes. Once we find the similar episodes, we can match the semantic information of episodes to trajectory . Giving the limitation of topological relationship of urban road networks, there are many similar or the same trajectory segments. So, we first segment trajectory into by DBSCAN clustering. Then, we solve the two problems: the similarity between stop subtrajectory and stop episode (stop trajectories) measurements and the similarity between move subtrajectory and move episode (move trajectories) measurements. Next, we will discuss the algorithm to solve these two problems, respectively, in the following algorithms.

The Algorithm to Determine the Similarity between Stop Trajectories. To our knowledge, there is no basic method for measuring the similarity of stop trajectories in the Euclidean space. In this paper, the stop and stop episode are clusters of trajectory points obtained by DBSCAN clustering. The similarity measurement of the stop and stop episode can be regarded as similarity measurement of point sets. Therefore, we view each stop trajectory, which is a stop or a stop episode, as point sets. The algorithm proposed in this paper consists of two steps: (1) similar region determination and (2) similarity measurement based on the Hausdorff distance. Given the fact that the closer the space, the more similar the trajectories, we first narrow the metric range of stop episodes down and remain stop with greater likelihood of similarity. Then, we calculate the Hausdorff distance between each stop in of and stop episodes in of OST sequentially. Finally, stop episodes meeting similar conditions are remained.

In the first step, we narrow the number of stop episodes down and remain stop episodes with high similar probability to each stop of . Firstly, we convert each stop to a point set by assigning the latitude and longitude coordinates of each stop to the coordinates of the point set (lines 1-5). According to the minimum circumscribed point in point set and the given radius , we draw a circular area as the similar region of the stop (line 6). All the stop episodes that intersect with or are inside are extracted for similarity measurement. If there are no stop episodes in a similar region, there is no similar stop episode to the stop . Otherwise, we convert stop episodes extracted in a similar region to point sets in the second step (lines 7-10). Figure 4 shows the similar region determination of each stop .

Then, we calculate the Hausdorff distance between and each point set in a similar region (lines 11-13). Finally, the point set , which has the minimum Hausdorff distance to point set P, was returned. The stop episode corresponding to the point set is the most similar episode to the stop (lines 14-16).

Input,
Output
1 for each do
2  for each do
3  ;
4   
5   ;
6   ;
7 for each do
8  ifthen
9   
10   ;
11 for each do
12  
13  ;
14   for each do
15  ;
16 return

The Algorithm to Determine the Similarity between Move Trajectories. Generally, move episodes are not completely similar to the entire subtrajectory. In academia, this kind of similarity measurement is called the local matching of the trajectories. Existing local matching methods include the Frechet distance [24], Longest Common Subsequence (LCS) [23], and K Best Connected Trajectories [25]. The Frechet distance method is sensitive to a noise trajectory point; the K Best Connected Trajectory method can only query a few elements and is mainly used for recommending tourist routes. The Longest Common Subsequence (LCS) method is different from the previous similarity measurement methods. The previous methods focus on calculating the distance between point pairs of trajectories. The LCS method takes into account the movement of vehicles, which is restricted by the road network. If vehicles travel on the same road segment, the trajectories passing through the road segment may completely overlap, which is consistent with the thought of the SEPSIM approach. Therefore, the degree of overlap between trajectories can be used as a criterion for similarity.

The LCS method is only suitable for trajectory data generated on the road network, and the time complexity is . However, the LCS method has the advantage of not considering departure time and driving speed of trajectories and is robust to noise, which is consistent with the situation of the experimental data in this paper. Therefore, we propose the algorithm to determine the similarity between move trajectories based on the LCS. The detail of the LCS method can be found in [23].

This algorithm consists of three steps: (1) similar region determination, (2) measurement range determination, and (3) similarity measurement based on LCS. First, we filter move episodes that are likely similar in each move similar region [26]. Then, the subtrajectory part of the move episode that is similar to is determined. Finally, we calculate the similarity between move episodes and the corresponding similar subtrajectory of based on the LCS method. The long common subsequence obtains the similarity and retains move episodes that meet the similarity threshold. The same operation is performed on each move .

We use the same way to draw the similar region of each move in of . In the first step, we draw a circular area with a given radius and a circle point , which is the center of each move , as the similar region of each move (lines 1 and 2). Each move episode that intersects with or is inside the circle is extracted for measurement range determination, which is the candidate move episode set (lines 3-5). For each move episode in a similar region, we draw two circular areas and with the given radii and and two circle points, which are the beginning and end point of each move episode (lines 6-9). Given that the trajectories are partially similar, we then confirm the measurement ranges of trajectory , where each move episode measures the similarity. The part of trajectory , which is tangent to the two circles and , is the measurement range corresponding to each . Figure 5 shows similar region determination and measurement range of each stop of .

In the third step, we calculate the similarity based on the Longest Common Subsequence (LCS) method (lines 9 and 10). If the simSeq is greater than or equal to the given similarity threshold , the move episode is similar to the part trajectory . We remain the move episodes as , which meet the similarity threshold (lines 11-13).

Input:, , ,,
Output:
1 for each do
2  ;
3 for each do
4   ifthen
5   ;
6 for each do
7   ;
8  ;
9   ifthen
10  ;
11  ifthen
12   ;
13 return
4.3. Semantic Information Matching

In this phase, we aim to match semantic information of episodes remained in the spatiotemporal similarity measurement phase to the trajectory . The remained are the most similar ones corresponding to the part trajectory , and all the similarities of are greater than or equal to 95%, which are identical to in spatial information. Given a trajectory and a set of similar episodes , the episodes corresponding to have the following three matching ways shown in Figure 6. Obviously, there is a problem that needs to be solved: how to determine if the selected episodes are the best combination in the similar episode set for matching to , which has the most semantic information.

To solve the problem, we propose a Semantic Information Matching Algorithm based on Similar Episodes (SESIM). This algorithm consists of two steps: (1) similar episode sorting and (2) semantic information matching. According to measurement range determination in the second phase, we first sort similar episodes meeting similar conditions by the spatial coordinate sequence of the trajectory . Then, we model the problem as a knapsack problem to match semantic information.

Similar Episode Sorting. Given a trajectory and a set of similar episodes , we first measure the similar range of the trajectory corresponding to similar episodes with the same solution in the step of measurement range determination. In this step, we convert the set of similar episodes to the candidate set , where and are the beginning and end trajectory points of subtrajectory , respectively, corresponding to , is the number of sampling points in , and is the number of geospatial information in (lines 1-5). Then, we sort the set by the position of in trajectory (lines 6-10).

Semantic Information Matching. In this step, we aim to select the best combination of episodes in set for matching the semantic trajectory with most semantic information. We extend a knapsack algorithm, considering the number of sampling points of the trajectory as the capacity of the backpack and the number of geospatial information in as the value of the episode. In start matching from the end sampling point of the trajectory , we aim to maximize the total value of the entire backpack. Given the candidate set , we define the value of the trajectory using the following formula: (lines 11-18).

Input:,
Output:
1 for each do
2  ;
3  ;
4  ;
5   ;
6  for each do
7   ;
8   ;
9  fordo
10   ;
11   if equal do
12   
13  for each
14   
15 return
4.4. Space Index Establishment

To quickly get the preannotated semantic episodes similar to trajectory , we use the space attribute of trajectory data to establish a space index for saving and querying episodes quickly, which will improve the efficiency of the SEPSIM approach.

The establishment of the space index is related to the query target. The index in this section is used to query episodes similar to the trajectory . Therefore, the elements stored in the space index should be trajectory edge data. The common space index includes -tree index [27], quad-tree index [28], and grid index [29]. The elements stored in the spatial index are episodes, which are essentially trajectory edge data. The quad-tree index is only adapted to query a trajectory point. The large number of unevenly distributed geospatial objects causes the grid index to be inefficient. Meanwhile, the -tree index can be efficient in the unevenly distributed dataset in this paper by ensuring the balance of the tree. Therefore, we create and maintain an -tree index for preannotated episodes. With this index, we can compare an incoming subtrajectory with preannotated episodes in the index, which are inside or intersect with the subtrajectory .

5. Experiments

In this section, we conduct extensive experiments on real trajectory datasets to compare the effectiveness and efficiency between the proposed approach SEPSIM in this paper and the typical approach based on the spatial join algorithm and map matching algorithm as the baseline approach.

5.1. Experimental Settings

We evaluate our approach on the GeoLife dataset. This trajectory dataset was collected in (Microsoft Research Asia) GeoLife project by 182 users in a period of over five years (from April 2007 to August 2012), which contains 17,621 trajectories with a total distance of 1,292,951 kilometers and a total duration of 50,176 hours. These trajectories were recorded by different GPS loggers and GPS phones and have a variety of sampling rates. The majority of the data was created in Beijing, China, and the data size is 1.87 GB. In this paper, all the preannotated semantic trajectories are generated by the typical approach. Both algorithms are implemented in Java and on computers with Intel(R) Xeon(R) CPU E5-2620 (2.10 GHz) and 32 GB memory.

5.2. Effectiveness

There is no clear and unified definition for the effectiveness of the semantic enrichment process. In this paper, we propose a new standard to measure the effectiveness of the algorithm proposed in this paper. For a trajectory , we view the semantic trajectory generated by the typical approach as the standard one and compare the semantic trajectory generated by the SEPSIM approach with its difference. Firstly, we segment and by the move state. Then, we compared the accuracy of each pair of subtrajectories and between and . The effectiveness of generated by the SEPSIM process approach is defined as the average accuracy of matched semantic information.

where semantic trajectory is generated by the SEPSIM approach of a given trajectory, . Accuracy means the correct matched semantic information accuracy of the subtrajectory compared to corresponding subtrajectory in , which is defined as the ratio of correct matched semantic information quantity in ( of ) to the standard semantic information quantity in (); and represent the number of sampling points contained in and semantic trajectory . Obviously, the higher the average accuracy of a matched subtrajectory, the more effective our proposed algorithm will be.

Figure 7(a) shows the change in effectiveness with the increasing preannotated trajectories. Obviously, after processing more and more preannotated trajectories, the effectiveness of trajectories that need to be enriched is gradually increasing. When the number of preannotated trajectories reaches 4000, the effectiveness exceeds 90% and keeps increasing steadily. Figure 7(b) shows the change in effectiveness with the increasing test trajectories. It can be seen that the effectiveness of test trajectories keeps above 90%.

On the other hand, to evaluate the effectiveness of the SEPSIM algorithm, we compare the semantic trajectories generated by the baseline approach and by the SEPSIM approach in the form of visualization. Figure 8(a) shows the geographical object information represented by red boxes and corresponding topological relationships of a given trajectory in OSM map. Figure 8(b) shows the geographical object information and corresponding topological relationships enriched in the given trajectory by the baseline approach, which annotate all relevant and reasonable geographical object information. Figure 8(c) shows that trajectory matched with different episodes represented by different colors annotates the same geographical object information. It can be seen that the algorithm proposed in this paper can annotate reasonable semantic information for spatiotemporal trajectories in geospatial environment.

5.3. Efficiency

In this section, we study the efficiency of our proposed algorithms. We compare it with the baseline approach and the LCS approach, which can annotate the semantic information on the similar trajectories. For each trajectory in the GeoLife dataset, we generate the semantic trajectory by the SEPSIM approach, the baseline approach, and the LCS approach, respectively, to retrieve the running time. The results of comparison are shown in Figure 9(a). We can see that the baseline approach and the LCS approach take more time annotating the same number of test trajectories than the SEPSIM approach. With the increasing test trajectories, the time spent by the typical approach and the LCS approach and the time spent by the SEPSIM approach gradually become more time-consuming.

Figure 9(b) shows the efficiency of the SEPSIM approach with different spatial indexes. Obviously, the time spent by the SEPSIM with the -tree index is much less than that of the other two spatial indexes in the SEPSIM approach, which means the -tree index is appropriate to the dataset in this paper. Meanwhile, the SEPSIM approaches with the three indexes are faster than the typical approach and the LCS approach, which represents the high efficiency of our proposed SEPSIM approach.

6. Conclusion

In this paper, we study the problem of the semantic enrichment process for spatiotemporal trajectories in geospatial environments. We first directly use semantic information in preannotated semantic trajectories for annotating spatiotemporal trajectories by the SEPSIM approach. It includes three phases: preannotated semantic trajectory storage, spatiotemporal similarity measurement, and semantic information matching. We propose an algorithm named Semantic Information Matching Algorithm based on Similar Episodes (SIM) for matching semantic information. In order to improve the performance of efficient enrichment processing, we establish an -tree index to query preannotated semantic trajectories. Finally, we conduct extensive experiments over a real dataset. The experimental results verify the superiority of our proposed approach in terms of effectiveness and efficiency.

Data Availability

The trajectory dataset used to support the findings of this study can be made available at https://www.microsoft.com/en-us/download/details.aspx?id=52367.

Disclosure

This paper expands on the short paper “Efficient Semantic Enrichment Process for Spatiotemporal Trajectories,” which was published in 4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Funding

This study was supported by NSFC41971343 and NSFC61702271.

Acknowledgments

This study was supported by the NSF of Jiangsu Province (BK20200725) and the Postgraduate Research Innovation Program of Jiangsu Province (KYCX201258).