Abstract

Network theory has provided a new analytical tool for the study of human trajectory and has also achieved rapid development in the complex network field. Conventional network model or complex network model ignores some details and cannot display the most remarkable features for a GPS based personal trajectory. It is necessary to set up a new personal trajectory model. For the purpose of researching the characteristics of trajectory for one person in a long time, we collected a GPS based personal LifeLog dataset named Liu Lifelog in the past 9 years. This paper analyzed the Liu Lifelog and proposed a ring structure personal trajectory (RSPT) model based on the basic complex network model. We discussed the definition, source, characteristic and attribute of the RSPT model and tested the model with the dataset which was provided by the Geolife project and verified that the model described the characteristic of trajectory for a person well. The result shows that this model is feasible and it can predict the human behavior characteristics more accurately and effectively.

1. Introduction

In recent years, more and more researchers are aware of the problems of the complexity features and behavior features of the complex network. For the purpose of solving some problems by network theory in a field, researchers first need to know how to create a network with different type of data. In mathematics, people use the graph theory derived from the famous problem of the seven bridges of Konigsberg to represent network. Experienced development processes of graph theory caused by the seven bridges problems, ER random graph model, small-world network, scale-free network, and complex network are widely used and have a tremendous impact on human life.

The complex network theory’s development has experienced three stages: regular network, random network [1, 2], and complex network theory [35] stage. At regulation network stage, researchers believe that, in theory, systems in the real world can be represented by regular network structures, such as a star network, global coupling network, and the nearest neighbor coupling network. Then at random network stage, Erdos and Renyi established the random network theory [1, 2, 6] and they provided a new method for creating network. But random network theory has its well-known flaws in resolving the actual network problem, such as random network model which cannot represent the high clustering property in social networks.

Since the end of the 20th century, scientists have discovered that in the real world plenty of networks are neither regular networks nor random networks. From overall characteristics, they are between rule networks and random networks, with small-world effects and scale-free characteristics. The theory of complex network has been proposed and has become a hotspot in scientific research. In 1998, Watts and his tutor proposed a small-world model named WS small-world model [3]. The WS small-world model reveals small-world phenomenon with the average shortest paths and high clustering coefficient in networks. In 1999, Barabasi and his student Albert proposed a model named scale-free networks model; now we call the model BA model [4]. The scale-free network has its great power on representing the real world for its characteristics of growing and preference attachment. With the in-depth research, the statistic characteristics of many real-life complex networks are already discussed, such as economy networks, world-wide airport networks, and scientific collaboration networks. More and more properties of complex networks were discovered. One of the most important research subjects was by Girvan and Newman who studied the community structure in social and biological networks in the year of 2002; also they proposed a method to discover the community in complex network [5]. After that, a lot of research work about community has been carried out.

Based on the above theory, researchers endlessly find new complex network model and propose new properties. Park and Liu studied the community structure and modular structure in the network in the literature [7, 8]. The literature [9] introduced how to measure the importance of community in different networks. In the year of 2012, Yang studied the quantification method of one single node that controls directional weighted network [10]. Another valuable research result is by Gao who studied the universal resilience patterns in complex networks in the year of 2016 [11], and Dimitrios studied the network stiffness in the year of 2018 [12]. In the same year, Ferraz introduced the fundamentals of spreading processes in single and multilayer complex networks. In recent years, the research work in the complex networks field also includes the literatures [1315]. In 2019, some researchers focused on the control of the complex networks [1618]. In 2020, some new results were obtained. Zhang discussed two strategies of core decomposition method of complex networks [19]. Dmytro introduced the relaxation time property in complex network [20]. Keller proposed a new method for embedding network or distance-based data into hyperbolic space [21]. Valdez studied the cascading failures [22] in complex networks and Stegehuis studied the closure coefficients in scale-free complex networks [23]. Literature [24] proposed a method of trajectory tracking on uncertain complex networks.

In the field of trajectory data mining, Yu [2527] and Jean [28] proposed a series of achievements that reviewed the datasets, methods, and applications in the years of 2015 and 2016, especially Yu who provided a trajectory dataset that was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of more than five years (from April 2007 to August 2012), but most of the personal data in the project is incomplete and most of users only collected data for no more than 3 years.

With a long-term plan named Liu Lifelog, we collected a GPS based dataset which is discontinuous and contains Lifelog with a richer content, including about 7000 records for one person in the past 9 years. The dataset records the most important place or event for the author; it was published on the website (http://www.liuguoqi.com). These data include GPS information, time information, GPS translated address information, what has happened at that time, and the author conducted behavior types for each record. Through the research of these carefully collected data, it will provide good experimental data for the analysis of personal trajectory and behavior. When we analyzed the personal trajectory data by network, we found some new characteristics; we will discuss the characteristics and propose a ring structure personal trajectory model which can describe the personal trajectory well. In this paper, we will focus on analyzing personal trajectory and its characteristic by the new model named RSPT.

The overall structure of the study takes the form of four sections, including Introduction, Section 2 which begins by introducing the ring structures in trajectory data, and Section 3 which proposes a ring structure personal trajectory model. Finally, Conclusions gives a brief summary and critique of the findings.

2. The Ring Structure in Personal Trajectory

2.1. Network Model

Researchers in movement analysis have made important contributions by developing methods and tools to solve specific application problems. There are different types of moving objects that can be tracked, such as people, vehicles, and animals. Tracking these objects generates a set of trajectories and some questions of which the solutions can be used in some application fields [2426, 2837], 7 of them focus on the trajectory mining [26, 28, 3034], and 5 of them focus on the structure of the trajectory networks.

Usually, researchers use graph theory or network theory to analyze the GPS based data, but most of them ignore the time and the ring structures in the network [38]. In fact, time plays a very important role in personal trajectory. Firstly we propose a GPS based trajectory model with points, domains, and time information, which is shown in Figure 1.

As shown in Figure 1(a), some GPS points are in a locality; here we named the locality as a domain, such as domain 1 () which represents a locality named Jianzhu University (e.g., as depicted in Figure 1(b)), and are the GPS points in the same locality. With such GPS points and domains, we can turn a trajectory from a series of time-stamped spatial points into a sequence of meaningful places ; a trajectory can be represented as

. With a series of , we can get a network structure as shown in Figure 1(c).

For a series of personal trajectory data with GPS information, we put all GPS data into different domains. Each domain is a node of the graph. If a location shift happened between two domains, we will draw an edge from the first domain to the other domain. With this method, we do not need to find out the stay points [39]; we just need to propose a method which can put the GPS data into different domain.

2.2. Put GPS Data into Domains

Li and Zheng et al. [39] first proposed the stay point detection algorithm. This algorithm considers the distance between an anchor point and its successors in a trajectory. It then measures the time span between the anchor point and the last successor that is within the distance threshold. Yuan and Zheng et al. [40, 41] improved this stay point detection algorithm based on the idea of density clustering. When we analyzed real GPS data, we found it is difficult to identify a stay point for the reason that even if you stay in one place, maybe you will get two different GPS data points because those commonly used GPS collected devices will get different GPS data in practical use. So the first problem is how to identify two different GPS data points that represent one place. Here we need a clustering algorithm such as DBSCAN or K-means to put the GPS data into different domains.

In the real world, if a person stays in one place for a long time, it means the place is important or the place will be a stay point [33]. We consider that stay point and domain have different meanings. We do not care about the stay points in this paper. We just consider how to identify the domains. Domain is a larger region than stay point and a domain may include several stay points. For example, a teacher gives lectures in the school classroom in the morning, has his lunch at noon in the school canteen, and does some research in the school library in the afternoon; at last the teacher goes home for dinner. We will draw a network with two nodes (the school and home) and an edge (from the domain of school to the domain of home). In this example, how long does a person stay in one place has made no impression on the network structure, and the time that the person moves from one domain to another domain is important. Also, for personal trajectory, different domains have different meanings; for example, my home is important for me, but for other people it is not important. So, domain is different with stay point.

2.3. The Characteristics of Personal Trajectory

However, the trajectories of different types of moving objects have different characteristics. We analyzed the GPS based data collected in the past 9 years and the GPS based trajectory data provided by Yu in literature [26].

We introduce the data we collected firstly. We developed a mobile app and a web app and collected personal GPS based data through mobile applications. During this period, about 10 people participated in the data collection process and over 11660 data were collected, but only one of the authors has insisted on uploading data in the past 9 years. At present, the amount of personal data of the author has reached about 6807. The dataset records the things that are important to the author in the past 9 years. We first use clustering algorithm and put the GPS data into domains; the results show that DBSCAN algorithm is better than other algorithms. We do not describe the detail of the process of the algorithm here; we published the source code of the DBSCAN on the website. We got 918 domains by using the DBSCAN algorithms, and we also got 3166 location shifts among these domains by the model which is shown in Figure 1. When we use the data to construct a network structure, we got the result that is shown in Figure 2.

In Figure 2, each node represents a domain and has a number, the number from 1 to 918. There are few nodes that have very big degree, such as the first domain (the domain number is 1) whose in-degree and out-degree are 260 and 271. The average degree of the network is 2.16; it means that the degree of most nodes is 2. The node degree of the network follows the power-law distribution and has scale-free properties. The clustering coefficient of the network is 0.254, and the average path length is 5.53. As shown in Figure 2, the model has significant community structures. When we use complex network theory to analyze the personal trajectories, some details are neglected; for example, there are 3166 times location shifts in the dataset but only 1983 edges in the network. Also, the clustering coefficient of the network does not completely agree with complex network features. Personal trajectories also show some different characteristics with other network models; for example, in SNS network, someone’s friends are friends who should be the high probability event to each other. But in personal trajectories, the places where you have been seem to be unrelated to each other. At last, the average path length seems to follow the complex network features; in fact, the average path length is low because there are some important nodes which have too many edges, so these nodes reduced the value of the average path length. To overcome the average path length, similar findings were made in personal trajectory network with a SNS network, albeit for different reasons.

When we analyzed the trajectories dataset we collected, we found some small trajectories set such as {1, 2, 3, 1}, {1, 7, 1}, {1, 14, 1}, and {1, 7, 1}, whose number is the number of the domains. {1, 2, 3, 1} represents that the person who submits the GPS data moved from domain 1 to domain 2 and then moved from domain 2 to domain 3, and at last back to domain 1; we call this is a ring trajectory. We created a Java project to find out all the ring trajectories in the dataset and finally we got 2248 ring trajectories. The number of ring trajectories which start from domain 1 among the 2248 ring trajectories is 658. That means in the past 9 years trajectories, the author often set off from domain 1, passed by some others domains, and finally went back to domain 1. Also we found some very long ring trajectories, such as the ring trajectory starting from domain 4 which has 2877 domains. Based on the result of 2248 ring trajectories, we proposed a personal trajectories structure model with ring structures which is shown in Figure 3.

In Figure 3, domain 1 and domain 27 are central nodes of the network; we call them central node of a community structures. And also we found that there are lots of ring structures in the community structures. As shown in community b, there are two ring trajectories:

and .

They have the same edge , that is, the answer that there are 3166 times location shifts in the dataset but only 1983 edges in the network model of Figure 2. So, we found that if researchers use the complex network theory to create a personal trajectory model, the model will miss some important information. Our conclusion is that firstly in a long time trajectory dataset there will be some ring structures, and secondly the common complex model does not always give the best reflection on real personal trajectory. We also analyzed the Geolife project dataset, and we found that if the trajectory data was collected in a short time, we can find few ring structures; but if the data is collected in a long time, such as over 2 years, the ring structure characteristics will be obvious.

We found that lots of personal trajectories having ring structures is congenial with reason and common sense. In very long a period of time, most of us usually lived in permanent place, went out for work in daytime, and came back home at night. Sometimes we will go to other cities for working or travel. To test this, we analyzed the Geolife project dataset. Firstly, we selected 14 users who collected data over two years from all the 182 users. Secondly, by clustering the dataset with DBSCAN algorithm, we got the nodes and edges of the network. Thirdly, we got the ring structures in the network. We found that there are ring structures in all 14 personal trajectories. We show 5 representative results in Figure 4.

In Figure 4, results obvious central structures and ring structures was observed in all network structure.

To sum up the characters of personal trajectory network, the network should be directed network; the network usually follows the power-law distribution and has scale-free properties; the network is not completely agreed with complex network features with low clustering coefficient; the network usually has several central nodes and the network usually has ring structures. We will propose a new network model which can describe the personal trajectory in the next section.

3. The Model of Ring Structure Personal Trajectory

A ring structure personal trajectory (RSPT) model can be defined as a directed graph , where represents the set of nodes, represents the set of central nodes, and represents the set of other nodes. is a set of edges which is represented as , is the start of the edge, and is the end of the edge. represents the ring structure, represents the number of nodes in a ring structure, represents the central node of the ring structure, and represents the set of nodes in the ring structure. A typically ring structure model is shown in Figure 5.

In Figure 5, , , , .

For a personal trajectory network model, it is important to analyze the way a person acts and illustrate human behavior characteristics of a person. With the model, we can study human behavior, such as predicting the behavior of a person. For this purpose, we will give some fundamental definitions and briefly introduce the ring structure model.

Ring coefficient (RC) is defined aswhere is the number of ring structures in the network and is the number of edges.

RC describes the percentage of nodes in ring structure in the network; the larger the value, more ring structures in the network. A high value means a person’s behavior with a certain degree of regularity. Usually, it will be easier to predict a person’s behavior by the trajectory network if the network has a high RC.

Central node (CN) is defined as a set of nodes where the value is more than a constant . Here we define aswhere is the number of ring structures that start from the node .

The value of also can represent the important of a central node. With the same constant , the number of central nodes also can represent a feature of a network. We analyzed the dataset and found that, for one person, the central node is different at different time period.

Length of a ring structure (LRS) is the number of nodes in a ring structure.

Average length of a ring structure (ALRS), if a network has n ring structures, is defined as is the length of ring structure in the network.

ALRS can echo a person’s behavior style, where a small value means that the person’s behavior is simple and has a well-regulated lifestyle; otherwise it means the person’s behavior is complex, and the person usually takes long journey.

Following the definitions above, we will discuss the characteristics of networks in Figures 2 and 4. In Figure 2, the network is constructed by the data we collected in the past 9 years; here we named the dataset as Liu Lifelog. In Figure 4, there are 5 network models that include 5 persons’ GPS based dataset collected by the project Geolife, and we use the project name and the user number to represent them. The results are shown in Table 1.

In Table 1, the dataset of Liu Lifelog has a high RC value which means the person has a regular life style; we defined k = 0.02, which means if the number of ring structures starting from the node is over 50, the node will be a central node. The ALRS is 80 if the person has a regular life style but sometimes took long journal. The results meet the actual situation of the author.

For the dataset of Geolife no. 010, the value of RC shows the person has not a regular life style, and the person always takes a long journal, too. With the low number of ring structures and k = 0.02, a node in the network which even only has 1 ring structure will be looked as a central node, which also shows that the trajectory has few ring structures. When we analyzed the GPS dataset of Geolife no. 010 in database, we found that the person always moves from one city to another city, and a lot of data was collected on road. From the real GPS data, we surmise that work in the Sales Administration Department.

The importance of different CNs in one personal trajectory is not the same, just like in complex network; the importance of different nodes is not the same. In literature [42], Hui Liu and Guanrong Chen discussed the importance of nodes in a double-star network structure by the grounded Laplacian matrix. For a CN node in the dataset of Liu Lifelog with k = 0.02, this means the node with over 50 ring structures will be a CN. We list all the ring structures for each CN, and the result is shown as follows:CN : 1, Ring structure : 658CN : 7, Ring structure : 119CN : 9, Ring structure : 66CN : 27, Ring structure : 175CN : 28, Ring structure : 92CN : 84, Ring structure : 148CN : 191, Ring structure : 62CN : 199, Ring structure : 103

CN : 1, Ring structure : 658 means the central node 1 which has 658 ring structures; it means the domain is very important for the author. There are two issues that need to be addressed. First, a ring structure may include several ring structures. Second, some ring structures contain duplicate data, such as for the trajectory {1, 27, 1} which has 39 duplicates. To solve the problem, we can add a weight to each ring structure. The ring structure personal trajectory model was proper which well showed the characteristic of person behavior and also can help us analyze person’s behavior or predict person’s behavior with the GPS based data, which covers the shortages in complex network theory [4347].

4. Conclusions

Analysis of person’s behavior characteristic has always been a long-lasting topic. With the new device and new GPS dataset, researchers had achieved a lot in a short space of time. After analyzing the real data we collected over the past 9 years, we found the ring structures based time is very important. In a period of time, for most of us, the personal behavior always follows certain rules, so the behavior will be predicted. There are some problems we did not solve in the paper, such as the importance of CNs and how to predict the person’s behavior by the ring structure model. We hope more researches will provide more trajectory data for the public, and we hope more researchers can join us to collect GPS based data. Finally, we published all the GPS data, results of the experiment, and the source code for this paper at our website (http://www.liuguoqi.com).

Data Availability

The datasets used in this paper are available at https://www.liuguoqi.com.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the Liaoning Province Key Research and Development Program (2019JH2/10300054).