Crowdsensing and Vehicle-Based SensingView this Special Issue
Research Article | Open Access
Fernando Terroso-Sáenz, Mercedes Valdes-Vela, Aurora González-Vidal, Antonio F. Skarmeta, "Human Mobility Modelling Based on Dense Transit Areas Detection with Opportunistic Sensing", Mobile Information Systems, vol. 2016, Article ID 9178539, 15 pages, 2016. https://doi.org/10.1155/2016/9178539
Human Mobility Modelling Based on Dense Transit Areas Detection with Opportunistic Sensing
With the advent of smartphones, opportunistic mobile crowdsensing has become an instrumental approach to perceive large-scale urban dynamics. In this context, the present work presents a novel approach based on such a sensing paradigm to automatically identify and monitor the areas of a city comprising most of the human transit. Unlike previous approaches, the system performs such detection in real time at the same time the opportunistic sensing is carried out. Furthermore, a novel multilayered grill partitioning to represent such areas is stated. Finally, the proposal is evaluated by means of a real-world dataset.
For the last years, smartphones have been the center of most of the technological advances due to their growing popularity. As a result of these improvements, they are now equipped with several sensors like GPS, accelerometer, microphone, and so forth.
This palette of sensors allows capturing a large amount of contextual information related to the phone’s holders and their surrounding environment . This has eased the development of the mobile crowdsensing (MCS) or human/people sensing paradigm so as to perceive large-scale phenomena that can not be detected at an individual level .
One of the most useful phenomena to be perceived is human dynamics. Due to the steady improvement of the positioning sensors installed in mobile devices or vehicles and the fact that location is the most critical element in reflecting users’ movement, the mobility mining discipline is one of the domains where MCS has been most widely applied so as to uncover different human mobility aspects . In turn, this eases the deployment of innovative location-based services like predictive queries for moving object databases  or pervasive navigation systems .
Although several studies already exist, mobility mining solutions based on MCS still face the following challenges.(i)The intensive usage of the positioning and communication capabilities of mobile devices required by this type of solution is rather battery draining. This is an important barrier for users to contribute to MCS-based solutions for mobility detection. Nonetheless, when it comes to composing detailed mobility models, most works assume that a large set of users are always available to report their location traces.(ii) Existing mechanisms usually follow an offline mining process of a previously gathered mobility dataset. As a result, the extracted knowledge is fixed and can not smoothly adapt to changes of human dynamics.(iii) Last but not least, location data is quite sensitive in terms of privacy for many people. Therefore, mobility models should be designed so that they can not be used to uncover meaningful places or routes of particular users.
In this context, the present work proposes an innovative framework for human mobility modelling with opportunistic MCS that considers the aforementioned open challenges. The key goal of the proposal is to detect regions with a high density of human transit that capture most of the mobility of a city.
In large city deployments, the number of these dense transit areas is usually very high. Hence, in order to ease their storage and management, a novel region abstraction for mobility mining is introduced. As Figure 1 shows, the idea is to represent such regions of interest with basic geometrical forms and define the incoming and outgoing flow of people inside each region with respect to their (left, right, up, and bottom) sides.
To do so, an online aggregation of the spatiotemporal traces from a set of contributing users is proposed. Unlike previous offline proposals, the introduced mechanism allows discovering the target regions at the same time the trajectories are being received. Once a stable set of transit regions have been discovered, they are continuously monitored by a subset of suitable contributors who are dynamically selected.
The goal of such monitoring is to detect sudden or long-standing meaningful changes of human movement within the detected regions like people driving slower than usual or walking in unusual directions. These mobility shifts can be signs of events of interest, like unplanned demonstrations or serious traffic problems, whose early perception is of great help for many public and private stakeholders.
All in all, bearing in mind the open challenges of mobility mining based on MCS listed before, the salient contributions of the present work are the following.(i)A novel mechanism uses only a subset of contributors to monitor the state of the composed transit areas: since the rest of users are deactivated, it reduces the extra cost of taking part of this type of solution.(ii) A new solution explicitly detects changes in the movement of people within the target areas.(iii) As far as privacy is concerned, the model of the detected areas only exposes general mobility information without disclosing any personal details of the contributors. This allows sharing the detected areas with third-party services without suffering serious privacy leaks.
Finally, the remainder of the paper is structured as follows. To start with, Section 2 is devoted to describing in detail the concept of dense transit area and the logic structure of the proposed system. Section 3 puts forward the procedure to discover such transit areas. Next, how these areas are monitored is stated in Section 4. Then, Section 5 discusses the main results of the experiments. An overview about mobility mining is put forward in Section 6. Finally, the main conclusions and the future work are summed up in Section 7.
2. Dense Transit Areas Detection System
This section is devoted to explaining the goal of the proposal along with the architecture. For the sake of clarity, Abbreviations summarizes the key acronyms and symbols used in the following sections.
2.1. Dense Transit Area Definition
The main goal of the system is to detect the spatial areas within a city where a high density of human transit exists. In our setting, such human transit is defined as the routes that people follow to move from one place to another (e.g., home, work, and school).
Definition 1. A city’s region of influence is the spatial region comprising all the frequent origins and destinations of its citizens.
Definition 2. A route of a person , , is the continuous movement in a city’s region of influence from an origin to a destination .
Definition 3. The lifetime of a route , , is the time interval between the instant at which departed from and the arrival time at .
Definition 4. A subroute of a route , , is the part of the continuous movement of during a time interval .
Bearing in mind the aforementioned concepts of human movement, we can then come up with a transit area definition.
Definition 5. A dense transit area (DTA) is a spatial region that has been visited by a set of subroutes , , from a set of people , .
Consequently, a DTA represents a spatial region of a city that is visited by a large number of citizens’ routes (e.g., ring roads, central avenues, or parks). For example, Figure 1 shows two DTAs, each one comprising 3 different subroutes. Note that these routes may have any purpose like commuting, going to the school, or shopping. As a result, the set of DTAs of a city comprise the areas that capture most of the movement of its population. The following sections describe how such DTAs can be perceived.
2.2. System Architecture
As it has been previously stated, the system follows an opportunistic MCS approach so as to detect the DTAs. Therefore, it relies on a set of participants or contributors that voluntarily accept to undertake sensing tasks.
From an architectural point of view, Figure 2 depicts that the system comprises two different elements, a thin client running in the personal or vehicle-mounted devices of the users and a back-end server.
The mobile client is in charge of detecting, at each moment, the routes of the device’s holder and sending them to the central server. Due to the adopted opportunistic sensing, this task is carried out in unconscious mode; namely, the client runs in the background and opportunistically collects and delivers the routes without active involvement of user.
Next, the server, on the basis of the collected routes, composes and manages the DTAs. It is important to note that such DTA generation is undertaken in an incremental manner at the same time users cover their routes, so the system does not rely on any type of previously gathered data.
2.3. System Operation Modes
The present system supports two different modes of execution, DTA discovery and DTA monitoring. Depending on the active mode, the system focuses on a particular task and it changes from one mode to the other when certain conditions arise.(i)Firstly, the DTA discovery phase is the initial mode of the system. During this state, the system focuses on generating a set of DTAs, , as complete and detailed as possible. To do so, the system collects all the routes from all the participants and timely composes on the basis of these routes.(ii)Once the system has been able to generate a suitable set then it transitions to the DTA monitoring mode. In this second mode, the system focuses on controlling the evolution of certain mobility features of the DTAs in to early detect potential mobility shifts. To do so, the system processes the routes of a subset of participants that allow reliably perceiving the aforementioned features. Finally, in case the system actually detects a potential shift, it moves to the initial DTA discovery mode in order to fully capture the mobility change in .
By means of these two modes, the system is able not only to detect the DTAs but also to perceive the movement inside them. Besides, for the monitoring task the system only uses part of the contributors so it does not require all the participants to report their locations all the time. For the sake of clarity, Figure 3 shows the state machine of the system. We will see the inner functionality of the system with respect to both modes in the upcoming sections.
3. DTA Discovery
During the DTA discovery mode, the system’s components of both the thin client and the back-end server are intensively executed so as to generate an initial or refined set of DTAs . In more detail, the steps followed by the system are put forward next.
3.1. Users’ Routes Generation
In order to generate the routes covered by the user, both the route composer and the route deliverer components work in a collaborative manner.
3.1.1. Route Composer
As Figure 2 shows, this element is part of the mobile client running in each user’s mobile device. Its key goal is to detect the current route that the device’s holder is covering at each moment.
For this goal, this module only relies on the device’s GPS sensor to extract the routes’ raw locations. In particular, the sensor periodically provides the module with a new timestamped location comprising the tuple where is the location in terms of latitude-longitude coordinates at instant .
On the basis of the collected locations, the system incrementally composes the sequence of timestamped locations, , of the ongoing route . This is done by a two-step procedure.
Firstly, the algorithm removes erroneous or irrelevant locations that the GPS sensor may return . This is done by means of the distance-based filtering applied to each new location described in  that allows performing this cleaning in real time.
Secondly, if is not discarded by the aforementioned filter then it is appended to by following a spatiotemporal gap identification approach. To do so, a maximum distance, , and time interval, , between two consecutive locations are defined. If the spatial or temporal distance between and the last location in , , exceeds or , it identifies as ending point of the ongoing route and as the starting point of a new . Otherwise, is appended to as the new last location.
Finally, the current sequence of the ongoing route, (and the one of the just-completed route, , if any), is sent to the route deliverer component.
3.1.2. Route Deliverer
This element is in charge of controlling the routes sequences that are delivered to the central server from the mobile client.
When the system runs in the DTA discovery mode, this component only processes each completed route sequence . In particular, it carries out two different tasks: it delivers the route to the central server and it stores such route in the local personal routes repository within the mobile client (see Figure 2). Such repository keeps the last routes covered by the user. As we will see later, this repository is instrumental when the system runs in the DTA monitoring mode.
3.2. DTA Generation
Once the server receives the users’ routes, it makes up the DTAs by means of two of its modules, the DTA composer and the DTA aggregator.
3.2.1. DTA Composer
This module of the back-end server is responsible for actually detecting the new DTAs of the city. Hence, it receives all the completed routes from all the users when the system runs in DTA discovery.
Bearing in mind Definition 5, we can regard a DTA as a spatial region exhibiting a high density of routes from many different users. Consequently, this module adopts an approach based on computing certain features of the incoming routes with respect to a predefined spatial partition so as to calculate the density of routes in each part of the city and, thus, uncover its DTAs.
To do so, DTA composer firstly divides the whole spatial region of the city under study into squared cells of the same size, . In turn, each cell is further divided into four subcells each one covering a different spatial region inside ; namely, . As Figure 4 shows, these subcells split the cell regarding the different manners a route can traverse it.
On the basis of this multilevel spatial partition, the module calculates the route density in each cell and subcell by the procedure shown in Algorithm 1. This algorithm is launched whenever a new route from any user is received.
First of all, the algorithm gets the timestamp at which the incoming route started (line 2 of Algorithm 1). Next, it maps the sequence of timestamped locations of the incoming route to the multilayered spatial partition described above. As a result, the route is translated into a sequence of cells, (line 3). As Table 1 shows, each cell comprises five movement attributes related to .
These five attributes can be computed using simple mathematics and computational geometry . In that sense, Figure 5 shows an example of how a route comprising six timestamped locations is mapped to a sequence of four different cells along with some attributes of each cell. In that sense, we can see that not all the cells include the attribute (see Table 1). This is because, in many cases, the subroute in a cell covers more than one of the subcells. For example, in Figure 5 the subroute of in cell is fully covered by the subcell whereas in cell the subroute is partially covered by the 4 subcells.
Once the mapping is completed, the algorithm uses the resulting sequence to update the space partitioning stats repository (see Figure 2).
This repository stores aggregated transit data of the space partition . In particular, such data is organized in two entities, and . The former stores information about routes having a low speed so that they are likely to have been covered walking whereas the latter stores information about nonwalking routes as they exhibit higher speed representing vehicle-based routes. Both and store for each the transit properties shown in Table 2.
The rationale of having two separate instances is that urban dynamics might be different depending on whether we consider movement on foot or in vehicles. For example, vehicle-based routes are constrained by the road network of the city whereas walking-based routes usually do not show so constrained displacements.
Consequently, for each cell in the system updates or depending on its speed attribute and a domain-dependant threshold (lines 5–8 of Algorithm 1). This update process is carried out by the update_cell function (line 9). This function takes a cell from , , and its associated element in the repository . As a result, it returns a Boolean value indicating whether the historical transit data in allows classifying it as a DTA. If that is the case, such entity gives rise to a new DTA (line 10) that is eventually added to the set of new DTAs (line 11). In this generation, the system removes the users attribute to keep DTAs anonymized. Otherwise, the system repeats the same process but this time with the subcell of , (see Table 1) (lines 12–16 in Algorithm 1).
This way, the algorithm follows a top-down approach so as to generate the DTAs, as it firstly tries to generate cell-based DTAs. If that is not possible, it focuses on detecting smaller DTAs with subcell granularity. Finally, the algorithm returns the set of new DTAs generated on the basis of the incoming route (line 18).
Concerning the inner functionality of the update_cell function (lines 18–27), it firstly updates with the attributes of its associated cell by simple or incremental addition (lines 19–23). In that sense, attributes speed, , and are disaggregated by a temporal criterion . This way, it is possible to know the speed and information about incoming and outgoing sides in a particular time interval with predefined granularity. As a matter of fact, if an hour granularity is chosen then is defined as an array .
Once has been updated, update_cell also detects whether it actually can give rise to a DTA (lines 25-26). For that goal, the function checks two features of , its density of routes and the number of users that have visited at least once the cell.
The first one can be calculated with respect to the total length of historical subroutes within and its geometric area. Since we are using square cells, we can easily compute such area as the cell’s side length squared (line 24). At the end, if such density and number of users are over two domain-dependant thresholds, and , the function concludes that actually represents a DTA, thus returning the Boolean value true. Otherwise, false is returned.
3.2.2. DTA Aggregator
The resulting set, , from the previous algorithm is directly delivered to this module (see Figure 2). In that sense, comprises DTAs at a cell or subcell granularity. However, real DTAs can cover spatial areas larger than the predefined size of a cell. For that reason, the DTA aggregator element applies a fusion procedure to the resulting DTAs to merge such areas.
The key idea of this procedure is that two closed DTAs can be merged together, creating a larger DTA, if the transit information they represent is strongly related. In our setting, we infer that two DTAs represent the same transit flow if people move at a similar speed and direction in both areas. More specifically, we distinguish between two types of similarities among DTAs, namely, the following:(i)Parallel-transit similarity, : this similarity arises when the subroutes of the DTAs have a very similar direction and speed.(ii)Common-transit similarity, : this similarity occurs when the DTAs have a certain number of common subroutes covering them.
For example, the two DTAs in Figure 6(a) comprise quite different transit flows in terms of both speed (one of them is mostly covered by vehicle-based routes whereas the other one has more walking routes) and direction (the routes in each DTA go in reverse with respect to the other). However, Figure 6(b) shows a parallel-transit similarity between the two DTAs whereas Figure 6(c) depicts a common-transit similarity. Therefore, the DTAs in these two last situations could be merged to make up a new and larger DTA.
(a) Nonrelated DTAs
(b) DTAs with high parallel-transit similarity
(c) DTAs with high common-transit similarity
In order to compute each similarity we made use of the number of incoming and outgoing subroutes that each DTA comprises (see Table 2). Thus, given two DTAs and their parallel similarity is calculated as follows:
Equations (1) and (2) calculate the dissimilarity between the two rates of incoming and outgoing subroutes for each of the four sides of a DTA. Finally, (3) aggregates both rate differences to generate the final parallel similarity.
Furthermore, the common-transit similarity is calculated as follows:where is the common side between and from perspective whereas is the adjacent side from point of view. For example, in Figure 6(a) (for ) whereas for . As we can see, basically measures the subroutes that actually move between the two DTAs under consideration.
All in all, Algorithm 2 shows the mechanism applied by the DTA composer so as to fuse DTAs. Basically, this algorithm takes a set of DTAs to be merged, . Then, each pair of adjacent DTAs is compared to measure its parallel (lines 4–7) and common (lines 9–12) similarity. In that sense, only DTAs with the same type attribute (see Table 2) are compared. This way, we ensure that both contain similar routes in terms of speed. It is also important to note that both similarities are calculated with respect to the time criterion . Thus, if any of these similarities is above its associated threshold , it means that both DTAs comprise a similar or common human dynamics for different time periods. As a result, the two DTAs can be merged.
Finally, Algorithm 2 is executed by the DTA aggregator by two different manners. In the first one, the algorithm is automatically launched when a new set of DTAs is delivered from the DTA composer. Then, the resulting set of fused DTA, , is appended to the global set of DTAs, , in the DTA global repository (see Figure 2). Additionally, since this first type of execution only fuses the DTAs generated due to a single route, the fusion mechanism is also periodically launched over the DTA set, , so as to detect correlated DTAs in the whole city under study.
3.3. DTA Discovery-DTA Monitoring Transition
At the same time the server composes the DTAs it controls their state so as to decide whether the system remains in the DTA discovery mode or it can move to the DTA monitoring phase.
In that sense, the system should transition from the discovery to the monitoring stage if a stable set of DTAs has been composed. In that sense, the DTA global manager module implements this decision process.
3.3.1. DTA Global Manager
This module defines a sampling time period and for each period calculates the number of new DTAs and the number of received routes from all the users . If the ratio is below a decision threshold then it means that the system has not composed many new DTAs with respect to the incoming routes. Consequently, the module infers that the system has reached a consistent set of DTAs and eventually transitions to the DTA monitoring as it is depicted in Figure 3.
During this transition, the DTA global manager distributes the set of DTAs uncovered by the server among all the contributors. In the client side, the DTA Local Manager receives such set and stores it in the DTA local repository (see Figure 2). As we will see in the next section, locally storing this data is of great help in the DTA monitoring mode.
On the whole, the system during the DTA discovery mode composes a set of DTAs by means of an approach that combines a multilayered partition of the space and its posterior fusion to come up with a reliable set of DTAs. Next, during DTA monitoring stage the system focuses on controlling the fact that the current set of DTAs actually represent the dynamics of the city under study.
4. DTA Monitoring
In the DTA monitoring mode, the system focuses on selecting a subset of users and uses their reported routes to compare the current state of the human transit in the city with respect to the one represented by the DTAs to see whether any discrepancy exists. This allows deactivating certain users and, thus, avoiding their continuous contribution to the system with the consequent resources saving.
As a result, the behaviour of the system changes with respect to the one described in the previous section so as to adapt to this new goal. In particular, the steps of the system in this mode are described next.
4.1. User Subset Selection
The first task the system does when it starts to operate in monitoring mode is to select the target set of monitoring users. This selective process is undertaken by the DTA global manager component.
4.1.1. DTA Global Manager
For the aforementioned goal, this module uses the sampling time period and selects, for each period, the subset of users providing the best coverage of the detected DTAs. This selection process follows the following steps.(1)For each DTA in , the module asks the mobile clients to report their availability to visit this DTA within the current sampling period.(2)Among all the users reporting an availability score over a domain-dependant threshold , it chooses the top users according to this value.(3)These top users are committed to report their ongoing routes to the server during the current sampling interval.
4.1.2. DTA Local Manager
This module in the mobile client is in charge of calculating the availability score used by the aforementioned selection process.
To do so, each time the mobile clients switch to the DTA monitoring mode, this module reads the historical routes of the user stored in the personal routes repository. Then, it counts the number of visits to each DTA stored in its DTA local repository, for each time period in . Then, it removes all these historical routes from the repository. This way, only the routes covered by the user during the previous DTA discovery cycle are used to generate the visit statistics. This avoids the usage of rather deprecated routes.
With this information, the module can easily calculate probability of visiting a DTA, dta, at particular time interval as where is the number of visits to dta during and is the number of routes that started during . On the basis of such probability the module calculates the availability to visit a DTA at a particular time interval , , as follows: where is a corrective factor to avoid the fact that a user contributes to the monitoring state during too many consecutive time intervals. To do so, the module counts the number of times the user has been selected by the server during the last instants, , and computes such factor as follows:
Finally, for each DTA is locally stored and sent to the central server in each selection process.
4.2. User Subset Routes Generation
As in the previous mode, this task is performed by the same two modules in the client side, route composer and route deliverer (see Figure 2).
4.2.1. Route Composer
This component in the client has the same functionality in both operational modes. Hence, it just composes the sequences of timestamped locations of the user’s routes and sends them to the route deliverer as it has been described in Section 3.1.
4.2.2. Route Deliverer
The behaviour of this element changes slightly under the monitoring stage. In particular, it now forwards to the central server the ongoing route’s sequence instead of the completed routes. Therefore, each time the route composer appends a new location to such sequence, this module immediately delivers the new version to the server. This way, the central server is informed of how the users move in real time under this operation mode.
4.3. DTA Change Detection
The key tasks in this operation mode are to perceive potential changes of the urban dynamics in the target city. This process involves the DTA change detector component.
4.3.1. DTA Change Detector
This module processes all the ongoing routes reported by the subset of users so as to extract the current movement features inside each DTA in terms of direction and speed. Then, it periodically compares such current values with the historical ones stored in the DTA global repository. Algorithm 3 shows this process in more detail.