Table of Contents Author Guidelines Submit a Manuscript
Applied Computational Intelligence and Soft Computing
Volume 2015 (2015), Article ID 578601, 12 pages
Research Article

Towards Scalable Distributed Framework for Urban Congestion Traffic Patterns Warehousing

1FSTM, Department of Computer Sciences, LIM/IDS Lab, Faculty of Sciences and Technologies of Mohammedia, BP 146, Mohammedia, Morocco
2ENSAK, Boulevard Béni Amir, BP 77, Khouribga, Morocco
3ENCG Casablanca, Beau Site, BP 2725, Ain Sebaâ, Casablanca, Morocco
4EMSI, 217 Boulevard Bir Anzarane, Casablanca, Morocco

Received 15 August 2014; Accepted 9 December 2014

Academic Editor: Yongqing Yang

Copyright © 2015 A. Boulmakoul et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


We put forward architecture of a framework for integration of data from moving objects related to urban transportation network. Most of this research refers to the GPS outdoor geolocation technology and uses distributed cloud infrastructure with big data NoSQL database. A network of intelligent mobile sensors, distributed on urban network, produces congestion traffic patterns. Congestion predictions are based on extended simulation model. This model provides traffic indicators calculations, which fuse with the GPS data for allowing estimation of traffic states across the whole network. The discovery process of congestion patterns uses semantic trajectories metamodel given in our previous works. The challenge of the proposed solution is to store patterns of traffic, which aims to ensure the surveillance and intelligent real-time control network to reduce congestion and avoid its consequences. The fusion of real-time data from GPS-enabled smartphones integrated with those provided by existing traffic systems improves traffic congestion knowledge, as well as generating new information for a soft operational control and providing intelligent added value for transportation systems deployment.

1. Introduction

Knowledge data discovery and big data are concepts that revolutionize the modern information technology. Big data refers to the large volume of data recorded in various “digital” activities that are directly involved in the data mining process. Decision-makers of public transport systems are aware of the role assigned to knowledge discovery and big data to draw promised profits [1].

Urban transport as part of “Urban Computing” is now considered one of the most striking big data and knowledge discovery applications. These range from urban traffic management activities with computerized processing of massive amounts of traffic data and geographic data to complex signaling and traffic assignment or control systems, to communications, vehicles tracking, and traveler information operations using increasingly common modern technologies like GPS [2], Wi-Fi, and cellular phone systems. In the field of urban transportation, road traffic state can be represented at any time through the analysis of information collected on all vehicles, which contributes to the development of descriptive models in the form of formulas and rules to reproduce the complex dynamics of traffic. There are several techniques for measuring these data, such as pneumatic tubes, radars, cameras and electromagnetic loops, GPS, RFID, and Bluetooth. Local detectors are practical for relevant measures. The extracted data correspond directly to the number of vehicles detected and used to calculate macroscopic variables such as traffic flow, density, and average speed. Data from various sensors should be stored in large databases. The history of all data will be operated by data mining techniques for efficient traffic management. Traffic congestion has been a major concern of most cities. The congestion phenomena dramatically affect the mobility of people and generate significant stress. Traffic congestion wastes time and energy and causes pollution. For general planning and traffic surveillance several studies provide a collection of velocity data in the field using a GPS device. The data relate to the speed and the number of vehicles in circulation. These data can be used to assemble velocity profiles and indicators of travel time per period. It can also be used to identify congested areas. In order to quantify the severity of congestion, Global Positioning System (GPS) applications have been utilized to collect travel time per period and delay data for many of transportation networks. Data provided by GPS technology proved to be at least as accurate as the data provided by other sensors.

The main contributions of this work can be summarized as follows:(i)the first point concerning the modeling of congestion and allowing developing a measurement process based on GPS technology,(ii)macroscopic traffic simulation which was built and has extended Daganzo model, for predicting the spread of congestion on a network,(iii)distributed infrastructure encompassing various components to meet warehousing services and patterns discovery inherent in the phenomenon of traffic in a transportation network. Big data and mobile computing technologies are the foundation of this infrastructure.

This paper has been organized as follows. In the next section, a state of the art on the macroscopic traffic models has been proposed. Then a network traffic simulation model based on Cellular Transmission Model is introduced. Congestion management and modelling are given in Section 3. Sections 3 and 4 introduce the congestion conceptual modeling using trajectories’ metamodel.

Section 5 outlines the global system architecture. Finally, Section 6 states the main conclusions of our work.

2. Traffic Modelling and Simulation

One of the main societal and economic problems related to transportation in many countries is congestion occurrence in traffic flow. In this context, understanding of the various traffic flow operations is important in managing the congestion and traffic behind road networks. Therefore, several questions arise such as the following: how can we define congestion? What causes congestion? How can we measure congestion? How congestion is propagated through networks? What determines traffic breakdown in terms of location and time?. To answer some of these questions, many traffic flow theories and models have been developed. These models can be deductive by applying physical laws and theories to reach prediction and explanation of traffic operations. They can be inductive by analyzing available real data to fit generic mathematical structures or they can be intermediate by developing basic mathematical model-structures from which one is fitted using real data (see Figure 1). Various criteria are used to classify developed models, such as both application and independent variables scales, operationalization, processes representation, and especially level of detail which describes the vehicular traffic flow. Scale of the independent variables distinguishes between two time scales to describe traffic system’s variables which can be either stochastic or determinist. Developed models can be used either as analytical solutions of sets of equations or as a simulation model in a specific area of application.

Figure 1: Categorization of traffic flow models.

The level of detail category considers the distinguished traffic of both entities (vehicles and drivers) and their description level in the respective flow models. Microscopic models describe this later by considering both the time-space behavior of individual drivers and vehicles, as well as their interactions at a high level of detail. Some works differentiate between microscopic models and submicroscopic ones both of which provide high level of detail. The first describe the functioning of vehicles’ subunits and the interaction with their surroundings, while the second distinguish and trace individual entities.

Mesoscopic models do not distinguish nor trace individual system’s entities either vehicles or drivers but specify the behavior of a small group of entities whose activities and interactions are described at a low detail level. Macroscopic models describe the collective flow by using the analogy between vehicles in traffic flow and particles in a fluid. Several researchers have nourished the macroscopic modeling approach based on the assumption that it provides a correct description of the traffic, compared to the other two categories. In the following paragraph, we present the traffic flow modeling approaches according to fundamental diagram.

2.1. Fundamental Diagrams of Traffic Flow

Fundamental diagrams of traffic flow are curves representing relations between flow and density, density and speed, and speed and flow (Figure 2). These diagrams are essential tools, which enable analysis of fundamental relationships [37].

Figure 2: Fundamental diagram.

The flow and density vary with time and space. When the density is zero, flow will also be zero, since there are no vehicles on the road, while when the vehicles number increases gradually the density as well as flow increases. Traffic reaches its jam state when vehicles cannot move because their density becomes maximum. At jam density, flow will be zero because the vehicles are not moving. When density is between zero density and jam density, flow is in a free state. Note that same flow can have two different densities. However corresponding speeds are different.

2.2. Traffic Simulation Model

There have been various approaches that have been proposed to apprehend the mechanism of propagation of traffic congestion [811]. In this work, we use the cell transmission model with some extensions and apply it to simulate the formation and dissipation of congestion at the semimacroscopic level.

Cell transmission model (CTM) is a discrete approximation to the LWR [7] model proposed by Daganzo [35]. CTM (Figure 4) is based on the assumption that the road is divided into similar cells whose lengths are equal to the distance traveled by free-flowing traffic in given interval (Figure 3). A discrete representation of system state is governed by the number of vehicles in each cell. Another parameter of model’s cell is the maximum number of vehicles that can flow into cell between time steps and . As defined earlier is defined to be the product of the cell’s length and its jam density, and is the product of the clock interval and the cell’s capacity. For simplification reasons, time variations of parameters and are ignored.

Figure 3: Link homogeneous discretization.
Figure 4: The trapezoidal fundamental diagram.

The key idea in CTM is that the length of a cell is the product of free vehicle speed over discrete time step. Therefore, .

If cells are numbered consecutively and if designs the number of vehicles moving from cell to cell during time interval , then the recursive relationship of the CTM can be expressed as where denotes the number of vehicles contained in cell at time . denotes the inflow to cell in the time interval and can be expressed by the following formula: and .

If the flow and density are uniform between two cells during a time interval then Unlike most traffic models, the CTM adopts a trapezoidal shape of fundamental diagram (Figure 4) which is defined by four properties: the free-flow speed , the capacity , the jam density , and the speed with which disturbances propagate backward when traffic is congested.

This trapezoidal fundamental diagram has almost the same form of inflow to a given cell. Therefore

The uncongested case corresponds to leading the wave to propagate downstream. If , then vehicles leave the cell at capacity. When , the number of vehicles which can enter the cell is restricted by the number of vehicles which fit at jam density.

2.3. Proposed Model

In this section, we present extended cell transmission model. Consider a link of road, which is divided into homogenous numbered cells having same length as shown in Figure 3.

The following notation has been adopted to represent the model: = the “flow rate” of traffic along a segment lane, in vehicles per hour; = the (average) speed of the traffic, in km per hour; = the “density” of traffic, in vehicles per mile.The above quantities satisfy the relationship given in equation (see Figure 2). Specific relationship between speed and density is described by equation (see Figure 2).

If TFD denotes trapezoidal fundamental diagram of CTM, density of cell at next time step, expressed in (5), is given by where is the length of cell.

and denote inflow to cells and , respectively.

Moreover inflow to th cell, expressed in (6), is given by where The trapezoidal fundamental diagram of CTM can be represented as shown in Figure 4.

Basic cell transmission model states that if the density in cell at time step is greater than , then the inflow to cell is given by the difference between jam density and current cell density, weighted by constant. If traffic system is in its free state then the inflow to cell is expressed by the fundamental relation of traffic flow. Otherwise, inflow to cell is equal to jam density. Therefore, CTM can be formulated as follows: If we consider speed variation on time, then the inflow traffic to th cell can be expressed by the mean of current vehicles density and speed instead of and constants. Therefore, (6) becomes where its fundamental relation between free speed and jam density gives speed:

Hence if the density in cell at time step is greater than half density jam, then the inflow to cell is given by the difference between jam density and current cell density, weighted by speed value of vehicles in the same cell at the same time step. Otherwise if traffic system is in its free state and if density of th cell is less than half jam density the inflow to cell is expressed by the fundamental relation of traffic flow. This can be formulated as follows: where , , , and .

One can show that this discretization method is “stable” if the Courant-Friedrich-Lewy condition is satisfied:

The extended CTM model given above allows constructing an arbitrary network (Figure 5). In the following the generalized model for urban network where denotes successors cells and describe the probability of drivers to choose cell from cell .

Figure 5: Generalized extended CTM on network.

With the following flow function

The simulation model formulated in this part traffic is of great help for the prediction of congestion on the transport network. The next section deals with congestion and provides definitions and measurement tools.

3. Congestion Modelling and Management

One of major issues that most countries are facing is traffic congestion because in both perception and reality, this phenomenon affects both people and society. To handle this problem many researchers have developed models to evaluate or predict the traffic congestion status along road networks. There are three standard models of traffic congestion. The first model states that trip cost increases in traffic flow, approaching infinity as capacity is reached, while the second describes congestion as a deterministic queue related to a bottleneck for a given a flow capacity [12]. The third model is based on macroscopic characteristic variables represented by the so-called “fundamental diagram,” and has been subject to extensive debate in the literature. Traffic congestion can be studied either at a microscopic level, by using, for instance, queuing theory [13], at a macroscopic level where vehicles are treated as a fluid-like continuum [14], or at an intermediate level [15].

3.1. Congestion Definition

Defining congestion presents lack of consensus because it is considered as a complex physical phenomenon on the behavior of drivers of vehicles that hinder the progress of other vehicles as demand for limited road space approaches full capacity [16].

For common sense, congestion is the condition when there is too much traffic in the road.

Some other productive approaches exist and consider how the phenomenon of congestion influences the transportation system and interacts with socioeconomic objectives and geogovernance.

The U.S. Federal Highway Administration [17, 18] notes that the phenomenon of congestion is essentially a complex phenomenon that is related to nonsynchronization between the performance of the road transportation system and the expectations of users of the network.

3.2. Congestion Measures Index

Traffic congestion can be understood as factor of level of traffic services. Therefore, three factors are used to characterize congestion which are congestion perception by roadway user, streams of road networks, and time because of temporal nature of congestion phenomena (Figure 6).

Figure 6: Congestion characterizing factors.

Recurrent congestions are usually precipitated by events that regularly affect the transportation system, while the nonrecurring congestion is unpredictable. In order to deliver better congestion outcomes, a necessary step is measuring congestion. At the local level, managers of urban transport network must have the congestion measures which enable them to meet operational concerns of incident management and regulation. For this purpose, road managers and engineers rely on collected indicators from roadway sensors. However free-flow speeds should not be used as a direct point of reference. These sensors are used to collect both the extent and relative scale and congestion evolution. Some indicators are strongly relevant for road users such as predictability of travel times and system reliability, while others are relevant to road systems operators, namely, speed and flow on the network links.

Nevertheless, these measures are difficult to aggregate and do not directly address the apprehensions of managers and users of urban transport network. System managers need to understand how good the entire network works; they are concerned with how large the volumes of vehicles are on the network impact travel time, while roadway users are more often worried with trip-based measurement like how much time do they need to get to their destinations which highlights travel time reliability and variability of travel conditions.

There is no simple measure of congestion that is useful for all purposes and situations; knowing how much time one must plan to get from one place to another will not necessarily help an engineer better time traffic signals in the central business district.

Road indicators are grouped as follows.(i)Speed based indicators do not adequately capture congestion effects and can serve as a benchmark for reliability measures.(ii)Delay based indicators depend on a baseline value for calculating the start of “delayed” travel. This concept becomes misleading at peak hours.(iii)Temporal based indicators are based on both travel time index and rate. They also depend on the identification of a baseline value for signaling the start of congested conditions.(iv)Spatial indicators also depend on threshold values in terms of median/average speeds achieved or on free-flow speeds.(v)Service level/capacity indicators typically reference the design capacity of roadway links and are typically implicitly used to maximize their throughput; these indicators have had the favor of roadway managers.(vi)Reliability based indicators try to capture how road users typically make trip decisions on congested networks.(vii)Economic cost/efficiency based indicators measure the cost caused by congestion.(viii)Other indicators may capture either a population exposure to congested road conditions or fuel consumption.

3.3. Measures of Congestion

Varieties of measures of congestion are used in the traffic engineering literature [1624]. In the following, we limit ourselves to the essential measures to be considered in our work.

Link: Volume/Capacity Ratio. The volume/capacity ratio, , varies from a low of 0 (free flow) to values sometimes greater than 1.0 (severely/heavily congested). Freeways are considered sternly congested when the volume/capacity () ratio is larger than 1.0; for quite short periods of time, roads can handle more traffic than their rated capacities. In the Highway Capacity Manual, the “level of service” (LOS) delivered by the facility refers to both the amount of traffic and the quality of traffic flow. Table 1 summarizes the descriptions of level of service, which range from “A” (free-flow uncongested travel) to “F” (severely or heavily congested flow).

Table 1: Levels of service (LOS) [25, 26].

Intersections: Delay. For signalized intersections, the Highway Capacity Manual measures congestion in terms of average delay per vehicle, and “levels of service” are defined based on the average amount of delay. The scale isLOS A ≤ 10 seconds,LOS B 11–20 seconds,LOS C 21–35 seconds,LOS D 36–55 seconds,LOS E 56–80 seconds,LOS F > 80 seconds.Intersections are considered congested when the average delay exceeds 80 seconds per vehicle.

Travel Time Index (TTI). This index conveniently relates congestion to peak travel times, which people see every day and can understand:

Link Performance (LP). The Bureau of Public Roads (BPR) [25, 26] developed a link congestion (link performance) function, which we will write: where , is free-flow travel time on link per unit of time; is flow attempting to use link per unit of time; is capacity of link per unit of time; and is the average travel time for a vehicle on link . The BPR function is commonly used for computing an optimum traffic assignment. Values for and are empirically measured from data. They may be different for different type of roads, whereas typical values for and are 0.15 and 4, respectively, based on the empirical data on highways [25, 26].

Travel Time and Speed Definitions. Travel time is commonly defined as “the time required for traversing a route between two points of interest.” Travel time can be measured directly across the road(s) that connects two or more points of interest. Details of calculation of these variables are given in [19, 2224].

4. Congestion Trajectory Meta Model

Congestion is a space-time event, and therefore the evolution of congestion is a trajectory. Figure 10 illustrates the concepts directly related to congestion. A means of transport operates in a space-time; it has a trajectory that materializes spatiotemporal events relating to positions occupied and provided by sensors (GPS for instance). Variable traffic speed type travel time is developed by embedded applications in smart phones. Virtual sensors in the form of line or polygon are discreetly positioned on the sections and junctions of the urban network [14, 15]. These dynamic virtual sensors are stored in a spatial database server. Our previous works on the trajectories are recalled in the following [27].

Trajectories Modeling Overview. Mobile phone networks, GPS-equipped devices, and other indoor and outdoor localization technologies generate a huge amount of spatiotemporal data. Such amount of data coming from many different heterogeneous fields calls. Providing location-based services (LBS) has multiple challenges as scalability, performance, query processing, high-precision positioning, and privacy preservation. Therefore, LBS growth and need unified model to deal and explore captured data to meet the expectations of several application areas [2, 2831]. In the following, we present different existing presentations of trajectories.(i)Raw trajectory concerns positions recording of a moving object at specific space time domain (GeoStream data) and a given time period; it is presented as a sequence of spatial position in 2D reference system representing the movement as a sequence of positions at time .(ii)Structured trajectory [30] is defined as raw trajectories designed into segments corresponding to significant steps in the trajectory trace (e.g., travel).(iii)Semantic trajectory [30] provides a semantic view of trajectory, which enables applications to associate whatever semantics they want with trajectories. However, this approach is only applicable to transactional schema. Indeed, no work has been published using trajectories as semantic objects with activities on multidimensional data modeling.(iv)Trajectory based on Region of Interest: other recent approaches describe trajectories in composed spatial and temporal contexts based on Region of Interest [32, 33] by defining spatial neighborhood and temporal acceptance.(v)Space Time Path: the “aquarium” [34] of the relevant time-space unit describes anything having spatial and temporal extent as paths (for instance, people, plants, and animal). The Unified Moving Object Trajectories’ Metamodel [27] describes a general metamodel that could be used by different application domains; it can also use an object approach and integrates previous trajectories models described in literature [2730, 3236]. Using the space-time event ontology, the metamodel models space according to OGC Spatial Data Model [31, 34, 3739], observation domain of trajectory, according to OGC Sensor Meta Model and OGC Feature Type, physical and virtual activities between the beginning and the end of Space Time Path [27], sensors used for collecting moving object’s traces, and movement patterns using composite Region of Interest. The metamodel as proposed in the class diagram (see Figure 10) expresses congestion as a spatiotemporal event. The congestion is measured by a sensor network based on the GPS technology and in accordance with the sensor metamodel proposed by OGC [38]. Spatiotemporal markers are introduced to control the collection of measurements of speed and travel time. The marking technique that we propose generalizes the one suggested by works given in [20, 21].

Sensors-enabled mobiles and smartphones are a great choice because they enable the use of many useful and beneficial location applications. For instance, the Android sensor framework allows access of many types of sensors. Android operating system provides service to collect different sensor data (accelerometer, GSM, WIFI, network and GPS, light, temperature, etc.) on a mobile Android device. The data is stored in a local database and can be transmitted to a remote host periodically (serialized to XML and optionally packed into an rsa-encrypted archive). The service can also broadcast the collected sensor data in order to provide it to other applications. It is possible to locate moving objects by using GSM Cell Tower location in combination with GPS. A system of real-time traffic monitoring based smartphones with integrated GPS takes advantage of the diversity of the network coverage delivered by telecom operators, as well as the correctness in position and velocity measurements provided by GPS devices and the existing infrastructure of telecommunication network (Figure 7).

Figure 7: Cellular coverage maps, Casablanca city.

Figure 7 provides visualizations based on the data collected, including cellular coverage maps that show exactly how strong signal is in any particular area for Casablanca city.

5. Global System Architecture

The architecture is given in Figure 8. Its foundations are based on the specifications delivered by the Open Geospatial Consortium (OGC). Precisely the OGC reports this requirement by developing the Sensor Web Enablement [37] (SWE) specification series [39]. Figure 9 describes the GeoMobility server, which is integrated with other elements of the architecture of location-based services (LBS) [3134]. The GeoMobility server delivers content such as maps, directions, points of interest, and traffic. It can also access other databases of local content on the Internet. The system involves vehicles equipped with GPS-enabled smartphones, a near-real-time big data collection infrastructure and a traffic patterns’ engine, and an information visualization system.

Figure 8: Mobile system architecture overview.
Figure 9: GeoMobility server.
Figure 10: Congestion Space-Time-Path model.

The software components architecture distinguishes four abstraction levels. The first considers the collection of data and patterns of traffic from mobile sensors (GPS-enabled smartphones). The second involves the development of measures and travel time and strengthens the congestions simulator. Finally, a monitoring component of urban traffic network is served by congestion index and specific patterns measures (see Figure 11).

Figure 11: Software components architecture.

5.1. Warehousing Architecture Components

Satisfying the expectations of users of location-based services in terms of speed of response time, performance, and scalability is the key to success. Usual relational databases storage systems are designed for static application data model in which data volumes were small and the database lived on a single server in one data center. Nevertheless, these traditional databases are not suitable for manipulating the volume, velocity, and variety of all dynamic collected spatiotemporal datasets, required to support such services, when we favor performances rather than guarantee writing data. In the following, we present the technologies used in the proposed architecture to provide a powerful and scalable framework for collecting and visualizing moving object’s trajectory’s data.

NoSQL Databases. The acronym NoSQL signifies “not only SQL” [40]. It is designed for storing data in a much simpler, flatter, and nonrelational manner that allows data repositories to be scaled up. In a NoSQL database, there is no fixed schema so we can store, in the same entity, heterogeneous spatiotemporal data and activities generated by different kinds of locations sensors. In addition, they are often open source, nonrelational, and distributed and often do not guarantee ACID of relational database (atomicity, consistency, isolation, and durability). Relational database scales up by getting faster hardware and adding memories whereas NoSQL, on the other hand, can take advantage of scaling out by spreading the load over many commodity systems. Consequently, NoSQL is an inexpensive database for scaling trajectories data. Companies like Google, Facebook, Twitter, Amazon, Twitter, Adobe, and Viadeo have left the relational world and all use NoSQL in one way or another because they have seen their needs in terms of load and data volume grows exponentially. Existing NoSQL solutions can be grouped into 4 main families: Key-Values Stores, Column Family Stores, Document Databases, and Graph Databases.

Our choice of MongoDB as a NoSQL database is motivated by the need for a document-oriented store for visualizing trajectories on the map using JSON documents.

MongoDB Database. MongoDB is a scalable, high-performance, open source NoSQL document-oriented database developed by 10gen in 2009 [41]. It is implemented in C++, document-oriented storage, full index, rich document-based queries, and flexible aggregation and data processing. MongoDB may contain several databases. Using JavaScript for its query language, MongoDB supports both single and complex queries. Storing JSON documents, the basis documents format of many modern geospatial applications, makes it easy to build on top of MongoDB. MongoDB database benefits from ascending, descending, unique, and geospatial indexes. To make performance enhanced, JSON is stored by MongoDB in BSON format [42]. To scale its performance on a cluster of servers, MongoDB uses a technique called sharding, which is the process of splitting the data evenly across the cluster to parallelize access. This is implemented by breaking the MongoDB server into a set of front-end routing servers mongos that route operations to a set of back-end data servers (mongod).

MongoDB queries examine one record at a time, which means that queries across multiple records must be implemented on the client or use MongoDB’s built-in MapReduce (MR). Though MongoDB’s MR can be executed in parallel at each shard, there are two major drawbacks [43]: (i) the language for MR scripts is JavaScript, which is slow and has poor analytics libraries and (ii) the SpiderMonkey JavaScript implementation used by MongoDB is not thread-safe, so only one MapReduce program can run at a time.

Hadoop Distributed Framework. Hadoop is scalable, fault-tolerant, and distributed big data storage and processing system [44]. Two main components of Hadoop ecosystem are (i) HDFS which is a distributed file system that provides efficient access to application data and (ii) Hadoop MapReduce which is software which was designed to solve the problem of processing in excess of terabytes of data in a scalable way.

Hadoop has been designed to run on multiple servers simultaneously. In practice, the data is spread across different servers, and Hadoop manages a replication system to ensure a high availability of data, even when one or more servers are failing. The strength of Hadoop is to benefit from the computational power of multiple servers unmarked cluster. MapReduce, whose mission is to distribute the treatments on different servers and vice versa to aggregate the elementary results in an overall result, manages the parallelized processing.

MapReduce plays a major role in the treatment of large quantities of data. The distribution of data within many name servers enables parallelized processing of multiple tasks each involving pieces of files. The Map function performs a specific operation on each element. The Reduce operation combines the elements according to a particular algorithm and outputs the result. The principle of delegation may be recursive: the nodes assigned to tasks can also delegate operations to other nodes.

In traditional applications, the built-in aggregation functionality provided by MongoDB is sufficient for analyzing data [40]. However, storing and analyzing the collected spatiotemporal data of trajectories need more complex data aggregation. This is the reason to use Hadoop as a powerful framework for complex analytics queries in our system architecture. Thus, we come up with the architecture in Figure 12.

Figure 12: MongoDB-Hadoop warehousing.

The first stage is to collect spatiotemporal data of trajectories, as GPX, OV2, or CSV files from different GPS enabling devices, using asynchronous  .Net sockets. Then the collected data is processed using data reducer, error measures, reverse geocoding, and activity recognition services. After that, data could be stored on MongoDB database processed within Hadoop via one or more MapReduce jobs. Output from these MapReduce jobs can then be written back to MongoDB for later querying and ad hoc analysis. Finally, we export results data using JSON documents in order to be visualized rapidly using Google Maps API.

Data Warehousing. In urban traffic engineering context, data warehousing services ensure that the huge volumes of real-time traffic data from a variety of data sources and locations are recorded and maintained for traffic management, traffic information, and traffic analysis and decision support. The real-time and historic data are integrated to monitor network status, manage traffic to reduce congestion, improve air quality, and manage noise impacts. Historical data in the data warehouse are as follows: the trapezoidal profile regarding the fundamental diagram of each link, the travel time of each link, and the average speed observed by a moving object. Other patterns are also considered which result from the data combination from several sensors (see Figure 12).

6. Conclusion

In this paper, we have presented global traffic congestion management architecture. The main part of our contribution lies in the proposal of a novel model of congestion, which is aligned with our previous trajectories’ metamodel. Furthermore, we developed a CTM-like traffic simulation model, for the congestion prediction. Other challenges have been introduced concerning the traffic patterns warehousing in NoSQL database within a distributed infrastructure. The next step for these constructions is to develop the corresponding software solution and achieve major field tests. This paper also provides an overview of integration and management of data provided by GPS network sensors. These data will be used to develop significant information for real-time intelligent transport systems [45].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


  1. H. Lyndon, “Public Transportation Moves with Analytics,” All Analytics, July 2012,
  2. E. D. Kaplan, Understanding GPS Principles and Applications, Artech House, 1996.
  3. C. F. Daganzo, “The cell transmission model: a dynamic representation of highway traffic consistent with the hydrodynamic theory,” Transportation Research Part B, vol. 28, no. 4, pp. 269–287, 1994. View at Publisher · View at Google Scholar · View at Scopus
  4. C. F. Daganzo, “The cell transmission model. Part II. Network traffic,” Transportation Research Part B, vol. 29, no. 2, pp. 79–93, 1995. View at Publisher · View at Google Scholar · View at Scopus
  5. C. F. Daganzo and J. A. Laval, “Moving bottlenecks: a numerical method that converges in flows,” Transportation Research Part B: Methodological, vol. 39, no. 9, pp. 855–863, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. Federal Highway Administration, Traffic Congestion and Reliability: Linking Solutions to Problems, U.S. Department of Transportation, 2004,
  7. M. J. Lighthill and G. B. Whitham, “On kinematic waves. II. A theory of traffic flow on long crowded roads,” Proceedings of the Royal Society London Series A, vol. 229, pp. 317–345, 1955. View at Publisher · View at Google Scholar · View at MathSciNet
  8. S. Yin, Z. Li, Y. Zhang, D. Yao, Y. Su, and L. Li, “Headway distribution modeling with regard to traffic status,” in Proceedings of the IEEE Intelligent Vehicles Symposium, pp. 1057–1062, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. B. Ramakrishnan, R. S. Rajesh, and R. S. Shaji, “CBVANET: a cluster based vehicular Adhoc network model for simple highway communication,” International Journal of Advanced Networking and Applications, vol. 2, no. 4, pp. 755–761, 2011. View at Google Scholar
  10. N. Saunier and T. Sayed, “Clustering vehicle trajectories with hidden Markov models application to automated traffic safety analysis,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '06), pp. 4132–4138, IEEE, July 2006. View at Publisher · View at Google Scholar
  11. D. Ding, “Modeling and simulation of highway traffic using a cellular automaton approach,” U.U.D.M. Project Report, Uppsala University, Uppsala, Sweden, 2011. View at Google Scholar
  12. R. Arnott, “A bathtub model of downtown traffic congestion,” Journal of Urban Economics, vol. 76, no. 1, pp. 110–121, 2013. View at Publisher · View at Google Scholar · View at Scopus
  13. T. Raheja, “Modelling traffic congestion using queuing networks,” Sadhana, vol. 35, no. 4, pp. 427–431, 2010. View at Publisher · View at Google Scholar · View at Scopus
  14. S. Javed, N. Ehsan, S. G. Javed, S. Z. Sarwar, and E. Mirza, “Computational simulation of a macroscopic traffic model for highways in Pakistan,” in Proceedings of the 5th IEEE International Conference on Management of Innovation and Technology (ICMIT '10), pp. 1182–1187, Singapore, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. M. E. Ben-Akiva, S. Gao, Z. Wei, and Y. Wen, “A dynamic traffic assignment model for highly congested urban networks,” Transportation Research Part C: Emerging Technologies, vol. 24, pp. 62–82, 2012. View at Publisher · View at Google Scholar · View at Scopus
  16. OECD, Managing Urban Traffic Congestion, OECD/ECMT, 2007.
  17. L. Zhang, D. Morallos, K. Jeannotte, and J. Strasser, “Traffic analysis toolbox volume XII: work zone traffic analysis—applications and decision framework,” Tech. Rep. FHWA-HOP-12-009, Federal Highway Administration, 2012. View at Google Scholar
  18. Highway Capacity Manual 2010 (HCM 2010), Volume 4: Applications Guide, Transportation Research Board, National Research Council, Washington, DC, USA, 2010,
  19. M. P. Miska, T. H. J. Muller, and H. J. V. Zuylen, “Online travel time prediction with real-time microscopic simulation,” in Proceedings of the 84th TRB Annual Meeting, January 2005.
  20. B. Hoh, M. Gruteser, X. Hui, and A. Alrabady, “Enhancing security and privacy in traffic-monitoring systems,” IEEE Pervasive Computing, vol. 5, no. 4, pp. 38–46, 2006. View at Publisher · View at Google Scholar · View at Scopus
  21. B. Hoh, M. Gruteser, H. Xiong, and A. Alrabady, “Preserving privacy in GPS traces via uncertainty-aware path cloaking,” in Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07), pp. 161–171, New York, NY, USA, 2007. View at Publisher · View at Google Scholar · View at Scopus
  22. C. Nanthawichit, T. Nakatsuji, and H. Suzuki, “Application of probe-vehicle data for real-time traffic-state estimation and short-term travel-time prediction on a freeway,” Transportation Research Record, no. 1855, pp. 49–59, 2003. View at Google Scholar · View at Scopus
  23. J. Herrera and A. M. Bayen, “Traffic flow reconstruction using mobile sensors and loop detector data,” in Proceedings of the 87th TRB Annual Meeting, Transportation Research Board, Washington, DC, USA, January 2008.
  24. L. Chu, S. Oh, and W. Recker, “Adaptive Kalman filter based freeway travel time estimation,” in Proceedings of the 84th TRB Annual Meeting, Transportation Research Board, Washington, DC, USA, January 2005.
  25. M. Zhou and V. P. Sisiopiku, “Relationship between volume-to-capacity ratios and accident rates,” Transportation Research Record, no. 1581, pp. 47–52, 1997. View at Google Scholar · View at Scopus
  26. TRB, Highway Capacity Manual, Transportation Research Board, 1985,
  27. A. Boulmakoul, L. Karim, and A. Lbath, “Moving object trajectories meta-model and spatial-temporal queries,” International Journal of Database Management Systems, vol. 4, no. 2, pp. 35–54, 2012. View at Publisher · View at Google Scholar
  28. L. Chen, L. Mingqi, and G. Chen, “A system for destination and future route prediction based on trajectory mining,” in Pervasive and Mobile Computing, pp. 657–676, Elsevier Science, 2010. View at Google Scholar
  29. H. Ma, T.-F. Tsai, and C.-C. Liu, “Real-time monitoring of water quality using temporal trajectory of live fish,” Expert Systems with Applications, vol. 37, no. 7, pp. 5158–5171, 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, and C. Vangenot, “A conceptual view on trajectories,” Data and Knowledge Engineering, vol. 65, no. 1, pp. 126–146, 2008. View at Publisher · View at Google Scholar · View at Scopus
  31. OGC, “OpenGIS Implementation Specification for Geographic information—Simple feature access: Common architecture,” 2008,
  32. X. Meng and Z. Ding, “DSTTMOD: a future trajectory based moving objects database,” in Database and Expert Systems Applications, vol. 2736 of Lecture Notes in Computer Science, pp. 444–453, Springer, Berlin, Germany, 2003. View at Publisher · View at Google Scholar
  33. O. Wolfson, B. Xu, S. Chamberlain, and L. Jiang, “Moving objects databases: Issues and solutions,” in Proceedings of the 10th International Conference on Scientific and Statistical Database Management (SSDBM '98), pp. 111–122, IEEE, July 1998. View at Scopus
  34. OGC, OpenGIS—Location Services (OpenLS): Core Services Open Geospatial Consortium Inc., September 2008, Reference number of this OpenGIS Project Document: OGC 07-074 Version: 1.2 Category: OpenGIS Interface Standard, 2008,
  35. F. Giannotti, M. Nanni, D. Pedreschi, and F. Pinelli, “Trajectory pattern mining,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 330–339, 2007. View at Publisher · View at Google Scholar · View at Scopus
  36. S. Shaw, A Space-Time GIS for Analyzing Human Activities and Interactions in Physical and Virtual Spaces, Center for Intelligent Systems and Machine Learning, 2011.
  37. OGC, Sensor Web Enablement WG Homepage, 2008,
  38. OGC, Sensor Observation Service,
  39. M. Botts, OGC White Paper—OGC Sensor Web Enablement: Overview and High Level Architecture, Version 3, (rev.: 28.12.2007), OGC 07-165, 2007,
  40. L. Mike, “Planning for big data,” in The NoSQL Movement, chapter 8, O’Reilly Media, 2012. View at Google Scholar
  41. MongoDB 10gen, 2013,
  42. BSON, Binary JSON, Version 1.0, 2013,
  43. E. Dede, M. Govindaraju, D. Gunter, R. S. Canon, and L. Ramakrishnan, “Performance evaluation of a MongoDB and Hadoop platform for scientific data analysis,” in Proceedings of the 4th ACM Workshop on Scientific Cloud Computing (ScienceCloud '13), pp. 13–20, ACM, New York, NY, USA, June 2013. View at Publisher · View at Google Scholar · View at Scopus
  44. B. Lublinsky, K. T. Smith, and A. Yakubovich, Professional Hadoop Solutions, John Wiley & Sons, Indianapolis, Ind, USA, 2013.
  45. R. S. Ghaman, H. C. Lieu, and G. Scemama, “Implementation of CLAIRE, traffic estimation and prediction system, and adaptive traffic control in the US,” in Proceedings of the 11th International Conference on Road Transport Information and Control, (Conference Publication No. 486), pp. 95–99, London, UK, March 2002. View at Publisher · View at Google Scholar