Journal of Advanced Transportation

Volume 2018, Article ID 3853012, 8 pages

https://doi.org/10.1155/2018/3853012

## Clustering Algorithm for Urban Taxi Carpooling Vehicle Based on Data Field Energy

^{1}School of Economics and Management, Lanzhou Jiao Tong University, Lanzhou 730070, China^{2}School of Traffic and Transportation, Lanzhou Jiao Tong University, Lanzhou 730070, China

Correspondence should be addressed to Xiao Qiang; moc.621@qx_tjzl

Received 4 March 2018; Revised 19 May 2018; Accepted 30 July 2018; Published 9 August 2018

Academic Editor: Giuseppe Musolino

Copyright © 2018 Xiao Qiang and Yao Shuang-Shuang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A clustering algorithm for urban taxi carpooling based on data field energy and point spacing is proposed to solve the clustering problem of taxi carpooling on urban roads. The data field energy function is used to calculate the field energy of each data point in the passenger taxi offpoint dataset. To realize the clustering of taxis, the central point, outlier, and data points of each cluster subset are discriminated according to the threshold value determined by the product of each data point field values and point spacing. The classical algorithm and proposed algorithm are compared and analyzed by using the compactness, separation, and Dunn validity index. The clustering results of the proposed algorithm are better than those of the classical clustering algorithm. In the case of cluster numbers 25, 249, 409, and 599, the algorithm has good clustering results for the taxi trajectory dataset with certain regularity in space distribution and irregular distribution in time distribution. This algorithm is suitable for the clustering of vehicles in urban traffic roads, which can provide new ideas and methods for the cluster study of urban traffic vehicles.

#### 1. Introduction

The acceleration of urbanization and the rapid increase in the number of travel vehicles have caused traffic congestion on urban roads to become a major problem that must be solved in urban development. Accordingly, several scholars have suggested the idea of “carpool” which has rapidly stimulated academic resonance. Empirical data show that this concept demonstrates policy effectiveness in alleviating urban traffic congestion and solving the plight of vehicle operation. Researchers from the Massachusetts Institute of Technology (MIT) analyzed a week-long run of New York City taxis in Manhattan in March 2013. Out of the 13,600 taxis in New York, approximately 10,000 were used during rush hours. Manhattan requires only 3,000 shared taxis to satisfy its 98% ride requirements [1]. This study showed that an effective ride-sharing system cannot only alleviate the traffic congestion in cities but also increase the passenger-carrying rate of operating vehicles and the operating income of drivers. In addition, energy consumption and environmental pollution must be reduced [2]. Therefore, implementing carpool is an effective means of improving the quality of urban traffic [3, 4].

The study of the Vehicle Routing Problem (VRP) has given rise to major developments in the fields of exact algorithms and heuristics. In particular, highly sophisticated exact mathematical programming decomposition algorithms and powerful metaheuristics for the VRP have been put forward in recent years [5]. In 2014, Lin [6] provided a classification of Green Vehicle Routing Problem (GVRP) that categorizes GVRP into Green-VRP, Pollution Routing Problem, and VRP in Reverse Logistics and suggests research gaps between its state and richer models describing the complexity in real-world cases. In 2018, Musolino [7] presented a procedure for the solution of the Vehicle Routing Problem (VRP) based on reliable link travel times, the equation of reliable link travel times is composed of a congestion term, expressing the traditional congested link travel times (or generalized costs), and a reliability term, which depends on the fundamental diagram of the link and the Network Fundamental Diagram (NFD) of the homogeneous cluster of adjacent links; NFDs data are used in the proposed link travel time function to calculate reliable travel times. The reliable link travel times are used for the solution of VRP to obtain optimal routes of freight vehicles.

Taxi carpooling as a way to alleviate taxi traffic congestion has been adopted and implemented in some cities. Therefore, as a hot topic of urban transportation research, carpooling has attracted the attention of many scholars. In the research of carpool service problem (CSP), in 2011, Agatz [8] studied the matching problem of drivers and passengers in a dynamic environment, and optimization method was proposed to minimise vehicle running mileage and individual travel cost. Shinde [9] proposed a genetic algorithm for multiobjective optimization based on the carpooling path matching. The algorithm effectively reduces the computational complexity and processing time and improves the carpooling effect. In 2015, Pelzer [10] proposed a dynamic decision algorithm based on network partition; the algorithm divides and numbers the road network and uses the spatial routing search algorithm to realize the matching of passengers and vehicles. In 2015, Jiau [11] implemented the carpooling path matching in a short time through genetic algorithm and realized the carpooling path matching scheme of low complexity and low memory. In 2015, Huang [12] propose a fuzzy-controlled genetic-based carpool algorithm by using the combined approach of the genetic algorithm and the fuzzy control system, with which to optimize the route and match assignments of the providers and the requesters in the intelligent carpool system. In 2016, Chou [13] developed a particle swarm carpool algorithm based on stochastic set-based particle swarm optimization (PSO); the set-based PSO (S-PSO) can be realized by local exploration. Method yielded the best result in a statistical test and successfully obtained numerical results for meeting the optimization objectives of the CSP.

In the research of taxi carpool problem (TCP), In 2013, Cheng [14] took the benefit of travellers and drivers as the optimization objective and established a multidynamic taxi carpooling model. The genetic algorithm was used to solve the problem of the carpooling algorithm. In 2013, Ma [15] proposed a large-scale taxi ride-sharing service; it efficiently serves real-time requests sent by taxi users and generates ride-sharing schedules that reduce the total travel distance significantly. In 2014, Xiao et al. [16] created membership function by using three factors of driving route, driving time, and the number of passengers to realize carpooling fuzzy clustering and recognition of passengers and taxis. In 2015, Ma [17] devised a mobile-cloud architecture based taxi-sharing system. Taxi riders and taxi drivers use the taxi-sharing service provided by the system via a smart phone App. In 2017, Zhang [18] presented the first systematic work to design a unified recommendation system for both the regular and the carpooling services, called CallCab, based on a data-driven approach; this recommendation system has been done to assist passengers to find a successful taxicab ride with carpooling.

It can be seen from the research of domestic and foreign carpool service problem and taxi carpool problem that the solution of the carpool problem is mainly realized by using multiobjective programming algorithm or intelligent algorithms. In practical application, because of the large amount of carpool data, the calculation time is very long when the algorithm is used to determine the carpool scheme. Therefore, a two-stage algorithm is proposed to solve the carpool problem in big data environment to reduce the computational time of the algorithm. Häme [19] proposed an adaptive insertion algorithm in 2011, using the idea of clustering to sort the order of passengers on and off, which greatly simplifies the complexity of the problem and the difficulty of the solution and is helpful in addressing the problem of taxi carpooling. In 2012, Manzini [20] proposed phased clustering model. Cluster factors such as route, distance, user information, and carpool were clustered separately. Then, the decision support system was used to judge the factors and provide decision support for the carpooling. In 2013, Shao [21–23] proposed a two-stage clustering heuristic matching strategy to solve the problem of multivehicle carpooling. In 2016, Yang [24] proposed a kind of carpooling model in distributed parallel environment and used the two-stage distributed estimation algorithm to solve the carpooling scheme.

From the above research, we can see that, in the two-stage algorithm, the first-stage carpool factors (such as route, distance, and time) are clustered. In the second stage, according to the results of the clustering, the algorithm of matching, multiobjective programming, and intelligent algorithm are used to solve the combinatorial problem. The main content of this paper is to study the first-stage clustering problem.

In the present application of the clustering algorithm, clustering centre and range are keys in the accurate application of the clustering algorithm, but the classic clustering algorithm on clustering effect is poor because of the city road network distribution. To realize taxi clustering in a city road, a clustering algorithm of urban taxi carpooling based on data field energy and point spacing is proposed for the clustering problem of taxi carpooling on urban roads. The data field energy function is used to calculate the field energy of each data point in the passenger taxi offpoint dataset. The central point, outlier, and data points of each cluster subset are discriminated according to the threshold value determined by the product of each data point field values and point spacing, and the taxi clustering can be realized to provide a basis for theoretical research on city taxis.

The rest of the paper is organized as follows. Section 2 introduces the data field theory. Section 3 proposes carpool taxi clustering algorithm based on data field energy and point spacing. Section 4 presents a case of Nanjing taxi data that illustrates the application of the proposed approach. Section 5 provides the conclusions.

#### 2. Data Field Energy

##### 2.1. Definition of Data Field Energy

From field theory, the field is used to describe the interaction between substances, such as gravitational field, electric field, and magnetic field. Each field is known to decay or increase with distance, and the distribution of field energy can be described by either a scalar or a vector function [25]. The method treats each object in space as a particle with a certain mass. There is a spherical symmetric virtual gravitational field around it, and any object in the field will be subjected to the joint action of other objects, so a data field is determined in the whole space. Similar to the vector strength function and scalar potential function description of physical field, this method introduces the definition of potential function and field strength function of data field and realizes the self-organizing hierarchical aggregation of data object by simulating the interaction and motion of object in data field. The proposed method is not dependent on the careful selection of user parameters and can identify nonspherical clustering of arbitrary size and density. It is insensitive to noise data and has an approximate linear time complexity. According to this principle, the data point in the large data set is considered a kind of data particle in space, where a virtual field is observed around the data particle. The data particles in the field will be subjected to field forces. Therefore, the data points interact to form a data field. In the data field, the distance between data points is near, and the interaction forces between the data are strong. Then, the data field can be largeand vice versa.

, which is a set of data, where refers to the data point from 1 to* n*. , which is a new index, where* N* refers to the number of data points. The data point of potential energy is defined as follows [26]:where is the distance between data points and , is the interaction factor of data points, is the quality of data objects (in this study, the value of is 1), and is the distance index. The data of spatial distribution mainly depend on the object interaction process or the radius of influence. Thus, spatial distribution has nothing to do with the specific form of the potential function or the selection of distance index, so distance index has minimal influence on the description of structural characteristics [25, 26]. Let , be the Gauss nuclear field, and the data point of potential energy is defined as follows:Gauss field potential function is shown in Figure 1.