Abstract

Although online ride-hailing supplies the nearest taxi matching for riders, the potential leakage of riders’ hobbies and physical condition raises privacy concerns. Since most of the privacy-preserving schemes do limited matching of riders and drivers in the same zone, in this paper, we propose a novel privacy-preserving cross-zone ride-matching scheme, namely, Cride, which extends one zone into multiple neighboring zones. Based on the zone division of a city, CRide allows distance computation between rider and driver across adjacent zones in the encrypted domain. Furthermore, towards efficiency improvement, a ciphertext packing technique is introduced. Theoretical analysis and experimental results suggest that CRide achieves a high ride-matching accuracy and acceptable efficiency without leaking privacy.

1. Introduction

Today, online ride-hailing (ORH) services like DiDi and Uber have been welcomed by more and more people who use it to resolve the problem of taking a taxi at rush hour [1, 2]. Compared with traditional taxi services, ORH provides people with convenient services. Riders can request a driver just using his mobile phone in a short time, instead of standing by the street and waiting for a taxi.

Despite the advantage of ORH, the way of traveling brings challenges of privacy leakage [3, 4]. To offer ride-matching, ORH server (RS) needs to collect riders’ and drivers’ sensitive information, such as identities, locations, and traveling time, which is used to implement matching between drivers and riders. The ride-hailing data may be leaked and used to monitor the traces of riders, infer the hobbies and physical condition of riders, etc. Riders will become the target of junk mail, robbery, or blackmail attacks [5, 6]. Hence, it is essential to protect the privacy of riders and drivers.

In ORH service, to protect the sensitive data about drivers and riders, some works [7, 8] studied the privacy-preserving ORH service. PrivateRide [7] was proposed to protect the privacy of ride-hailing. Later, ORide [8] was designed to find the nearest driver for the rider without leaking identities and locations of drivers and rides. However, the matched drivers may not be the global nearest one because of local matching in ORide. Then, Xie et al. [9] utilized the RNE techniques and difference evaluation scheme based on the property-preserving hash to achieve privacy-preserving ride-matching. But, Vivek [10] demonstrated that the scheme faces a passive attack, which can recover locations of riders and matched drivers. Therefore, how to achieve secure ride-hailing becomes a challenge.

To resolve the above problem, a privacy-preserving cross-zone ride-matching scheme for ORH (CRide) is proposed. Based on the coordinate ciphertext of riders and drivers, CRide calculates the distance ciphertext of the rider from all drivers without learning their locations and then sends it to the rider. The distance ciphertext is decrypted and the nearest driver is found by the rider. We conduct an experiment on real dataset to evaluate CRide, and results show that it achieves a high accuracy and acceptable computing and communication efficiency without the risk of privacy disclosure. The primary contributions of this thesis are introduced as follows:(1)We propose a privacy-preserving cross-zone ride-matching scheme for ORH service (CRide), which can find the nearest driver for a rider. During matching, the location information of drivers and riders is protected. Using the ciphertext packing technology, the proposed scheme simultaneously calculates the distance ciphertext between the rider and all drivers in a single ciphertext, efficiently reducing the bandwidth overhead between the rider and the server.(2)A ride-matching approach crossing adjacent zones is presented in CRide. It can securely find the nearest driver for riders in the whole zone rather than in the local zone. Thus, the accuracy of ride-matching can be increased.(3)In order to evaluate the performance and accuracy of CRide, we design and implement the scheme by C++ on real dataset. Experimental results confirm that CRide achieves comparable accuracy with acceptable computation and network cost for ORH.

This paper is organized as follows. The problem statement is given in Section 2. The necessary preliminaries are introduced in Section 3. The privacy-preserving ride-hailing scheme crossing zones is proposed in Section 4. The security of scheme is analyzed in Section 5. CRide is implemented and evaluated in terms of performance and accuracy in Section 6. At last, related works and conclusions are, respectively, given.

2. Problem Statements

2.1. System Model

In ORH service, when a rider requests a ride query, the nearest driver is matched. In the period of matching, riders’ and drivers’ sensitive information like location and identification is protected. Figure 1 shows the system model which includes three entities, i.e., RS, drivers, and riders.(1)Riders. Riders send ride request to RS, and ride request includes rider’s public key pk, coordinate ciphertexts, and located zone. After receiving the distance ciphertexts from RS, riders decrypt the ciphertexts and find the nearest driver.(2)RS. RS computes the distance ciphertext based on the encrypted coordinates of riders and drivers. Meanwhile, RS sends pk to drivers who are in the rider’s zone zi and the adjacent zones zi,j, where .(3)Drivers. Drivers register on the RS and update their zones to the RS when their zones are changed. Drivers in the rider’s zone and some adjacent zones send their encrypted location information to RS when they receive pk from RS.

2.2. Threat Model

(1)RS is honest but curious, which is curious about drivers’ and riders’ sensitive information although following the protocol. So, except for the ride-matching results, locations and identification of riders and drivers cannot be leaked to RS.(2)Riders and drivers are also honest but curious. Riders upload ride request to get the nearest driver, and drivers send their encrypted locations to RS, but they are curious about the location information of each other. Meanwhile, drivers may be ill-natured who could destroy other drivers’ location information.(3)There may be adversaries from Internet, which may launch eavesdropping to learn sensitive locations. Therefore, the information of drivers and riders ought to be protected, about which nontrusted entity should not learn anything.(4)Drivers do not collude with RS. The assumption is reasonable because drivers are not the employees of ORH service.

2.3. Design Goals

Based on the above scenarios and the threat model analysis, the scheme will perform ride-hailing matching efficiently while protecting the privacy of drivers and riders. The specific goals are described here.(1)Security. During the matching process, CRide ought to protect the privacy of drivers and riders. RS only knows the riding-matching result and does not learn anything else. Meanwhile, riders and drivers also learn nothing about the locations of each other except the matched driver.(2)Accuracy. The scheme should be able to match the nearest driver accurately according to the distance between riders and drivers in the whole zone.(3)Efficiency. CRide should achieve a high ride-matching accuracy rate with acceptable costs of communication and computation in practice.

3. Preliminaries

3.1. Somewhat Homomorphic Encryption

Homomorphic encryption allows people to perform any operations on the encrypted data and get encrypted results whose plaintext is the same as the operations conducted in plain domain [11, 12]. Throughout the whole operation process, ciphertext need not be decrypted to protect data privacy. It is crucial for RS to compute the match value based on encrypted location coordinates of drivers and riders while not leaking plaintext. In our scheme, we use the somewhat homomorphic encryption (SHE) which supports some types of operations on ciphertexts with limited times.

As an efficient SHE, the Fan–Vercauteren (FV) scheme [13] is an additively and multiplicatively homomorphic encryption. In particular, plaintext m and ciphertext c are polynomials over a ring, plaintext elements , ciphertext elements , t and q are positive integers which, respectively, define the maximum of the plaintext and ciphertext coefficients, and .(1)FV.KeyGen (): suppose is a short noise random distribution in , is a secret key , is a random element in , and is a noise term; then, output pubic key .(2)FV.Encpk(m): to encrypt a message , let , sample and return .(3)FV.Decsk(c): decrypt ciphertext c with secret key sk, .

3.2. Number-Theoretic Transform

Number-theoretic transform (NTT) is a Fourier transform for finite fields which can speed up the polynomial multiplication in encrypted domain and transform convolution products into coefficient-wise products [14]. For a vector , its NTT is represented as

Inverse NTT:where is the modulo inverse of n in and is a principal nth root of unity in .

4. Privacy-Preserving Ride-Matching Scheme Crossing Zones

4.1. CRide Overview

Generally, both riders and drivers can use smartphones with third-party navigation apps and map projection systems to get their locations and convert the pair of (latitude, longitude) to planar coordinates (x, y). To lighten the calculation and communication burden, we can divide a city into a number of zones. When a rider needs ride-hailing, he sends his encrypted planar coordinates to RS. RS packs coordinate ciphertexts into a ciphertext polynomial after receiving coordinate ciphertexts of drivers in the local zone. Then, the distance ciphertext of the rider from all drivers is simultaneously calculated in a single ciphertext.

When the rider receives the distance ciphertext, he decrypts it and gets the minimum distance value dist. Based on the dist, the rider finds the adjacent zones. For each adjacent zone, the same operation is performed iteratively. Finally, the nearest driver in the whole zone is matched.

Note that it is impossible that a driver currently being matched is matched to a new rider again, so we use variable matched[k] to represent the status of the kth driver. If matched[k] = 1, the kth driver is busy; otherwise, he is free. Drivers with busy status will not be considered for a new ride requesting.

4.2. Framework of CRide

Figure 2 depicts the framework of CRide which can be described as follows:(1)Initialization. A city is divided into a number of zones of a certain size vertically and horizontally, and some parameters are set. Drivers registered on ORS service submit their zone locations to RS and constantly update when their zones are changed. Meanwhile, matched[k] is set to 0.(2)When a rider requests ride-hailing, he firstly generates a key pair and then packs his location () into two polynomials , . Next, he applies inverse NTT on polynomial and and encrypts them to and . Finally, the rider sends pk, his zone location (e.g., z), , and to RS.(3)RS sends the upper right corner coordinates and the lower left corner coordinates of zone z to the rider.(4)RS assigns each driver in zone z an index i () except for those with matched[k] = 1. Then, all the indexes and pk are sent to the corresponding drivers.(5)For the ith driver, his coordinates can be encoded in the ith coefficient and . Then, he applies inverse NTT on them and encrypts them to get the ciphertext of coordinates and . Finally, the driver sends and to RS. Similar works are requisite for all the drivers.(6)RS packs ciphertext of locations from n drivers into a single ciphertext ( and ) after receiving them. The base and is large enough to separate each coordinate value at the bit level. Then, RS computes the distances among the rider and n drivers in parallel by in encrypted domain and sends to the rider.(7)The distance ciphertext is decrypted and NTT is applied to get the distance polynomial, which consists of distances between rider and each driver. Then, the rider gets the shortest distance in local zone. Taking , and as inputs, rider runs algorithm 1 to find adjacent zones array with at most 8 zones [15]. Generally, the nearest driver exists in one of them.(8)For each adjacent zone, 4–7 are run iteratively until the nearest driver is found and its matched[k] is set to 1.

(1) Input: the upper right corner coordinates and lower left corner of zone z
(2)  The distance between the nearest driver and the rider in zone z
(3)   The coordinate of the rider
(4)  Output: zones array that nearest driver may exist
(5) is initialized to empty
(6) if , then add into
(7) if , then add into
(8) if , then add into
(9) if , then add into
(10) if , then add into
(7) if , then add into
(11) if , then add into
(12) if , then add into
(13) return

5. Security Analysis

5.1. Security Analysis from RS

RS is honest but curious who follows the protocol but is curious about locations of riders and drivers. Towards privacy-preserving ride-matching, riders and drivers separately encrypt their location coordinates and with public key pk and then send individually location ciphertexts and to RS. So, the procedure of ride-matching is conducted in encrypted domain that prevents attackers and RS from learning anything because of the semantic security of FV. Meanwhile, none of RS and attackers has the rider’s private key sk. So, they cannot decrypt the ciphertext. In the matching process, CRide ought to protect the privacy of drivers and riders. RS only knows the riding-matching result and does not learn anything else. Thus, RS cannot launch inference attack on riders.

5.2. Security Analysis from Driver

In the proposed scheme, drivers may be ill-natured who could destroy other drivers’ location information by encrypting nonzero values for other slots. To prevent the malicious attack, RS sets a different index i for each driver, and drivers encode their coordinates in the ith coefficient and , by which, RS can figure out the malicious behavior.

5.3. Security Analysis from Rider

During ride-matching, drivers’ locations are encrypted before being sent to RS. RS computes the ciphertext of distance and sends it to the rider. After receiving the ciphertext of distance, the rider can decrypt it and find the nearest driver. Meanwhile, he learns nothing about the location of the matched driver except for his ID, although the rider knows sk.

6. Evaluation

To evaluate communication and computation overload and ride-matching accuracy, some experiments are conducted on real-world dataset. In the experiments, we set the polynomial dimension of a 20-bit plaintext and the coefficient size to 4096 and 124 bits, respectively. Thus, CRide can reach a 112-bit security and support ciphertext calculation of 4096 bits. We implement the scheme with the C++ NFLlib library [16]. RS is deployed on a machine with Intel i7-10700 which has 2.9 GHz CPU and 16 GB RAM. Riders and drivers are located on the same machine with Intel i7-3537U which has 2.5 GHz CPU and 8 GB RAM.

6.1. Dataset

CRide is evaluated based on real dataset that comes from [17]. In the experiment, we divide a zone z from a dataset for October 2014 with over 14 million riders. Zone z has about 4096 drivers and is used to evaluate the performance. Towards cross-zone ride-matching, zone z is divided into some zones vertically and horizontally. For example, the zone is divided into nine zones of the same size using 3 3 divisions.

In the dataset, we assume a record is a ride-matching request, and the pick-up location of a rider is his location. Since 99% of the time interval between the drop-off of a driver and his next pick-up is about 30 seconds, a driver is available for a ride request if there is once drop-off in the last 30 seconds.

6.2. Performance

This part evaluates the performance of CRide, including communication and computation overhead compared with PrivateRide [7] and ORide [8]. Based on the above settings, the polynomial size is 62 KB. The maximum number of available drivers for a request is 4096. Experimental results are collected after 100 trials. When a rider requests ride-matching, communication overhead and computation cost are, respectively, shown in Tables 1 and 2.

6.2.1. The Performance of Riders

Assuming that there are n drivers and m adjacent zones, the nearest driver may exists in one of zones. The rider sends RS a public key and two encrypted coordinates with six polynomials and a 372 KB payload for a ride request. The download overhead is linear to the number of available drivers in PrivateRide. With the ciphertext packing technique, ORide only has a distance ciphertext, and the number of distance ciphertext is reduced to in CRide. The size of an encrypted distance is 186 KB. For a ride request, 4096 encrypted distances are sent to the rider in RrivateRide. In CRide, the download overhead is linear to the number of chosen adjacent zones. The maximum value of m is 8. Experiments show that m is up to 3 in most cases. One hundred experiments show that on average, RS sends the rider the ciphertext of distance with size 353.4 KB for 3 3 division. Table 1 shows that the communication overhead of CRide is slightly increasing compared with ORide, which is significantly reducing compared with PrivateRide.

As shown in Table 2, the three schemes have the same computation cost for key generation and encryption. The decryption cost of CRide increases slightly compared with ORide. However, it is also significantly reduced compared with PrivateRide.

6.2.2. The Performance of Drivers

Tables 1 and 2 show that the three schemes have the same communication and computational overhead for each ride request. For the driver, he will receive the public key pk and upload his coordinate encryption for ride requests from his zone and some adjacent zones. The driver’s bandwidth should meet the ride-matching while not being too large. The bandwidth is calculated by multiplying the size of the required bandwidth per request with the number of receiving requests per second. Results of the experiment show that the required bandwidths for CRide (3 3), CRide (6 6), and CRide (9 9) are separately less than 0.68 Mb/s, 0.3 Mb/s, and 0.23 Mb/s. Therefore, RS should balance the bandwidth requirement for drivers and zone division.

6.2.3. The Performance of RS

As shown in Table 1, using ciphertext packing and cross-zone ride-matching, the communication overhead of CRide is significantly reduced compared with PrivateRide and ORide.

The computation cost of CRide is linear to the number m of adjacent zones, and that of PrivateRide is linear to the number of available drivers. Table 2 shows that the computation cost of CRide is significantly reduced compared to PrivateRide, and it is slightly increasing compared with ORide.

6.3. Accuracy

For ride-hailing service, the key is to match the nearest drivers for riders. In experiments, 100 ride requests are generated randomly and implemented. We use ride-matching accuracy rate to demonstrate the matching degree of PrivateRide, ORide, and CRide with the ground in plaintext. As shown in Figure 3, CRide gets the nearest drivers for about 92% of riders, which exceeds ORide by nearly 30% under all division granularities, which implements ride-matching without cross zone. Meanwhile, CRide reaches the same accuracy as PrivateRide, which implements ride-matching without division because of cross-zone matching. But Table 1 shows that the communication overhead and computation cost of PrivateRide are considerably large compared with CRide.

With the popularity of ride-hailing services, privacy protection receives more and more attention. Some works have been done to design privacy-preserving ride-hailing schemes. Duan et al. and Khazbak et al. [18, 19] used cloaking technologies to match the nearest drivers for a ride request while protecting the sensitive information of drivers and riders. Duan et al. [18] proposed cloaking region-based passenger privacy protection in ride-hailing, maximizing social welfare under riders’ privacy requirements. Khazbak et al. [19] proposed a ride-hailing scheme with privacy preserving, which considers the privacy preference of riders using novel obfuscation techniques. However, these schemes use location range to find the nearest drivers for a rider, which results in low accuracy.

To promote the accuracy of ride-matching, schemes [15, 20] were designed using the road network embedding technology (RNE) to match. Luo et al. [15] proposed pRide which allows ORH to efficiently match riders and drivers based on RNE and garbled circuits. Yu et al. [20] proposed RMatch, which computes the shortest distance using RNE and PHE. Meanwhile, Yu et al. [20] also proposed EPRide, in which the Hamming distance replaces road distance to implement ride-matching using somewhat homomorphic encryption and load network hypercube embedding technology. But the three schemes assume that there is a trusted party, which cannot fully guarantee the privacy of user in practice.

There are new works on privacy-preserving ride-hailing. Shivers et al. [21] proposed a framework for developing a decentralized ride-hailing architecture for autonomous vehicles implemented on the Hyperledger Fabric blockchain platform. Huang et al. [22] proposed a privacy-preserving ride-matching scheme with prediction, which utilizes a deep learning model to predict the emergence of ride requests in various regions and find the best driver in a global perspective instead of the nearest driver in the local region, leveraging prediction results.

8. Conclusions

In this thesis, a privacy-preserving cross-zone ride-matching scheme is proposed in ride-hailing services. The city is divided into some zones, and ride-matching is implemented in cross adjacent zones using somewhat homomorphic encryption and ciphertext packing. By using CRide, RS can find the nearest driver for riders in the whole zone. During matching, the privacy of drivers and riders is protected. Theoretical analysis and experimental results over real-world datasets prove that CRide achieves acceptable efficiency and comparable matching accuracy with the ground in plaintext. In the next research, the reputation of drivers and the interest of riders are worthy of considering for ORH service.

Data Availability

The data that support the findings of this study are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Key Technologies R&D Program of He’nan Province under grant nos. 192102210295 and 212102210084.