Abstract
Predicting and managing the movement of people in a region during epidemics’ outbreak is an important step in preventing outbreaks. The protection of user privacy during the outbreak has become a matter of public concern in recent years, yet deep learning models based on datasets collected from mobile devices may pose privacy and security issues. Therefore, how to develop an accurate crowd flow prediction while preserving privacy is a significant problem to be solved, and there is a tradeoff between these two objectives. In this paper, we propose a privacy-preserving mobility prediction framework via federated learning (CFPF) to solve this problem without significantly sacrificing the prediction performance. In this framework, we designed a deep and embedding learning approach called “Multi-Factors CNN-LSTM” (MFCL) that can help to explicitly learn from human trajectory data (weather, holidays, temperature, and POI) during epidemics. Furthermore, we improve the existing federated learning framework by introducing a clustering algorithm to classify clients with similar spatio-temporal characteristics into the same cluster, and select servers at the center of the cluster as edge central servers to integrate the optimal model for each cluster and improve the prediction accuracy. To address the privacy concerns, we introduce local differential privacy into the FL framework which can facilitate collaborative learning with uploaded gradients from users instead of sharing users’ raw data. Finally, we conduct extensive experiments on a realistic crowd flow dataset to evaluate the performance of our CFPF and make a comparison with other existing models. The experimental results demonstrate that our solution can not only achieve accurate crowd flow prediction but also provide a strong privacy guarantee at the same time.
1. Introduction
The gathering and movement of people is an important factor in infectious disease outbreaks. How to predict the crowd flow while protecting the privacy of users will help in the prevention and control of outbreaks. Recently, the coronavirus disease 2019 (COVID-19) outbreak has quickly spread globally and has shown a significant impact on public health and the economy [1, 2]. Preliminary evidence clearly proved that the virus could cluster with outdoor particulate matter (PM) under certain circumstances [3, 4]. The coronavirus disease 2019 (COVID-19) has many cases of human-to-human transmission. A more effective way to reduce transmission and control the outbreak is to impose lockdown and isolation of persons [5]. A better understanding of human mobility can help to more easily control the spreading of contagious diseases by limiting contact among individuals [6], since the transmission of infected people from one place to another is an important way to infect the susceptible ones, either in a small scale area [7, 8] or from a worldwide viewpoint [9–12]. Therefore, during an epidemic, predicting the trajectory of the population allows for more efficient surveillance and management of the risk area, reducing the spread of the epidemic.
The prevalence of mobile Internet has provided massive amounts of continuous data for crowd trajectory prediction during epidemics. In this paper, we focus on the travel areas of people in the city during the epidemic at different times of the day. To predict the crowd flow of urban region, some state-of-the-art deep learning models have been proposed [13–17]. Nevertheless, the existing approaches are still not well suited for instant flow prediction especially for rapid flow variation scenarios usually due to social emergency incidents/accidents. In the past, most user trajectory prediction approaches have the following limitations: (1) ignored the influence of other factors, such as weather conditions and functional attributes of the area, resulting in a single data source structure and compromised prediction accuracy. The weather has proven to be useful as features for a crowd flow prediction tasks such as bike-sharing demand prediction [18] and metro passenger flow forecasting [19]. In many application scenarios, we tend to focus more on functional regions such as business area, residential area, administrative area, and scenic spot, which are more connected with human behavior. Dividing the regions by functional attributes is more suitable for the accurate expression of human dynamics and population movement patterns. (2) Lack of spatio-temporal adaptive models for fine-grained feature processing of crowd flow. The human trajectory during the epidemic is affected by the epidemic prevention policy, time, weather, and other factors, which will show certain regularity. Some recent work has used deep neural networks to solve this problem [14]. However, the accuracy of a deep neural network approach relies heavily on its architecture and hyper-parameters [20]. In addition, these models [21–26] are designed for a specific type of data and are difficult to be generalized to handle other types of spatio-temporal data.
To tackle the aforementioned challenges, we propose a deep learning approach with fusion of multiple feature sources (MFCL) to predict crowd flows. In particular, our deep learning approach combines a convolutional neural network (CNN) component and a long short-term memory (LSTM) component. Since human mobility is closely related to the functional attributes of regions, this method divides the city into M grids and determines the functional attributes of regions according to the POI attributes of each region. Firstly, we extract the feature factors that affect the human mobility prediction, such as weather and time of day, as the feature vectors for model training. Then we use CNN for local feature extraction and LSTM for model training and prediction.
In the field of crowd mobility prediction, current prediction models are mainly divided into two categories: (1) personal model with self-information; and (2) joint model with population information. Personal model includes Markov model, hidden Markov model, and decision tree. These works try to model the mobility behavior of individuals with various methods by only utilizing their own mobility data records. The single personal machine learning model performs poorly in real-world scenarios with sparse and limited data records. We consider use a joint training model which uses multi-party mobile data to train the federated model to solve the problem of sparsity of data records. However, the dataset used in a joint training model may contain a large amount of user’s private data and there are substantial user privacy and security issues in the process of data upload and centralized model training. How to develop accurate crowd flow prediction while preserving privacy is a major problem that needs to be addressed. Madan and Goswami [27] introduced a privacy control mechanism based on “k-anonymous diffusion,” which can complete taxi-order scheduling without leaking user privacy. Le Ny et al. [28] proposed a differentially private real-time traffic state estimator system to predict traffic flow. In order to improve the prediction accuracy of crowd movement and to address the balance between data availability and privacy protection, we introduce a privacy-preserving crowd flow prediction framework based on Federated Learning (FL) [29, 30] during epidemics. In our framework, what is uploaded to the server is not data, but only the intermediate weight gradient of the prediction model. First, we introduce a differential privacy protection mechanism for local data based on the risk level of different regions during the epidemic. Secondly, we combine the idea of clustering to design a three-layer Federated Learning framework with practicality. Finally, we propose an efficient aggregation strategy that aggregates model parameters from different regional devices to achieve robust convergence. In FL, private data is stored and analyzed in the local devices, so our approach protects the security of the user’s private data while ensuring the accuracy of the prediction. The major contributions of this paper are summarized as follows:(1)To ensure the accuracy of crowd movement prediction in different areas of the city during the epidemic, we propose a novel deep prediction model called “Multi-Factors CNN-LSTM” (MFCL) that incorporates multi-factors related to human trajectory data (weather, holidays, temperature, and POI) during an epidemic. The MFCL consists of an embedding component, an urban regional classification model (URCM), a convolutional neural network (CNN) component, and a long short-term memory (LSTM) component. The embedding component can capture the categorical feature information and identify correlated features. The URCM identifies and classifies areas of the city according to their functional attributes. In theory, the CNN component can extract features of local trajectory while the LSTM component has the benefits of maintaining a long-term memory of historical data.(2)In order to solve the problem of the balance between the utility and security of user location data privacy, we propose CFPF, a novel Privacy-preserving Crowd Flow Prediction framework via federated learning, to protect the user privacy and get accurate crowd flow forecasts. The framework balances the data availability and security in a real environment by introducing a differential privacy protection mechanism for local data based on the risk level of different regions during the epidemic. Privacy-preserving methods used in CFPF satisfy the differential privacy-preserving property.(3)We simulate the crowd flow environment during the COVID-19 epidemic on real desensitized mobile 4G data and validate the effectiveness of our crowd flow prediction model and privacy-preserving framework after extensive experiments. The experimental results show that the CFPF achieves a better balance of prediction accuracy and data privacy protection.
The rest of this paper is organized as follows: We first formulate the related work in Section 2. Then, we introduce details of the crowd flow model in epidemic and differential privacy and geo-indistinguishability in Section 3. In Section 4, we proposed a privacy-preserving crowd flow prediction framework based on federated learning. Specifically, select key features, introduce crowd flow prediction models, and build a federal learning framework. After the framework description, we apply CFPF on real-world mobility datasets with multi-user environment and conduct extensive analysis on the prediction performance and the effectiveness of the privacy-preserving mechanism in Section 5. After systematically reviewing the related works in Section 6, we conclude our paper.
2. Related Work
2.1. Crowd Flow Prediction
Crowd flow prediction, instead, is the task of forecasting the incoming and outgoing flows of people on a geographic region, which has an impact on public safety, the definition of on-demand services [31, 32], and traffic optimization [33–38]. Predicting urban crowd flow helps urban management, especially in the alert phase of infectious disease outbreaks, and knowing in advance which areas of the city will be the gathering areas for crowd flow is essential for city managers to take measures in advance. Modeling citywide crowd flows has been studied by several recent works. The prediction of crowd flow is mainly the prediction of spatio-temporal series that can generally be divided into two categories: parametric model-based approaches and nonparametric model-based approaches. Traditionally, parametric time series models include ARIMA such as Kohonen-ARIMA (KARIMA) [39], subset ARIMA [40], seasonal ARIMA ([41] and [39]), and regression models with spatio-temporal regularization [42]. Williams [43] used ARIMA model to predict the short-term traffic flow. The advantage of the parametric model is that the model is highly transparent and easy to understand. These solutions usually take less time than nonparametric models. However, the complex temporal and spatial correlations of the crowd flow data are hard to be captured due to their limited learning ability.
The nonparametric model-based approaches assume that the data distribution cannot be defined in terms of a finite set of parameters. Aude Hofleitne [44] proposed a dynamic Bayesian network to predict the traffic flow of an entire arterial road network with hundreds of road links based on sparse probe data. With the development of data storage and computing power, neural network-based models have become popular in crowd flow prediction. Davis and Nihan et al. in [45] proposed k-NN model for short-term traffic flow prediction. Reference [46] proposed a Spatio-Temporal Dynamic Network (STDN) model which combines CNN model and RNN model to capture both spatial and temporal correlations for traffic prediction based on road network. Reference [47] proposed a deep learning framework for traffic flow forecasting named the Diffusion Convolutional Recurrent Neural Network (DCRNN) to model the traffic flow as a diffusion process on a directed road graph. The authors of [48] introduced Long Short-Term Recurrent Neural Networks (LSTMs) which is capable of learning long or short-term dependencies for traffic flow prediction. However, LSTM is not good at mining spatial correlation relations. Some of the research work proposes solutions with a combination of CNN and LSTM for both spatial and temporal dependent flow prediction. Zhang et al. [49, 50] proposed a deep learning model ST-ResNet to collectively forecast the inflow and outflow of crowds in each region of a city based on DeepST. ST-ResNet contains four major components: closeness, period, trend, and external unit. ST-ResNet provides a fusion mechanism which allows assigning different weights to each output of the closeness, period, and trend unit. This method also takes into account external factors such as weather, holiday event, and metadata. The ST-ResNet can only predict one step ahead and cannot be used for predicting multi-step ahead according to the complex spatio-temporal history. Yao [46] proposed a Spatial-Temporal Dynamic Network (STDN) model for traffic prediction. Reference [14] proposed a multi-task deep learning framework to simultaneously predict the node flow and edge flow in a constructed urban spatio-temporal network. Reference [51] proposes DeepSTN + architecture that improves the accuracy of prediction and stability of the model compared to ST-ResNet. However, most of these models are supervised and require a large amount of training data. Due to the large number of parameters in deep learning models, scarcity of training data can lead to overfitting. Crowd flow prediction can also be treated as a spatio-temporal graph (STG) prediction problem, where each node represents a region with time-varying flow. For example, Sun et al. [52] used spatial graph convolution to construct a multi-view graph convolutional network (MVGCN) for crowd flow prediction. It uses each graph convolution network to predict the inflow and outflow of people in a region integrating geospatial locations through spatial graph convolution.
2.2. Privacy Issues for Crowd Flow Prediction Systems
In the research of crowd flow prediction systems, a large amount of user data needs to be trained, which may result in the leakage of user privacy [53]. Brain et al. in [54] proposed a data-sharing algorithm based on information-theoretic -anonymity principle. However, this algorithm may leak privacy during data-sharing operations. Zhou [55] proposed a privacy-preserving transportation traffic measurement scheme for cyber-physical road systems by using maximum-likelihood estimation (MLE) to obtain the prediction result. Reference [56] proposes a scheme for protecting the location of vehicles using encryption methods. These privacy-preserving methods mentioned above sacrifice the accuracy of the prediction results and cannot handle large amounts of data in a short period of time.
FL is a new machine learning paradigm that encourages participants to collaboratively train a globally shared model released by the central server. FL effectively eliminates privacy threats to individuals by performing prediction tasks locally. However, traditional FL relies heavily on the central server which manages local model updates and thus cannot avoid the security vulnerability of a single point of failure [57]. Xu [58] proposes the first privacy-preserving federated learning framework VerifyNet. The [59, 60] applied edge computing to a federated learning framework to obtain faster training time and better privacy preservation. In order to improve the convergence efficiency of FL, the literature [61] used a momentum gradient descent method in the local update process of FL. The FL framework alone has the following security threats: (1) Privacy leakage after a single point of attack on the central server. (2) Malicious reporting of false data from the local server affects the overall prediction model. Li in [62] combines blockchain with FL and introduces a crowdsourcing framework to improve the overall security of the framework. Reference [63] propose PMF, a novel privacy-preserving Mobility prediction framework that introduces differential privacy into the optimization of the local models to obtain the controlled and privacy-preserving embedding table for secure model sharing via federated learning to protect the user’s privacy. There are two main schemes to add DP noise to FL: (1) The global DP scheme that adds noises to the aggregated gradient information on the server side before the gradient information is distributed to clients [64, 65]. (2) The local DP scheme adds noises to the gradient information on users before it is uploaded to the server. Additionally, DP can be applied to various FL algorithms such as FedSGD [66] and FedAvg [67]. Reference [68] proposes a hybrid approach that integrates federated learning with local differential privacy techniques. They deploy LDP mechanisms to gradients in FedSGD algorithm. Hu et al. [69] propose a privacy-preserving FL approach for learning effective personalized models. They use Gaussian mechanism, a centralized DP mechanism, to protect the privacy of the model.
2.3. Limitations of Previous Studies
In summary, previous studies failed to address the following issues: First, these privacy-preserving approaches based on federated learning fail to achieve the tradeoff between accuracy and privacy, leading to significant performance reduction in the mobility prediction task. Second, although some previous studies consider the impacts of weather conditions and other features on crowd flow prediction, most of them lack spatio-temporal adaptive model for fine-grained feature processing of crowd flow. In contrast to the methods mentioned above, we focus on the task of moving predictions for different risk populations during epidemics. We introduce differential privacy into the optimization of local models and propose a privacy-preserving framework that incorporates spatio-temporal adaptive model during epidemics, which balances data utility with secure privacy to overcome the above limitations of previous studies.
3. Preliminaries
3.1. Crowd Flow Model in Epidemics
Definition 1. Cell region: The city is divided into a grid map based on the longitude and latitude. Each grid is an equal-sized cell region. We denote all the cell regions as , where is the i-th row and j-th column cell region of the grid map.
Definition 2. Region: Based on the a priori knowledge and POIs of the city, the city is divided into different nonintersecting parts with basic functional attributes (such as business area, residential area, and administrative area) and road characteristics, which is so called regions. Each region has a specific ID and contains a number of cell region as shown in Figure 1.

Definition 3. Crowd Flow: The crowd flow of region is the number of individuals (e.g., people or vehicles) moving into/out of the corresponding region during a time interval t, as shown in Figure 1. denotes inflows and outflows of all the cell regions during t. In a real geographical context, a crowd flow consists of multiple () travelers whose trajectory is similar.where the indicates a cell region and is the current position of the individual. Let be the set of locations of all the individual trajectories then the inflow to a cell region is the number of people that stay in at time but not in at time .The out flow from a cell region is the number of people that stay in at time but no longer in at time .
Definition 4. Trajectory: A location record can be denoted as a tuple of three elements: user , location identification , and timestamp . A trajectory is defined as an ordered set of location records , i.e., . Generally, is the latitude and longitude of the user or the ID of the region. Considering the mobility characteristics of human, we choose 5 minutes as the default trajectory time sampling point based on the time interval of the 4G signal reporting to the base station. The time sampling point can be easily adapted to the requirement.
Definition 5. Person Trajectory Similarity: The person trajectory similarity is a measure of the overlap of the trajectory owned by the user and j. The higher the , the more likely two users own the same trajectory. In this paper we define the as the same cell region through which the trajectories of users i and j pass in time order during the same time period.
Definition 6. Crowd flow prediction problem: By analyzing the user’s trajectory database and the spatial and temporal similarity between the user’s trajectory data, the moving position of the next flow of people can be predicted. Given the users’ trajectory database , the crowd flow prediction task is to obtain the probability of the location of crowd flow in the next time .
Definition 7. User Risk Classification: Take the COVID-19 pandemic as an example. China’s national government service platform launched the “Epidemic Prevention and health Information Code” and set up a personal health code for dynamic management in three colors: “green” represents low risk, “yellow” represents medium risk, and “red” represents high-risk. Based on the health code, we create a risk classification for humans.
3.2. Differential Privacy and Geo-Indistinguishability
3.2.1. Differential Privacy
Differential privacy is a recent privacy model which provides a strong privacy guarantee. The main idea of differential privacy is that after adding or deleting a record to the database, there is almost no difference in the output of the same algorithm applied to the database. Formally, differential privacy is defined as follows:
Definition 8. -differential privacy: A private algorithm gives -differential privacy if for any neighboring databases and , and for any possible output ,where represents the probability, and represents the value range of the output result of algorithm .
Definition 9. Sensitivity: For any neighboring databases and , the sensitivity of is Currently, differential privacy protection has two main methods: the Laplace’s mechanism and the exponential mechanism.
Definition 10. -local differential privacy: Traditional DP requires a central trusted party which is often not realistic. To remove that limitation, local differential privacy (LDP) has been proposed. A randomized mechanism satisfies for any pair input and in D, and any output of .The privacy guarantee of mechanism is controlled by privacy budget. To combat inference attacks against shared data values, companies including Google, Apple, and Microsoft employ local differential privacy (LDP), the state-of-the-art in privacy-preserving data collection.
Theorem 1. Laplace’s mechanism: The Laplace mechanism is to add noise following a Laplace distribution to the output of the algorithm, so that the algorithm meets the differential privacy protection. For any function : with sensitivity , the algorithm
The Laplace distribution with magnitude , i.e., , follows the probability density function as , where is determined by the sensitivity and the privacy budget , for any pair of inputs.
Theorem 2. Sequential composition: For the same data set, if the whole privacy protection process is divided into the different privacy protection algorithms whose privacy protection levels are , then the privacy protection level of the whole process needs to satisfy differential privacy protection.
Theorem 3. Parallel composition: For the disjoint data set, if the whole privacy protection process is divided to the different privacy protection algorithms whose privacy protection levels are , then the privacy protection level of the whole process needs to satisfy differential privacy protection.
3.3. Geo-Indistinguishability
By generalizing the definition of differential privacy, [70, 71] proposed a generalization of differential privacy under the metric. Let be a set of locations and let be a set of query outputs, while is an obfuscation mechanism which ensures that and are similar to a certain degree for any two locations , thus making it impossible to distinguish the true position. The distance that expresses the distinguishability level between and . A small value denotes that the locations should remain indistinguishable, while a large value means that we allow the adversary to distinguish them. Let denotes the set of probability measures over , The multiplicative distance on is defined as with if both and are zero and if only one of them is zero. In this paper, we use the Euclidean distance between and to represents the , The is defined as follows:
Definition 11. . A mechanism : satisfies if ,Has been proposed a technique that achieves named Planar Laplacian mechanism (PLM) which generalizes the Laplace Mechanism to two dimensions as follows:
4. Methodology
In this section, we proposed a privacy-preserving crowd flow prediction framework based on federated learning as shown in Figure 2. In the proposed scheme, we first select the spatio-temporal feature of the prediction model to quantify the factors that influence the trajectory of people during epidemics. We select the weather and temperature feature, temporal feature, and regional feature as the main features of the crowd flow prediction model. In particular, we construct the urban regional classification model to classify different regions according to their regional functional attributes and quantify the functional attributes of the regions. Then, we combine CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory) algorithms to design the crowd flow prediction model. CNN is suitable for local feature extraction, while LSTM is suitable for time series processing, so the two are complementary in function. Finally, we introduce CFPF, a federated learning framework with local differential privacy. It consists of the following three steps as described by Algorithm 1.(i)Step 1. The servers in the region are divided into K subregions by K-Means clustering algorithm and the server located at the center of the region is set as the edge central server.(ii)Step 2. The cloud initializes weights and pretrain global training model. Then edge central servers send the weights and model to each client.(iii)Step 3. For each client that has private data, a local differential privacy mechanism is introduced for training. In each communication, the selected local clients will update their local models by the weight from the edge central server. Finally, cloud aggregates the parameters from all edge central servers.

4.1. Selection of Spatio-Temporal Feature
Weather, temperature, holidays, and other factors can significantly affect users’ travel patterns and trajectories. At the same time, the spatial and temporal patterns of crowd gathering are strongly related to the functional attributes of the gathering area. Therefore, in this section, feature vectors related to crowd flow prediction are selected and urban regional classification model is designed to assign different functional attribute features to regions.
The weather and temperature feature. The weather and temperature will affect the user’s travel rule and the predicted result of the flow of people. For instance, on the rainy holiday, people would like to stay at home or shopping mall. The weather condition number is added into the feature vector as a 1-dimensional feature. Convert the temperature data into discrete data according to the interval. The weather of user ‘s location at time is denoted as: , the temperature () of user ’s location at time is denoted as
Temporal feature. We set up two kinds of characteristics: an hourly feature and a weekday feature. Crowd flow patterns are different for each day in a week. During the COVID-19 outbreak, people mainly stayed at home on weekends, but some still went to work on weekdays. The trajectory of crowd flow showed obvious changes in characteristics. The desensitization data of 4G base stations in a certain area of Nanjing was selected for analysis. During the epidemic period, 7:00 am to 9:00 am was the peak time for the flow of people to work, and 18:00 was the peak time for the flow of people to work. The flow of people was relatively low on Saturday and Sunday. The range of hour features from 1 to 12. The value range of a weekday is between Monday and Sunday. The temporal feature can be transformed into one-hot encoding.
Regional feature (urban regional classification model). Considering that the spread of the epidemic is mostly concentrated in functional areas with high population density, such as airports and shopping malls, the model divides the areas of the city into functional areas, and adds a feature mark to each area (representing the functional attributes of the area). The user’s trajectory through different functional areas reflects changes in user behavior patterns. For example, when a user leaves a residential area, he is more likely to choose an office or shopping mall as his destination. The accuracy of the prediction results will be improved by introducing the regional functional model into the prediction of crowd flow trajectory.
Definition 12. Urban Zoning. Streets and road networks in cities can divide urban areas into different geographical areas as is shown in Figure 3.

Definition 13. Composite arrival matrix: Divide the 24 hours of the day into 12 intervals starting at 0 o‘clock. N is the number of partitions in the region, w is the weather value of the time period in the region, and temp is the temperature range of this region in the time period. The composite arrival matrix is an R-×N-dimensional matrix. Each row of matrix A represents the number of users arriving in each region at different time intervals, temperature temp, and weather w. ,
Definition 14. Composite departure matrix. Similar to the arrival matrix, the composite arrival matrix is an R-×N-dimensional matrix. Each row of matrix D represents the number of users leaving in each region at different time intervals, temperature temp, and weather w. ,
Definition 15. POI matrix. POI is point of interest on a map, POI points may be shopping malls, schools, hospitals, and office buildings which represent the most intuitive functional attributes of the region. A location may have multiple POI attributes as shown in Figure 3, where there are both shopping malls and office buildings. The POI matrix is an n-×m-dimensional matrix where m represents the number of categories of points of interest. The value of pi is the proportion of the number of different POI of businesses in the area. For example, in Figure 3, area i has 4 POI attributes, such as shopping mall, fitness, restaurant, and office, so the value of pi is as Table 1 shows.
Chen et al. [73] used the NMF model to divide urban areas into three categories: “work,” “residential,” and “other,” and the user’s trajectory can correspond to the movement between these three regions. However, the original NMF model can only use one type of information for decomposition, resulting in biased results. In order to integrate weather, temperature, arrival data, departure data, and POI matrix, we use a modified NMF to find more accurate mobility patterns named Joint Nonnegative Matrix Factorization [74, 75]. The objective function of the region functional model is as follows:
, is the flow change matrix of the user arriving and leaving a certain area, which represent the integration of matrices Composite Arrival Matrix A and Composite Departure Matrix D, , denotes POI matrix. is the functional weight matrix of the region, each row of W can be denoted as the intensity of each function. , each row of matrix can be denoted as a basic mobility pattern which indicates the moving habits for certain function. is the distribution pattern matrix of interest points. controls the importance of matrix and matrix in the decomposition process. When optimizing the objective function , the algorithm updates one set of parameters at a time and fix the others. From the above formula, the update formula for , , and isIn order to obtain the region functional attribute feature values, we propose the JNMF-based regional function discovery model. The algorithm calculates the composite arrival matrix and composite departure matrix and POI matrix of the region, combined with the JNMF algorithm, and finally obtains the regional feature values of the region. The specific algorithm is shown in Algorithm 1.
|
4.2. Crowd Flow Prediction Model
The purpose of flow prediction is to predict the flow of people in different regions in the next stage based on the historical trajectory data of people combined with weather, temperature, and other factors. CNN (Convolutional Neural Networks) is suitable for local feature extraction, while LSTM (Long Short-Term Memory) is suitable for time series processing, so the two are complementary in function. In order to improve the prediction accuracy of the prediction model, a prediction model based on CNN and LSTM is proposed in this paper. The model takes the crowd flow trajectory data as the input of CNN model, combines embedding components for feature extraction, and uses LSTM for training and prediction. The framework of our basic mobility prediction model is presented in Figure 4. As Figure 4 shows, the crowd flow prediction model consists of four parts: input module with multi-feature, embedding component, CNN component, and prediction component. Details of these modules are introduced in the following sections:

Input module with multi-feature: The time information of trajectory data exists in the form of time stamps and cannot be directly input into the prediction model as eigenvalues. We set the time interval and divide one day into several time period to obtain discrete time feature sequences and add them into feature sequences. Formula (10) maps the time of each trajectory point to the time segment, where is the time when the user appears in the region (0:00–24:00), and the time segment is defined as
In order to unify the time-length of trajectory and obtain better prediction results, we divides each trajectory data into fixed-length trajectory based on a sliding window [76–79]. Each trajectory sequence is scanned by a sliding window with , The trajectory data are combined with the weather and temperature features. The weather and temperature conditions are added into the feature vector as 1-dimensional feature vector. Finally, we introduce the weekday features. The eigenvectors of trajectories are shown in formula (11).
Embedding Component. During the epidemic period, attention should be paid to the region and the time where people gather. Using latitude and longitude as location data will result in higher data dimensions. Therefore, we use the area ID to represent the area embedding and the time segment ID to represent the time embedding. The embedding vector can represent the correlation between different categories. The two specific embedding tables reduce the feature dimension and obtain better performance and meaningful semantics, the embedding component is trainable and optimized with the whole network during the training. In this way, the high-dimensional location and time input are converted into a low-dimensional dense representation with meaningful semantics. Embedding maps the data from resource space to target space with structural-preservation.
CNN Component. Convolutional Neural Networks (CNN) are analogous to traditional artificial neural networks (ANN) in that they are comprised of neurons that self-optimise through learning. CNN is primarily used in the field of pattern recognition within images. CNN can quickly extract data features. In this paper, feature extraction of trajectory data using CNN can accelerate the speed of network training. CNN Component is composed of three types of layers. In this paper, 4 convolution layers (convolution kernel size: 16, 32, 64, and 128), the pooling operation layer, and the full-connection layer are established. The full-connection layer is reconstructed at the tail of CNN to reduce the loss of feature information.
Prediction Component. As the most successful variant of recurrent neural network, long short-term memory (LSTM) has been widely used in different sequential modeling tasks including mobility modeling tasks. Crowd flow is highly related to the historical information. The basic unit of an LSTM network is the memory block containing one or more memory cells and three adaptive, multiplicative gating units shared by all cells in the block. It works by utilizing the input, forget, and output gates that can be trained to learn, respectively, what information to store in the memory, how long to store it, and when to read it out. Suppose is the input data of the current period and is the output data of the previous period, then the states of each unit can be denoted as follows:forget gate .
Through the output of the previous period and the input of the current period, a prestate is firstly calculated for the model, and then the useful information in the prestate is finally added into the hidden layer state by the input gate.
The output gate determines the output information in the current period according to and and the current hidden layer state.
In this paper, the prediction ability is improved by increasing the model depth. Three layers of LSTM cell are set up and the hidden layers number is 128. The prediction algorithm is as the Algorithm 2 shows.
|
Computational Complexity. The computational complexity of crowd flow prediction model can be estimated by summarizing up the computational costs of the embedding CNN and LSTM components, respectively.(1)Computational Cost of Embedding Component: The computational cost of embedding component mainly depends on the dimension of input categorical vector and the embedding dimension. In particular, we denote the computational cost of the embedding component by , which can be calculated by , where is the number of time and area categorical features and is the embedding dimension. The calculation is mainly based on the concatenation of categorical feature vectors according to the embedding dimension.(2)Computational Cost of CNN Component: The standard convolution computational cost per time step in CNN component is , where is the total number of parameters in a CNN. can be calculated , where is the number of input channels, is the number of output channels, is the kernel size, and is the feature map size.(3)Computational Cost of LSTM Prediction Model Component: It is shown in [80] that the learning computational complexity per time step in LSTM prediction model is , where is the total number of parameters in a standard LSTM network. In particular, can be calculated by , where is the number of input units, is the number of memory cells, and is the number of output units.
4.3. Federated Learning Framework
In this section, we propose a novel federated learning-based privacy-preserving crowd flow prediction framework. As presented in Figure 5, the whole process of proposed framework can be divided into five steps:(i)The central server (cloud server) divides the region into different client groups based on the K-Means algorithm and the geographical proximity. According to the definition of the constrained K-Means clustering algorithm, our goal is to determine the cluster center that minimizes the Sum of Squared Error (SSE). The SSE is defined as follows: As the location is defined by the longitude and latitude, the distance between locations can be measured by Euclidean Metric Distance Squared. where denotes the mth cluster center in round t iterations, is a solution for cluster assignment and update. If and only if SSE is minimum and , we can obtain the optimal clustering center.(ii)The cloud server selects the edge central server in client group, and the edge server is the closest server to the center of client group. The design of edge central server reduces the frequency of communications between cloud server and clients in the original FL architecture, and because the edge central server is the center of client group, the prediction model trained in client group can better reflect the characteristics of the region where it is located.(iii)The cloud server distributes the copy of the global model to all clients via the edge central server, and each client group trains its copy on local data. Considering the different level of privacy protection suitable for users in different risk level areas during the epidemic compared with traditional data privacy protection, our local privacy-preserving method is to train different parts of the whole model with different data. We divide the clients with user data into different groups by their privacy leakage risk level. Lines 20–30 in Algorithm 3 illustrate the local privacy protection method. If there is a high-risk user in client group, we should pay more attention to the accuracy of the model prediction, and thus directly train the clients in the risky client group with normal data. If there is no high-risk user, we should pay more attention to user privacy protection, so we need to train the client in the normal group with protected data (e.g., noisy data protected with the differential privacy mechanism). The noisy data satisfied the differential privacy requirement is generated based on the planar Laplace mechanism introduced in [81]. We map each true location l to a randomly drawn location point p in the infinite continuous space P according to the probability density function, which is formulated as follows: where is the original real location, is the noisy location, denotes the privacy budget of differential privacy. Based on this formulation, we can generate obfuscated location data (noisy data) from the normal data to train the risky client group.(iv)Each client uploads the trained crowd flow prediction model to the edge central server. The edge central server generates a new local model by aggregating the received various local models from the clients in clients group. Every edge central server uploads the local model to cloud server. Shown in Algorithm 3, the clients upload the trained local model weights, and the edge central server needs to aggregate them to obtain a local optimal model in each client group. We choose a simple global optimization method FedAvg [82] which is utilized for federated learning settings as the default optimization method of deep learning based crowd flow prediction model in our problem. If client group has clients participate in the local training, the edge central server will obtain updated local models. Based on the local prediction model, the local optimal parameters in each client group is updated by , where denotes the learning rate, denotes the scale factor which is the sample size with the normalized loss of each local model during the training.(v)The cloud server generates a global model by aggregating the received local models from all edge central servers. The cloud server distributes the updated global model to repeat the aforementioned iii) and iv) steps until meeting the stop criteria.

|
5. Performance Analysis
5.1. Datasets
The experiments are conducted on big data platform which is IBM X3650M4 with Intel e5 CPU 16 Core (2.2 GHZ) 256 GB RAM, and 3 TB ROM. All the proposed algorithms are coded by the Python3.6 programming language in CentOS7.5. All experiments are conducted using TensorFlow and PyTorch. We selected the desensitized 4 G base station data from an anonymous user group in Nanjing from March 1 to March 22 as the traffic flow data. The data fields are as follows: anonymized user ID, timestamp, longitude, latitude, city, and grid number. We also introduce some external features include weather, temperature, and POIs, etc. The data description is shown in Table 2. The reason why we choose them is that the data possess a variety of forms of crowd flow dynamics which helps comprehensively evaluate all methods. We select the data from March 1 to March 15 as the training dataset and the data from March 15 to March 22 as the testing dataset. To simulate the flow of people with different risk levels during the COVID-19 pandemic and to test the accuracy of predictions under different levels of privacy protection in our federated learning framework, we randomly selected 5 person in the data as high-risk person (red health code) and 10 person as medium risk users (yellow health code) out of 6000 anonymous users.
5.2. Baselines and Evaluation Metrics
We compared the proposed model, labeled as “CFPF,” with the following 6 baseline methods.(1)ARIMA: Auto Regressive Integrated Moving Average is a classic statistics-based method for time series prediction. This method is a common model in the field of time series prediction., which is widely used in the early research period.(2)LSTM: Long Short-Term Memory network is an improved version of RNN. It has the advantages of capturing the long temporal feature of input time series. Recently, LSTM has been applied in traffic flow forecasting. We used the implementation in TensorFlow.(3)ConvLSTM: ConvLSTM is a variant of LSTM which contains a convolution operation inside the LSTM cell. ConvLSTM considers both geographical spatial and temporal dependency of the spatial-temporal data, and is widely used in many spatial-temporal prediction tasks.(4)ST-ResNet: Spatio-Temporal Residual Network (ST-ResNet) is a state-of-the-art neural network-based single-task learning model for urban crowd flow prediction. It stacks convolutional layers and residual unites to capture the spatial and short/long-term temporal dependencies.(5)DeepST: It is a deep spatio-temporal neural network designed for urban crowd flow prediction. It uses a temporal dependent instances to capture the temporal closeness, period seasonal trend, and convolutional neural network to capture near and far spatial dependencies.(6)GEML: Grid Embedding-based Multi-Task Learning (GEML) is a multi-task learning framework that predicts the flow OD matrix and crowd flows simultaneously. It uses grid embedding and multi-task LSTM to capture the spatial-temporal representations of the crowd flow data.
We compare the prediction performance of different methods in terms of Mean Absolute Error (MAE), mean absolute percentage error (MAPE)j and Root Mean Square Error (RMSE). We adopt MAE, MAPE ,and RMSE as the evaluation metrics defined as follows:where is the observed crowd flow, and is the predicted crowd flow.
5.3. Overall Performance
In this section, we first introduce the model parameters in the prediction framework of this paper and then analyze the accuracy of the prediction algorithm in this paper, present the comparison results with baseline models under the constrain of privacy-preserving settings, the efficiency of the algorithm, the analysis of privacy protection and analyze the effects of key parameters in the training procedure of the proposed system.
In the federal learning architecture of this paper, the model size is about 50 M, the local training epoch is about 100 seconds, the transfer rate size is 12 MB/s, and the max communication overhead is about 4% of the overall time. Using the extranet environment, the transfer rate drops to 4 M/s, and the communication overhead accounts for 12.5% of the overall duration. The main parameters of our privacy-preserving crowd flow prediction are set as follows: (1) The parameters for the crowd flow prediction model: the size of area embedding = 64, the size of time embedding = 24; (2) hyper-parameters for the local training of prediction model: learning rate = 0.02, mini-batch size = 128, dropout rate = 0.5, and 107 base station locations (client = 107) in the experimental area were statistically analyzed, 10 central servers were obtained using K-means clustering algorithm.; (3) hyper-parameters for the global aggregation: local epoch = 1.
5.3.1. Prediction Accuracy (RE)
During the COVID-19 outbreak, we focus on the movement of crowd between functional areas. For example, the movement of crowds from an office building to a shopping mall. Therefore, the number of crowds in office buildings and shopping malls in a specific area was selected as the experimental observation area in the experiment. Among the baseline algorithms, the ConvLSTM and ST-ResNet algorithms have higher prediction accuracy, so we compare the CFPF algorithm with the ConvLSTM and ST-ResNet algorithms in the prediction accuracy session. We compared the performance of our crowd flow prediction framework with the baseline methods in two days of weekdays and weekends for prediction, as shown in Figures 6 and 7. The experimental results show that the CFPF and ST-ResNet models are with much better accuracy when compared with ConvLSTM. The prediction performance is given in Table 3 for all baselines performance comparison. From the results, it can be observed that MAE of ST-ResNet is lower than those of ARIMA, LSTM, ConvLSTM, DeepST, and GEML. Figure 8 illustrates the loss of the ST-ResNet model and the CFPF model. From the results, the loss of the CFPF model is not significantly different from the ST-ResNet model. This proves that the CFPF has good convergence and stability. Compared with ST-ResNet, the CFPF model predicts with higher accuracy. Our model can perfectly capture the proposed spatio-temporal characteristics and temporal queue. The result also shows that the designed crowd flow prediction framework is effective to complex crowded environment from the large area data.



5.4. Performance Comparison and Effects of System Parameters
In this section, we present the comparison results with baseline models under the constrain of privacy-preserving settings and analyze the effects of system parameters in the training procedure of the proposed system.
From Table 3, we can draw three key conclusions: First, federated learning framework performs better than individual machine learning models. The traditional statistics-based method ARIMA performs poorly because it uses only time series data ignoring the geographical correlation between regions. The model performance is improved by introducing ConvLSTM and LSTM learning models with spatio-temporal data features. This indicates that combining spatio-temporal feature values will be beneficial to improve the model accuracy. Second, the deep learning model ST-ResNet and the multi-task learning model DeepST reflect the superiority of deep learning models in predicting crowd mobility data. And the introduction of multi-task learning and federated learning framework has a great improvement on the deep model performance. Third, the prediction accuracy of our CFPF model is affected by the privacy-preserving settings, and this method still achieves a competitive performance after adjustment and optimization.
In this section of experiments, we explore the impact of different number of edge central server (i.e., ), number of clients (i.e., ), and the CNN depth (i.e, ) on the overall performance of the algorithm. In this paper, represents the number of central servers in the geographic space, obtained using the clustering algorithm. In the area of the experiment, we selected different edge central servers (i.e., different values of K) to test the performance of the CFPF, as shown in Table 4. In this experiment, the prediction accuracy and performance of the algorithm can reach the optimal case for K = 8. This indicates that in our scheme the overall prediction efficiency is improved by classifying similar spatio-temporal feature regions and populations into the same class by clustering. In Figure 9, the number of clients is equal to the number of location data servers (e.g., base stations) involved in training in the experimental area. Figure 10 shows the impact of the number of clients selected to participate in the training on the final performance. As the number of clients increases from 100 to 250, the final performance keeps improving. However, we observe that the overall accuracy and performance of the algorithm does not significantly improve anymore when the number of clients increases from 250 to 350. This indicates that the number of clients involved in the training is not as large as possible, but too many clients will affect the performance of the system and increase the system running time. To analyze the effect of the depth of the CNN on the algorithm, we choose RMSE as a measure of the efficiency of the algorithm. As shown in Figure 11, when the depth of CNN increases from 1 to 4, the RMSE decreases rapidly from 17.67 to 9.89; when the depth increases from 4 to 10, the RMSE decreases slowly or even rebound. This indicates that the depth of the CNN has an improvement on the algorithm efficiency within a certain range. Therefore, we believe that choosing an appropriate CNN depth (CNN = 4 in this paper) for the training dataset is beneficial to the overall prediction accuracy of the system. Based on the aforementioned results, by considering the final performance and accuracy, choosing proper clients, edge central server and CNN depth to participate in the optimization becomes important for the system.



5.5. Privacy Analysis
We discuss the privacy-preserving capability of the CFPF proposed in this paper from the following aspects:(1)System Architecture: The CFPF is developed based on the FL framework. In this system, we avoid uploading any private data directly to the cloud server, only the trained model weights and parameters (e.g., gradient) are uploaded, and the private data are only stored and accessed on the local device. In this way, private data seems to be protected naturally.(2)Data Protection: For data with high privacy protection level, we propose a privacy-preserving local optimization method which is based on LDP. In this paper, clients are divided into different according to different privacy protection levels. The client without privacy issue is to be trained with the normal data, The client with privacy risk is to be trained with the protected data (, ). The noisy data satisfied the differential privacy requirement is generated based on the planar Laplace mechanism introduced in [72]. On the obfuscated location data (noisy data) from the normal data to train the risky client group to make it robust to the potential attack.
To analyze the impact of privacy budget on performance, we use the accuracy of the algorithm as a measure. In Figure 12, we assign a privacy budget from 0 to 10 to the algorithm CFPF. Experiments show that the algorithm maintains high accuracy when the privacy budget increased to 3. This is because more privacy budgets are required for complex prediction tasks. Since the CFPF model is based on a distributed privacy framework, the predictions are trained by encrypting and reaggregating the data from the distributed clients instead of directly accessing the original data. Experiments show that the CFPF model achieves a balance between privacy protection and data availability.

6. Conclusions and the Future Work
In this paper, we study crowd flow prediction model with privacy protection. Based on the concept of federation learning, we propose a practical crowd prediction framework by introducing “Multi-Factors CNN-LSTM” algorithm to analyze the key factors affecting crowd flow during epidemics in order to achieve the desired prediction performance. In terms of privacy protection, the private data are protected by differential privacy mechanism, while the data are only used in the local server and there is no transmission of sensitive data in the system. We compare our method with existing ARIMA, LSTM, ConvLSTM, ST-ResNet, DeepST and
GEML methods in 4G base station dataset. The experimental results show that the method achieves a better balance of prediction accuracy and data privacy protection compared with competing methods. The future work can be divided into two directions: (1) optimizing the existing prediction models to better capture the spatio-temporal correlations among traffic flow data in order to further improve the prediction accuracy. (2) In this paper, only simple crowd flow scenarios during epidemics are considered, and we will optimize the whole framework to adapt to more complex and realistic scenarios.
Data Availability
The experiment data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61872197, 61972209), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX180891), the Natural Science Foundation of Jiangsu Province (BK20161516, BK20160916), and the Postdoctoral Science Foundation Project of China (2016M601859), and in part by the Postgraduate Research & Practice Innovation Program of Jiangsu Province KYCX210789.