The unprecedented COVID-19 pandemic impacts negatively on the security and development of human society. Comparison and analysis of intercity highway travel patterns before and during the COVID-19 pandemic can bring vital insights for the prevention and control of the pandemic. Empirical studies are conducted using cellular network-based datasets associated with two groups of city pairs in China heavily affected by COVID-19. Spatial matching, full-sample extrapolation, and trajectory feature analysis are adopted to attain travel volumes of intercity highways during four different periods. The reliability of origin-destination (OD) matrices calculated based on the cellular network-based dataset is demonstrated by comparing with the fluctuation trend of traffic count data. The empirical studies show that the OD flows associated with passenger cars on intercity highways in China decreased significantly during COVID-19. With the effective implementation of the pandemic prevention control policy and the orderly promotion of the recovery to work and production, the volumes of intercity highway OD flows returned to the pre-pandemic level in mid-April 2020. Besides, the peak of passenger car trips decreases and the time span for truck trips gets longer owing to implemented control measures in dealing with COVID-19. The results can be applied to the calculation of OD flows between most adjacent cities and analyze the intercity highway traffic travel patterns changes, which provide insightful implications for making intercity travel safety prevention and control policies under epidemic conditions.

1. Introduction

The COVID-19 pandemic has spread rapidly since December 2019, causing a severe adverse impact on economic development and social security worldwide [15]. By June 1, 2021, COVID-19 cumulatively infected more than 170 million people worldwide and killed more than 3.5 million people. Strict travel restrictions have been imposed in all countries to control the spread of the pandemic.

The transmission characteristics of the COVID-19 pandemic in China can be roughly summarized as the pattern of “case clustering and mobile diffusion” [6, 7]. To prevent the spread of the pandemic, in January 2020, all 31 provinces in mainland China activated the Level 1 response to public health emergencies and strictly controlled the movement of people and goods within and outside the country.

In the context of the COVID-19 pandemic, understanding the characteristics of population movements on intercity highways plays a vital role in the analysis of transmission patterns and the development of solutions to preventing and controlling the pandemic [811]. However, the main challenge faced by many researchers is the complexity of pandemic conditions and the difficulty of collecting actual data. Mobile telephone systems are considered as a promising technology for the traffic data collection system, as well as a valuable source for providing near-human movement information and approximating population-level movement patterns [12]. Unlike traditional road-based point surveys and community-wide small sample censuses, the cellular network-based data are a full sample with a higher penetration rate and temporal resolution. A considerable amount of papers has conducted empirical studies to demonstrate the capability of cellular network-based data for capturing human movement patterns, identifying activity locations, monitoring transport patterns, inferring travel purposes, deriving OD matrices, and so on [1315].

During the COVID-19 pandemic, many investigators have examined the positive correlation between human activity and pandemic transmission by conducting empirical studies using mobile telephone systems [1620]. The data informatics generated by these studies help the public for a timely understanding of mobility trends and policy effects, as well as for time-sensitive decision support to further contain the spread of the virus. However, there are still some areas that deserve analytical attention [21, 22] due to the lack of in-depth understanding of COVID-19 and the need to further implement highway travel safety policies. Firstly, previous studies based on cell phone signaling data under the epidemic only analyzed cell phone location and determined the travel OD matrix, but no studies differentiated travel modes. Secondly, most previous studies defaulted to the complete accuracy of cellular network-based data and overlooked the fact that multiple operators exist in a region. Consequently, the cell phone signaling data from a single operator are not representative enough. The highway traffic count data can obtain the full-sample traffic cross-sectional data of road trips, which can make up for the lack of representativeness of cellular network-based data and combine it with highway for analyzes. Based on cellular network-based data and highway traffic count data, this study retrospectively analyzes the changes in intercity highway traffic travel characteristics before and after the COVID-19 pandemic and provides targeted suggestions on some public safety issues that existed during the pandemic.

2. Dataset Characteristics

2.1. Cellular Network-Based Data

At present, there are already many types of cellular data based on mobile phone collection [15]. According to different switching methods of the core network, cellular network-based data can be divided into two categories: circuit-switched (CS) domain and packet-switched (PS) domain.

The CS domain mainly provides telephony services. During a single call, both the mobile terminals always occupy a dedicated channel, which cannot be used by other mobile terminals. The operator records a set of information throughout the call, including interaction ways of the mobile terminal with the signaling network within the area, other communication devices, or equipment. Typical of the data generated by circuit-switched networks is the call detail records (CDRs). The data are characterized by small data volumes, passive generation, wide coverage, low costs, short analysis cycles, etc.

The PS domain mainly provides Internet data services. When users use the cellular network for data transmission, the packet-switched network divides the user data into messages of a certain length for transmission and exchange. Each message of a certain length is called a packet. The operator will record the interaction of the mobile terminal with the regional base station during packet switching, including the timestamp and the necessary fields for user identification. Typical data generated by packet-switched networks are the visitor location registry (VLR). The data generated by the packet-switched network are characterized by the large volume, high real time, costly processing, long analysis cycles, etc.

The CDR and VLR data are usually collected through active and passive events:(i)Active events include making or receiving telephone calls, sending or receiving text messages, mobile phone Internet logs, and switching the phone on and off(ii)Passive events include periodic location updates, movement of users into a new set of cellular station areas, and cellular signal switching between different communication eras

The dataset collected for this study covers about 18 million users and contains about 23 billion pieces of information on Internet records daily (Table 1). The dataset contains both the information about coverage areas’ shapes and the geographical location of each base station. Combining the interaction data between the user and the base station with the geographical coverage area of the base station enables us to analyze and construct the user’s trajectory from a spatial and temporal perspective [23]. As shown in Figure 1, these trajectories are temporally and spatially sparse.

2.2. Traffic Count Data

The highway traffic survey is an indispensable part of the statistical work for the Ministry of Transport. By siting continue count station (CCS) or deploying portable traffic recorder (PTR) on the highway, the highway traffic survey is conducted to understand operational characteristics of highway systems utilizing statistics, analysis, and forecasting [24].

CCS transmits data to the traffic investigation system every five minutes, containing the traffic count data and speed of each vehicle type during those five minutes. Traffic count data are normally collected as part of a continuous count program. The primary objective of the program is to develop hour of day (HOD), day of week (DOW), month of year (MOY), and yearly factors to expand short-duration counts to annual average daily traffic (AADT). The CCS can be used to develop adjustment factors, track traffic volume trends on important highway segments, and provide inputs to traffic management and traveler information systems.

3. Data Preprocessing

3.1. Trajectory Data Arrangement and Processing

The CDR and VLR data obtained directly from the operator are full samples and contain only anonymous users, cell numbers, and timestamps for each activity. To simplify the description, the first step is to integrate the trajectories of each anonymous user in the dataset [2527]. Each trajectory consists of a series of activity points. For example, the trajectory of user is shown as follows:

In formula (1), let denote the point of activity of any trajectory, which means that user appears on cell at time . The data volume of VLR is enormous, and it is very difficult to analyze the dataset directly. Therefore, the size of the dataset needs to be reduced appropriately [28, 29].

Firstly, for trajectories with only few cell records, the CDR and VLR data may be unable to represent the travel characteristics of users. Therefore, trajectories with less than eight cell phone locations are excluded [30].

Secondly, the VRL data packetize the transmission data during transmission, and the device prefers the nearest base station to transmit. When the burden of the nearest base station reaches a threshold value, the base station will no longer receive new data and transmit them, and the mobile will automatically select another relatively close base station nearby . The location coordinates of each base station are denoted by , and the number of interactions between the mobile terminal and each nearby base station during the statistical time period is denoted by . According to the weighted center-of-mass point model, the position of point during time period is calculated as follows:

Since the geographical coverage area of the base station is a grid of 250250 m, by slicing the map into a grid of 250250 m size, the user location can be transformed from the base station cell range to the grid-based index. The accuracy of the calculated position is doubled compared with that of the original base station location, and the size of the dataset can be significantly reduced. This study defines the time period for the weighted center-of-mass model calculation [31]. Since the target of the analysis is intercity travel, drift deviations of a few hundred meters in location can be ignored.

3.2. Identification of Origin and Destination Matrices

A stay point is defined by consecutive grid records restricted by both temporal and spatial constraints. The spatial constraint refers to the deviation distance for a user to stay at a location, and the size of the deviation distance is usually related to the signal accuracy [3234]. In this study, since each point is assigned to its own grid and the grid is a 250250 grid according to the previous step, the constraints on the spatial location can be satisfied. The temporal constraint refers to the minimum time a user can stay at a grid, usually calculated by the time difference between the first record and the last record at a grid. In this study, we adapt the stay-point algorithm initially described by Zheng and Xie [33]. The results show that only trajectory points staying in a grid continuously for more than 2.5 hours are recognized as stay points, which are the arrival points of the previous trajectory and the starting points of the subsequent trajectory for users. After the users’ travel OD matrices are identified, in accordance with Calabrese’s research [31], the trajectories of all users are organized according to the following steps: (1) match the grid’s location to the , , where it is geographically located; (2) integrate the origin city, destination city, and departure time for each trajectory of each user into , where denotes the user id, the origin city, the destination city, and the departure time; and (3) aggregate all user trips that have the same origin city and destination city by hourly and daily time windows to , where is used to denote the departure date and the departure hour is denoted by . The aggregated results are summarized into a three-dimensional matrix . Elements represent the number of trips from the departure city to the arrival city in the time period .

3.3. Data Augmentation

The CDR and VLR data are obtained from a single operator in this study, and the operators occupy different market shares in different cities. Therefore, if cellular network-based data of a single operator are directly used for intercity highway travel analysis, the analysis results will be biased owing to the different market shares. Consequently, to obtain the accurate number of trips for the whole society, it is necessary to expand the existing dataset for each city to the full-sample data.

With the CDR data, it is possible to obtain the call records of that operator’s subscribers with all operators’ subscribers in the whole society. Since the calls between users are random and independent of the operator’s brand, the operator’s market share in a region can be measured by the call records of users in that region. For city , if the number of dials made by operator brand to all users in the city is and the number of calls for the called operator is , then the following formula is given:where let denote the proportion of mobile phone users with operator brand to all mobile phone users in the city , i.e., the market share. The full sample can be extrapolated based on the market share of the operator in each city.

3.4. Highway Trip Calculation

For intercity travel during the pandemic, the main travel modes currently include highway (), waterway (), railway (), and airlift () [29]. Let the total number of intercity trips on day be , which is obtained as follows:

Due to the complexity and large volume of the highway transport network, it is difficult to directly determine the number of highway trips, so we can first determine the number of trips by waterway, railway, and airlift, and subtract them from the total amount to calculate the number of highway trips.

For the trips of travel by waterway and railway, as there are fixed entry/exit points and travel routes for these two modes, the quantity can be determined by means of trajectory matching. For traveling tracks by air, as the ground base station signal is not covered during the air travel of the aircraft, there exist both long signal interruptions and long-distance location movements after the signal is connected. Therefore, by matching the features of the critical nodes of the travel trajectory, the trips by waterway, railway, and airlift can be determined separately, leading to the calculation of the highway trips.

4. Example Analysis

This study used calculation, expansion, and selection to select representative pairs of cities around Hubei Province for travel analysis: Changsha-Yueyang (two cities in Hunan Province) and Hefei-Lu’an (two cities in Anhui Province). Both Hunan and Anhui provinces are adjacent to Hubei Province and were relatively seriously affected in the early stages of the pandemic outbreak, with Changsha-Yueyang and Hefei-Lu’an being the closest routes to Hubei from these two provinces. Among them, Changsha and Yueyang are mainly connected via the national expressway G4 and the ordinary national highway G107 (Figure 2), with closer intercity passenger connections. Hefei and Lu’an are connected by highway mainly through the national expressway highway G40 and the ordinary national highway G312 (Figure 3), with closer intercity freight connections.

Therefore, studying the OD flows between these two city pairs around the COVID-19 outbreak is essential for analyzing the spread and control of the COVID-19 pandemic. For the time of the pandemic, this study selects four periods of OD flows for analysis in 2020: before the outbreak (January 6 to January 12), during the outbreak of COVID-19 pandemic (February 3 to February 9), resumption of work and production (March 2 to March 8), and after the unsealing of Wuhan (April 8 to April 14). Since each of the four time periods is an important representative time period before and during the pandemic, it is important to analyze the trends of intercity OD flows changes during this period. Based on the above steps, the intercity OD flows were calculated as shown in Figures 4 and 5.

In the comparison of the OD flows between the two groups of cities, Changsha-Yueyang (Figure 4) and Hefei-Lu’an (Figure 5), both showed large fluctuations around the time of the pandemic, and the fluctuation trends were similar in general. The Chinese New Year was celebrated on January 25, 2020, and 15 days before the Chinese New Year, China started its annual large population migration. The number of OD trips between the two groups of cities increased day by day during the period before January 12, 2020, when the Spring Festival had almost begun and the outbreak had not yet occurred. In the post-outbreak period (February 3, 2020-February 9, 2020), intercity OD travel was decreased to one-fourth of the pre-outbreak level. By February 6, 2020, the traffic volume reached its lowest value. In March 2020, China started to promote the return to work and production following the zoning of the pandemic, and the highway travel volume increased significantly compared with February. On April 8, 2020, China's Wuhan began to lift its 77-day coronavirus lockdown, and highway traffic volume gradually returned to pre-outbreak level.

5. Data Validation

It is necessary to compare and analyze the calculated and processed OD flows with the real collected highway traffic data for verifying the accuracy and representativeness of the OD flows calculated by cellular network-based data. As described in Section 1.2, the highway traffic survey is an important part of China’s statistics and an essential means to be informed of the operational status of the highway network. By installing fixed sensors at certain distance intervals on the highway and setting up CCS, the system can obtain traffic count data and vehicle share for that highway section with approximately 90% accuracy in real time. Since the data collected by the traffic survey stations only include data concerning vehicles, and the OD flows are specific to individuals, this study only analyzes the correlation between the cellular network-based data and the fluctuation trend of traffic count data to determine the accuracy of cell phone signaling.

The study selected traffic count data collected on major intercity highways during the same time period and calculated the Poisson correlation between traffic count data and OD matrices calculated by cellular network-based data. The correlation coefficients of Changsha-Yueyang and Hefei-Lu’an are 0.803 and 0.706, respectively, indicating that there is a strong correlation between cell phone signaling data and traffic count data. The results provide strong evidence to show that cellular network-based data can support intercity travel OD analysis. Matching the proportion of vehicle types from the traffic count data to the OD flows enables us to analyze the changes in passenger and truck traffic on the intercity highways before and after the epidemic.

The two sets of intercity OD flows data were split according to the vehicle share of the traffic count data and the average hourly numbers of the OD flows using different types of vehicles in the four time periods.

The highway OD flows between Changsha and Yueyang were dominated by passenger cars before the outbreak of COVID-19, accounting for about two-thirds of the total trips (Figure 6). The trip volume of passenger cars was high and showed obvious morning and afternoon peaks. After the outbreak of COVID-19, OD flows dropped rapidly, but the share of passenger cars increased significantly, accounting for about 80% of total trips. The OD flows by passenger car in the morning peak dropped rapidly to one-fourth of the pre-outbreak level, and the OD flows by trucks even fell to 11%. At the same time, the morning peak of OD flows by passenger car became insignificant, with a new insignificant peak in the evening. With the pandemic under effective control and the orderly return to work and production, the OD flows increased together with the rapid increase in the proportion of truck traffic, with early morning truck travel exceeding that of passenger cars. Intercity travel time was decentralized, with the number of morning and evening trips gradually increasing, while afternoon peak, where travel time was originally concentrated, was not increased significantly. After the lifting of closure in Wuhan, its overall OD volume has returned to pre-pandemic levels. However, to speed up the resumption of work and production but with as little bunching of trips as possible, the travel timing during this period was relatively deconcentrated, with the morning and afternoon peaks being lower than before the epidemic. At the same time, morning truck trips continued to be higher than passenger car trips during this period.

Contrary to the share of Changsha-Yueyang intercity travel models, the intercity travel pattern for Hefei and Lu’an was dominated by trucks before the pandemic, and passenger car trips accounted for only one-third of total trips (Figure 7). Meanwhile, the OD flows between Hefei and Lu’an cities were small and relatively insignificant in the morning and evening peaks. After the outbreak of COVID-19, the intercity OD flows also dropped sharply, the peak travel performance was not prominent, and OD flows by passenger car slightly exceeded that of the truck. During the resumption of work and production, intercity OD flows recovered to half of the pre-epidemic level. The OD flows of the two travel modes were the same, and the trips were more evenly distributed. After Wuhan began to lift its coronavirus lockdown, highway travel OD flows were nearly restored to pre-pandemic levels. Remarkably, the departure times of OD flows by truck were more widely distributed during this period because more and more truck drivers chose to travel earlier or later in the day. Hourly traffic growth before the morning peak was twice as fast as before the outbreak.

6. Conclusion

This study presents one kind of data analysis approach to obtain OD flows on intercity highways involving different vehicle types based on cellular network-based data and traffic count data. This method can be applied to the calculation of OD flows between most adjacent cities. Findings are helpful for governments of various countries to monitor the intercity highway traffic patterns and to improve the effectiveness of policies in areas with different social populations.

For four time periods with representative characteristics before and after the outbreak of COVID-19, the study selects two groups of city pairs around Hubei Province that were highly affected for analysis. After screening and processing the CDR and VLR data within the coverage area, OD flows on intercity highways during the four time periods are observed. A correlation analysis between the derived OD flows and the actual traffic count data is performed to demonstrate the reliability of the cellular network-based data and to match the vehicle share of the traffic count data to the calculated OD flows. Finally, the OD flows by different travel modes on intercity highways before and after the pandemic are analyzed and studied. The conclusion of this study is listed as follows:(1)After the outbreak of the COVID-19 pandemic, intercity highway OD flows were greatly affected and dropped significantly. With the effective implementation of the pandemic safety prevention control strategies and the orderly promotion of the return to work and production, the intercity highway OD flows recovered to the pre-pandemic level in mid-April 2020.(2)During the pandemic, the percentage of highway travel by passenger car increased, but the morning peak was not significant. Although the OD flows gradually increased afterward, the peak of OD flows by passenger car remained insignificant, and a new evening peak appeared to control the spread of the pandemic and ensure the safety of travel.(3)OD flows by truck declined significantly after the outbreak compared with OD flows by passenger car. With the orderly promotion of the return to work and production, the OD flows by truck rebounded more quickly. The time span of truck trips during this period was greater, and more and more trucks chose to travel earlier or later in the day.

Due to the data collected by the traffic survey station are in units of vehicles, the data of the OD flows are in units of people. This method cannot accurately compare the correlation between the two in terms of numerical value, which is also the possible shortcoming of the method.

In the next step, we will analyze the changes in highway freight transportation before and after the outbreak of COVID-19 based on the current empirical data and combined with vehicle weighing data. Meanwhile, the analysis will also be more combined with the epidemic prevention policies of various cities.

Data Availability

Mobile phone signaling data are the data obtained from a Chinese mobile phone operator through purchase. The data on the traffic volume of passenger cars and trucks are collected by the China National Traffic Survey Data Collection System, so they are not free.

Conflicts of Interest

The authors declare that they have no conflicts of interest.