The COVID-19 pandemic spread catastrophically over the world since the spring of 2020. In this paper, a heterogeneous branching process with immigration is established to quantify the human-to-human transmission of COVID-19 in local communities, based on the temporal and structural transmission patterns extracted from public case disclosures by four provincial Health Commissions in China. With proper parameter settings, our branching model matches the actual transmission chains satisfactorily and, therefore, sheds light on the underlying COVID-19 spreading mechanism. Moreover, based on our branching model, the efficacy of home quarantine and social distancing are explored, providing a reference for the effective prevention of COVID-19 worldwide.

1. Introduction

The COVID-19 spread alarmingly fast in Wuhan in late January 2020 before the city’s lockdown starting Jan. 23. Based on the public reports on the number of confirmed cases, the prevalence of COVID-19 outside Hubei Province came to a controllable size in late February. The Wuhan lockdown eventually ended on Apr. 8, 76 days since its commencement, after the confirmation of COVID-19 under control in China.

An extensive amount of research has been conducted to understand the spreading features of COVID-19. There are two primary directions. One is the clinic feature, focusing on the virus itself [1, 2], such as the estimation of the basic reproduction number [35], the effective reproduction number [6], and the basic statistical results obtained from the confirmed cases, such as the incubation period, the serial time, and the secondary attack rate [79]. The other direction is the spreading dynamics, which is studied mainly through mathematical models, such as SEIR [1014] and branching model [15, 16]. The effect of lockdown of Wuhan [17, 18] and different levels of isolations are also considered based on generalized or specialized SEIR models [15].

In this paper, public reports of line-list confirmed cases in Anhui, Henan, Jiangsu, and Zhejiang provinces from Jan. 21 to Feb. 19, 2020, covering 30 days, were collected and analyzed. Due to effective isolation policies, such as suggesting people to stay at home, wearing masks, washing hands, and tracing close contacts, the epidemic got under control within about one month in the above four provinces, which is approximately two or three generations according to the serial time. Short transmission chains are not appropriate to be modeled by the SEIR model, which usually simulates transmissions using multiple iterations. Therefore, instead of SEIR, we propose a branching process to model the spreading of COVID-19 in well-prevented regions in China. Based on the statistical results extracted from our data, two influential factors for the propagation of COVID-19 are considered: the migration from outside a particular community and the efficacy of containment within the local communities.

In fact, SEIR models and branching models are both superior candidates for modeling classical epidemic spreading. Under certain conditions, such as when the total population is large enough, the two models are equivalent in modeling general epidemic dynamics; see [19] for theoretical support. In SEIR models, transmissions may happen in multiple iterations and result in longer transmission chains than the actual situation. In contrast, the branching process is more flexible in modeling the efficacy by the cutoff of the transmission chain in model assumption. On the contrary, the branching model is also flexible in modeling the confirmed cases of COVID-19 with different sources of contact, that is, imported or local, which are modeled by immigration and branching parts, respectively. In SEIR models, it is difficult to distinguish the sources of contact. The above comparison is listed in Table 1.

To sum up, instead of the well-known SEIR model, a heterogeneous branching process with immigration is established to explore the diffusion of COVID-19 in well-prevented local communities in China. In our branching model, heterogeneity is caused by the distribution of serial time, immigration is the confirmed cases coming to a certain local community from outside, and the secondary cases infected by the imported infectors are modeled as their offsprings with the specific branching mechanism. Further transmissions are modeled as further offsprings with similar rules. All parameters in the model are extracted and approximated from real data. The feasibility of this approach is verified by back analysis of choosing proper parameters which represent isolation strength and social distance. It turns out that our model matches the real data very well. The efficacy of the containment measures is also simulated with our branching model. Our findings reveal the spreading mechanism of the COVID-19 from an individual to the population level in well-prevented local communities. The effectiveness of isolation measures in local communities obtained in our work can shed light on preventing the global pandemic spreading of COVID-19.

An outline of this paper is as follows. The data description is given in Section 2. The branching model is built in Section 3, with parameters obtained by statistical analysis from the real data. The validation of our branching model and the impact analysis of the isolation parameters in our model are explored in Section 4. Conclusions and discussions are given in Section 5.

2. Data Description

The data in this paper are extracted from the reports of confirmed cases collected in Anhui (totally 887 cases), Henan (totally 1279 cases), Jiangsu (totally 577 cases), and Zhejiang (totally 1137 cases) from Jan. 21 to Feb. 19, 2020. The locations of these four provinces, as well as Hubei, are illustrated in Figure 1. The color refers to the number of confirmed cases we collected in each region till Feb. 19, 2020. A typically reported item is as follows.

“Patient ID: Huainan-25. The patient Huainan-25 is a 59 year-old woman who is the wife of the Huainan-26 patient. On Feb. 12, she developed fever, muscle soreness, and other symptoms. On Feb. 14, she went to the hospital for treatment and stayed at the hospital for observation. On Feb. 15, her nucleic acid test was deemed positive, and doctors diagnosed her as a suspected patient. Two days later, she was confirmed. Doctors have traced back three close contacts, all of whom have been quarantined for medical observation. During the Chinese New Year’s holiday, she had close contact with her daughter, son-in-law, and granddaughter. Her son-in-law, an asymptomatic patient with a history of suspicious exposure in Hefei, stayed at a designated hospital for observation. Doctors have traced back his 46 close contacts, all of whom have been quarantined for medical observation.”

For the patient with ID “Huainan-25,” Huainan refers to the city where the patient lives, and 25 means that she is the 25th confirmed case in Huainan. The cases we selected are the ones with partial or full of the following information: (1) date of confirmation, (2) whether or not be an imported case (that is, infected outside the local community or not), (3) date of his/her infector’s confirmation, and (4) relationship between a primary case (infector) and a secondary case (infectee). After extracting the necessary information we need, sample sizes for (1) and (2) are 831 for Anhui, 967 for Henan, 299 for Jiangsu, and 1,051 for Zhejiang, respectively. For (3) and (4), 411 cases (with Anhui 234 and Henan 177) are obtained.

Based on the actual data, statistical results concerning the key features during the spreading are illustrated in Section 3, including (1) the imported and local new cases evolving with time, (2) the main relationships between infector and infectee, and (3) the serial interval, that is, the time interval of confirmation times between each pair of infector and infectee.

3. Model Description: Heterogeneous Branching Process with Immigration

Naturally, a strict isolation policy is urgently needed for highly infectious diseases without pharmaceutical measures to prevent its epidemic effectively. The detailed reports of confirmed cases provide necessary information to understand the mechanism of COVID-19 transmission. The tracing back and isolation of close contacts efficiently cut off the transmission chain such that the imported cases to a specific region could only transmit the virus for few more generations. The serial interval and the incubation period are two of the key factors for prevention policymaking, from which the suggested length of isolation is commonly set as at least 14 days.

A heterogeneous branching process with immigration is established considering three ingredients for modeling, which are (1) the temporal pattern of serial time, (2) the structural pattern of transmission considering containment measures, and (3) the import of confirmed cases which begins the prevalence of COVID-19 in local communities. The framework of our branching model is given in Section 3.1. The values of parameters and distributions of random variables in our branching model are extracted from the real data in Sections 3.2, 3.3, and 3.4. The validation of our model and simulation results for different isolation levels and social distances are given in Section 4.

3.1. The Framework of the Branching Model

Heterogeneous branching processes with immigration are well suited to describe the temporal evolution of populations in which individuals appear randomly over time in accordance with two distinct mechanisms. One mechanism, called immigration, is the influx of new individuals in the population of which they are not natives. The other mechanism, referred to as branching, is how individuals of the population generate new offspring. In this paper, we consider a heterogeneous branching process with immigration, in which(i)the branching mechanism is used to model the spreading of the virus in local communities with heterogeneity caused by serial time(ii)the immigration is a time-dependent Poissionian process, modeling the imported cases coming from the outside of a certain region

In the following, immigration, offspring distribution, and serial time are discussed in detail with values or distributions obtained from statistical analysis of the real data described in Section 2.

3.2. Immigration

The imported cases with contact history from outside of a local region are described as immigration. In our branching model, the immigration process is modeled by a time-dependent Poissionian process with a varying rate . That is, the number of immigration arrived on the th day, denoted as for , possesses the following distribution law:

From the line-list reports, the number of imported cases and the number of local cases changing with time are obtained. Figure 2 illustrates the data results of the four provinces we considered. The red and black curves refer to the imported and local cases, respectively. The immigration process established in our model is extracted from the imported sequences of the four provinces, referring to the red curves in Figure 2.

It is apparent that the first spreading in local communities is due to the import of confirmed cases from outside of the considered local community. In the beginning, the imported cases are more than the local ones. Then, several days later, local cases began to increase. Whether or not an outbreak will happen depends on the prevention policy of local communities as long as the import path is completely cut off at an early stage.

3.3. Offspring

The number of potential secondary cases produced by each infective individual is called the offspring in our branching model, which comprises two parts considering the place where infection of COVID-19 happens.

One is within a family, drawn from a binomial distribution Bin (, ), where is a random variable representing the number of family members and is the probability of getting infected within a family by the first infected member. It is notable that, for , there is no other family member to be infected. Moreover, high transmission of COVID-19 results in a large value of . In our model, the transmission within a family happens with probability according to the statistical result that about 90% infection happened between family members.

The relationship between each pair of infector and infectee is counted. Due to home quarantine and high transmissibility of the COVID-19, the family members of the imported infectious ones are at super high risk of being infected. The top three relationships between an infector and an infectee are between couples, from parent to child, and from child to parent. The number and ratio of cases for the three relationships are shown in Table 2.

For the number of family members , the reference distribution comes from the Chinese statistical yearbook of 2020. The distribution is illustrated in Table 3.

The other part of the offspring happens out of their homes. Assume the probability of leaving home is . Moreover, the number of potential infectees outside homes follows Poissionian distribution Poi (), . In other words, represents the strength of home quarantine, and measures the effect of social distance. Smaller and/or means more strict containment of COVID-19 in certain regions.

The final assumption comes from the isolation and tracing back policy. The secondary infectors’ behavior is slightly adjusted. Firstly, the family infectees would not transmit the virus to family members since they are all treated as the offspring of the first infected family member who imported the virus. Secondly, the family infectees may have secondary out-of-home infectees, but the probability changes from to . In fact, the decaying pattern of the probability of going out of home is set as exponential due to the cumulative awareness of isolation. Therefore, the probability of leaving home for the second generation is set as instead of linear relation or others between generations. Thirdly, for the social infectees of the imported cases initially, they can transmit the virus to their family members, and they may also have their secondary out-of-home infectees. The probability also changes from to . Finally, no more transmission would happen after two generations due to the strict contact tracing measures taken at local communities.

3.4. Heterogeneity

The heterogeneity in our branching model comes from the serial time, denoted by , which is the time interval between the onset times of a newly infectee and its infector. The serial interval distribution extracted from the data is illustrated in Figure 3.

The empirical distribution of in Figure 3 is obtained with the 80.05% positive serial intervals. It is notable that there are also negative and zero serial intervals. The ratios of negative and zero serial intervals are 5.35% and 14.60% in our sample, shown in the bar plot in Figure 3. Together with the empirical distribution of the positive serial interval, two known distributions, which are the Gamma distribution with mean 4.43 and variance 10.23 and the Weibull distribution with mean 4.45 and variance 10.31, are utilized to fit the empirical distribution. The fitting distributions are drawn in Figure 3 as references. Notably, a translational Weibull distribution is utilized in Ref. [6] for the serial time with different datasets, which is consistent with our result. The numerical result of our empirical distribution is listed in Table 4 for reference, which is used in our simulation.

To sum up, the parameters or variables of our branching model, together with their descriptions and values or distributions for further investigation and simulation, are listed in Table 5. Notably, the tested parameters are and , representing the isolation level and social distance. Other parameters or variables involved in our model are kept fixed during the simulation.

4. Simulation Results

Firstly, we show the good match of our model to the real data in Section 4.1. Then, the efficiency of staying at home with parameter and keeping social distance with parameter is provided in Section 4.2. In the following simulation, the distributions of family members and the serial interval are listed in Table 5, and the transmission probability between family members is fixed as 0.9.

4.1. Fitting Real Spreading Processes

To match the real data, the imported data series is borrowed as the immigration of the branching process in each province, which are the red curves in Figure 2. In order to test the validation of our branching structure, the simulation results should match the black curves in Figure 2. Therefore, fine values for the tested parameters, i.e., the probability of going out of home and the mean of secondary cases due to social activities should be set carefully. The best fit values for and are obtained by minimizing the mean absolute error (MAE) between the real local series and the simulated ones. The simulation series is obtained by averaging the 50 experiment trials.

Figure 4 shows the simulation results. As shown in Figure 4, the simulated local confirmed case series and the real ones match well for all the four provinces, which gives good validation of our model, proving that the branching structure built in this paper is adequate for modeling the spreading of COVID-19 in well-prevented local communities.

4.2. Simulation Results

In this section, experiments are conducted to investigate the combined effect of staying at home and keeping social distance. For this purpose, the immigration rate is set as the average of the four series from the four provinces with a moving average of order 5, which is illustrated as the red curve with circles in Figure 5(b).

First of all, the spatial stratified heterogeneity (SSH) among provinces is measured. The program calculates a so-called -statistic to test the significance of differences among provinces. The value of is a ratio ranging from 0 to 1, where 0 means no association between the number of cases and province, while 1 means that they are perfectly associated. The -statistic can be calculated with the following equation [20]:where , , and are the numbers of units and the variance in the study area which is composed of strata, respectively, is the number of units, and is the variance in stratum . Large value of means larger spatial heterogeneity in the study area. The significance value can be transformed so that it can satisfy the noncenteral -distribution:withwhere and are the noncentral parameter and mean value in stratum , respectively. Then, the -statistic and the corresponding value in Table 6 can be applied to testify that whether the concerned cases have significant differences of variances in different strata. As the values in Table 6 and curves in Figure 5, the SSH for all cases are significant at the level slightly above 0.05. The only nonsignificant one is the new cases of immigrant. As can be seen from Figure 5, the extreme fluctuation of Zhejiang Province is the main reason that leads to this nonsignificance. To sum up, the SSH for the time series we considered is significant. Despite the SSH, our branching model can fit different cases quite well with different parameters. In the following simulation for the isolation parameter and social distance parameter , the immigrant rate is fixed as the reference, which is chosen as the moving average of order 5 of the mean immigrant series of the four provinces, just the red curve in Figure 5(b).

In the following, we conduct the simulation for our parameters. The values for isolation parameter and social distance parameter for simulation are selected as and . The simulation results will be illustrated in Figures 6 and 7, with 9 subfigures for the nine combination of parameters and . Figure 6 shows the evolution of local cases (the black curves) changing with time, with the same immigration curve (the red ones) as the reference. Figure 7 gives the detailed components of local infectees by Home, Social, and Secondary. Home and Social refer to the infected cases of the imported cases taking place at home and out of home, respectively. Secondary refers to the infectees caused by Home and Social. The red, blue, and black curves in Figure 7 are the local infectees of Home, Social, and Secondary, respectively. Based on our assumption, the branching model only evolves two generations due to the contact tracing policy. In the following, detailed results with isolation parameter and social distance parameter are given.

Firstly, either strict isolation or keeping a strict social distance is effective for preventing the spreading. The effect of strict isolation is obtained from in Figures 6(a)6(c), in which . Obviously, the confirmed local cases (black curves) grow as increases but still within a controllable size. In Figures 7(a)7(c), the numbers of Home (red curves) keep stable, while the Social (black curves) and Secondary (blue curves) infectees increase slightly as the gathering together parameter increases. Therefore, the most effective measure for preventing the spreading is staying at home for about two weeks.

Secondly, when it is necessary to leave home, keeping a social distance is the second line of defense. Since it is difficult to stay at home for a couple of weeks without going out, well prevention is crucial to avoid being infected. The effectiveness of social distance can be obtained from Figures 6(a), 6(d), and 6(g), in which . The local cases increase as increases but still within a controllable size. In Figures 7(a), 7(d), and 7(g), the numbers of Social (black curve) and Secondary (red curve) increase slightly as the leaving home probability increases. However, as long as the social distance is far enough, isolation can be mitigated.

Finally, if isolation fails, for illustration, people have high demands of going out of their home, it is crucial to keep social distance, or the disaster result of gathering together will merge. As shown in Figures 6(g)6(i) with , the probability of going out is 70%; then, the local infectees grow very fast as increases from 0.4 and 1.4 to 4.8. In Figures 7(g)7(i), the Home infectees keep stable, while the Social and Secondary increase obviously as increases. Notably, our assumption on contact tracing leads to a complete cutoff of the third and further generations. However, with and , the number of infectious individuals is so large that it is quite difficult to isolate the infected individuals, let alone the trace back and isolation of the close contact individuals, due to the lack of medical resources. Therefore, an outbreak would take place in local communities with high possibility.

To sum up, when faced with the pandemic of COVID-19, the most costless and effective measure is staying at home. It is not the effort of someone but the effort of everyone. More importantly, it should be carried out simultaneously. However, considering the trade-off between the prevention of COVID-19 and economic affairs, keeping a proper social distance is more important.

5. Conclusion

Based on the confirmed cases reported outside the epic center in China, temporal and structural patterns are extracted from the actual data. Moreover, an age-dependent branching process with immigration is built to mimic the mechanism of the transmission of COVID-19 in particular local communities. Our model matches the actual data quite well, showing the validation of our branching model. The efficiencies of isolation and social distance are also tested by the branching model. We reveal that the spreading chain can be cut efficiently under strict isolation, which might be the main reason for the success of COVID-19 prevention in China. However, due to the trade-off between economic consideration and prevention of the pandemic, keeping a proper social distance is more important when leaving home for social activities. Our findings reveal the effectiveness of isolation in China outside Hubei Province and may shed light on preventing the global pandemic spreading of COVID-19.

The branching structure is proper for modeling the spreading of COVID-19, as shown in Figure 4. Although the situations we considered are the well-prevented local communities, the basic features, such as the serial interval, the composition of infectees, and the immigration structures, can be applied to more general situations for investigating other containment measures.

Data Availability

The data used to support the findings of the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was jointly supported by the National Natural Science Foundation of China (Grant nos. 11775034, 11971074, 11905042, and 61976025) and Fundamental Research Funds for the Central Universities (no. 2019XD-A11).