Abstract
The influenza pandemic is a wideranging threat to people’s health and property all over the world. Developing effective strategies for predicting the influenza outbreak which may prevent or at least get ready for a new influenza pandemic is now a top global public health priority. Owing to the complexity of influenza outbreaks that are usually involved with spatial and temporal characteristics of both biological and social systems, however, it is a challenging task to achieve the realtime monitoring of influenza outbreaks. In this study, by exploring the rich dynamical information of the city network during influenza outbreaks, we developed a computational method, the minimumspanningtreebased dynamical network marker (MSTDNM), to identify the tipping point or critical stage prior to the influenza outbreak. With historical records of influenza outpatients between 2009 and 2018, the MSTDNM strategy has been validated by accurate predictions of the influenza outbreaks in three Japanese cities/regions, respectively, i.e., Tokyo, Osaka, and Hokkaido. These successful applications show that the earlywarning signal was detected 4 weeks on average ahead of each influenza outbreak. The results show that our method is of considerable potential in the practice of public health surveillance.
1. Introduction
Influenza, a seasonal, contagious, and widespread respiratory illness, has always been a huge threat to people’s health. According to the World Health Organization, up to 650,000 deaths annually are associated with respiratory diseases caused by seasonal influenza. In the United States, the influenza pandemic leads to an average of 610,660 deaths per year and 3.1 million hospitalized days [1]. It is estimated that the total economic burden caused by influenza reaches 81.7 billion US dollars each year [2]. Therefore, from both public health and economic perspective, it is crucial to detect the earlywarning signal of imminent influenza outbreak so that timely preventive measures can be carried out to prevent a new influenza pandemic or at least reduce the magnitude of influenza outbreaks [3, 4]. However, it is usually a challenging task to predict the influenza outbreak due to the complexity of its temporal and spatial characteristics. First, the records of worldwide influenza pandemics showed that each outbreak differed from the others with respect to etiologic agents, epidemiology, and disease severity [5]. Second, there is a major obstacle for most developing countries to deploy influenza forecasts, that is, the national surveillance system for infectious disease could be either too costly or inaccurate [6]. Therefore, it is of great concern to develop a costeffective computational method for predicting the outbreak of influenza only based on the available data.
In this study, by exploring the rich dynamical information provided by highdimensional records of clinic hospitalization data, we developed a practical computational method, i.e., the minimumspanningtreebased dynamical network marker (MSTDNM), to quantitatively measure the dynamical change of a city network and thus detect the earlywarning signal of an influenza outbreak. The theoretical basis of MSTDNM is our recently proposed concept, the socalled dynamical network marker (DNM) [7], which is a dominant group of variables satisfying three generic properties for the impending critical transitions, that is, (1) the correlation between any pair of members in the DNM group rapidly increases; (2) the correlation between one member of the DNM group and any other nonDNM member rapidly decreases; (3) the standard deviation or coefficient of variation for any member in the DNM group drastically increases. Different from traditional biomarkers, the DNM method is aimed at detecting the earlywarning signal of the critical state before the occurrence of a catastrophic event, by mining the critical information from highdimensional time series data [7, 8]. The DNM method has been applied to realworld datasets and successfully identified the critical states for a number of biological processes, such as the critical state of cell differentiation [9], the tipping point during the cell fate decision process [10], the critical transition in the immune checkpoint blockaderesponsive tumor [11], the multistage deteriorations of T2D [12], acute lung injury [13], HCVinduced liver cancer [14], cancer metastasis [15], and other complex diseases [15–19]. However, to accurately predict the influenza outbreak, a new computational method is required to explore and measure the criticality from a network perspective by considering the geographic information of a city.
The MSTDNM is a novel networkbased computational method combined with minimum spanning tree for accurate detection of earlywarning signal to the influenza outbreak. The spread of infectious diseases in a region is described as the dynamical evolution of a nonlinear system, while the influenza outbreak is regarded as a qualitative state transition of the dynamical system. Without loss of generality, there are three states for the influenza outbreak (Figure 1), that is, a normal state with high stability and robustness to disturbances, standing for the period with few clinic visits; a preoutbreak state (critical state) with low resilience and high convertibility, representing the critical stage just before the emergence of massive clinic visits; and an outbreak state with high stability and robustness, which is an irreversible state or severe flu pandemic with massive clinic visits. Clearly, identifying the preoutbreak state is crucial in influenza control since timely management may greatly reduce the magnitude and duration of the influenza outbreak. Specifically, by combining the geographically adjacent information, transportation, population, and the number of clinics of each city district, we constructed a city network with edge weights which were assigned as the correlation between the clinic visit numbers of two adjacent districts. By analyzing the dynamical transmission of influenza in the city network, the proposed MSTDNM can accurately identify the preoutbreak state and thus early signal influenza outbreaks or potential pandemics. Specifically, the MSTDNM method was employed to probe useful dynamical information in a city network, which is modeled based on geographic location and traffic conditions, from the highdimensional clinicvisiting data of influenza, which are from 175 clinics distributed in 23 wards of Tokyo, Japan, 139 clinics distributed in 30 cities of Hokkaido, Japan, and 197 clinics distributed in 11 wards of Osaka, Japan. Clearly, such realtime data could be much more readily available for a largescale surveillance system. The results indicate that the MSTDNM method is capable of monitoring the infection process of the flu in real time and timely identifying the warning signal before the outbreak of influenza. Moreover, by analyzing the dynamic changes of the minimum spanning tree in a city network, it provides a new approach to study the epidemic spread in a city. Therefore, this method is of great applicable potential in setting up a realtime surveillance system, which could be greatly favorable for preventive care or the implementation of interventions to a health epidemic.
2. Materials and Methods
2.1. Theoretical Background
The influenza spread and outbreak is a complex dynamic process of a nonlinear system. According to the DNM theory, when a complex system approaches to a tipping point or critical transition point, there is a dominant group, i.e., the DNM, which satisfies the following three essential properties [7]: (i)The correlation () between each pair of members in the DNM group dramatically increases(ii)The correlation () between a member of the DNM group and a nonDNM member rapidly decreases(iii)The standard deviation () for each member in the DNM group drastically increases
In general, the above properties can be roughly understood as that the emergence of the DNM group with violent fluctuation and high correlation signifies the upcoming critical transition. Thus, these properties can be utilized as three criteria to identify the critical state of a complex biological system.
Based on the DNM theory, we developed the MSTDNM method in order to accurately predict the earlywarning signal to the influenza outbreak, by combining with the minimum spanning tree in a city network. According to our method, the evolution process of flu outbreak could be modeled as three diverse stages or states (Figure 1): (i) the normal stage, which is a stable state with high resilience; (ii) the preoutbreak stage, which is an unstable critical state with low resilience; this critical state is the limit of the normal state and at the edge of transition into an epidemic outbreak of influenza; and (iii) the outbreak stage, which is a steady and irreversible stage with a large number of clinic visits caused by influenza. It would bring heavy economic burdens to people and society and strongly impact the existing social health security system once in this status. Consequently, it is crucial to identify the warning signal of the preoutbreak state to prevent people and society from the catastrophic flu outbreak in some effective measures.
2.2. Algorithm
The sketch of the MSTDNM method is presented in Figure 2. First, it is noted that the MSTDNM method is applied to a city network for monitoring the influenza spread and outbreak in such a city. Therefore, the first step of our method is to model a city network by combining the information of geographically adjacent relationship, transportation, population, and the number of clinics of each city district. Then, a weight was assigned to each edge of the city network, which was the correlation between the numbers of clinic visits of two adjacent districts. Based on such weighted city network, our method is implemented. Specifically, in order to detect the critical state of influenza outbreak, the procedure of the MSTDNM method can be described as the following detailed steps. Its pseudocode is illustrated in Algorithm 1.

2.2.1. Modeling a City Network Structure
A city network is modeled based on its administrative divisions’ geographic location and their adjacent information. As demonstrated in Figure 2, for example, there are 23 districts in Tokyo, so that 23 nodes are added into the Tokyo city network. Furthermore, the edges between nodes in the network are established based on the adjacency relations of those corresponding districts.
2.2.2. Data Preprocessing
For each district of a city, it is necessary that the raw data which is weekly based should be averaged in terms of the total number of clinics within the district, owing to the enormous discrepancy of the number of visits between different clinics. Afterwards, the processed data is mapped to the city network.
2.2.3. Implement
The city network can be represented as a graph , where is a set of vertexes in this network and is a set of edges in this network. There are the following procedures.
First, we consider the number of clinic visits per week of a district as a sample , forming a series of time series data. In other words, when the city network is at week , there is a sequence of clinicvisiting data {,, …, } for each vertex .
Second, for each edge of the city network at week , calculate the correlations between the two vertexes of this edge to give it a weight : where represents the Pearson correlation coefficient (PCC) between the two vertexes at week and represents the Pearson correlation coefficient between the two vertexes at week , and parameter is of the following form: where represents the standard deviation (SD) of all simple data of the two vertexes of this edge at week and represents the standard deviation of all simple data of the two vertexes of this edge at week . After this step, we have obtained a set of weighted differential network {,, …, }.
Third, when the city network is at week , in order to better describe its evolution as the number of visits changes, it is required to obtain its minimum spanning tree. In this study, Kruskal’s algorithm is applied to the timespecific weighted differential network (such network is generated specifically for a time point) to obtain its minimum spanning tree . The detailed flow of Kruskal’s algorithm is presented in Algorithm 2. Then, we can calculate the weight sum of this minimum spanning tree as the MSTDNM score: where represents the weight of edge in and represents the total number of edges of .

In the ideal case, when the network system approaches a tipping point, there are the following two properties for the relationship between nodes in the network: (i)The nodes in the city network are all DNM members. The standard deviation of these members and the Pearson’s correlation coefficient between these members both dramatically increase(ii)There are DNM and nonDNM members in the city network. The standard deviation of the DNM members dramatically increases, but the Pearson’s correlation coefficient between DNM members and nonDNM members decreases significantly, i.e., its absolute value increases significantly
Meanwhile, the proposed city network’s MSTDNM score is based on the standard deviations of these DNM members and their Pearson’s correlation coefficients; thus, it could be employed as an index for quantitatively analyzing the significant change of the city network, thus detecting the warning signal of the critical point.
2.2.4. Identifying the Critical State
After the above procedure, it is possible to quantitatively analyze and monitor the dynamical process of influenza spreading based on the indicator . Nevertheless, it is still a tough task to confirm the tipping point. In some previous studies, the foldchange thresholds were used to detect the warning signal [20, 21]. However, such empirical or tunable threshold is not a universal method for different data or network structures. In this study, the logistic regression is applied to determine the appearance of the tipping point, which is widely employed in the biological field [22] due to its intrinsic advantage that the threshold is determined by the data itself. In view of the sufficient training data (several years of clinicvisiting records), the learningbased approach would be an optimal option.
Logistic regression, which essentially is a linear regression model based on the sigmoid function, is used to analyze the dataset with duality to explore relationship between its internal independent variables, i.e., solving twoclass (0 or 1) problems. Assume a dataset with samples and feature and each sample with a binary label. Then, we will get a sample matrix , where is a column matrix with features, and corresponding label , where represents a binary label (0 or 1). Usually, we will add an extra item to as a bias; therefore, each is represented by . Then, the sigmoid function is applied to calculate the probability for belonging to 1:
According to the above form, the key to the logistic regression model is to train a suitable parameter based on the given sample and label . Therefore, the following loss function based on the negative loglikelihood is applied to optimize our logistic regression model to obtain suitable :
In order to prevent our model from overfitting, the norm was added into the loss function. Since there is no direct solution to this loss function at present, we used coordinate descent to minimize this loss function with respect to .
In this study, we used the MST score of each week as and the relevant state as label , where 1 represents the critical state and 0 represents others. For a certain year, the logistic regression model is trained by other years’ datasets; we tested whether the week is the tipping point. As long as the probability of belonging to 1, i.e., , is greater than 0.5, this week is considered to be the critical state. Otherwise, this week is classified as the normal state. Then, the week is selected as the new test point to carry on.
3. Results and Discussion
3.1. Predict the Outbreak of Seasonal Influenza in Tokyo
It is usually too complicated to mathematically express the influenza transmission kinetics before a sudden outbreak, because the influenza spread involves massive parameters from both biological and social systems. Based on the dynamical systems theory, there exists a socalled bifurcation point when there are dramatic fluctuations or a qualitative transformation in a network from its normal status [19, 23]. It means that the state transition of a dynamical system would gradually be restricted in a one or twodimensional space so that the system can be simply expressed and understood while approaching the bifurcation point [7]. According to this theory, it is achievable to develop a general method to detect the tipping point of influenza outbreak only based on the observed data.
As shown in Figure 1, we collected the historical clinicvisiting data caused by influenza from clinics in 23 districts of Tokyo, Japan, from January 1, 2009, to May 31, 2019. It can be regarded as the outbreak point of flu when the number of total clinic visits reaches the peak in each year. According to the proposed method, MSTDNM, the following procedures will be carried out to identify the critical state of flu outbreak in Tokyo. First, we modeled a 23node network according to the geographic location of 23 wards and their adjacency. Second, we mapped the clinicvisiting numbers into corresponding nodes, assign weights (i.e., the correlations between two adjacent nodes, the detailed calculation is in Materials and Methods) to edges, and calculate the weight sum of the minimum spanning tree of this network for each week. Finally, an analyzed data matrix constituted by MSTDNM scores was obtained, which was employed to train a logistic regression through leaveoneout crossvalidation and further detect the tipping point of influenza for each year.
As presented in Figure 3, the earlywarning signals of the seasonal influenza outbreak were detected by our MSTDNM method. It can be seen that the flu outbreak of each year is quite regular except year 2009. The worldwide largescale outbreak of influenza A (H1N1) in 2009, which was reported first in Mexico, led to a massive longterm outbreak of influenza in Tokyo. It is explicit that the peak of appears earlier than the peak of the clinicvisiting counts for 4 weeks on average. Therefore, before the outbreak of influenza, our MSTDNM score is quite sensitive and the index increases drastically, which implies the appearance of critical state of the influenza outbreak.
In order to better demonstrate the dynamical process of the influenza spread in the network level, the evolutions of minimum spanning tree of the city network can also be presented. As shown in Figure 4, it is seen that there are almost no influenza cases at each node/ward and the correlations between these adjacent nodes/wards are relatively low at the beginning. In the city network, when the correlations between the adjacent nodes/wards drastically increase, which are the necessary conditions of the DNM features, it indicates that the influenza spread in this city is closed to its outbreak point. Furthermore, the edges of the minimum spanning tree become thicker before the nodes turn red in week 54, which means that the earlywarning signals of our method appear before the flu outbreak point. The dynamical evolution of minimum spanning tree of the city network illustrates that the system based on the MSTDNM method is able to monitor the whole process of influenza outbreak in real time and issue an earlywarning signal in time.
3.2. Application of MSTDNM in Osaka and Hokkaido
In order to illustrate the universality of our MSTDNM method, we also applied it to detect the earlywarning signals of flu outbreak in Hokkaido and Osaka. Similar to the processing flow in Tokyo city, a 30node city network was modeled for Hokkaido region and an 11node city network for Osaka city. Then, we mapped the clinicvisiting data to the corresponding network and calculate the minimum spanning tree. Finally, a logistic regression model trained by data consisting of MSTDNM scores was applied to detect the tipping point of influenza for each year.
As shown in Figures S1 and S2 of Supplementary Information (see Supplemental File), the critical state of the influenza outbreak was smoothly detected by our method MSTDNM in Hokkaido between 2011 and 2015 and in Osaka between 2012 and 2017, respectively. In other words, the MSTDNM method is quite general and robust irrelevant to the scale of the city network. The dynamic evolutions of the minimum spanning tree of Hokkaido city network and Osaka city network are shown in Figures S3 and S4, respectively.
3.3. The Key Role of the Minimum Spanning Tree
In order to demonstrate the key role of the minimum spanning tree in our approach, we compared the effect of the MSTDNM method on the presence or absence of the minimum spanning tree in 2010, which is presented in Figure 5(a). It can be seen that the earlywarning signal detected by a DNM method without the minimum spanning tree is far away from the influenza outbreak point but another signal appears in an appropriate time point.
(a)
(b)
An undirected and edgeweighted minimum spanning tree is the smallest tree model that minimizes the sum of the weights of all connected edges in the original network. It is able to reflect the overall changes of the network structure and could avoid the impact caused by local abnormal correlations around node 7 in week 45, which indicates that the minimum spanning tree plays a key role in the prediction process of outbreak points.
3.4. Performance Comparison with Other Methods
In the previous work, we developed a groundbreaking networkbased approach for predicting influenza outbreaks, the socalled landscape dynamic network marker, which used empirical foldchange threshold to recognize the significant changes in DNM score to get the earlywarning signal. We compared the performance of the proposed method MSTDNM with different tipping point determination strategies, that is, threshold determined from logistic regression and empirical threshold, which is presented in Figure 6. It is clear that the performance of the MSTDNM method based on logistic regression is better than that on the foldchange threshold. Actually, the logistic regression has natural advantages relative to the traditional earlywarning signal determination methods. The logistic regression model is a more general and more robust method only with some appropriate training measures.
4. Conclusions
Japan suffered a serious influenza outbreak at the beginning of year 2019. According to the reports of about 5000 designated medical institutions across Japan, there was an average of 57.09 influenza patients per institution in the week from January 21st to 27th, which hit a new historical high since the first statistics in 1999. The influenza epidemic causes school suspension and the absence of a large number of workers, which would further result in a decline in social productivity and affect the economic development. It is estimated that the direct economic losses caused by the 2009 influenza pandemic to countries are about 0.5% to 1.5% of gross domestic product (GDP) [24]. However, the actual losses may be higher, due to the underestimate for the indirect economic losses caused by other infection prevention and control measures, such as the decline of tourism. Therefore, in order to better prevent the outbreak of influenza, it is quite essential to establish a realtime monitoring system only based on available and robust data, such as the number of clinic visits issued by the relevant health department.
Based on the DNM theory, which was applied to detect the tipping point or analysis critical transition of complex diseases on related genomic data in our previous works, combined with minimum spanning tree and logistic regression, a novel computable method called MSTDNM was developed to identify the earlywarning signal of influenza outbreak in Tokyo, Osaka, and Hokkaido of Japan. In our MSTDNM method, we first extract the crucial characteristics of the preoutbreak state of influenza using DNM and minimum spanning tree from highdimensional and longitudinal clinicvisiting counts. Then, the logistic regression trained by leaveoneout crossvalidation is applied to identify the preoutbreak state and issue an earlywarning signal based on these crucial characteristics. As shown in Figures 3 and 4, the MSTDNM method could timely detect the earlywarning signal of influenza outbreak, which makes it quite possible to construct a realtime and effective influenza surveillance system. Nevertheless, there are still a few ways to improve the performance of our algorithm, such as using other robust but hardly obtainable data like population movement between wards and flu epidemic report to calculate the Pearson correlation coefficient and standard deviation, which is one of our future topics.
Data Availability
The historical raw data is available from Tokyo Metropolitan Infectious Disease Surveillance Center (link: https://survey.tokyoeiken.go.jp/epidinfo/weeklyhc.do), Hokkaido Infectious Disease (link: http://www.iph.pref.hokkaido.jp/kansen/501/data.html), and Osaka Infectious Disease (link: http://www.iph.pref.osaka.jp/infection/2old.html), respectively.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgments
The authors are grateful to Professor Yongjun Li for the valuable discussion. The work was supported by the National Natural Science Foundation of China (Nos. 11771152, 11901203, and 11971176), the Guangdong Basic and Applied Basic Research Foundation (2019B151502062), the China Postdoctoral Science Foundation funded project (No. 2019M662895, 2020T130212), and the Fundamental Research Funds for the Central Universities (2019MS111).
Supplementary Materials
Figure S1: the predictions of annual influenza outbreak in Hokkaido city between 2011 and 2015. Figure S2: the predictions of annual influenza outbreak in Osaka city between 2012 and 2017. Figure S3: the dynamic evolution of the minimum spanning tree of the city network in Hokkaido during years 20142015. Figure S4: the dynamic evolution of the minimum spanning tree of the city network in Osaka during years 20172018. (Supplementary Materials)