Abstract

Objective. This study aimed to investigate the specific epidemiological characteristics and epidemic situation of brucellosis in Jinzhou City of China so as to establish a suitable prediction model potentially applied as a decision-supportive tool for reasonably assigning health interventions and health delivery. Methods. Monthly morbidity data from 2004 to 2013 were selected to construct the autoregressive integrated moving average (ARIMA) model using SPSS 13.0 software. Moreover, stability analysis and sequence tranquilization, model recognition, parameter test, and model diagnostic were also carried out. Finally, the fitting and prediction accuracy of the ARIMA model were evaluated using the monthly morbidity data in 2014. Results. A total of 3078 cases affected by brucellosis were reported from January 1998 to December 2015 in Jinzhou City. The incidence of brucellosis had shown a fluctuating growth gradually. Moreover, the ARIMA(1,1,1)(0,1,1)12 model was finally selected among quite a few plausible ARIMA models based upon the parameter test, correlation analysis, and Box–Ljung test. Notably, the incidence from 2005 to 2014 forecasted using this ARIMA model fitted well with the actual incidence data. Notably, the actual morbidity in 2014 fell within the scope of 95% confidence limit of values predicted by the ARIMA(1,1,1)(0,1,1)12 model, with the absolute error between the predicted and the actual values in 2014 ranging from 0.02 to 0.74. Meanwhile, the MAPE was 19.83%. Conclusion. It is suitable to predict the incidence of brucellosis in Jinzhou City of China using the ARIMA(1,1,1)(0,1,1)12 model.

1. Introduction

Brucellosis is an infective and allergic anthropozoonosis caused by Brucella. According to the Law of the People’s Republic of China on the Prevention and Treatment of Infectious Diseases, brucellosis is a natural focus infectious disease listed as a class B infectious disease. Meanwhile, it is also a class B animal epidemic disease according to International Office of Epizootics (OIE). Patients infected with Brucella show the main symptoms such as fever, headache, fatigue, hidrosis, neuralgia, and bone, joint, and muscle pain [1]. The incubation period fluctuates greatly, which ranges from 1 to 3 weeks. Generally, human beings are not the source of infection, but they are vulnerable to most bacteria of Brucella genus [2,3]. In China, sheep account for the main source of Brucella.

Brucellosis has spread all over the world, especially in developing areas such as Asia, Central and South America, and Mediterranean region [35], but the regulation of brucellosis started late at home and abroad. In order to keep abreast of the epidemic situation of brucellosis, China has set up monitoring stations in 14 provinces (districts) since 1990. The epidemic situation of brucellosis in Liaoning Province ranks the 5th in China, where 14 cities are afflicted with the prevalence of brucellosis at various degrees. Moreover, most cases suffer from Brucella melitensis [1].

In recent years, the incidence of brucellosis in Jinzhou City has been on the rise. Notably, the incidence of brucellosis in Jinzhou took the first place in Liaoning Province from 2005 to 2008. One of the purposes of monitoring is to predict, while the existing studies only monitor without prediction. Accordingly, the research in this field is almost blank in Jinzhou, and it is difficult for doctors to grasp the incidence trend. It is of great significance to analyze the monitoring data of brucellosis by mathematical model methods and predict the epidemic dynamics to better grasp the epidemic trend of brucellosis in Jinzhou City. In addition, an actual value obviously exceeding the 95% confidence limit of predictive values warns a possible outbreak of infectious diseases [6]. In this way, targeted measures for disease prevention and control would be carried out based on the predicted data. Thus, we can efficiently control the epidemic of infectious diseases.

ARIMA is often adopted in the short-term prediction of infectious diseases, which can be ascribed to its virtues of compatibility to complicated factors, simple model structure, easy operation, economy, and practicability. ARIMA is applicable to both stationary and nonstationary time series. Therefore, it has been widely applied in predicting seasonal and periodic infection. For instance, Lee had predicted the morbidity of human and bovine brucellosis with time-series analysis in South Korea [7]. In China, Bai [8] and Yang [9] had predicted the epidemic situation of human brucellosis in both Shanxi Province and Shandong Province, respectively. Nevertheless, few reports are available so far regarding human brucellosis in other districts of China.

In the current study, the incidence characteristics of brucellosis within the past two decades were described. For the first time, the ARIMA model was established to predict the incidence trend of brucellosis in Jinzhou. Hopefully, the current study could provide recent epidemic characteristics of brucellosis in Jinzhou City of Northeast China, establish scientific basis for the prevention and treatment strategies of brucellosis, and offer clues for its management and warning.

2. Methods

2.1. Data Sources

All data of human brucellosis were collected from the China Information System for Disease Control and Prevention. All cases had been laboratory confirmed.

2.2. Description of Epidemic Characteristics

The incidence data of brucellosis from 1998 to 2015 were classified into annual and monthly cases for subsequent analyses. Later, the monthly incidence data, which served as the basic data for the time-series model, were calculated according to the total population in the region. Moreover, statistical charts were used to describe the incidence trend and epidemic characteristics.

2.3. ARIMA Model

The monthly morbidity data from 2004 to 2013 were adopted to establish the ARIMA model and then to predict brucellosis morbidity from 2005 to 2014 to identify its stationarity and availability.

Autoregressive integrated moving average (ARIMA) is a time-series model, in which the orders of autoregressive and moving average parts are “p” and “q” denoted by AR(p) and MA(q), respectively. The time series in the ARIMA model should be a stationary and stochastic sequence with zero mean. As a result, the nonsmooth sequence should be converted into stationary series by difference transformation so that the ARMA model becomes the ARIMA model. Specifically, if “d” indicates the difference order, the model is written as ARIMA(p, d, q) without seasonal component, ARIMA(sp, sd, sq) with seasonal components, and ARIMA(p, d, q)(sp, sd, sq) complex model. The complex model, which is suitable for a general sequence, is the most advantageous among these models. Consequently, it is necessary to order p, d, q, sp, sd, and sq so as to construct the ARIMA model. Generally, the successive steps were processed to construct the ARIMA model including stationarity, identification, and estimation, as well as diagnostic and forecasting.

In this study, the monthly incidence data of brucellosis from 2004 to 2013 were selected in consideration of their integrity and stationary trend to the model. Subsequently, the morbidity of brucellosis from 2005 to 2014 was predicted, and the predictive accuracy of the ARIMA model was finally estimated using the monthly morbidity data in 2014.

2.3.1. Sequence Stationarity

The time sequence (monthly incidence data of brucellosis from 2004 to 2013) was found to be nonstationary. Therefore, the methods of square-root transformation, once common difference, and once seasonal difference had been conducted successively to transform this nonsmooth sequence into a stationary one. Subsequently, the primitive sequence diagram and the transformed sequence diagram were used to evaluate the stationarity and trend. Moreover, the sequence stationarity was tested by the augmented Dickey–Fuller (ADF) test using the EViews 6.0 software.

2.3.2. Identification

Firstly, the randomness, stationarity, and seasonal characteristics of the time sequence were recognized and analyzed through observing the autocorrelation function (ACF) and partial autocorrelation function (PACF). Afterwards, orders of the model were generally determined from 0 to 2 in accordance with the AIC and BIC, whose orders were rarely more than 2. Several rough models had been recognized by differently combining 0, 1, and 2; meanwhile, the optimal model with minimum AIC and BIC was selected finally.

2.3.3. Estimation and Diagnosis

The appropriateness of the candidate model was diagnosed using the error sequence “et-test”, where “et” was named the residual error, indicating the D value between the actual and predicted morbidity. It was required that the residual error should be white noise for an appropriate model. The white noise of the residual series was recognized using the Box–Ljung test. In other words, the residual error must be random with no statistical significance in the residual correlation test. According to the residual irrelevant principle [10], the model was suitable for forecasting if its residual series was white noise; otherwise, the model should be improved and identified again [11,12].

2.3.4. Forecasting and Assessment

The optimal ARIMA model was adopted to predict the monthly morbidity data from 2005 to 2014, the effect and accuracy of which were subsequently assessed with 2 methods. On the one hand, the fitting effect of the ARIMA model between the actual and the predicted values was determined by observing whether the actual values had fallen within the scope of 95% confidence limit of the predicted values. On the other hand, the mean absolute percentage error (MAPE) was calculated to evaluate the accuracy of the ARIMA model.

2.4. Statistical Analysis

The epidemic characteristics of brucellosis were described using Excel software (17.0).

Meanwhile, SPSS (13.0) software was used to analyze the time series, define the time variable, and estimate the stationarity. Moreover, the sequence and correlation were plotted, and the Box–Ljung test was conducted. ARIMA model fitting tests were also carried out, including standard error, log-likelihood, AIC and BIC, and residual variance analysis. Besides, model diagnosis was performed, including parameter t-test, correlation test, and Box–Ljung test for “et.” Finally, the predicting effect of the ARIMA model was determined using the confidence limit along with MAPE.

In addition, the stationarity of the time series was inspected by the ADF test using EViews 6.0 software.

3. Results

3.1. Epidemiological Characteristics of Brucellosis in Jinzhou from 1998 to 2015
3.1.1. Overall Distribution

Firstly, the overall distribution data of brucellosis from 1998 to 2015 were collected and analyzed. As is shown in Figure 1, a total of 3078 brucellosis cases were observed from 1998 to 2015. The incidence showed a wave-like increasing tendency year by year, with the trend of rectilinear rise after 2011. 513 cases were reported in 2014 and 2015, respectively, and the morbidity was 16.53 per 100,000, which stood for the peak incidence since 1998.

3.1.2. Time Distribution

As is shown in Figures 2(a) and 2(b), a total of 554 cases were reported in May from 1998 to 2015, which represented the peak period of brucellosis during the whole year. Moreover, the incidence trend and characteristics basically coincided in most years, except for 2015 when 115 patients were reported in July.

3.2. ARIMA Model Forecasting Analysis
3.2.1. Sequence Characteristic Analysis and Transformation

Firstly, a monthly sequence from 2004 to 2013 was calculated and its chart was drawn, as shown in Figure 3(a). The original sequence showed an upward or downward trend with a seasonal cycle rhythm, which was not smooth and had uneven variances. Therefore, the original sequence was transformed into a random one through the methods of square-root transformation, once common difference, and once seasonal difference successively. After that, the time sequence displayed a random and stationary trend (Figure 3(b); ADF test: t = −8.66 and ).

3.2.2. Identification

The order of the model was determined according to ACF (Figure 4(a)) and PACF (Figure 4(b)) after once common difference and once seasonal difference. ACF was the related coefficient between the prior and lag sequences. The autocorrelation coefficient would decline exponentially or in a sinusoidal wave and would approach zero when lag > q. As shown in the ACF chart (Figures 4(a) and 4(b)), when lag = 1, the autocorrelation coefficient would break through the confidence interval, which shows stronger relevance within 1 order. Therefore, q = 1 and sq = 0 or 1 were preliminarily identified.

Previous values which were also the AR(p) order were needed to forecast an actual value. It was shown in the once common difference PACF chart (Figure 4(a)) that the partial autocorrelation coefficient apparently broke through the confidence limit when lag = 1, while the coefficient hardly broke through the confidence limit when lag = 2. Consequently, 2 orders were enough for modeling. In addition, it was illustrated in the once seasonal difference PACF chart (Figure 4(b)) that the partial autocorrelation coefficient outstandingly broke through the confidence limit when lag = 1. As a result, p = 1 or 2 and sp = 0 or 1 were preliminarily ascertained.

After the orders were identified, we obtained 8 rough models: ARIMA(1,1,1)(0,1,0)12, ARIMA(1,1,1)(0,1,1)12, ARIMA(1,1,1)(1,1,0)12, ARIMA(1,1,1)(1,1,1)12, ARIMA(2,1,1)(0,1,0)12, ARIMA(2,1,1)(0,1,1)12, ARIMA(2,1,1)(1,1,0)12, and ARIMA(2,1,1)(1,1,1)12.

3.2.3. Estimation and Diagnosis of Model

Parameter estimation and correlation analysis of 8 rough models are presented in Table 1.

According to the results of correlation analysis (Table 2), no correlation was observed among parameters of three candidate models. As a result, these three models were all accepted.

Furthermore, as shown in Table 3, there were no statistically significant differences in ARIMA(1,1,1)(0,1,1)12 and ARIMA(1,1,1)(1,1,0)12 models (). In other words, no obvious correlation was observed and the residual series was white noise.

Subsequently, the goodness of fit of three models (Table 4) was analyzed. Both the AIC and BIC values of the ARIMA(1,1,1)(0,1,1)12 model were found to be minimum, which meets the selection criterion. Moreover, the residual series of the ARIMA(1,1,1)(0,1,1)12 model was also white noise (Figure 4(c)). Therefore, the ARIMA(1,1,1)(0,1,1)12 model was the optimal model for prediction.

3.2.4. Forecasting Using ARIMA Model

As shown in Figure 5, the monthly morbidity data from 2005 to 2014 were predicted using the ARIMA(1,1,1)(0,1,1)12 model based on the morbidity of brucellosis from 2004 to 2013, the results of which suggested that the predicted values fitted well with the actual values. Notably, the actual values in 2014 fell in the 95% confidence limit of the ARIMA(1,1,1)(0,1,1)12 model, and the MAPE was 19.83% (Table 5).

4. Discussion

In Jinzhou City, there had been no case of brucellosis until the year of 1983, when domestic brucellosis outbreak among humans was first discovered. Since then, brucellosis cases have been reported every year and on an uptrend. The rapid development of animal husbandry as well as the sharp increases in herdsmen and butchers has made great contributions to the above phenomenon. Moreover, the convenient transportation facilitating regional dealings, together with the expansive circulation of livestock, has given rise to the extensive spread of Brucella between the infected and healthy animals [13, 14]. Last but not least, Brucella can be more easily brought in and spread locally due to the neglected animal immunity and the lack of vaccine in many villages and towns.

Brucellosis displays an obvious seasonality, with spring being the epidemic season [15]. In this study, high incidence occurred from spring to autumn during the busiest period for farmers, while the lowest incidence was observed in winter when the farmers had spare time. In time distribution, the incidence in July 2015 was the highest during the same period of all time, and the reason is that there were three outbreaks in July 2015.

Time-series analysis is a method to extrapolate predictions, in which a mathematical model is established according to the regularity and trend of the historical observed values with time. In this study, the prevalence data of livestock, instability of the animal brucellosis rule, and complexity of environment factors [16] can hardly be obtained, which limits the prediction of brucellosis. Nonetheless, time-series analysis can overcome these major obstacles. Time has taken the place of other influencing factors in many ways, such as trend, season, and human factors [17], regardless of the causality between variables. Time-series analysis, including the ARIMA model, has been widely used in predicting infectious diseases in recent years. For instance, some scholars have used this model to predict morbidity of TB, mumps, measles, encephalitis B, and hand-foot-and-mouth disease [1822]. These diseases are similar to brucellosis in time distribution. The ARIMA model has taken the time-series trend, aspection, and interference into consideration, which can also quantify the expression by virtue of model parameters [23, 24]. The incidence sequence of brucellosis in Jinzhou has met the requirement of the time series; therefore, historical data within the past 10 years are selected in this study. Such time series is long enough and relatively steady, which is suitable and practicable for forecasting the incidence of brucellosis using the ARIMA model. It is possible to be applied in predicting the incidence of human brucellosis.

However, there are some potential differences between the predicted and actual values due to the complicated related factors of brucellosis. As a result, the parameters should be identified and modified repeatedly so as to pick out the optimal model with the best fitting degree. In this study, ARIMA(1,1,1)(0,1,1)12 was confirmed to be the optimal model. Almost all the actual morbidity data in 2014 had fallen within the range of 95% confidence interval and MAPE was as low as 19.83%, indicating an ideal forecasting effect. The actual morbidity fluctuating within the 95% confidence limit indicates the normal incidence, while the epidemic situation would be sharply different from the former epidemic pattern if the actual values break through such a range. Therefore, we must be alert of the possibilities of infectious disease outbreak [1], and relevant personnel should adopt specific corresponding measures ahead of time to control and treat the secondary widespread brucellosis.

Nevertheless, the ARIMA model is not a fixed pattern; therefore, various strategies should be adopted in accordance with a specific situation. Existing studies have indicated that the ARIMA model is more applicable for short-term (less than one year) prediction, which may be attributed to various influencing factors of infection as well as the complicated interactions among them. However, ARIMA is also potentially suitable for long-term prediction under the premise of a relatively stable incidence trend and external factors, just like our findings that ARIMA(1,1,1)(0,1,1)12 is a good prediction model for brucellosis in a certain district. Furthermore, the incidence of brucellosis may be remarkably different due to the restriction of geographical conditions in different areas. In this study, the ARIMA(1,1,1)(0,1,1)12 model is more suitable for predicting the incidence in Jinzhou, which should be further amended more or less when applied to other districts.

Like other studies, the ARIMA model is also associated with certain limitations. For instance, it proposes the linear relation within time-series data by taking the time factor into consideration only, rather than pathogen, host, social economy, and natural environment factors [7]. Accordingly, all the other influencing factors should be further clarified in future monitoring and taken into account when taking preventive measures in practical works.

Such a model provides important theoretical basis and clues for the early warning and intervention of infectious diseases. On the basis of this study, we can further carry out brucellosis research in combination with relevant departments and comprehensively analyze its etiology, incidence trend and distribution, influencing factors, and so on. Brucellosis prevention is a long-term and hard task, which requires cooperation of various departments, such as the department of health and the department of animal husbandry. It is of great necessity for the government at all levels to strengthen the management of livestock market and implement strict quarantine. Moreover, the surveillance of the epidemic situation between humans and animals, the health education, the propaganda of the knowledge for brucellosis prevention, the early detection, and the timely treatment should also be enhanced during strictly managing livestock quarantine transaction and eliminating infected animals. In this way, the epidemic of brucellosis in Jinzhou and even other areas can be effectively prevented and controlled.

5. Conclusion

The incidence of brucellosis from 1998 to 2015 had shown a fluctuating growth gradually in Jinzhou, which peaked in 2014 and 2015. The ARIMA models have been successfully established, among which ARIMA(1,1,1)(0,1,1)12 is suitable for predicting the incidence of brucellosis in Jinzhou.

Abbreviations

ACF:Autocorrelation function
ADF test:Augmented Dickey–Fuller test
AIC:Akaike’s information criterion
ARIMA:Autoregressive integrated moving average
AR(p):Autoregressive(p)
BIC:Bayesian information criterion
MA(q):Moving average(q)
MAPE:Mean absolute percentage error
PACF:Partial autocorrelation function
SE:Standard error
UHT:Ultraheat-treated.

Data Availability

All data of human brucellosis were collected from the China Information System for Disease Control and Prevention. All cases had been laboratory confirmed.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

LW, CJ, SW, XL, JY, and YC contributed to the proposal and design of the study. LW, WW, and CL contributed to the collection, analysis, and interpretation of the data. All authors took part in writing the manuscript and approved the final version of the manuscript for submission.

Acknowledgments

The authors would like to thank Xiuli Gao, Yue Zuo, Kai Fan, and Xiaoyan Mo, in Jinzhou Centre for Disease Control and Prevention, for their contribution to the detailed design of the study, as well as the collection, analysis, and interpretation of the data. This work was supported, in whole or in part, by the National Nature Science Foundation of China (No. 81673223) and the Science Project of Education Department in Liaoning Province (No. LK201615).