Abstract

With the establishment of China’s national air quality monitoring network, large amounts of monitoring data are available for different kinds of users. How to process and use this big data is a tough problem for users: most users have limited computing power, and new data are collected at every moment. Cloud computing may be an efficient and low-cost way to solve this problem. This paper investigates a problem of a complex system: the impact of PM2.5 on hospitalization for respiratory diseases. A change-point detection method based on grey relation analysis was used to solve this problem. Daily air pollution monitoring data and patient data were used in this study. Our results showed that (1) PM2.5 pollution showed a positive correlation on hospital admission for respiratory disease; (2) most patients went to hospital 2 days after PM2.5 pollution events; and (3) male, children, and old people were significantly affected by PM2.5 pollution. Our study is of great significance to help the government formulate suitable policies to reduce the damage caused by PM2.5 pollution and help hospitals allocate medical resources efficiently.

1. Introduction

Generally speaking, the development of the global economy, especially in the Third World countries, is closely related to environmental problems. At present, the rapid development of Chinese economy and the acceleration of urbanization make the contradiction between economic growth and environment more and more prominent. Consequently, China is suffering serious air pollution. With a population of over 1.4 billion, China’s air pollution situation is extraordinary [1]. In recent years, the annual death toll from air pollution in China is over1 million [2] and cost about China’s 2.0% GDP (gross domestic product) [3].

In 2012, the newly revised Ambient Air Quality Standard went into effect [4], and China began to build national air quality monitoring network. The real-time hourly concentration of six monitoring indicators, sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), ozone (O3), particulate matter with a diameter smaller or equal to 10 μm (PM10), and fine particulate matter with a diameter smaller or equal to 2.5 μm (PM2.5), are shown to the public. By the end of 2019, more than 1,400 national urban air quality monitoring stations have been built and scheduled to increase the number of monitoring stations to nearly 1,800 by the end of 2025.

Due to the huge data volume of the air quality monitoring big data set, it is too large and complex to be dealt with the traditional data-processing method and required a lot of computer power to process, especially when used along with other big data sets such as hospital electronic medical records in complex systems [5, 6]. The limited computer power of ordinary users greatly reduces the usage of air pollution monitoring data [7]. Cloud computing may be an efficient and low-cost way to solve this problem [8].

Cloud computing is a kind of distributed computing [9]. It refers to decomposing huge data computing process program into many small process programs through the network and then processing and analyzing these small programs through a system composed of multiple servers to get results, and then return them to users. The advantages of cloud computing include high flexibility, scalability, and high cost performance [10]. Users no longer need expensive supercomputers with large storage space. They can choose relatively inexpensive PCs to form a cloud, reducing costs, and computing performance not inferior to supercomputers.

In this paper, a change-point detection method based on grey relation analysis (GRA-CP) is introduced and fully investigated to reveal the correlation between PM2.5 pollution and hospitalization for respiratory diseases by employing the collected air pollution big data and records of hospital admission. Our aim is to predict the potential impact of PM2.5 pollution on patients with respiratory diseases. Our contributions include as follows:(1)Investigate whether the grey correlation method is suitable for solving public health problems(2)Identify which populations are more susceptible to air pollution and suffer from respiratory diseases(3)Study on residents’ consultation after haze pollution

As a critical component of air pollution, many scholars have pointed out that PM2.5 poses serious threats to humans [11]. Because PM2.5 particles are very small, they can be inhaled into the respiratory tract, pass through the lungs, enter the circulatory system through the alveoli, and damage other organs [1214]. After entering the vascular system, PM2.5 can cause thrombus, hypertension, and coronary heart disease [15, 16]. PM2.5 may also increase suicide rate and cause mental illnesses [17, 18].

China has taken significant measures to reduce personal health risks and property losses caused by severe air pollution due to the government-oriented policies [19]; however, that is still much higher than the WHO (World Health Organization) standard: 10 μg/m3 [20]. Although China has made some achievements in controlling air pollution, China is still suffering burden of disease caused by PM2.5: each year PM2.5 caused about 1.3 million deaths and reduced the life expectancy by about 3 years [21].

The influence of air pollution on the respiratory system has drawn many scholars’ attention. Saldiva et al. [22] found out that São Paulo’s air pollution was so bad that it could cause adverse health effect to exposed population. Research conducted by Zhao et al. [23] reviewed that the immune system can be weakened by severe air pollution. Farhat et al. [24] figured out that air pollution could greatly increase the probability of children suffering from respiratory diseases. The outdoor air pollution was classified by WHO (World Health Organization) as a cancer-causing agent [25] in 2013; their research shows that outdoor air pollution significantly adds the incidence of lung cancer and increases the risk of bladder cancer.

As PM2.5 is the most lethal element of air pollution, it draws many researcher’s attention. Ostro et al. [26] found out that some components of PM2.5 were related to various respiratory diseases among children; the result form Vinikoor-Imler et al. [27] showed that PM2.5 was closely related to lung cancer morbidity and mortality; Song et al. [28] suggested that, in 2015, PM2.5 was associated with 40.3% of stroke deaths and 23.9% of lung cancer deaths. Xing et al. [29] studied the harm of PM2.5 to the respiratory system and suggested that residents should try their best to avoid exposure to air pollution. He et al. and Zheng et al. investigated the components and the source about PM2.5 pollutants in Beijing separately [30, 31].

Nanjing is the capital of Jiangsu Province. The city has 11 districts and an administrative area of 6,600 km2 with a total population of 8,436,200 as of 2018. Air quality in Nanjing has shown some improvement in recent years, but the annual mean concentration of PM2.5 in 2018 is about 41 μg/m3, which is still above the standards of WHO. The main sources of PM2.5 pollution in Nanjing are industrial emissions, vehicle exhaust emissions, construction site, and road dust. Although service industries are dominating, accounting for about 60% of the GDP of the city, there are still many heavy polluting industries in Nanjing, such as Yangzi Petrochemical, Jinling Petrochemical, Nanjing Chemical, and Nanjing Iron and Steel Company. By the end of 2018, there are about 2.6 million vehicles in Nanjing. Situated in the Yangtze River Delta region with a humid subtropical climate and influenced by the East Asian monsoon, the air contains a high level of atmospheric moisture which can act as a binder increasing PM2.5 pollution. With different kinds of fine particulate coming from different sources, PM2.5 pollution in Nanjing is a complex system. Therefore, our study is of great significance to help the government formulate correct policies to reduce the damage caused by PM2.5 pollution.

3. Method and Data

3.1. Statistical Method
3.1.1. Grey Correlation Theory

Grey relation analysis (GRA) is a comparative method to study the trend in a system [32, 33]. This method has many advantages: it does not require too much data sample size nor does it need a typical distribution law, the calculation hour is relatively low, and the results will be more consistent when compared with other qualitative analysis results.

GRA is diffusely applied in environment protection and air-quality evaluation. Lu et al. [34] pointed out that PM2.5 pollution in eastern China is mainly caused by human activities, and for northwest of China, dust is also a component of PM2.5 pollution; Han et al. [35] found that males in eastern China had a higher chance of developing lung cancer than in the western China; and Ouyang et al. [36] demonstrated the space distribution of PM2.5 and cancer incidence are similar, meaning a close link between PM2.5 and cancer incidence.

GRA has been described previously in our study in which GRA was used to identify which population segment was more susceptible to air pollution which caused lung cancer in Nanchang, China, and which air pollutant was the main cause of lung cancer [37]. Our previous study result shown that PM10 is the main cause of lung cancer.

GRA has been applied and investigated as follows:(1)Construct the reference sequence and comparison sequences.Let be the reference sequence (the characteristics of a system were reflected by data sequence), and let be the comparison sequences (the behavior of a system was affected by that data sequences).(2)Dimensionless processing of the reference sequence and comparison sequences.Because the dimensions of factors in the system are usually not the same, which have no convenience of comparison, it is difficult to get the correct result when comparing. Therefore, when performing grey correlation analysis, the dimensionless data processing is generally needed:(3)Find the grey correlation coefficient .The degree of correlation refers to the geometric difference between the curves of the reference and compared sequences. For one reference sequence , there are many comparison sequences, , and the correlation coefficient at each time can be calculated by the following:where ρ > 0 is the resolution coefficient, and the value of ρ is usually taken as 0.5.(4)Correlation degree calculation: refers to the correlation degree at each time, and it has not just one value, so the message is too fragmentary to conduct the whole comparison. So, to gather the correlation coefficients at each time into one value is very essential, which means to calculate the mean value as the degree of correlation:(5)Correlation degree ranking:The correlation degree of the subsequence to the parent sequence is sort by the order of size to form the correlation sequence , which reflects the “superior or inferior” relationship of each subsequence to the parent sequence. If , is said to be better than for the same parent sequence , and it is recorded as .

3.1.2. GRA-CP

Change point is a sudden change in the time series data set [38]. Change point search is to identify when time series change happens [39]. Change point reflects the qualitative change of things or processes. In order to accurately reflect the changes of the process and deal with them correctly, the change point problem cannot be ignored. The problem of change point has impacted many fields of production and life, such as computer, signal process, meteorology, finance, and medicine.

Based on GRA, some scholars have developed a new method to solve the change point problem: GRA-CP. This method keeps advantages of GRA: the amount of calculation is relatively small and no strict requirement for the amount of data. Wong et al. [40] came up with a grey correlation test method searching changing points, and the Shunde river network area is taken as an example. Zhang and Gong [41] used the time series example of the agricultural disaster area in eastern China to verify the practicability and strength of the grey correlation algorithm; Chen and Gong [42] identified CO2 emission trends’ change points and cycles from China’s energy consumption; and Wang et al. [43] calculated the change points of cumulative CO2 emission from 1995 to 2004 in three eastern China jurisdictions and performed cycle division.

GRA-CP was applied as follows:(1)Construct the reference sequence:From time series in the first half of the column (or the second half of the column), select as the reference sequence, where are integers. In this study, n is the number of days (n = 1095), and both daily air pollution monitoring data and patient data were processed with the same number of integers in each sequence.(2)Comparison sequence construction:Based on the reference sequence, the comparison sequence is as follows:

Formula (5) is a comparison sequence set of order .(3)Overall relevance degree calculation:Compute correlation degree of and separately, and compute the arithmetic average of these correlation degrees; we might as well call as the overall correlation degree.(4)Change-point determination:

Letand we call , the relative overall correlation degree. Calculate the maximum value of relative overall correlation degree, and let ; then, is the change point of time series .

Note the following:(1)For time series , if any change point occurs in the second part, the following change has to be made: .(2)Theoretically, can be taken as 1, but when takes a very small value, the method in this paper will be meaningless. Therefore, in numerical applications, we should choose reasonably, for example, should be greater than or equal to 5.

3.2. Data Collection
3.2.1. Air Pollution Monitoring Data

Daily air pollution monitoring data of Nanjing were acquired from Department of Ecology and Environment of Jiangsu Province from January 1, 2013, to December 31, 2015. The data include 6 kinds of pollutants: PM2.5, PM10, SO2, NO2, CO, and O3. Datasets can be downloaded at https://pan.baidu.com/s/1Y1A1vKDKc7jFYRKvgCMTGw.

3.2.2. Hospital Admission Data

Daily hospital admission data were gathered from a local major hospital from January 1, 2013, to December 31, 2015. These records include case number, gender, age, time of diagnosis, and ICD (International Classification of Diseases). Respiratory disease (ICD-10/J00-J99) records were screened out from all records. Then, those records were further categorized into groups by gender (female and male) and age (0–14, 15–64, and 65+).

4. Result

4.1. Data Description

Table 1 shows the concentration of PM2.5 from 2013 to 2015 in Nanjing. The annual average and the maximum and minimum concentration of PM2.5 in Nanjing showed a downward trend, meaning that PM2.5 pollution in Nanjing is falling, but the concentration of PM2.5 is still much higher than the WHO safety standard (10 μg/m3).

Figure 1 shows that, in Nanjing from 2013 to 2015, the daily concentration of PM2.5 is much higher in winter, and spring and summer have lower concentration; this is related to the increase usage of fossil fuel for heating in winter, and the atmospheric circulation is relatively stable in winter, which make PM2.5 pollution difficult to diffuse.

Table 2 shows the percentage of respiratory diseases in different groups and the population proportion in Nanjing from 2013 to 2015. Among all hospital admission records, respiratory system diseases account for 42.34%, the largest among all diseases. 46.26% patients were children, 14.72% were older people, and disease percentage for age 0–14 and 65+ was much higher than that of population proportion, which means children and older people were more likely to suffer from respiratory diseases; about 54.33% patients were male, and disease percentage for males was higher than population proportion for males, which suggested that men are more susceptible to respiratory diseases.

4.2. Change Point

Table 3 shows the change point in different groups. Among all groups, the change point appeared on the second day, which means 2 days after the PM2.5 pollution events, most patients would go to the hospital. Except the 65+ group, that group also has a change point which happened on the third day, which means some of the older people would go to the hospital on the third day after the PM2.5 pollution event.

5. Discussion

In this paper, we studied the impact of PM2.5 pollution on hospital admission for respiratory disease in Nanjing, China, from 2013 to 2015. We found that PM2.5 pollution was closely related to hospital admission for respiratory disease. The lag between PM2.5 pollution and hospital admission varied slightly among different age groups: most of patients went to the hospital 2 days after PM2.5 pollution events, while some people over 65 years old decided to wait one more day. These findings in our study may help mobilize medical resources more efficiently and reasonably.

Table 4 shows that, among all 7 days of a week, most people see a doctor on Sunday followed by Monday; this trend appeared in all groups except the old people and was particularly significant among children. This might be because children and working-age people need to go to class or work during working days, while older people were laid off or retired; therefore, they might spend their time more flexible than younger people.

There are some drawbacks regarding the hospital admission data in this article: data were collected from only one local hospital, which cannot cover the entire population of Nanjing and are not up to date, only from 2013 to 2015. The reason is that the Chinese government has strict restrictions on data access. We failed to get the hospital admission data of the entire Nanjing from Jiangsu Provincial Center for Disease Control and Prevention. After a lot of efforts, we only managed to get the current data form one local hospital from 2013 to 2015.

Moreover, due to privacy reasons, the hospital admission data do not include the address of the patients, and we are unable to filter out the data of non-Nanjing patients.

Based on our findings, we make the following recommendations to the government:(1)Take a more flexible approach to coordinate medical resources and allocate the work and rest time of medical staff, and improve the number of staff at the peak of medical treatment on Monday and Sunday(2)Establish an early warning system to prepare for a surge in the number of patients after severe haze pollution appears(3)Take appropriate measures to reduce air pollutant emissions, such as increase the proportion of new energy vehicles, install energy saving and emission reduction equipment, and install dust suppression equipment at construction sites(4)Make insensitive data available to the public

6. Conclusions

In this paper, we utilized the GRA-CP to study the correlation between PM2.5 pollution and hospitalization for respiratory diseases based on the analysis of the daily air pollution datasets and daily records of hospital admission. We found that the following:(1)PM2.5 pollution was closely related to respiratory disease(2)Children and old people are more likely to suffer from respiratory diseases due to PM2.5 pollution, and women are less susceptible to respiratory diseases caused by PM2.5 pollution than men(3)Most patients went to hospital 2 days after PM2.5 pollution events, while some of the old people waited one more day

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the Major Project of the National Social Science Foundation (grant no. 16ZDA047), the National Natural Science Foundation of China (grant no. 71673145), the Academic division of the Chinese Academy of Sciences (grant no. 2018ZWH002A-004), and the China Scholarship Council.