A Mobile Sensing System for Urban <svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" style="vertical-align:-4.3443pt" id="M1" height="15.4164pt" version="1.1" viewBox="-0.0657574 -11.0721 40.4343 15.4164" width="40.4343pt"><g transform="matrix(.017,0,0,-0.017,0,0)"><path id="g121-78" d="M46 650V622C120 617 128 613 128 525V125C128 42 120 34 40 28V0H311V28C221 34 212 39 212 124V281L286 262C297 261 316 261 331 263C429 275 526 338 526 468C526 533 501 579 462 609C422 638 364 650 293 650H46ZM212 559C212 588 215 600 223 606C230 613 251 618 279 618C361 618 430 572 430 464C430 337 350 302 285 302C252 302 225 309 212 314V559Z"/></g><g transform="matrix(.017,0,0,-0.017,9.661,0)"><path id="g121-75" d="M861 0V28C774 35 771 41 768 147L759 509C756 612 762 614 851 622V650H681L449 149L221 650H57V622C148 613 153 609 144 479L130 271C123 166 117 123 111 88C104 46 85 34 26 28V0H259V28C192 35 169 42 167 90C166 130 166 173 170 256L185 541H187L411 7H431L675 555H679L683 147C683 41 680 35 598 28V0H861Z"/></g><g transform="matrix(.012,0,0,-0.012,25.044,4.134)"><path id="g50-51" d="M414 144C384 79 371 75 317 75H135L276 221C367 316 408 376 408 465C408 570 327 635 237 635C179 635 131 609 100 575L42 494L67 471C94 510 138 565 205 565C277 565 321 517 321 435C321 348 258 270 195 195C146 137 88 81 33 26V0H411C423 44 433 88 446 135L414 144Z"/></g><g transform="matrix(.012,0,0,-0.012,30.895,4.134)"><path id="g50-47" d="M117 -12C153 -12 178 12 178 50C178 84 153 110 118 110C84 110 59 84 59 50C59 12 84 -12 117 -12Z"/></g><g transform="matrix(.012,0,0,-0.012,33.741,4.134)"><path id="g50-54" d="M158 548H390L417 615L410 623H122L83 318C105 326 143 337 185 337C296 337 350 275 350 188C350 116 308 42 225 42C164 42 122 74 100 93C90 101 82 99 72 92C60 82 51 68 50 59C48 46 52 38 66 24C82 9 125 -12 172 -12C225 -11 292 15 346 59C408 108 437 166 437 226C437 309 371 397 242 397C214 397 170 382 133 369L158 548Z"/></g></svg> Monitoring with Adaptive Resolution

Guo, Hongjie; Dai, Guojun; Fan, Jin; Wu, Yifan; Shen, Fangyao; Hu, Yidan

doi:https://doi.org/10.1155/2016/7901245

Journal of Sensors

On this page

Abstract Introduction Experimental Results Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 7901245 | https://doi.org/10.1155/2016/7901245

A Mobile Sensing System for Urban Monitoring with Adaptive Resolution

Hongjie Guo,¹Guojun Dai,¹Jin Fan,¹Yifan Wu,¹Fangyao Shen,¹and Yidan Hu¹

Academic Editor: Yasuko Y. Maruo

Received30 Dec 2015

Revised23 Mar 2016

Accepted30 Mar 2016

Published15 May 2016

Abstract

This paper develops a mobile sensing system, the first system used in adaptive resolution urban air quality monitoring. In this system, we employ several taxis as sensor carries to collect original data and collect a variety of datasets, including meteorological data, traffic status data, and geographical data in the city. This paper also presents a novel method AG-PCEM (Adaptive Grid-Probabilistic Concentration Estimation Method) to infer the concentration for undetected grids using dynamic adaptive grids. We gradually collect the measurements throughout a year using a prototype system in Xiasha District of Hangzhou City, China. Experimental data has verified that the proposed system can achieve good performance in terms of computational cost and accuracy. The computational cost of AG-PCEM is reduced by about 40.2% compared with a static grid method PCEM under the condition of reaching the close accuracy, and the accuracy of AG-PCEM is far superior as widely used artificial neural network (ANN) and Gaussian process (GP), enhanced by 38.8% and 14.6%, respectively. The system can be expanded to wide-range air quality monitor by adjusting the initial grid resolution, and our findings can tell citizens actual air quality and help official management find pollution sources.

1. Introduction

Fine particulate matter () has been identified as the most health-damaging particles to public health [1]. In particular, urban areas of developing countries such as Beijing and New Delhi are suffering from this threat seriously. Due to the intricateness of structures and diversity of functional areas in urban, traditional monitoring urban particulate matter method based on sparely fixed stations (e.g., the testing region of our work covering a 64 km² area only had one official air quality monitoring station [2]) is far away to tell citizens the actual air quality they breath in.

Recently, inferring fine-grained urban air quality has gained much attention. In AirCloud proposed by Cheng et al. [3], they use ANN and GP to infer concentration based on amounts of sensors which are built on several points of interest (POIs). U-air [4] proposes a semisupervised learning approach based on air quality data reported by a few official monitor stations and meteorological data, POIs data, road networks, and taxi trajectory to infer air quality information throughout a city. These two systems’ inference accuracy and resolution of concentration distribution highly relied on the number of POIs and selection of POIs. To improve data coverage and distribution resolution, several systems resort to collected urban pollutants data with mobile, low-cost sensors. In [5], authors collected the measurements using mobile sensor nodes installed on top of transport vehicles and develop land-use regression (LUR) models to infer the pollution concentration distribution with a high resolution of 100 m × 100 m. Similarly, in the paper proposed by Hu et al. [6], the authors collect the data samples by mobile sensors and proposed a Probabilistic Concentration Estimation Method (PCEM) to infer regional concentration distribution with 200 m × 200 m resolution. These methods’ scalability can be easily challenged given a large monitoring area since too high resolution produces huge computational cost.

Overall, inferring pollutant concentration distribution with adaptive resolution is of great importance. A large grid size can lead to unacceptable errors for many pollutants formed via nonlinear chemical reactions, while too high resolution can lead to high computational complexity. Adaptive grid method used in air quality modeling has been explored for many years. Tomlin et al. [7] used adaptive unstructured triangular grids to model air pollution transport. They solved the discretized atmospheric diffusion equation using a finite-volume approach and applied the concentration gradient as refinement criteria. They kept the original grid nodes fixed and refinement of the grids by splitting each triangle into 4 smaller, similar triangles when concentration gradient exceeded a threshold. The method developed by Tomlin et al. was applied to model nuclear contamination dispersion [8] and air pollution formation [9] and used in vertical domain [10, 11] as well. Srivastava et al. [12] propose a new adaptive grid algorithm (mesh movement) for simulating reactive atmospheric pollutants. They employ a constant number of grid nodes to keep the computational time manageable and calculated grids weight (spatial error) by a linear combination of curvatures of different chemical species. The grid nodes are clustered in area of high weight to improve simulating accuracy. Garcia-Menendez et al. [13] developed an adaptive grid version of the Community Multiscale Air Quality Modeling System (AG-CMAQ) using the adaptive grid algorithm proposed by Srivastava et al. Lagzi et al. [9] reported an assessment approach for AG-CMAQ, showing AG-CMAQ has better simulating results compared to fixed grid CMAQ [14]. Constantinescu et al. [15] develop an adaptive resolution system (mesh enrichment) for modeling regional air pollution based on the Sulfur Transport and dEposition Model (STEM) [16]. Refinement is achieved by dividing a block into smaller blocks based on the curvature in concentration fields. The authors claim that the adaptive grid algorithm required only a quarter of the time spent compared to static fine grid under the condition of reaching the close accuracy.

Above adaptive grids methods are all model based approaches used in reginal-to-global air pollution modeling. They fail to provide a fine-grained and precision air pollution concentration distribution as the data is collected by sparely deployed monitoring sites. Furthermore, their concentration gradient or concentration curvature based refinement criteria require inferring each grid air pollution concentration before refining grids, which bring modeling delay and poor efficiency. So far, there is no effective adaptive resolution method used in fine-grained air quality inferring.

To this end, we develop a mobile sensing system for fine-grained urban monitoring with adaptive resolution. We divide the testing area into amounts of initial 500 m × 500 m grids and let mobile sensors randomly collect concentration in the testing area and collect meteorological data, traffic status data, and geographical data in the city. We also propose AG-PCEM: an adaptive grid method to infer the fine-grained urban distribution. We develop novel refinement criteria to refine grids before inferring concentration using (historical and real-time) concentration data collected by several mobile sensors and a variety of datasets we observed in the city; we also define several refinement levels and grid resolutions to adaptively adjust grid resolution. The evaluation results show the computational cost of AG-PCEM is reduced by about 40.2% compared with a static grid method PCEM under the condition of reaching the close accuracy, and the accuracy of AG-PCEM is far superior as widely used artificial neural network (ANN) and gaussian process (GP), enhanced by 38.8% and 14.6%, respectively.

Our contributions can be summarized as follows:(i)The proposed AG-PCEM provides a fine-grained concentration distribution which can be expanded to wide-range air quality monitor by adjusting the initial grid resolution.(ii)Novel refinement criteria and refining method are used to refine grids before inferring concentration distribution which assure inferring efficiency.(iii)Inferring concentration distribution using dynamic adaptive grids reduces computation cost obviously and ensures algorithm accuracy, and finer-grained concentration distribution tell citizens actual air quality and help official management find pollution sources.(iv)We develop a system using mobile, low-cost sensors to collect original data over one year and validate that our method AG-PCEM has a good performance in terms of accuracy and computational cost.

The rest of this paper is organized as follows. Section 2 introduces several definitions used in this work and the framework of our system. In Sections 3–6, the proposed AG-PCEM algorithm is presented in detail. A prototype experimental system is developed in Section 7 to validate AG-PCEM. The performance of the proposed algorithm is evaluated in Section 8. Finally, the conclusions of this paper are discussed in Section 9.

2. System Overview

2.1. Preliminaries

Definition 1 (air quality index and individual air quality index). AQI is a number that describes the status of air quality. People are more likely to experience health risks as the AQI increases; AQI is calculated from 6 kinds of main pollutant concentrations; they are fine particulate matter (), particulate matter (), sulfur dioxide (), nitrogen dioxide (), ozone (), and carbon monoxide (NO).

IAQI is calculated for each pollutant whose value varies among different pollutants; in the above 6 pollutants, has the biggest impact on human and environment, so IAQI value is the most important index to assess air quality. IAQI values are divided into ranges, and their standards differ in different countries. Considering the real testing environment, in this paper, we use the standard issued by Chinese Environmental Protection Administration [17] as shown in Table 1.

Definition 2 (POI). A point of interest (POI) is a place (like a school and factory) in the physical world that we are interested in.

Definition 3 (grid and mobile sensing). The testing region would be divided into grids with 500 m × 500 m initial resolution. As shown in Figure 1, a testing car with built-in particle sensor collected data along the route noted by blue arrow. Grids which are not in this route imply that concentration in these areas is not detected directly. A grid with a resolution of hundred meters would be regarded as a point in terms of geography and the concentration in a grid is uniform, where means that the location of current grid is in the th row and th column in the grid map of testing area and means the concentration value in the th row, th column in the grid map.

Definition 4 (refining and derefining). Refining is a process to divide a grid into 4 or 16 small and same-sized grids, while derefining means merging 4 adjacent grids into one grid, and the process of refining and derefining the grid map of a region is termed as regriding.

Definition 5 (grid resolution and refinement level). Grid resolution denotes grid size. In this work, the initial grid resolution is 500 m × 500 m with four levels of refinement (i.e., −1, 0, 1, and 2) and four matched grid resolutions (i.e., 1000 m × 1000 m, 500 m × 500 m, 250 m × 250 m, and 125 m × 125 m). As shown in Figure 2, grid with green border, red border, and blue border denotes gird at levels −1, 1, and 2, respectively, and the rest of grids denotes grid at level 0. A gird () at level −1 denotes that the grid () and its right grid (), lower grid (), and lower right grid () should be merged into one coarser grid with a 1000 m × 1000 m resolution; a gird at level 0 means the initial grid should keep its initial resolution with a 500 m × 500 m resolution, a gird at level 1 denotes the initial grid should be split into four grids with a 250 m × 250 m resolution, and a gird at level 2 denotes the initial grid should be split into sixteen grids with a 125 m × 125 m resolution.

Definition 6 (random walk and transition probabilities). The term random walk was first introduced by Pearson in 1905 [18] and is a mathematical formalization of a path that consists of a sequence of successive random steps. In our work, we simulate the particle () diffusion referring to the concept of random walk, as shown in Figure 3, and particles in center grid will move to adjoining neighbors with different transition probabilities influenced by numerous variables, such as wind direction, geographical conditions, and factors, where represents the probability from location to its nearest neighbor .

Definition 7 (false positive rate and false negative rate). In this work, FPR and FNR are used to demonstrate the error of ANN training results. FPR denotes that the rate of grids, supposed to be at level −1, is instead derefined at 0, 1, or 2, which will increase computational cost. FNR denotes that the rate of grids, supposed to be at level 1 or 2, is instead derefined at 0 or −1, which will reduce inferring accuracy.

2.2. Framework

As shown in Figure 4, the framework of our system consists of two parts, offline learning and online inference, which generate three kinds of data flows: preprocessing, inference, and learning data flows.

Preprocessing Data Flow. In this data flow (denoted by broken black arrows), we employ several mobile sensors to collect the original concentration data in the testing region and store them in server for processing. The processed historical concentration data is used for offline learning, and the processed real-time concentration data is used for online inference and evaluates the inference results.

Learning Data Flow. In this data flow (represented by broken blue arrows), we extract features for each grid from a variety of data pieces as the input of ANN training model and calculate refinement level from POIs data and processed mobile sensing data as output. Among the features, temperature, humidity, weather, and wind power are extracted from meteorological data; traffic and location are extracted from traffic status data and geographic data. To improve the training accuracy, we also set some POIs to extract accurate features and refinement levels to supplement training dataset, as detailed in Section 7.3, POIs data is also used to evaluate the inference results; this process is also performed offline.

Inference Data Flow. In this data flow (denoted by red solid arrows), we first calculate the real-time features for each grid from meteorological data, traffic status data, and geographic data and feed the features into ANN model to get output refinement level of each grid. Then the grid map of testing area would be regriding according to each grid’s refinement level. After regriding the map, the grid concentration of the testing region would be inferred, detailed in Section 6.

Increments of PM_2.5 concentration with value of 5 μg/m³ and 10 μg/m³ are considered as vital boundaries. European Study of Cohorts for Air Pollution Effects (ESCAPE) used data from 17 cohort studies based in nine European countries. Prospective analysis shows that long-term exposure to , even with low concentration, will significantly increase lung cancer risk: concentrations increase every 5 μg/m³ and lung cancer and lung adenocarcinoma risk increased by 18% and 55%, respectively [19]. The research carried out by American Cancer Society (ACS) shows the relative risk of lung cancer mortality is in correlation with 10 μg/m³ changing of . The risk increased by 8% using concentration data from 1979 to 1983, by 13% using data collected from 1999 to 2000 [20, 21].

According to the IAQI and the change of concentration, the refinement criteria are as follows.

Level −1. A grid and its adjacent grids concentrations are all within 75 μg/m³ (air quality range is “good”), which is within the range of “good” air quality. Those grids could be merged.

Level 1. A grid concentration is more than 115 μg/m³ (air quality range is “moderate pollution”) and the mean difference value (MDV) about such grid and its surrounding grids is more than 5 μg/m³ but not more than 10 μg/m³; this grid concentration is unhealthy and there may be a pollution source in the grid; this grid should be refined to the finer level (level 1).

Level 2. A grid concentration is more than 115 μg/m³ (air quality range is “moderate pollution”) and the mean difference value about such grid and its surrounding grids is more than 10 μg/m³; this grid’s concentration is very unhealthy and there is a high possibility that a pollution source is in the grid; this grid should be refined to the finest level (level 2).

Level 0. Other grids.

There is the calculation of refinement criteria as the following equations show where denotes grid refinement level:

4. Features Extraction

In this work, we determined a grid’s refinement level by its concentration feature as depicted in Section 3, while many previous researches have proved that the concentration of air pollutants is influenced by some features like temperature, humidity, and traffic flow [22]; the concentration data and features data we collected span one year and also verified such conclusions, detailed in Section 7.2. That is, grid refinement levels are influenced by those features. Accordingly, we identify six grid features as follows.(1)Temperature feature (): this feature denotes the temperature of a grid with initial resolution, and it is acquired by public data and POIs data.(2)Humidity feature (): this feature denotes the humidity of a grid with initial resolution; this feature is acquired by public data and POIs data.(3)Weather feature (): this feature denotes the weather condition of a grid with initial resolution; it is divided into being sunny (denoted by numeral 1), being cloudy (denoted by numeral 2), light rain (denoted by numeral 3), heavy rain (denoted by numeral 4), and being snowy (denoted by numeral 5); this feature is acquired by public data.(4)Wind power feature (): this feature denotes the wind power of a grid with initial resolution; this feature is acquired by public data and POIs data.(5)Traffic feature (): this feature denotes the road traffic status in a grid with initial resolution; it is divided into being smooth (denoted by numeral 1), being slow (denoted by numeral 2), being crowded (denoted by numeral 4), and being heavily crowded (denoted by numeral 8); this feature is acquired by public data.(6)Location feature (): this feature denotes the geographic location of a grid with initial resolution in the grid map of testing area; this feature is acquired by geographic data.

Among the features, location feature is spatially related, extracted from geographical data offline since the feature does not change with time; other features (including temperature, humidity, weather, wind power, and traffic) are temporally related, extracted from POIs data and public data (traffic status data and meteorological data), and updated every hour, detailed in Section 7.3.

5. Offline Learning

We propose a grid refinement level inference model based on ANN, as Figure 5 depicted. Here, the model consists of input, hidden, and output layers, and there are six nodes in input layer while output layer only has one node, where , , , , , , and denote the temperature, humidity, weather, wind power, traffic, location, and refinement level of grid . The function of the hidden layer is to modify weights in the training procedure for the error minimization [23].

6. Online Inference

In this work, we dynamically adapt grid resolution and infer fine-grained concentration distribution hourly. We first regrid grid map according to the real-time grid features extracted from meteorological data, traffic status data, and geographical data and then infer urban concentration distribution based on the data collected by mobile sensors within an hour.

6.1. Regriding

As Figure 6 shows, in the regriding process, the real-time grid features of each grid at initial resolution would be input to the ANN network generated by Section 5, and then the grid map of testing area would be regriding according to the output grid refinement levels.

After regriding grid map, the original data collected by several vehicles would be processed and merged into each grid based on their geographic information, while some grids have no data since the vehicles have not passed such areas; the method to infer the concentration in undetected areas is introduced as follows.

6.2. Calculating Transition Probability Matrix

We assumed that the concentration of each grid remains the same in a certain interval. concentration of a random grid can be identified by the particles transited from surrounding neighbors (affecting region) [4]. In (2), we define the transition probability matrix :where is the number of all grids and denotes the particle transport probability between grid and its nearest neighbors.

As a result, addressing the problem of particle transport is equal to resolving the transition probabilities for each certain grid. We also assume that the geographic feature is similar and hourly weather condition is equivalent in the testing region that means different grids in the same testing area follow the same transition probabilities, but for a certain grid , different directions have different transition probabilities due to the influence of numerous meteorological factors; in this work, we assume the particle will diffuse in four directions, ; that is, and . Therefore, the problem of modeling particle transport can be simplified as estimating the parameters in . We calculate the transition probability matrix using initial resolution grids as (3) shows and then infer the concentration for uncollected area based on it:

6.3. Inferring

After regriding process, there are four kinds of grids (coarser, initial, finer, and finest) in the grid map. In this section we introduce the method for estimating the concentration of undetected grids.

The concentration of an undetected initial grid can be estimated as follows:where and denote the concentration of a certain grid and its neighbors.

While a certain grid’s neighbor may appear, the following four scenarios are as follows as Figure 7 shows.

(1) A center grid’s neighbors were split into 4 grids after regriding process, for example, grid 1 in the figure. For this circumstance, we calculate its concentration as follows:where denotes the concentration of four small grids and represents nonzero number among , , , and , that is, the number of detected grids among the four grids.

(2) A center grid’s neighbors were split into 16 grids after regriding process, for example, grid 2 in the figure. Similarly, we calculate ,where denotes the concentration of sixteen small grids and represents nonzero number among .

(3) A center grid’s neighbor keeps its initial resolution after regriding process, for example, grid 3 in the figure. For this circumstance, keeps its value.

(4) A center grids neighbor is derefined to a coarser grid after regriding process, that is, grid 4 in the figure. For this circumstance, , where denotes the coarser grid’s concentration.

The concentration of undetected coarser, finer, and finest grids is estimated by the same method of an undetected initial grid used.

7. Experiments

7.1. Mobile Data Collection

7.1.1. System Prototype

We select a low-cost sensor, DN7C3CA006 [24] by SHARP as the built-in sensors; it continually samples the air every 10 ms and provides relative consistent readings. For the sensor calibration and system evaluation, we choose an advanced sensor Lighthouse 3016IAQ [25] as the reference. Lighthouse 3016IAQ is an advanced portable sensor with 0.1 μg/m³ estimated error. We employ urban taxis as sensors carriers to collect the mobile data in real-time; it has been verified that mobile sensing can overcome the coverage and granularity problem with its larger coverage and fast movement speed [26]. To cope with complex measuring conditions caused by changeable vehicle speed, sensor nodes are installed on the top of taxis to avoid the physical damaging and keep work even in the worst environmental condition as shown in Figure 8(a); an urban area nearly of 64 km² is covered by 5 taxis in an hour with built-in sensor nodes.

(a) A taxi with built-in sensor

(b) Inner view of sensor node in taxis

The inner view of a sensor node is shown in Figure 8(b). The sensor nodes are equipped with low-cost sensors DN7C3CA006, GPS (Global Position System), control and transmission modules, and the power interface. This sensor node can be charged directly by vehicle igniter.

7.1.2. Testing Area

As shown in Figure 9(a), we choose a local region Xiasha in Hangzhou City of China as the testing region which suffered from air pollution especially for more than 70% days of 15 months (from January 1, 2014, to March 25, 2015); the level of this region was identified as threatened for sensitive groups according to the Chinese standard of level. Apart from the serious hazy days, it also has some comprehensive elements including universities, residential areas, a block of industrial zone, an expressway junction, and a bankside of the Qiantang River. Universities and several parks locate in the northwest. Near the riverbank, there are several residential areas. Industrial zone is in the southeast and the expressway crossing with heavy traffic is in the northeast. concentrations of different locations are collected while taxis are driving randomly in the monitoring area as shown in Figure 9(b), and the prototype system is kept working over a year to monitor local concentration.

(a) The testing region of system

(b) A route driving by taxis randomly

7.1.3. Sensor Calibration

Mobile sensing systems normally require high sensor consistency. Therefore, low-cost sensor calibration shall not be carried out only in laboratory with different environmental conditions but also be verified in real world. In our system, the sensor calibration consists of two parts. On one hand, we focus on exploring the relationship between standard values and low-cost sensor reading in the laboratory to eliminate the initial hardware variations. On the other hand, we identify different system compensations through practical experiments. The system compensations mainly consider a possible vibration caused by intermittent moving, high wind, temperature, and complex testing environment. Characteristic of gross errors is also studied based on outdoor samples. Figure 10 shows the detail of testing environment both for laboratory and in real world.

(a) Experiment environment in lab

(b) Experiments in testing area

To improve DN7C3CA006 sensors’ sensibility for concentration change and stability for same environmental condition, we develop embedded software and design a process for sensor calibration, as shown in Figure 11.

After testing all sensors with different environmental conditions, we find that accidental error follows Gaussian distribution and is independent from sensor to sensor. Systematic error is predictable. It is an inherent bias in the system [27], which could be caused by testing conditions or the organization of hardware modules. These two errors can be minimized by calibration. The relationship between the reference value and detected concentration is described in (7). is an observed value of concentration and stands for the reference value which is obtained by Lighthouse 3016IAQ. For a sensor , accidental deviations are unpredictable and have no expected value [27]. is a set which describes random errors:

In laboratory, we focus on optimizing data redundancy to obtain an acceptable range of random errors. Hardware parameters are also estimated to minimize the systematic errors. To minimize accidental deviations, we change the detection period of DN7C3CA006 from 10 ms to 10 s achieve intensive sampling. This strategy still can be adopted with a certain movement speed. Based on this data piece, we find that the calibrated sample is within of actual value with 85% confidence level, according to (8) deriving from Chebyshev’s Inequality,

In mobile sensing scene, gross error detection is performed under real-time experimental condition. For mobile data, gross errors are eliminated once samples are sniffed by sliding window technique which is verified in dynamic systems [28, 29]. We normalize time variable into 10 s interval in each window. The maximum number of windows is and current number is ; for a mobile sensor , if system sniffs a possible error at the th interval, the size of current window will be extended and recalculate the fault tolerant according to (9). As a result, samples corrupted by gross error will be eliminated; otherwise, it will be uploaded to database when it turns to left from window:

Systematic error is either constant bias or related to the actual value. To explore the relationship between referenced concentration collected by 3016IAQ and corrupted value detected by DN7C3CA006, we refer to the linear dependence provided by [24]. Then, the singular value decomposition is computed for and the partitioning definition is described as follows:

According to the total least square [30], the parameter matrix can be estimated as .

In particular, for moving samples, we also consider the possible influences from external conditions, such as temperature, wind speed, wind direction, humidity, and traffic volumes. We denote each possible influence factor as and the difference between 3016IAQ and DN7C3CA006 as . Then factor analysis is used to estimate different influence factors to minimize systematic error.

After calibration process, the performance under different testing conditions is shown in Figure 12. Compared with the concentration detected by 3016IAQ, Figure 12(a) describes the calibration result of DN7C3CA006 sensors over 1058 samples in laboratory. It shows that two datasets have a similar trend. Furthermore, it reflects good Pearson correlations of between calibrated DN7C3CA006 and 3016IAQ. Figure 12(b) shows that samples collected in practical experimental conditions have a lower correlation of , but it is still considered as an acceptable one.

(a) Calibration in laboratory

(b) Calibration for mobile samples

7.1.4. Suitable Resolution

In the real testing region, there are various functional areas and complex geographic structures, and the region suffered from air pollution badly in most days of a year, which result in great difference of concentration between different locations (in real test, the maximum value is nearly twice minimum value in real test). To find out the most suitable initial resolution for our algorithm, we analyze the performance of AG-PCEM with different initial resolutions. As Table 2 shows, generally, higher initial resolution for AG-PCEM is beneficial for accuracy improvement and brings more computational cost. Taking computational cost and accuracy into account, we adopt an initial resolution with 500 m × 500 m and a coarser resolution with 1000 m × 1000 m and two finer resolutions with 250 m × 250 m and 125 m × 125 m to dynamically adapt the grid size. The initial resolution can be easily adjusted to fit different application environment.

7.1.5. Detection Strategy

Current official monitoring systems take the hourly detection strategy. It means concentration, temperature, wind power, and other meteorological factors will be refreshed once in an hour. All of the analysis in meteorology is built on such basis. Therefore, in this paper, we also assume that influencing features remain stable in an hour. To verify the assumption, we detect the hourly variation of concentration at certain location, by an advanced sensor 3016IAQ. It shows that the hourly detection strategy can be adopted in this method as it only has around 10% variation in an hour, as Figure 13(a) has shown. We also analyze the impact of vehicle number and find out that 5 testing taxis have 98.8% coverage of all main streets which generate 86.3% accuracy, and the improvement of coverage and accuracy is very little by adding more testing cars, depicted in Figure 13(b).

(a) concentration variation at hourly temporal resolution

(b) The number of taxis and their sampled coverage

As a result, hourly samples collected by 5 mobile carriers can be seen as the data acquisition from a large-scale distributed sensor network configured in the area. In this way, we dramatically increase the concentration samples of monitoring area at an extremely low expense; sensing the air every 10 ms provides large amounts of original data under the premise that guarantees the stability of the sensors.

7.2. Concentration Influenced by Meteorological Features

We use a dataset including meteorological conditions data and concentration data which span a year (from December 10, 2014, to December 30, 2015) to figure out the correlation between them and results as shown in Table 3.

Through the analysis of experiment results, we find that meteorological features and concentration have significant correlation; in particular, relative humidity has a great positive impact on concentration. Furthermore, the partial correlation coefficient of a meteorological feature (except temperature) and concentration obviously increased by controlling the influence of other variables; it demonstrates that meteorological features have integrated impacts on concentration and they cannot be analyzed one to one.

7.3. Dataset

7.3.1. POIs Data

We collect the concentration, temperature, humidity, and wind power data of POIs to supplement training datasets and use measured concentration to evaluate the inference accuracy offline. To ensure the diversity of samples, we deliberately select some representative places in the testing region as POIs, including universities (1), residential areas (2), commercial district (3), industrial zone (4), and expressway junction (5) as shown in Figure 14(a). We select four 3016IAQ advanced sensors with temperature/relative humidity probe and four wind power sensors [31] as Figures 14(b) and 14(c) show and employ them for four places in the same time where their location is four neighboring grids in the grid map (denoted by red dot in Figure 14(a)).

(a) Selection of POIs

(b) 3016IAQ

(c) Wind power sensor

7.3.2. Traffic Status Data

We collect traffic status data using web crawler from a public traffic status website.

7.3.3. Meteorological Data

We collect meteorological data, consisting of temperature, humidity, weather, and wind power, from a public website monitored by nearest official station [32] every hour.

7.3.4. Geographical Data

Geographical data is mainly used to map the original concentration data collected by mobile sensors to grids online and calculate location feature offline. We collected geographical data used as a GPS module; this module has the same sampling frequency as sensor. Controller module combines geographical data and concentration data as data for transmission.

8. Experimental Results

In this section, we evaluate the performance of AG-PCEM on its offline learning accuracy and online inferring computational cost and accuracy.

8.1. Performance of Offline Learning

We randomly choose 512 grid samples from a large amount of historical data; each sample was described by a set of attributes , , , , , , . We use , , , , as the input of ANN network and compare the output to .

Figure 15 shows ANN has a good performance in learning grid refinement level. The accuracy of learning result is 93% and FPR and FNR are 11.1% and 19.5%, respectively.

8.2. Performance of Online Inference

To validate the performance of our algorithm, we choose the original concentration data detected by sensors on August 26, 2015, as the evaluating sample and randomly choose 100 monitoring locations as the testing points. Algorithm was tested on a 64-bit server with a Core™ 3.30 G CPU and 4 GB RAM. We adopt two parallel experiments to analyze the performance between AG-PCEM and PCEM with different fixed grid resolutions and the accuracy between AG-PCEM and other widely used methods.

AG-PCEM and PCEM. Table 4 resolution shows the performance of AG-PCEM and PCEM with different fixed grid resolution. The results demonstrate AG-PCEM has good performance in terms of accuracy and computational cost. Comparing AG-PCEM and PCEM with the same initial (500 m × 500 m) resolution, AG-PCEM performs much better in inferring accuracy with an acceptable computational cost. Comparing AG-PCEM and PCEM with 250 m × 250 m resolution, the computational cost of AG-PCEM is reduced by about 40.2% under very close accuracy. Though high resolution with 200 m × 200 m of PCEM can improve inferring accuracy, the increment is too little compared to multiple computational cost; we also find that too high resolution of PCEM with 100 m × 100 m reduces the accuracy instead due to bad data coverage (only 8.71% of grids have original data in this situation).

Accuracy of AG-PCEM. We also compare our system to some widely used methods, such as classical multivariable linear regression (MLR), artificial neural network (ANN), and Gaussian process (GP). Figure 16 shows the inference accuracy for each of them; result shows that average estimated error of our system can be reduced by about 42.9%, 38.8%, and 14.6% compared with MLR, ANN, and GP, respectively.

Visualization. Figure 17 shows the heat map of concentration in testing area; it demonstrates that, at the same time, concentration is highly different at different location in the testing region, and it is valuable for official management of locating pollution sources. Also, citizens can acquire the information of immediate environment conveniently through the applications, as shown in Figure 18.

(a) Immediate information on website

(b) Mobile termination

9. Conclusions

In this paper, we have proposed a mobile sensing system to collect data in the city and present AG-PCEM to infer the concentration for undetected grids using dynamic adaptive grids. Our system can provide a precision concentration distribution for citizens and help official management find pollution sources.

A prototype system has been prepared and implemented in real world over a year and has been tested by employing 5 taxis from October 11, 2014, to November 25, 2015. The results show that the proposed system presents low computational cost and high accuracy.

As the first system providing urban air quality monitoring with adaptive resolution, our system can provide deeper understanding of concentration, and it can be easily expanded to wide-range air quality monitor.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant nos. 61190113, 61401135, and 61471150.

References

J. LePeule, F. Laden, D. Dockery, and J. Schwartz, “Chronic exposure to fine particles and mortality: an extended follow-up of the Harvard six cities study from 1974 to 2009,” Environmental Health Perspectives, vol. 120, no. 7, pp. 965–970, 2012.
View at: Publisher Site | Google Scholar
Hangzhou, http://dwz.cn/ft4lx.
Y. Cheng, X. Li, Z. Li et al., “Aircloud: a cloud-based air-quality monitoring system for everyone,” in Proceedings of the 12th ACM Conference on Embedded Networked Sensor Systems, Memphis, Tenn, USA, November 2014.
View at: Google Scholar
Y. Zheng, F. Liu, and H.-P. Hsieh, “U-air: when urban air quality inference meets big data,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Ill, USA, August 2013.
View at: Google Scholar
D. Hasenfratz, O. Saukh, C. Walser et al., “Deriving high-resolution urban air pollution maps using mobile sensor nodes,” Pervasive and Mobile Computing, vol. 16, pp. 268–285, 2015.
View at: Publisher Site | Google Scholar
Y. Hu, J. Fan, H. Zhang, X. Chen, and G. Dai, “An estimated method of urban PM_2.5 concentration distribution for a mobile sensing system,” Pervasive and Mobile Computing, pp. 88–103, 2015.
View at: Publisher Site | Google Scholar
A. Tomlin, M. Berzins, J. Ware, J. Smith, and M. J. Pilling, “On the use of adaptive gridding methods for modelling chemical transport from multi-scale sources,” Atmospheric Environment, vol. 31, no. 18, pp. 2945–2959, 1997.
View at: Publisher Site | Google Scholar
I. Lagzi, D. Kármán, T. Turányi, A. S. Tomlin, and L. Haszpra, “Simulation of the dispersion of nuclear contamination using an adaptive Eulerian grid model,” Journal of Environmental Radioactivity, vol. 75, no. 1, pp. 59–82, 2004.
View at: Publisher Site | Google Scholar
I. Lagzi, T. Turányi, A. S. Tomlin, and L. Haszpra, “Modelling photochemical air pollutant formation in Hungary using an adaptive grid technique,” International Journal of Environment and Pollution, vol. 36, no. 1–3, pp. 44–58, 2009.
View at: Publisher Site | Google Scholar
A. S. Tomlin, S. Ghorai, G. Hart, and M. Berzins, “3-D multi-scale air pollution modelling using adaptive unstructured meshes,” Environmental Modelling and Software, vol. 15, no. 6-7, pp. 681–692, 2000.
View at: Publisher Site | Google Scholar
S. Ghorai, A. S. Tomlin, and M. Berzins, “Resolution of pollutant concentrations in the boundary layer using a fully 3D adaptive gridding technique,” Atmospheric Environment, vol. 34, no. 18, pp. 2851–2863, 2000.
View at: Publisher Site | Google Scholar
R. K. Srivastava, D. S. McRae, and M. T. Odman, “An adaptive grid algorithm for air-quality modeling,” Journal of Computational Physics, vol. 165, no. 2, pp. 437–472, 2000.
View at: Publisher Site | Google Scholar
F. Garcia-Menendez, A. Yano, Y. Hu, and M. Talat Odman, “An adaptive grid version of CMAQ for improving the resolution of plumes,” Atmospheric Pollution Research, vol. 1, no. 4, pp. 239–249, 2010.
View at: Publisher Site | Google Scholar
United States Environmental Protection Agency (USEPA), https://www.cmascenter.org/cmaq.
E. M. Constantinescu, A. Sandu, and G. R. Carmichael, “Modeling atmospheric chemistry and transport with dynamic adaptive resolution,” Computational Geosciences, vol. 12, no. 2, pp. 133–151, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
G. R. Carmichael, L. K. Peters, and R. D. Saylor, “The STEM-II regional scale acid deposition and photochemical oxidant model—I. An overview of model development and applications,” Atmospheric Environment Part A: General Topics, vol. 25, no. 10, pp. 2077–2090, 1991.
View at: Publisher Site | Google Scholar
Ministry of Environmental Protection of the People's Republic of China, “Technical regulation on ambient air quality index,” HJ 633-2012, 2012.
View at: Google Scholar
K. Pearson, “The problem of random walk,” Nature, vol. 72, p. 294, 1905.
View at: Google Scholar
O. Raaschou-Nielsen, Z. J. Andersen, B. Beelen et al., “Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE),” The Lancet Oncology, vol. 14, no. 9, pp. 799–805, 2013.
View at: Google Scholar
C. A. Pope III, R. T. Burnett, M. J. Thun et al., “Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution,” The Journal of the American Medical Association, vol. 287, no. 9, pp. 1132–1141, 2002.
View at: Publisher Site | Google Scholar
R. M. Harrison, D. J. T. Smith, and A. J. Kibble, “What is responsible for the carcinogenicity of PM 2.5?” Occupational and Environmental Medicine, vol. 61, no. 10, pp. 799–805, 2004.
View at: Publisher Site | Google Scholar
S. Vardoulakis, B. E. A. Fisher, K. Pericleous, and N. Gonzalez-Flesca, “Modelling air quality in street canyons: a review,” Atmospheric Environment, vol. 37, no. 2, pp. 155–182, 2003.
View at: Publisher Site | Google Scholar
M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences,” Atmospheric Environment, vol. 32, no. 14-15, pp. 2627–2636, 1998.
View at: Publisher Site | Google Scholar
Sharp, Device Specification for PM2:5 Sensor Module, Electronic Components and Devices Division, Sharp Corporation, 2014.
Lighthouse, http://www.golighthouse.nl/en/indoor-air-quality.
X. Xu, P. Zhang, and L. Zhang, “Gotcha: a mobile urban sensing system,” in Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems (SenSys '14), pp. 316–317, Memphis, Tenn, USA, November 2014.
View at: Publisher Site | Google Scholar
K. Schäfer, “Introduction to the theory of error, von Yardley Beers. Addison-Wesley Publish. Comp. INC., Cambridge 42 Mass. 1953. 1. Aufl. VI, 65 S., brosch. $ 1.25,” Angewandte Chemie, vol. 67, no. 16, pp. 432–432, 1955.
View at: Publisher Site | Google Scholar
P. Vachhani, R. Rengaswamy, and V. Venkatasubramanian, “A framework for integrating diagnostic knowledge with nonlinear optimization for data reconciliation and parameter estimation in dynamic systems,” Chemical Engineering Science, vol. 56, no. 6, pp. 2133–2148, 2001.
View at: Publisher Site | Google Scholar
D. M. Prata, M. Schwaab, E. L. Lima, and J. C. Pinto, “Simultaneous robust data reconciliation and gross error detection through particle swarm optimization for an industrial polypropylene reactor,” Chemical Engineering Science, vol. 65, no. 17, pp. 4943–4954, 2010.
View at: Publisher Site | Google Scholar
I. Markovsky and S. Van Huffel, “Overview of total least-squares methods,” Signal Processing, vol. 87, no. 10, pp. 2283–2302, 2007.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Wind Power Sensor, http://www.smartsensor.cn.
China Weather Reuters, http://www.weather.com.cn.

Copyright

Copyright © 2016 Hongjie Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1715

Downloads

1565

Citations