Abstract

Air pollution is a universal problem confronted by many developing countries. Because there are very few air quality monitoring stations in cities, it is difficult for people to know the exact air quality level anytime and anywhere. Fortunately, large amount of surveillance cameras have been deployed and can capture image densely and conveniently. In this case, this provides the possibility to utilize surveillance cameras as sensors to obtain data and predict the air quality level. To this end, we present a novel air quality level inference approach based on outdoor images. Firstly, we explore several features extracted from images as the robust representation for air quality prediction. Then, to effectively fuse these heterogeneous and complementary features, we adopt multikernel learning to learn an adaptive classifier for air quality level inference. In addition, to facilitate the research, we construct an Outdoor Air Quality Image Set (OAQIS) dataset, which contains high quality registered and calibrated images with rich labels, that is, concentration of particles mass (PM), weather, temperature, humidity, and wind. Extensive experiments on the OAQIS dataset demonstrate the effectiveness of the proposed approach.

1. Introduction

Over the last few decades, many developing countries have suffered dramatic urbanization and industrialization processes in an unprecedented scale. In China, the population had increased to more than one million in more than 120 cities. This rapid growth in such a short period of time has caused serious complicated air pollution in China [1]. Due to the expensive cost of building and maintaining, the air quality monitoring stations cannot be placed on each block in cities. Figure 1(a) shows the distribution of air quality monitoring stations in Beijing (6336 square miles). In addition, there are only 10 air quality monitoring stations in Shanghai, which is the largest Chinese city in population and the largest urban area in population in the world. Moreover, air quality varies nonlinearly, so that the effective range of an air quality monitoring station is limited. We barely know the exact air quality on each block in metropolis by those sparse monitoring stations, so how to obtain the air quality fast and conveniently will attract much attention.

Many existing methods are based on satellite remote sensing technologies [26]. However, these methods only can reflect the air quality of atmosphere which is far from the ground air quality. Recently, some works focused on air quality inference via massive sensing data [711]. These works achieve good results at the cost of time consumption on the complex algorithms. Moreover, the massive sensing data used in these works are difficult to obtain.

With the development of Internet of Things (IoT) [12, 13], various sensors such as smart phones and cameras play important roles in urban sensing [14]. Different from the limited monitoring stations, there is large amount of surveillance and traffic cameras in many cities especially in urban area. For example, Figure 1(b) illustrates 3254 cameras in Beijing which are available in Sougo Map [15]. Furthermore, there is more than 400 thousand public cameras in Beijing [16], that is, nearly 20000 times more than monitoring stations, let alone the number of cameras in buildings. Therefore, we present a convenient and efficient air quality level inference approach based on multiple features and multiple kernel learning from single images via surveillance cameras. We first extract several features such as dark channel, medium transmission, sky color, power spectrum slope, contrast, and saturation from single images. In the previous work [17], we proposed an approach for air quality inference from image based on air quality index decision tree and SVM. To effectively fuse these heterogeneous and complementary features, in this paper, we utilize multikernel learning to learn an adaptive classifier based on multiple kernels. In addition, we build an Outdoor Air Quality Image Set dataset, which contains high quality registered and calibrated images captured by surveillance cameras. The dataset covers a wide range of daylight illumination and air pollution conditions. Each image in QAQIS has various data labels, including the concentration of particles mass (PM2.5), weather, temperature, humidity, and wind, which are related to the air quality level.

To summarize, we develop an approach for air quality level inference from a single image with the following contributions: (i)We propose a novel air quality level inference approach from a single image based on multiple features and multiple kernel learning. (ii)We improve some existing algorithms for obtaining the features and take multiple kernel learning for air quality level inference. (iii)We construct a dataset of high quality registered and calibrated images, which covers a wide range of daylight illumination and air quality conditions. It has the potential value for image processing and atmospheric sciences and can be used as a test bed for many algorithms.

The rest of this paper is organized as follows. Section 2 reviews previous approaches in brief. Section 3 presents several corresponding features of images and takes proper processing. In Section 4, multiple kernel learning will be used for training an adaptive classifier to fuse these heterogeneous and complementary features effectively. Next, Section 5 describes the dataset we use in this work and compares our approach with the traditional classification methods. Finally, a brief conclusion and future work will be given in Section 6.

In this section, we discuss the existing works relevant to this research topic. Satellite remote sensing technologies have been widely used in the area of atmosphere air quality estimating. Van Donkelaar et al. [2] estimated the ground-level PM2.5 for January 2001 to October 2002 using space-based measurements from the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Multiangle Imaging Spectroradiometer (MISR) satellite instruments and additional information from a global chemical transport model (GEOS-CHEM). Liu et al. [3] used a generalized linear regression model to examine the relationship between ground-level PM2.5 measurements and aerosol optical thickness from Multiangle Imaging Spectroradiometer (MISR) measurements in the eastern United States. Lamsal et al. [4] presented an approach to infer ground-level nitrogen dioxide (NO2) concentrations by applying local scaling factors from GEOS-CHEM to tropospheric NO2 columns retrieved from the Ozone Monitoring Instrument (OMI) on board the Aura satellite. Li et al. [6] proposed the Haze Optical Thickness retrieval model based on the assumption that surface reflectance varies slowly in a relative short period that could monitor the haze distribution and intensity for Beijing Olympic Games and help Beijing municipal government to carry out more measures to improve air quality conditions. Moumtzidou et al. [18] proposed a configurable semiautomatic framework for processing air quality and pollen forecast heatmaps. They integrated several existing environmental quality forecast data extraction tools with text processing and OCR (Optical Character Recognition) techniques tailored for heatmap analysis.

Recently, some works focused on air quality inference via massive sensing data. Zheng et al. [7, 8] inferred the air quality information based on the historical and real-time air quality data reported by existing monitor stations and a variety of data sources such as meteorology, traffic flow, human mobility, structure of road networks, and point of interests. Chen et al. [9] proposed a big spatiotemporal data framework for the analysis of severe smog in China. They collected about 35,000,000 detailed historical and real-time air quality records (containing the concentrations of PM2.5 and air pollutants including SO2, CO, NO2, O3, and PM10) and 30,000,000 meteorological records in 77 major cities of China through air quality and weather stations. It conducts scalable correlation analysis to find the possible short-term and long-term factors to PM2.5. Li et al. [19] estimated 9 haze levels and 3 PM2.5 levels based on images. For each image, they computed the transmission matrix and the depth map via established methods. Then, they analyzed the correlation between the haze levels with the official PM2.5 record. However, the correlation between the haze levels and PM2.5 is not stable due to the impact of weather or illumination. Hasenfratz et al. [10] developed land-use regression models to create pollution maps based on spatially resolved ultrafine particles dataset that is publicly available containing over 25 million measurements. The measurements were collected throughout more than one year using mobile sensor nodes installed on the top of public transport vehicles in the city of Zurich. Devarakonda et al. [11] proposed a method for air quality estimate from social media post such as Weibo text content based on a series of progressively more sophisticated machine learning models.

There were some vehicular-based works for collecting or measuring air quality. For example, Mei et al. [20] presented a vehicular-based mobile approach for measuring fine-grained air quality in real time and Al-Ali et al. [21] designed a wireless distributed mobile air pollution monitoring system which utilized city buses to collect pollutant gases.

However, existing methods are usually time-consuming in computation or difficult and expensive in data collection. Aiming at convenient and efficient air quality level inference, we extract multiple features from a single image and utilize multiple kernel learning to learn an adaptive classifier.

3. Feature Extraction

For inferring the air quality from images, we extract some features first. In traditional image classification tasks, the mainly used mid-level features such as SIFT, HOG, and LBP cannot describe the subtle and minute difference between images captured in the same scene. But what is even worse is the fact that some low-level features cannot distinguish the subtle difference, such as color. Different images may have the same pixel intensity in color spaces [22]. Unlike traditional methods, we adopt five discriminate features, that is, dark channel, medium transmission, sky color, power spectrum slope, contrast, and saturation, that are derived by analyzing the visual and spectral clues from images. These features are more suitable for the task of air quality inference from images.

3.1. Dark Channel and Medium Transmission

Particulate matter is one of the main air pollution sources. In the process of transmission, light intensity attenuated because of the particulate matter scattering. Figure 2 demonstrates the process of atmospheric light scattering and attenuation. In computer vision and computer graphics, the model widely used [23, 24] to describe the formation of a haze image is as follows:where denotes the observed intensity, denotes the scene radiance, denotes the atmospheric light, and denotes the medium transmission describing the portion of the light that is not scattered and reaches the camera.

Haze is an atmospheric phenomenon where dust, smoke, and other particulate matters obscure the clarity of the sky. The concentration of haze can reflect the air quality level. He et al. [25] found that most local patches in haze-free outdoor images contain some pixels which have very low intensities in at least one color channel. That is to say, the minimum intensity in such a patch should have a very low value. Similarly, we compute the dark channel aswhere is a color channel of input image and is a local patch centered at . Then, we can estimate the medium transmission by

In our experiment, we resize the input images into 450 × 450. The patch is set to 45 × 45 for an image. We use a 101-dimensional feature vector to indicate the haze level. The 101-dimensional feature vector includes 100 dimension of the median value of dark channel intensities in these patches and 1 dimension of the medium transmission of the image.

3.2. Sky

Sky might be the most obvious features to indicate air pollution in images. As shown in Figure 3, the sunny images almost have a clear and blue sky, while the sky is gray in a haze image. For the sky part, firstly, we detect the sky region in an image with the method suggested in [22, 26]. First, we collect 20000 sky and non-sky patches, each of size 15 × 15. For each patch, we extract a 131-dimensional feature, which contains the SIFT descriptor and mean HSV color. Then, a random forest classifier is learned on the two-class patches. For an input image, we uniformly sample 15 × 15 patches and test their labels (sky or non-sky patch). Sky region can be segmented by implementing graph cuts on those patches. Then, we extract A and B channels in the LAB color space of the sky region to form a 200-dimensional feature vector.

3.3. Power Spectrum Slope

With the decrease of the air quality, the captured image becomes of low-resolution even blur. Due to the low-pass-filtering characteristic of a blurred region, some high frequency components are lost. So the amplitude spectrum slope of a blurred region tends to be steeper than that of an unblurred region. In order to analyze the impact of low-resolution of an image, we extract the power spectrum slope suggested in Liu et al.’s work [27]. First, we compute the power spectrum of an image with size by taking the squared magnitude after Discrete Fourier Transform (DFT):where denotes the image transformed by DFT. We represent the two-dimensional frequency in polar coordinates, thus and , denotes the radius of power spectrum image, and denotes the angle of the polar coordinates, and we construct . By summing the power spectra over all directions and using polar coordinates, can be approximated by

Burton and Moorhead [28] found that approximate to an exponential function of , so slope of power spectrum can be calculated aswhere denotes a constant.

3.4. Contrast

There are various particulate matters in the atmosphere; light intensity attenuated during the transmission because of these particulate matters. Therefore, the same scenes in different air pollution conditions have different contrast. The Michelson contrast is commonly used for patterns where both bright and dark features are equivalent and take up similar fractions of the area.

However, the Michelson contrast does not consider the error due to noise points. We compute the contrast according to Root Mean Square (RMS) [29]:where denotes the luminance of the pixel of the image and denotes the number of pixels in the image. Therefore, we can obtain the contrast of an image as the one-dimensional feature vector.

3.5. Normalized Saturation

We also consider the color information of images for air pollution inference. As the saturation is independent of illumination, it can represent different images under various illumination conditions. For an image , we calculate the normalized saturation for each pixel bywhere is the maximum saturation value and is the minimum saturation value of image . For convenience of calculations in the following steps, we compute the histogram of the normalized saturation of an image to form the 10-dimensional feature vector.

4. Multiple Kernel Learning

There are several challenges during classification. For example, how to combine the features, how to choose the suitable kernels, and how to set the parameters of kernels. However, simple classifiers cannot handle the challenges well. Thus, a proper feature selection and fusion method is required for adapting the model to the specific problem. To effectively fuse these heterogeneous and complementary features that have different notions, we utilize multiple kernel learning [30] to learn an adaptive classifier by using multiple kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual feature. Moreover, multiple kernel learning is able to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection.

Let be the training image dataset, where denotes the th sample and denotes the corresponding class label and is the number of training images. We aim to train a multikernel based classifier with a decision function to predict the air quality level of an unlabeled image . In this paper, we use some linearly combined base kernel functions to determine an optimal kernel function:where is one of the linear combination coefficients, , and Given the input feature of the image, the decision function is defined as follows:where and are the parameters of the standard SVM.

In this paper, we adopt the simple multiple kernel learning [31], so the objective function can be formulated as follows:in which and is the number of air quality levels. We adopt the gradient descent algorithm to solve the optimization problem in (11). For each iteration, and can be obtained by the given weight . Then, we can recalculate using and . We adopt one-against-all strategy to transform the multiclass classification into two-class classification. If there are classes, then the objective function can be rewritten aswhere is a two-class classifier, the positive samples are the samples with class label and the negative samples are the samples with other labels. We can obtain the class label by

Compared with traditional classifiers, multiple kernel learning can improve the accuracy by learning an optimal kernel combination. By comparing the weights of different kernels, we can adopt the effective features and abandon less effective features. Thus, we can reduce the time consumption on feature calculation.

5. Dataset and Experiments

5.1. Dataset

We construct a dataset with high quality registered and calibrated images named Outdoor Air Quality Image Set (OAQIS) to evaluate our approach. All of the images are captured in Beijing, suffering serious air pollution. In addition, there are two scenes in OAQIS. Scene A is captured automatically from 8:00 am to 6:00 pm by a camera installed in No. 3 Teaching Building of BUPT, as shown in Figure 4(a). Scene B is captured randomly from 9:00 am to 15:00 pm by a traffic camera fixed on the first pedestrian bridge which is on the east side of Jimen flyover, as shown in Figure 4(b). The dataset covers a wide range of daylight illumination conditions and air pollution conditions. The spatial resolution of each image is 1280 × 720 pixels.

5.2. Ground Truth

We have tagged our images with a variety of ground truth information. The most important categories of the ground truth we collected are the concentration of particles (PM2.5) in the air and weather information.

We collect the concentration of particles in the air by two air quality monitors made by Suzhou Beiang Technology Co., Ltd., and Zhongzhiwuxian Trade Co., Ltd., as shown in Figure 5. We also collect illumination intensity by an optical sensor for potential uses in the future. We gather PM2.5 data every five minutes in an hour and compute the average value.

Then, we set the value as the ground truth for all images captured in the hour. We automatically collect standard weather information from Weather China website [32]. This includes information about weather condition (sunny, cloudy, overcast, misty, foggy, hazy, rainy, snowy, etc.), temperature, humidity, and wind, as shown in Figure 6.

Air quality index (AQI) is affected by several atmospheric pollutants. PM2.5 is an important measure of AQI, so we use PM2.5 to indicate AQI level. In many countries, AQI is divided into six levels indicating increasing health concern as shown in Figure 7. An AQI value over 300 means hazardous air quality, whereas if it is below 50 the air quality is good.

5.3. Experimental Results

We evaluate our approach on OAQIS, and the performance on the dataset demonstrates the effectiveness of the proposed approach. In our experiment, we fuse these features to form a 313-dimensional vector for an image. In MKL, we use 5 linear base kernels, respectively, constructed for the multiple features. First, we evaluate our approach on scene A. We divide images into 10 parts, while choosing 9 parts as the training set, leaving one as the testing part. Then we use 10-fold cross-validation to test 10 times, and use the average result as the final result.

We justify the performance of features described in Section 3. , , , , and are the notations of dark channel and medium transmission, sky color, power spectrum slope, contrast, and saturation. As shown in Table 1, the accuracy of these features is improved a lot by using all of the features, and the sky color feature has the best performance among the five features. In addition, the dark channel and medium transmission feature, power spectrum slope feature, and saturation feature also achieve good results. So the weights are different, in accordance with the performance of these features.

Table 2 shows the inference results on six image categories with different air quality levels. We can notice that the results of air quality inference are understandable. Due to the lack of sample images, the category with very unhealthy air quality gets lower accuracy of 64 percent while the results of other categories are all above 70 percent. Specially, the categories with hazardous and unhealthy air quality achieve impressive performance with 90 percent accuracy.

The confusion table of inference results is depicted in Figure 8. Green, yellow, orange, red, purple, and maroon are corresponding to six air quality levels mentioned before. For some air quality levels, such as moderate and unhealthy, our approach achieves excellent performances.

Table 3 presents the comparisons between our approach and the baseline methods. To show the best performance of all methods, every method produced multiple results using a group of reasonable parameters. The first baseline is to implement SVM directly on the 313-dimensional feature. The second baseline is the traditional Adaboost, which combines several classifiers to build a stronger classifier. Experimental result shows that our approach outperforms the baseline methods.

In order to evaluate our approach, we randomly select five days in OAQIS. Figure 9 shows the measured values of PM2.5 and the inference result of PM2.5 by the proposed approach at 9:00–11:00 am and 2:00–4:00 pm on May 26–30, 2014. The measured values were collected every 5 minutes by an air quality monitor made by Zhongzhiwuxian Trade Co., Ltd., during the experiment period. We calculate the mean value of each hour, and set it as the PM2.5 label of the images captured in the same period. Some measured values of PM2.5 on 29 May change a lot within a short time, but the mean value of that period is not so big, so the inference result is consistent with the correct air quality level. We also evaluate our approach on scene B; the inference accuracy is 88.26%. The accuracy result shows the effectiveness of fusing multiple features for air quality level inference on images.

5.4. Efficiency

The experiments were evaluated on a 64-bit PC with a Dual-Core CPU @3.20 GHz and 4 GB memory. We use Matlab R2013b to extract features from images, and use Visual Studio 2013 for the inference part.

In our first experiment, we compare the time required to extract different features from images and the procedure of inference. As shown in Table 4, on average, the consumptions of feature extraction are 3.0136 seconds, 0.3661 seconds, 8.9181 seconds, and 0.9068 seconds, the inference procedure requires 0.029 seconds, and the total time cost of the whole procedure is 13.2336 s. Compared with existing methods, our approach can estimate air quality from a single image with reasonable cost.

6. Conclusion

We have presented an convenient and efficient approach for air quality inference from a single image. Our approach is based on multiple features and multiple kernel learning. We first extracted several features such as dark channel, medium transmission sky color, power spectrum slope, contrast, and saturation from images. To effectively fuse these heterogeneous and complementary features, we utilized multikernel learning to learn an adaptive classifier based on multiple kernels. We collected a dataset of high quality registered and calibrated images named OAQIS. The dataset covers a wide range of daylight illumination and air pollution conditions and has potential implications for image processing and atmospheric sciences and can be used as a test bed for many algorithms. We evaluated our approach on the dataset, and the results show the effectiveness of our approach. In the future, we will extend our dataset and evaluate our approach on more scenes. We will explore fine-grained air quality inference approaches from unrestricted images and consider the influence of wind, humidity, and weather on air quality inference.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The research reported in this paper is supported by the National Natural Science Foundation of China under Grant no. 61332005 and no. 61190114, The Funds for Creative Research Groups of China under Grant no. 61421061, The Cosponsored Project of Beijing Committee of Education, The Beijing Training Project for the Leading Talents in S&T (ljrc 201502), and Beijing University of Posts and Telecommunications Foundation of Youth Science and Technology Innovation under no. 500401132.