Abstract

We present a novel correction method for air-pressure data collected by microelectromechanical pressure sensors embedded in Android-based smartphones, in order to render them usable as meteorological data. The first step of the proposed correction method involves removing the mechanically derived outliers existing beyond the physical limits and those existing outside 3σ, as well as a reduction to the mean sea level pressure using the altitude data from digital elevation models. The second correction step involves classifying data by location and linear-regression analysis utilizing the temperature and humidity sensed by the smartphone to reduce correction errors by performing the analysis according to personalized settings. Air-pressure data obtained from smartphones is subject to several influential factors, depending on the users’ external environment. However, once corrected for spatial location, temperature, and humidity and for individual users after a comprehensive quality control, the corrected air-pressure data was highly reliable as an auxiliary resource for automatic weather stations.

1. Introduction

In densely populated urban areas, environmental and meteorological data pertains to accurate decision-making regarding important socioeconomic issues, such as demographic changes, healthcare, food supply, security, conflict, and natural disasters. Disastrous weather events, such as localized torrential rains, gust of wind, extreme temperature, and rising sea levels, are among the several events that remind us of the importance of high resolution data regarding the ambient environment manifest as fine-scale climatic features [13]. However, an automatic weather station (AWS) operated by a public institution is limited insofar as it can only provide very short-term (within one hour) weather forecasting [46], owing to the high social costs associated with installation and maintenance, and it requires large-scale investments in both time and money. Studies have been made for high resolution meteorological observations and forecasting in the United States, Japan, and South Korea by developing portable meteorological equipment, although installing portable meteorological equipment over a large area poses significant regional and economic challenges [79].

Sensors have recently been available in smartphones with embedded microelectromechanical systems (MEMS) sensors for air pressure, temperature, and humidity to correct global positioning system (GPS) altitude data. Several Android-based smartphones, in particular, are designed to collect meteorological data (Appendix A). With the increasing ubiquity of such smartphones, the higher the population in urban areas, the more densely distributed the meteorological data available. This study aims to develop a correction method to minimize the errors in meteorological data collected by smartphones vis-à-vis reference data stored in the Korea Meteorological Administration (KMA). To do so, we developed an app (named Yeowoobi, which means sun shower in Korean) [10] that is capable of collecting and storing data observed by various weather sensors embedded in Android-based smartphones, and we asked participants to use the app to collect meteorological data in June 2013. To our knowledge, this is the first study on correcting air-pressure data collected by smartphones, although guidelines and studies are available for correcting errors from public weather stations. Earlier studies [1113] tried ambient temperature analysis using battery temperatures monitored by smartphones [14]. Unlike static weather stations, smartphones continually move from one place to another, and they are exposed to heating and cooling devices, the user’s body temperature, and a changing external environment, including spaces within automobiles or trains. Because such factors influence the smartphone’s sensors, the data cannot be used directly for weather forecasting. However, the data collected with the existing infrastructure in smartphones can be used as a low-cost auxiliary resource to provide information about the atmospheric environment in terms of fine-scale meteorological phenomena. To make this data more useful, the data must be adequately corrected. It is suggested to have a permissible error range of ±0.5 hPa, as specified in the KMA announcement number 2010-5 about the standard of automatic weather observation system (AWOS), which was provided by the KMA in 2011 [15]. This is done by performing a targeted quality control (QC) using preprocessing, statistical analysis, context awareness, and machine learning [16]. This study demonstrates the feasibility of such meteorological data collected by smartphones as an auxiliary resource for weather stations by analyzing and comparing the accuracy of the data with stored data from public weather stations.

The remainder of this paper is organized as follows. Section 2 describes the data used in this study—namely, smartphone data, data from public weather stations, and data from digital elevation models (DEMs). Moreover, the preprocessing step for the proposed method is explained, involving the removal of mechanically derived outliers, a reduction to the mean sea level pressure (MSLP), and the removal of outliers lying outside 3σ. Section 3 explains the linear-regression analysis and statistical values. Section 4 describes the classification of locations by weather station, the inclusion or exclusion of temperature and humidity, enabling and disabling personalization, and the linear-regression analysis results depending on the user’s mobility. In Section 5, the results of the study are summarized and discussed. Finally, Section 6 presents the direction of future research.

2. Meteorological Data

2.1. Data Collected by Smartphones

The following data is collected by a mobile app called Yeowoobi from general public: time of data acquisition, the user ID (encrypted), temporal information (year, month, day, hour, and minute), latitude (degree), longitude (degree), altitude (m), air pressure (hPa), temperature (°C), humidity (%), accuracy, speed (m/s), and mobile terminal information (Appendix B). Yeowoobi is designed to read and transmit the sensor values with the default interval, for collecting the air pressure, temperature, and humidity, which is set to 10 minutes. Users can change the settings to suit their needs with consideration to battery consumption, data-transfer costs, and notifications of abrupt changes to the air pressure. They can do so by choosing one of the nine observation intervals, ranging between one minute and three hours. Some screenshots of Yeowoobi app are given in Appendix C.

2.2. Data Collected by Public Weather Stations

The AWS run by the KMA has been using the Automated Surface Observing System (ASOS) data as public meteorological (PM) data. The PM data is spatially distributed from 692 public weather stations, currently active across the country as of November 2014. Only 256 stations (37%) of them collect air-pressure data, since AWS installed after the year 2007 includes air-pressure sensor in addition to conventional sensor arrangement—that of temperature, precipitation, rainfall occurrence, wind direction, and wind speed. Humidity has also been available since 2010. Nominal observation interval for each reading is one minute. The spatial distribution of the public weather stations is approximately 36 km with the ASOS and approximately 13.5 km for an AWS [15]. Figure 1 shows a sample plot of air-pressure data of two smartphones and AWS observed during a month (August 2014) at a station.

2.3. DEM Data

The following data was used in this study: (i) smartphone data: specifically the meteorological data collected with smartphones via Yeowoobi between January 1, 2014, and August 31, 2014 (240 days); (ii) PM data: that is, the meteorological data stored in the KMA (AWS and ASOS data) from the same period; and (iii) DEM data: that is, altitude data at 30 m × 30 m resolution [17]. When app users are in building or underground, elevation information is not correctly collected by GPS in smartphones. Resultantly we could obtain elevation information as much as only 10 percent of the total collected data. So we used elevation information of DEM data for getting MSLP values of air-pressure data observed in smartphones. Excluded from the analysis were the smartphone and PM data collected from the spatial range of some public weather stations containing data flagged as having abnormal values after QC.

2.4. Data Scale

In Figure 1, the locations of the data-collecting smartphones and public weather stations are plotted on a map. The total number of public weather stations was 692, as of October 2014, of which 217 stations are found in the area covered by the smartphone data. 162,387 locations out of the total number of locations for smartphone data collection (787,200) were found to be spatially distributed, as shown in Figure 2(a), when the latitude and longitude (unit: degree) were recalculated to three decimal places. In the plotted distribution of smartphones shown in Figure 2, the number of smartphone data collected across the country was 2,654,548 (a larger number than the number of locations, because different data can be collected from the same location at different times), of which over 50% (1,470,818) was distributed in Gyeonggi-do (including Seoul). Locations with high data density other than Gyeonggi-do were big cities with high population densities, such as Busan, Daejeon, Daegu, and Gwangju. By contrast, mountainous regions in Gangwon-do (in the northeast) and the Namhae plain region (in the southwest) yielded a smaller volume of smartphone data.

As for the temporal distribution, the number of smartphones and the volume of collected data were extremely low between January and May 2014. The volume of data from smartphone increased sharply between mid-June and late July, and the number of smartphones and the volume of collected data steadied at a level of 600/day and 47,000/day, respectively, from August onwards (Figure 3).

Among the approximately 20 models of Android-based smartphones with embedded meteorological sensors for air pressure, temperature, humidity, and humidity, 17 models were used to collect the meteorological data (Appendix B). Whereas all of them could measure the air pressure, only a few had embedded sensors for temperature and humidity (namely, Galaxy 4, Galaxy , and Galaxy Round), with a data-acquisition rate as low as 35%. Speed and elevation data accounted for only 10% of the air-pressure data (Table 1). Smartphones acquire location information by either the internet or GPS or both. The ones with internet adopt both the Wi-Fi transfer mode and the locations of peripheral communication stations as a reference standard. With GPS, speed and altitude cannot always be calculated. In fact, speed and altitude can be calculated via GPS-mediated communication for only 10% of the meteorological data.

During the data collection period, a total of 2,934,718 data items were collected from 3,096 smartphone users. After removal of the data found to have abnormal values when checked against the equivalent data from the nearest public weather station, 2,654,548 (90.5%) of the smartphone data items were used in the final analysis.

2.5. Preprocessing: Quality Control (QC)
2.5.1. Physical Limit Test

According to the general meteorological standards from the World Meteorological Organization (WMO) [18], air-pressure values lower than 500 hPa and higher than 1,080 hPa are specified as abnormal. We removed these abnormal values from the smartphone data in accordance with this standard.

2.5.2. Reduction to Mean Sea Level Pressure (MSLP)

The equation used for the reduction to the MSLP in this study is as follows [19]:where is a sea level pressure (hPa), is a measured pressure (hPa), is an altitude obtained from 30 × 30 DEM (m), and is a temperature (°C).

Based on the smartphone’s location information (i.e., the latitude and longitude), the DEM (30 m × 30 m) altitude data (), and the air temperature () at the nearest public weather station, the data that remained after purging mechanical errors was reduced to the MSLP.

2.5.3. Removal of Outliers Existing outside

Smartphones are exposed to a number of factors that cause artificial air-pressure changes as users move from one place to another using various means of transport (e.g., by driving along the highway, on high-speed trains, and in elevators). Figure 4 shows the distribution of air-pressure data of a representative user and AWS at two representative stations (108 and 410), during a month (August 2014). It was quite similar to a normal distribution. Thus, all data whose values were more than three times the standard deviation (SD; σ) value of the total smartphone air-pressure data (σ = 5.647) were considered as abnormal values and consequently removed. The data that remained after eliminating the outliers with abnormal values from the 3σ test consisted of 2,636,328 data items, or 99.31% of the total number (2,654,548) of collected data before eliminating them (i.e., 18,220 or 0.69%).

3. Linear-Regression Analysis

We compared the MAE (4) and the RMSE (5) using linear-regression analysis in the WEKA (Waikato Environment for Knowledge Analysis) suite [20]. For linear-regression analysis, we used the MSLP from the nearest public weather station temporally and spatially as the true value. As preprocessing, we used attribute selection using M5’s method (step through the attributes removing the one with the smallest standardized coefficient until no improvement is observed in the estimate of the error given by the Akaike information criterion) and a greedy selection using the Akaike information metric [21]. We tested our linear-regression method through 10-fold cross-validation.

After calculating the weight from the training data ( = number of training data),and estimating each training data item, as per (2),we obtain the linear-regression equation by selecting the weight () that minimizes the training data’s error, as per (3):The resultant values from the test data ( = number of test data) are obtained by calculating (4) and (5).

4. Analysis of Results

4.1. Removal of Outliers Existing outside 3σ

Table 2 shows the results from the linear-regression analysis throughout South Korea, comparing the result before outlier removal with that after outlier removal. As shown in Table 2, the mean absolute error (MAE) decreased by 0.14, from 1.69 to 1.55, and the root mean square error (RMSE) decreased by 0.36, from 2.44 to 2.08.

Tables 2 and 3 show similarities between the patterns of meteorological data from Gyeonggi-do’s sample area and those from the entire country. We generated a dataset for linear-regression analysis by corresponding the smartphone data (i.e., the observation time, latitude, longitude, air pressure, and DEM data) with the PM data (i.e., the observation time, latitude, longitude, MSLP, altitude, and the SD of the distance from each smartphone) in the Gyeonggi-do area (latitude: 36.394–38.283, longitude: 126.379–127.858).

4.2. Classification by Public Weather Stations

We performed a linear-regression analysis on the PM data’s MSLP from the dataset described in Section 2.5.3. Table 4 presents the mean of the results of the linear-regression analysis in the Gyeonggi-do area and the results of the linear-regression analysis from the data obtained from all public weather stations.

The linear-regression analysis for all of the data in the Gyeonggi-do area yielded MAE and RMSE values of 1.58 and 2.05, respectively. These values decreased by 0.51 and 0.57, respectively (to 1.07 and 1.48), as a result of the linear-regression analysis after grouping the same data according to public weather stations.

4.3. Classification Reflecting Temperature and Humidity

The proportion of data obtained measuring air pressure, temperature, and humidity in the Gyeonggi-do area was only approximately 35%. Furthermore, we performed a separate linear-regression analysis for data containing air-pressure information exclusively, comparing it with data containing temperature and humidity information as well (Table 5).

There were 930,883 data items measuring air pressure exclusively (63.29% of the total data) and 532,946 data items that included pressure, temperature, and humidity (36.23% of the total data). Although data with all three items accounted for only 57% of the data from air pressure exclusively, the linear-regression analysis resulted in a decrease to the MAE and RMSE by 0.23 and 0.26, respectively. Moreover, the linear-regression analysis for the data containing all three items—after being classified by public weather stations—resulted in a considerable decrease to both the MAE and RMSE.

4.4. Classification by Public Weather Stations and Personalization

We performed linear-regression analysis on the air-pressure data after classifying them by public weather station and by user. Table 6 presents the linear-regression analysis results for the mean of the cases exceeding 1,000 data items and the public weather stations, demarcated into Station A (Seoul, Jongno-gu, Songwol-dong; latitude: 37.571, longitude: 126.966, #108) and Station B (Seoul, Dongjak-gu, Sindaebang-dong; latitude: 37.491, longitude: 126.918, #410).

The results of the linear-regression analysis after classifying data by public weather stations and by users in the Gyeonggi-do area showed only slight differences to the mean MAE between Station A (108) and Station B (410). That is, the mean value did not differ significantly between Stations A and B. The total number of data items (TN) reveals that Stations A and B are representative areas, accounting for approximately 20% and 10% of the entire Gyeonggi-do area, respectively, and that the MAE values did not differ much in the area around Gyeonggi-do.

The correlation analysis for the MAE and RMSE according to the distance and pressure differences from the public weather stations by smartphone users yielded correlation coefficients of 0.18 for the MAE and 0.17 for the RMSE, with the MAE and RMSE according to the SD in pressure at 0.32 and 0.33 (see values in Figure 5), respectively. In other words, the correlation for the MAE and RMSE was greater with respect to pressure differences than distance differences.

4.5. Comparison of Errors according to User Mobility

Figure 6 illustrates the visualized mobility patterns over time for users with high mobility compared with those with low mobility at public weather Stations A and B. The blue dots represent users with high mobility (Users 1 and 3), and the red dots represent those with lower mobility (Users 2 and 4). Table 7 presents the results from the linear-regression analysis for each user, wherein the patterns are apparent for the MAE and RMSE with respect to the air pressure at the public weather stations.

The high-mobility users resulted in MAE (RMSE) values of 1.00 (1.42) in Station A and 0.47 (0.89) in Station B. The MAE (RMSE) value for low-mobility users was 0.22 (0.37) and 0.09 (0.11) in Stations A and B, respectively. This implies that errors in air-pressure measurements were greater in proportion to user mobility at the same location.

5. Discussion

Most Android-based smartphones are able to measure air-pressure data. Unlike temperature and humidity, air pressure is less influenced by the man-made environment (indoor/outdoor or air-conditioned/heated). The results of this study revealed that errors in the meteorological data from smartphones tend to decrease under the following conditions: with concurrent observations of temperature and humidity and comparing users with similar moving patterns (presumably because of similar means of transport). By correcting the errors resulting from these factors, we could verify the feasibility of using the air-pressure data collected by smartphones as an auxiliary resource for public weather stations. As such, the proposed method contributes to enhancing the forecasting accuracy by providing high resolution meteorological data in countries or regions with a low distribution of public weather stations otherwise difficult to achieve due to high costs in terms of installation and maintenance. The results show that smartphone-based meteorological data with minimal correction algorithm have potential for contribution to improving high resolution weather forecasting where precise short-term meteorological observations are required, such as sporting events.

6. Future Research Directions

In this study, we presented a new method to correct errors in air-pressure data collected with smartphones, by comparing the errors in air-pressure data from nearby public weather stations after classification according to relevant factors, such as the presence or absence of temperature and humidity data, personalization, and the mobility of individual users. In future research, we plan to focus on the following: (i) comparing between the day-time and night-time mobility of users, (ii) comparison of errors with different speed using the location information in smartphones to verify mobility, (iii) comparing between data with and without altitude data acquired from smartphones in addition to DEM data, (iv) comparison with various machine-learning techniques in addition to linear-regression analysis, (v) direct comparison with the smartphone pressure to the WMO-approved pressure sensor as a reference, and (vi) comparing various preprocessing steps, such as time-consistency test (step test) and persistence test, as well as the physical limit test and 3σ test used in this study. There are several problems in directly applying step test and persistence test used in AWS to smartphone data. One of the main problems is that the time interval in collecting smartphone data is not irregular (it varies from one minute to three hours). In the cases in which the time interval is too large or it is changed over time, it is not easy to apply the two QC tests to smartphone data. However, its successful application to smartphone data is valuable and necessary to improve the correction quality.

We also intend to explore other comparative methodologies and to validate them. Using these methods, we aim to improve the correction ability of the proposed method with regard to the meteorological data from smartphones and to verify the possibilities of using such data as additional meteorological data for high resolution short-term scale weather forecasting.

Appendices

A. List of the Smartphone Models Used in the Study for Data Collection and Their Respective Meteorological Sensors

See Table 8.

B. Data Collected by Smartphones and Pertinent Details

See Table 9.

C. Some Screenshots of the Developed Yeowoobi App

See Figure 7.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the “Small and Medium Sized Enterprise Technology Development Support Program,” through the Small and Medium Business Administration (SMBA) of Korea, in 2013 (S2139031). The present research has been conducted by the Research Grant of Kwangwoon University in 2014. The authors thank Mr. Sangjin Sim for his valuable suggestions in improving this paper.