Abstract
In order to ensure optimal operation of the existing environmental monitoring information system, it has become essential to use mathematical modeling based on the data assimilation algorithm. In this paper, a data assimilation algorithm has been designed and implemented. An algorithmic approach was tested for the assimilation of city atmosphere monitoring data from an industrial area. An industrial district of Karaganda city was selected for the investigation of the algorithm. The industrial district of Karaganda was taken as a research object due to the high level of atmospheric air pollution in industrial cities in the Republic of Kazakhstan. The result of our research and testing of the algorithm showed the effectiveness of the data assimilation algorithm for monitoring the atmosphere of the selected city. The practical value of the work lies on the fact that the presented results can be used to assess the state of atmospheric air in real time, to model the state of atmospheric air at each point of the city, and to determine the zone of increased environmental risk in an industrial city.
1. Introduction
Nowadays, the environmental monitoring problems have received considerable attention due to the high level of atmospheric air pollution in industrial cities of many countries [1–6]. For the effective operation of the existing information system for monitoring the atmosphere for pollution by heavy metals, it has become essential to use mathematical modeling based on the data assimilation algorithm.
Data assimilation technology is used to improve forecasts of air quality in atmospheric chemistry, as well as to perform a reanalysis of three-dimensional chemical (including aerosol) concentrations and determine the values of input variables (parameters) of the inverse simulation model (for example, emissions). The concept of “data assimilation” combines a sequence of operations starting with observations of the system and ending with the assessment of its state based on additional statistical and dynamic information. Currently, data assimilation technology is widely used in the fields of modeling the atmosphere, climate, ocean, and environment under any conditions, particularly if it is necessary to assess the state of a large dynamic system based on limited information. The purpose of data assimilation for atmospheric modeling is to obtain a better understanding of the atmosphere in terms of its meteorological and chemical parameters.
Several decades ago, I. Sasaki developed the variational method of data assimilation, and his approach is currently widely used for modern-day analysis and for prediction in meteorology [7]. R.E. Kalman also demonstrated an optimization method for linear filtering, and this filter is named after him. The data assimilation model based on the Kaman filter has allowed the generalization of assimilation systems, such as the cycle of forecast analysis [8, 9]. The main problems of using the Kalman filter are the high order of the covariance matrix in forecasting errors and the nonlinearity of the system equations describing meteorological processes. In order to solve these problems, a method was adopted based on the Lagrange variational principle using conjugate equations for estimating and predicting natural processes. V.V. Penenko expanded this method to the assimilation of variational data using the methods of sensitivity theory and related problems [10, 11]. In dynamic meteorology, data assimilation technology has been applied for many decades to improve weather forecasting and reanalysis results. To date, research in this field has been actively conducted by many scientists [12–14].
Chemical analysis has been utilized to predict air quality since the mid-1990s with the creation of primitive databases regarding pollution, such as an air pollution index for five pollutants for each year without analytical processing and forecasts. Despite the fact that, as Zhang et al. [15–17] showed in their research, it is preferable to make air quality forecasts based on statistical approaches, data assimilation techniques have been used since the 1990s in air quality modeling to understand air pollutants, such as in concentration maps [18]. Furthermore, inverse modeling has been used to improve (or detect errors) the radiation rate [19–23], boundary conditions [24], and model parameters [25–27]. S. Rakhmetullina et al. used variational data assimilation algorithms to detect atmospheric pollution sources [28]. The 3D-Var algorithm was first implemented in 1992 by the National Center for Environmental Forecasts (NCEP) [29]. Later, in 1996, it was urgently implemented at the European Center for Medium-Term Weather Forecasts (ECMWF); then, in 1997, the 4D-Var algorithm was first applied in the ECMWF forecasting system [30]. Various models are also used to simulate atmospheric ventilation processes and, accordingly, methods for modeling the temporal and spatial dispersion of various pollutants in the atmosphere of industrial cities such as MLDP0 (Modèle Lagrangien de Dispersion de Particules d’ordre 0), HYSPLIT (Hybrid Single-Particle Lagrangian Integrated Trajectory Model), NAME (Numerical Atmospheric-dispersion Modelling Environment), RATM (Regional Atmospheric Transport Model), FLEXPART (Lagrangian Particle Dispersion Model), a Local Scale Atmospheric Circulation Complex-Field Model (LACCM) and others. Methods and algorithms to modeling, processing, and assimilation of the industrial city atmosphere monitoring data were considered in the works [31–40].
According to previous studies, in this work, algorithm for the assimilation of atmospheric monitoring data was designed and tested for the highly air-polluted city of Karaganda, the Republic of Kazakhstan (RK). A “data assimilation” module has been developed for the information system of monitoring atmospheric air pollution.
2. Materials and Methods
The government of the RK approved a state program, “Digital Kazakhstan,” and considered the creation of a “unified state system for monitoring the environment and natural resources” [41]. Based on the current environmental code of the RK, this system monitors the means of controlling, forecasting, and evaluating pollution and is also a comprehensive system for observing the state of the environment and natural resources [42]. Currently, 146 posts and 14 mobile laboratories located in the largest cities and national industrial centers of Kazakhstan are engaged in the analysis of the state of atmospheric air pollution. According to the reports of national environmental authorities, the highest levels of air pollution are observed over industrial centers. Generally, national environmental authorities allow a maximum permissible concentration (MPC) of pollutants; this indicator also includes heavy metals (HM). For example, for the specified period of March 10–16, 2020, the following measurements were registered: In Karaganda city, in the district of observation post 6 for atmospheric air pollution, 141 cases exceeding the maximum permissible concentration (MPC) for suspended particles PM2.5 were found. In Nur-Sultan city, 430 cases of excess in the range of 1.0–3.8 of the MPC for sulfur dioxide were found, along with 997 cases of excess in the range of 1.1–3.0 of the MPC for hydrogen sulfide, etc. In Ust-Kamenogorsk city, 371 cases of excess in the range of 1.0–1.9 of the MPC for hydrogen sulfide were found [43].
According to the newsletters of the Republican State Enterprise “Kazhydromet,” Karaganda occupies a leading position in terms of the cities with high air pollution in the RK [43]. Therefore, the object of our research was to investigate the atmospheric air pollution of the industrial city of Karaganda, which is characterized by a sharply continental and arid climate due to its great distance from the seas, free access in summer to warm dry winds of the deserts of Central Asia, and cold, moisture-poor arctic air in the cold season. In this city, the monitoring process is carried out by eight posts: four automatic and four manual sampling posts. The northern industrial zone of the territory in Karaganda was selected to solve the problem of data assimilation; the third regional thermal power plant (TPP-3) is located in this area. A location map of the thermal power plant in Karaganda is shown in Figure 1.

Generally speaking, the scope of our research consists of two stages (Figure 2): the first stage is the process of forming an observation plan, the selection of areas for air sampling, the analysis of meteorological data, and the determination of the content of heavy metals in air samples of Karaganda city.

Technical details and explanations of the air sampling process to assess the content of heavy metals in this selected area are shown in Table 1.
Figure 3 illustrates the main characteristics of atmospheric air pollution in Karaganda city, in which phenols—1.8 of the MPC—and formaldehyde—1.5 of the MPC—show the greatest excess values.

(a)

(b)

(c)
The chemical analysis of the HM content was determined as follows. First, air at a volume of 18 m3 was passed through the “ABX” filter, meaning that the HM contained in the atmospheric air was collected on this filter. Then, the filter was burned by the method of “wet salinity” in 4 mL of 5 M HNO3. The resulting mixture of HNO3 with a filter was slightly evaporated in a water bath under a hood until wet. Then, 0.3 mL of concentrated H2O2 was added to the mixture, and the mixture was settled for 0.5 h. The mixture was then evaporated to dryness; then, 0.2 mL of HNO3 was added to the dry residue and brought to a volume of 25 mL in the cylinder with distilled water. In the obtained sample, the HM content was determined using an atomic adsorption spectrophotometer “Shimadzu” with AA-6650 electro-thermal atomization [44, 45]. The chemical analysis of air samples from Karaganda city showed their HM content.
To implement algorithms for data assimilation, the results of monitoring the content of heavy metals in the air of Karaganda in the amount of 4380 measurements were used.
The obtained data regarding atmospheric pollution with HM were verified using correlation-regression analysis. According to the calculations, the value of the correlation coefficient r = 0.9 shows a strong relationship between the content of Cu and Pb in the air of Karaganda city, which is reflected in the regression equation y = 0.7866x + 0.0134 and shown in Figure 4.

The validation of air pollution in Karaganda with HM was carried out from 1 March 2020 to 31 March 2020. Table 2 shows the values of the mean average deviation (MAD) and errors (mean square error (MSE), root mean square error (RMSE)) for this operation. Since the actual values of HM concentration are close to zero, it would be incorrect to use the mean average percentage error (MAPE).
The impurity content of the atmosphere is also affected by the wind direction. Moreover, seasonal changes in atmospheric pollution are important in this research, as they may influence the volume of atmospheric pollution. Atmospheric pollution is not only characterized by daily changes but also by the seasons of the year and by meteorological conditions. In order to achieve a comprehensive monitoring solution, information on wind, air temperature, and humidity in Karaganda was analyzed for the period of 1–31 March 2020. Weather information of the city was collected from the weather station in Karaganda (the geographic coordinates of the station are as follows: latitude 49.80, longitude 73.15, and altitude 553 m.).
The second stage of this research was applying the data assimilation algorithm to predict the spread of air pollution. Variational algorithms play an important role in modeling the distribution of pollutants and working in real time, especially when solving environmental pollution problems in the development of ecosystems. Data assimilation is the most used technique in variation algorithms. The term data assimilation covers the entire sequence of operations that begins with observations of the system with additional statistical and dynamic information that gives an assessment of its state. Data assimilation technology is a standard practice in numerical weather forecasting, and its application is becoming widespread in any circumstances in which it is intended to assess the state of a large dynamic system based on limited information. In data assimilation problems, it is necessary to predict the value of the model state function in accordance with the available observational data. Therefore, the approach is used to restore the “real” state of the system as accurately as possible, using a mathematical model, a priori information, and measurement data. The problem statement with the nonstationary transfer equation and diffusion was considered for this study [46, 47]. After multiplying the original equation by a sufficiently smooth conjugate function, the integral identity was obtained to construct discrete approximations. To evaluate and predict natural processes, the Lagrange variational principle was chosen using conjugate equations. Variational data assimilation was developed by V.V. Penenko based on the methods of sensitivity theory and conjugate problems [10, 11]. The sequential variational assimilation of observational data in real time was performed, and it was assumed that the values of the concentration field could be measured in a finite set of points in space and time. The grid function is 1 at points in the space-time grid where measurement data are available, and 0 otherwise. The approach of modeling using functions includes observational data that express the degree of proximity of the measured values and their images calculated from the models of processes and measurements. The values of the concentration field are measured in a finite set of points in space and time.
3. Results and Discussion
The existing information system for monitoring atmospheric pollution in the RK has a number of disadvantages, as its main function is to store and collect data. In this regard, for the effective operation of an atmospheric monitoring information system, it has become necessary to use mathematical modeling based on a data assimilation algorithm and to develop an appropriate module for this [48–51].
3.1. Mathematical Support of the Environmental Monitoring System Based on the Data Assimilation Algorithm
As a model of impurity transfer for the nonstationary transfer equation with diffusion, we consider the following problem statement [11, 46]:
We set the boundary conditions as the third kind:
We take as the initial conditionwhere —impurity concentration function, —initial concentration distribution, µ ˃ 0—turbulent exchange coefficient, u—impurity transfer rate, c—decay rate, f(x,t)—source function, x ϵ (0, N)—spacing interval, t ϵ (0, M)—time interval, —given coefficients, and —given functions.
We assume that the function and flow are continuous in space.
Let us introduce a grid region into consideration: uniform grids with steps of and , and the numbers of partition nodes are M and N, respectively.
For the numerical solution of the problem under consideration, we use the discrete method described in [46, 52, 53], in which a two-layer second-order approximation was used to approximate the time derivative:
As an approximation of the original differential equation, we use the discrete-analytic method proposed in [46, 52, 53] and obtainwhere is the time step and j is the step number.
Next, after writing down the integral identity and multiplying all the terms of the equation under consideration by a smooth function , which we call the conjugate, in the standard way, i.e., as a result of two integrations by parts, we obtain the discrete-analytical scheme:
On the intervals (xi-1, xi) and (xi, xi+1), we place the boundary conditions
Using the summated identity method [53], let us build a three-point diagram of the general formwhere and
The resulting three-point scheme with the found coefficients is solved using the matrix sweeping method.
3.2. Sequential Variational Assimilation of Observational Data in Real Time
Let the concentration field values be measured at a finite set of points in space and time. Let us denote by the result of measurements at the j-th moment of time at the grid point with index i and through the mask of the measurement system. To apply the method of summation identities, we assume that the grid function is equal to 1 at the points of the space-time grid where measurement data are available and 0 otherwise. The algorithm used to solve the problem of the sequential variational assimilation of data is presented in matrix notation form [46, 54] as follows:
For ,
For ,
For ,
To minimize this, we considered a quadratic functional of the form
The following Figure 5 shows relative errors of the numerical solution in comparison with the exact analytical solution for different values of the time step and for different steps in space.

According to the above-described mathematical modeling, a software module has been created for the information monitoring system. Figure 6 describes the class diagram of the developed software module for data assimilation.

With the help of the data assimilation software module, a dat-file is created in which the input and output data of the observation data assimilation algorithm are recorded. These data are required to interact with the data visualization module. The dat-file has a structure that is partially shown in Figure 7.

3.3. Testing the Implemented Algorithm
In order to test the generated algorithms, 3D graphical functions from Wolfram Mathematica 10.4 were applied. Figure 8 shows the model of the industrial area which was used to test the data assimilation algorithm. A two-dimensional version of the data assimilation was selected.

(a)

(b)

(c)
In parallel, nX (the number of points in space along the X axis = 100) mY (the number of points in space along the Y axis = 100) of the one-dimensional data assimilation problem for each time layer was solved. Data were selected from the industrial city of Karaganda. The turbulent exchange coefficient µ (nCoefficient) was 0.1 m2/s, and the transfer speed (nSpeed) was 0.1 m/s.
Figures 8(b) and 8(c) shows the solution to the data assimilation problem at different points in time and the “real” state of the system. When solving the data assimilation problems, in contrast to direct problems of modeling the propagation of pollutants, process models were complemented by observation models that describe the observed quantities in terms of state functions and the parameters of process models. This makes the procedures of applying the data assimilation algorithm correct from a mathematical point of view and increases the information content of observations. An effective algorithm for predicting the propagation of impurities in the atmosphere that simultaneously involves the parallelization of problems reduces the time required for numerical calculations, which contributes to immediate decision-making in real time when monitoring atmospheric air pollution.
4. Conclusions and Further Work
A data assimilation algorithm for monitoring the atmosphere of an industrial area was investigated. To study the algorithm, the industrial district of Karaganda city was selected as a research object. As a result of the research, an algorithm was implemented that combines two-layer discrete-analytical numerical schemes for convection-diffusion equations and algorithms for sequential data assimilation in real time. A two-dimensional version of a two-layer time-based numerical scheme based on splitting was implemented. An additional module for data assimilation was developed in order to expand the functions of the environmental monitoring information system.
In future, we plan to adapt the model to the conditions of Karaganda city, taking into account the terrain relief and trends of seasonal changes.
Abbreviations
HM: | Heavy metals |
MPC: | Maximum permissible concentration |
ECMWF: | European Center for Medium-Term Weather Forecasts |
RK: | Republic of Kazakhstan |
MAD: | Mean absolute deviation |
MSE: | Mean square error |
RMSE: | Root mean square error |
MAPE: | Mean absolute percentage error. |
Data Availability
Data on atmospheric air pollution are taken from information bulletins on the environmental situation of the Republic of Kazakhstan, which are publicly available (https://kazhydromet.kz/en/ecology/ezhemesyachnyy-informacionnyy-byulleten-o-sostoyanii-okruzhayuschey-sredy/2020).
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This research was been funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant no. AP0513599) and a grant titled “The Best Teacher.”