Research Article  Open Access
Livio Fenga, Carlo Del Castello, "COVID19: Metaheuristic OptimizationBased Forecast Method on TimeDependent Bootstrapped Data", Journal of Probability and Statistics, vol. 2021, Article ID 1235973, 7 pages, 2021. https://doi.org/10.1155/2021/1235973
COVID19: Metaheuristic OptimizationBased Forecast Method on TimeDependent Bootstrapped Data
Abstract
A compounded method—exploiting the searching capabilities of an operation research algorithm and the power of bootstrap techniques—is presented. The resulting algorithm has been successfully tested to predict the turning point reached by the epidemic curve followed by the COVID19 virus in Italy. Future lines of research, which include the generalization of the method to a broad set of distribution, will be finally given.
1. Introduction
In general, predicting the time of a peak conditional to a set of timedependent data is a nontrivial task. Often carried out in a multitasking fashion, requiring the availability of time and resources, the correct estimation of future turning points can be important in many instances but becomes crucial in the case of epidemic events. These are the typical circumstances when the forecasting exercise is conducted online and on a time series exhibiting a small sample size. However, under these conditions, the problem might become particularly complicated since statistical methods usually employed for these purposes—for example, of the type hidden Markov (e.g., Hamilton [1] and Koskinen and Oller¨ [2]) or nonparametric (e.g., Delgado and Hidalgo [3]) models—not only are very demanding in terms of building and tuning procedures but typically requiring the availability of a “long” stretch of data. In addition to that, the time series related to epidemics usually show highly nonlinear dynamics, which, if not preprocessed, make them not suitable for standard linear models. On the other hand, attempting to fit nonlinear models, e.g., of the type selfexciting threshold autoregressive (for example, [4]) or the artificial neural network [1, 5], is not a viable option, due to the abovementioned sample size issues. In any case, when an illtuned model is fitted on a time series, reliable outcomes should not be reasonably expected. Therefore, an approach able to perform under the aboveoutlined conditions is proposed. In essence, the problem is solved by building a unified framework in which two powerful techniques—belonging to two different branches of computational statistics—are sequentially employed to lower the amount of uncertainty embedded in the observed data and to find a (possibly global) optimum through which the “best” statistical distribution for the dataset at hand is found.
2. The Unified Framework
As above stated, the approach studied in this study is rooted in a unified framework in which two powerful paradigms are exploited. The first one, which belongs to the socalled computer intensive statistical methods, is the bootstrap, which will be detailed in Section 4. By using this technique, a high number of “bona fide” replications of the original data are generated. In essence, each of the bootstrap series obtained “mimics” the observations recorded, so that the number of series observed—which in real life is typically equal to one—becomes (very) high. Repeating a mathematical operation (e.g., the computation of an estimator) B times makes possible (i) the assessment of the degree of uncertainty associated with the obtained estimations and (ii) lessbiased estimators. The latter goal is achievable by the design of the bootstrap method, as through its replications, the use of central tendency functions, such as mean or median, is possible. The second tool employed is an optimization method for the selection of the “best” parametrization of a class of statistical distribution commonly used in the literature. In practice, this step is performed in the socalled bootstrap world, meaning that it is sequentially repeated for each bootstrap sample. By doing so, the degree of uncertainty associated with the selected distribution is lower than the one obtainable by processing just one set of data (the real observations).
3. Data and Contagion Indicator
This study employs the data related to COVID19, collected and regularly updated by the Italian National Institute of Health (an agency of the Italian Ministry of Health) and by the Italian Civil Protection Department. The whole dataset is freely and publicly available in a comprehensive database, accessible on the Internet at the web address https://github.com/pcmdpc/COVID19/tree/master/datiregioni. The dataset includes 38 daily data points collected at national level during the period starting from January 19 to March 27. The used indicator—which will be referred to as the variable of interest—is obtained by subtracting, for each day, from the total number of people tested positive of corona virus both the number of the deaths and of the recovered.
4. The Resampling Method
The choice of the most appropriate resampling method is far from being an easy task, especially when the identical and independent distribution (i.i.d.) assumption (Efron’s initial bootstrap method) is violated. Under dependence structures embedded in the data, simple sampling with replacement has been proved (for example, Carlstein [6]) to yield suboptimal results. As a matter of fact, i.i.d.based bootstrap schemes are not designed to capture and therefore replicate dependence structures. This is especially true under the actual conditions (small sample sizes and strong nonlinearity). In such cases, selecting the “right” resampling scheme becomes a particularly challenging task as many resampling schemes are not designed to capture the dynamics typically found in epidemiology. As an example, the wellknown resampling method called sieve bootstrap—introduced by Buhlmann [7]—cannot be employed due to the quadratic shape almost always found in this type of time series.
In more details, while in the classic bootstrap, an ensemble Ω represents the population of reference the observed time series is drawn from, and in MEB, a large number of ensembles (subsets), say {ω_{1}, ..., ω_{N}}, become the elements belonging to Ω, each of them containing a large number of replicates {x_{1}, ..., x_{J}}. Perhaps, the most important characteristic of the MEB algorithm is that its design guarantees the inference process to satisfy the ergodic theorem. Formally, denoting by the symbol · the cardinality function (counting function) of a given ensemble of time series {x_{t} ∈ ω_{i}; i = 1, ..., N}, the MEB procedure generates a set of disjoint subsets Ω_{N} ≡ ω_{1}∩ω_{1} ···∩ω_{N} s.t. EΩ_{N} µ˜ (x_{t}), with µ(·) being the sample mean. Furthermore, basic shape and probabilistic structure (dependency) are guaranteed to be retained ∀x^{∗}_{t, j} ⊂ ω_{i} ⊂ Ω.
MEB resampling scheme has not negligible advantages over many of the available bootstrap methods; it does not require complicated tuneup procedures (unavoidable, for example, in the case of resampling methods of the type block bootstrap), and it is effective under nonstationarity. The MEB method relies on the entropy theory and the related concept of (un)informativeness of a system. In particular, the maximum entropy of a given density δf (x) is chosen so that the expectation of the Shannon information H = E (−log δo (x)) is maximized, i.e.,
Under mass and meanpreserving constraints, this resampling scheme generates an ensemble of time series from a density function satisfying (4). Technically, the MEB algorithm can be broken down into the 8 steps detailed as follows:(1)A sorting matrix of dimension T × 2, say S_{1}, accommodates in its first column the time series of interest x_{t} and an index set, i.e., I_{ind} = {2, 3, ..., T}, in the other one.(2)S_{1} is sorted according to the numbers placed in the first column. As a result, the order statistics x_{( t )} and the vector I_{ord} of sorted I_{ind} are generated and, respectively, placed in the first column and second column.(3)“Intermediate points,” averaging over successive order statistics, are computed, i.e., , and intervals I_{t} constructed on c_{t} and r_{t} are derived, using ad hoc weights obtained by solving the following set of equations:(i)(ii)(iii)(4)From a uniform distribution in [0, 1], T pseudorandom numbers are generated and the interval R_{t} = (t/T; t + 1/T] for t = 0, 1, ..., T − 1 is derived, in which each p_{j} falls.(5)A matching between R_{t} and I_{t} is created according to the following equations: so that a set of T values {xj, t} as the j^{th} resample is obtained. Here, θ is the mean of the standard exponential distribution.(6)A new T × 2 sorting matrix S_{2} is defined, and the T members of the set {x_{j,t}} for the j^{th} resample obtained in Step 5 is reordered in an increasing order of magnitude and placed in column 1. The sorted I_{ord} values (Step 2) are placed in column 2 of S_{2}.(7)Matrix S_{2} is sorted according to the second column so that the order {1, 2, ..., T} is then restored. The jointly sorted elements of column 1 are denoted by {x_{S,j,t}}, where S recalls the sorting step.(8)Steps 1–7 are repeated a large number of times.
In order to give a clearer picture of the MEBOOT algorithm, in Figure 1, its flowchart is reported. As it can be noticed, four different functions characterize this resampling method. The sorting function plays a key role, as it operates in two different points of the algorithm, i.e., to order the values belonging to the original time series (S_{1}) and to restore the given time sequence for each of the bootstrapped data (S_{2}). Besides the pseudorandom function generator, whose employment is straightforward, the two remaining functions, i.e., the average and the matching, are, respectively, used to compute the mean of the maximum entropy density and to create a matched sequence of the intervals R_{t}’s.
5. BootstrapDriven Forecast Optimization
This section aims to define an alternative method to forecast our variable of interest by means of an optimization approach designed to fit a set of distribution functions applied to each bootstrap replication.
The variable of interest is assumed to approximately describe a logistic function, scaled by a normalizing parameter h (representing the asymptotic number of cases), as shown in Figure 2, so that its derivative is a Gaussian function rescaled accordingly, i.e.,
where t represents the number of days since pandemic has started in Italy, h represents the magnitude scale, µ represents the peak of daily cases (scale), and σ represents the standard deviation (shape).
Now, given(i)The parameter vector θ = (h, µ, σ), where θ ∈ Θ(ii)The total active cases x_{t} since the infection spread(iii)A generic bootstrap distribution x^{∗}_{t,i} ∈ ω ⊂ Ω, where i ∈ {1..N} is the i^{th} bootstrap within N replicates(iv), where is the theoretical value; the objective function is defined as follows:
This is a nonlinear unconstrained optimization problem which cannot be addressed using standard global optimization methods (e.g., of the type simplex, branch and bound, or branch and cut), which are designed for linear programming (LP) [8] and mixedinteger linear programming (MILP) [9], within the field of discrete combinatorial problems [10].
On the other hand, local search simulated annealing metaheuristic designed to approximate global optimization can be used to solve unconstrained nonlinear problems in a large space.
6. Simulated Annealing Optimization
Simulated annealing (SA), following Van Laarhoven and Aarts [11], is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic used to approximate global optimization in a large search space for an optimization problem.
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. Both are attributes of the material that depend on its thermodynamic free energy. Heating and cooling the material affects both the temperature and the thermodynamic free energy. Simulated annealing can be used to approximate the global minimum for a function with many variables. This approach was used by Kirkpatrick et al. [12] to solve the traveling salesman problem. They also proposed its current name, simulated annealing. The notion of slow cooling implemented in the simulated annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored. Accepting worse solutions is a fundamental property of metaheuristics because it allows for a more extensive search for the global optimal solution.
In general, simulated annealing algorithms work as explained next. The temperature progressively decreases from an initial positive value to zero. At each time step, the algorithm randomly selects some neighbor state s^{∗} of the current state s, measures its energy (in this case, the MSE_{i} (x^{∗}_{t, i}) on the bootstrap distribution), and decides between moving the system to the state s^{∗} or staying in state s according to the temperaturedependent probability of selecting better or worse solutions. During the searching process, such a probability, respectively, can remain at 1 (or any positive number smaller than 1) or decrease towards zero.
6.1. Simulated Annealing on Bootstrap Pseudocode
The following pseudocode presents the simulated annealing heuristic applied to bootstrap replicates. For each bootstrap, it starts from a state s_{0} and continues searching solutions until temperature decay reaches a low temperature. In the process, the call Neighbour (s, φ) generates a randomly chosen neighbor of a given state s; the call Random U (0, 1) picks and returns a value in the range [0, 1], uniformly at random. The annealing schedule is defined by the temperature decay based on the fixed cooling rate ρ.(i)Let the current temperature set at T = t_{0}(ii)Set the cooling rate ρ(iii)For each bootstrap series i in {1, ..., N}
(i)Let current solution s = s_{0}(ii)Loop while temperature T > 1(i)Pick a random neighbor, s_{new} ← Neighbor (s, φ), where φ is the radius around s(ii)If Prob (E (s), E (s_{new}) T, k_{B}) ≥ Random U (0, 1): s ← s_{new}(iii)T ← T∗ (1 − ρ)(iii)Output the final state s on i^{th} bootstrapHere, the function Prob (E (s), E (s_{new}) T, k_{B}) is the acceptance probability at each iteration given temperature T and Boltzmann constant (Aarts and Korst [13] k_{B}), i.e.,
The employed criterion for the acceptance of bad solutions has been preferred to the Boltzmann criterion if Prob (E (s), E (s_{new})T) (as suggested by the referee), since it exhibited a much higher flexibility.
7. Empirical Evidence
In order to improve local search speed, the parameter space Θ can be bounded to Θ^{0}⊂Θ, so that useless tails have been removed. Moreover, no information is lost under parameters space reduced to
The SA parameters have been iteratively tuned on a trial and error basis, to maximize the procedure overall efficiency. To this end, we set the initial temperature T_{0} = 10000, the cooling rate ρ = 0.0006, the Boltzmann constant k_{B} = 100, and the radius φ = 0.3 (θ_{max}−θ_{min}). The optimization procedure has been applied to 500 bootstrap replications, derived from the positive cases in Italy between February 19 and March 27. The related results have been used to create the frequency distribution upon magnitude scales and peaks pairs, as shown in Table 1 and Figure 3.

The average values, computed for each parameter, are as follows: Avg (h) = 122178, Avg (µ) = 36.7, and Avg (σ) = 10.8.
Approximated confidence bounds are derived from the normal distribution (with 0.01 significance level) for each of the parameters, i.e.,
By evaluating µ confidence intervals, a peak day of daily cases between March 25 and 26 is inferred, while the h magnitude parameter shows an asymptote in the curve of the total positive cases between 120000 and 125000. The new cases curve has an asymptotic behavior, so cutting the tail beyond the 0.1 cutoff for new infections, the pandemic time window is hypothetically over after May 16.
This behavior is clearly described in Figures 4 and 5, which have been built considering the Gaussian (Figure 4) and the cumulated Gaussian (Figure 5) curves around the 99% confidence lower and upper bounds for each parameter.
8. Further Developments
The SA optimization for fitting bootstraps derived from real data is applicable to any kind of distribution known in the literature and empirical distributions as well. This research highlights a great potential if the aforementioned procedure is enhanced by introducing an automatic routine for the “optimal” choice of either known (ξ_{r}) or empirical (ξ_{e}) distributions, where (ξ_{r}, ξ_{e}) are in a predefined distribution space Ξ. In some more details, the algorithm could include a preprocessing light SA optimization (with a higher cooling rate ρ to cut down the number of SA iterations) designed to reduce the distribution space Ξ as well as the parameter space Θ_{ξ} for each distribution ξ ∈ Ξ and thus boost the optimization search.
Data Availability
The data used to support the findings of this study are publicly available, free of charge, at the web address https://github.com/pcmdpc/COVID19/tree/master/datiregioni.
Disclosure
The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Italian National Institute of Statistics or any other entities.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
 J. D. Hamilton, “A new approach to the economic analysis of nonstationary time series and the business cycle,” Econometrica: Journal of the Econometric Society, vol. 57, pp. 357–384, 1989. View at: Google Scholar
 L. Koskinen and L.E. Öller, “A classifying procedure for signalling turning points,” Journal of Forecasting, vol. 23, no. 3, pp. 197–214, 2004. View at: Publisher Site  Google Scholar
 M. A. Delgado and J. Hidalgo, “Nonparametric inference on structural breaks,” Journal of Econometrics, vol. 96, no. 1, pp. 113–144, 2000. View at: Publisher Site  Google Scholar
 M. P. Clements, P. H. Franses, J. Smith, and D. Van Dijk, “On SETAR nonlinearity and forecasting,” Journal of Forecasting, vol. 22, no. 5, pp. 359–375, 2003. View at: Publisher Site  Google Scholar
 M. H. Hassoun, Fundamentals of Artificial Neural Networks, MIT press, Cambridge, MA, USA, 1995.
 E. Carlstein, “The use of subseries values for estimating the variance of a general statistic from a stationary sequence,” The Annals of Statistics, vol. 14, no. 3, pp. 1171–1179, 1986. View at: Publisher Site  Google Scholar
 P. Bühlmann, “Sieve bootstrap for time series,” Bernoulli, vol. 3, no. 2, pp. 123–148, 1997. View at: Publisher Site  Google Scholar
 K. G. Buhlmann, Linear Programming, Springer, Berlin, Germany, 1983.
 M. Bènichou, J. M. Gauthier, P. Girodet, G. Hentges, G. Ribière, and O. Vincent, “Experiments in mixedinteger linear programming,” Mathematical Programming, vol. 1, no. 1, pp. 76–94, 1971. View at: Publisher Site  Google Scholar
 C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Courier Corporation, Honolulu, HI, USA, 1998.
 P. J. Van Laarhoven and E. H. Aarts, Simulated Annealing: Theory and Applications, Springer, Berlin, Germany, 1987.
 S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983. View at: Publisher Site  Google Scholar
 E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines, Wiley, Hoboken, NJ, USA, 1988.
Copyright
Copyright © 2021 Livio Fenga and Carlo Del Castello. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.