Abstract
Ebola is an infectious virus that causes Ebola hemorrhagic fever in primates and humans, which was first found in 1976. The Ebola virus outbreak in West Africa in 2014 was the largest ever. A lot of researchers use mathematical models to analyze the characteristics of infectious diseases. However, many parameters in the model cannot be estimated completely. To ease the difficulty, we proposed an approach to estimate the parameter based on genetic algorithm (GA). GA uses the natural selection method of the fittest to find the optimal solution of the model. The least residual squares sum is used as fitness function to measure the performance of GA in parameter estimation. Moreover, we used a dynamical model and the real data of Ebola in Sierra Leone to verify the validity of GA. The experimental results indicate that the GA has strong competitiveness compared with the classical method, and it is a feasible method for estimating the parameters of infectious disease models.
1. Introduction
Ebola virus belongs to the family Filoviridae and is considered a prototype pathogen of viral hemorrhagic fever [1]. The virus was first detected in the Ebola river basin in southern Sudan and Congo in 1976 [2–9]. Since the discovery of Ebola virus, only four species of this virus cause human disease, namely, Zaire ebolavirus, Tai Forest ebolavirus, Sudan ebolavirus, and Bundibugyo ebolavirus [10]. The Reston virus causes only animal disease, not human disease. Therefore, the source of the Ebola virus is unknown. Researchers found evidence of asymptomatic infection of Ebola virus in three species of the fruit bats, which suggested that the bats are most likely to be the source of the deadly virus [11]. The bats could carry Ebola virus to other animals and even humans [12–14].
Ebola virus is transmitted through the saliva, the urine, and other body fluids [15, 16]. People can cause infection by direct contact with body fluids which carry the virus, with the virus entering the body through the nose, the mouth, the eyes, and the damaged skin [17]. Humans become infected after contact with the blood, the body fluids, and the infected fruit bats, as well as through the sexual contact [18].
Since there were no good treatments and approved vaccines at the time, the management of Ebola virus was limited to the use of obstacles and palliative care to suppress transmission [19]. A largescale Ebola outbreak occurred in West Africa in 2014, mainly in Guinea, Liberia, and Sierra Leone. The number of confirmed cases is far greater than that in the past [10]. The lack of effective preventive measures at the time resulted in more people being infected with the Ebola virus [20]. In [21], the authors investigated the effectiveness of small interfering RNAs treatments for Ebolainfected patients. RNA interference can suppress the expression of viral genes; thus it is effective in suppressing Ebola virus replication, and the authors developed monoclonal antibodies against Ebola glycoprotein for the treatment of Ebolavirusinfected people [22]. In addition, some researchers used Sierra Leone’s disease data to study mathematical models of Ebola virus, predict the progress of the epidemic, and propose preventive control measures and recommendations [5, 23–25].
Research on infectious diseases using dynamic models has become one of the important methods [26–32]. The propagation coefficient of the disease in the model affects the prediction results directly, and, consequently, it is important to estimate the propagation coefficient correctly. Classical parameter estimation methods are the Markov Chain Monte Carlo (MCMC) method, the leastsquares method, and so on. The basic principle of the MCMC method is to construct a Markov chain by using the joint posterior probability distribution of the model propagation coefficients and assign any initial value to the simulation until it converges to a stable distribution. This determines the propagation coefficient [33–35]. There are many improved MCMC methods, such as using sequential Monte Carlo (SMC) filter techniques to estimate the propagation coefficients in the model [36]. However, firstly, the traditional method is limited by the calculation cost of the highdimensional nonlinear model, which may take a lot of calculation time; it is usually not easy to obtain highprecision results, and it is not possible to get all the propagation coefficients at once [37, 38]. Secondly, the numerical estimation of the marginal probability distribution is difficult to achieve in the highdimensional inversion model [39]. The leastsquares method is performed by convolving the simulated data with the real data [40]. Although it has low calculation cost and generality, it does not consider the uncertainty of the inverse problem solution, and the initial value of the propagation coefficient will affect the efficiency of the algorithm. The leastsquares method has certain flaws in determining the initial value. If it is set close to the optimal propagation coefficient, the result will be obtained quickly. If it is set far from the optimal propagation coefficient, it will increase the time of the algorithm [41].
In this paper, we present a method to solve inverse problems of differential equations based on GA. The GA is a method that is widely used in parameter estimation and other fields, and it has been proven to be a reliable method for estimating parameters based on nonlinear functions [42, 43]. It has a powerful adaptive search technology and uses the natural selection method of the fittest to simulate the evolution process, and thereby it can effectively solve the optimization problem [44]. When searching in highdimensional models, GAs are superior to the other traditional search techniques due to their simplicity, effectiveness, versatility, and robustness [45, 46]. We have used the GA of adaptive mutation operator to estimate the parameters of differential equations. The advantage of this method is that the parameters in the highdimensional model can be completely estimated by a small amount of data and all parameters combinations can be quickly obtained in a limited evolution process. In addition, an effective combination of multiple propagation coefficients can be obtained by GA for reference in studying the propagation dynamics.
The remainder of this paper is organized as follows. In Section 2, we introduced the transmission dynamical model of Ebola in Sierra Leone and the theories and processes of GA. In Section 3, we estimated the values of the parameters in a dynamical model based on GA. What is more, we validate the accuracy of the experiment results. Finally, we give the discussion and conclusions in Section 4.
2. Main Method
2.1. A Dynamical Model of Ebola Virus Transmission in Sierra Leone
The time series of Ebolaconfirmed case reports were collected from the World Health Organization (WHO) and the Ministry of Health of Sierra Leone. The data include the Ebola outbreak in 14 regions of Sierra Leone, including the suspected cases , the probable cases , and the hospitalconfirmed cases , which are thought to represent the best available data of the Ebola epidemic. Due to the fact that hospitalization is a result of real infections, while suspected and probable cases may not be completely converted into hospitalized cases, it is more accurate to use hospitalization cases to indicate the actual number of the Ebola infections. We collected the newly infected cases for 34 weeks from May 19th, 2014, to January 11th, 2015. More detailed data can be found in [47].
We used a GA based on adaptive mutation operator. Think of the propagation coefficient of the Ebola virus model as a genetic target. It is binaryencoded, and then the genetic operators of random selection with elitism, multipoint crossover, and gene site mutation are used to simulate evolution to find the optimal solution. Set evaluation index for parameter genetic process as fitness function which is a sum of variances of fitted data and real data. We estimated the parameters based on the dynamical model established in [6]. Record the optimal parameter set of each generation and perform comparison with the optimal parameter set of the next generation, always save the optimal set, and wait until the evolution has completed obtaining all the parameters in the model.
Based on other literature analyses, this article divides Ebola virus transmission into seven categories, namely, susceptible , exposed , suspected individuals may be misdiagnosed , probable individuals , hospitalconfirmed cases , the individuals who may infect others at a funeral , and removed [6]. Figure 1 depicted the transmission mechanism of the Ebola virus.
Consequently, we have used the following system of equation (1) to simulate the transmission dynamics of the Ebola virus in Sierra Leone and the biological meanings of parameters can be obtained in Table 1. We quantified the uncertainty of parameter estimates, and we give the 95% confidence intervals in Table 1.
2.2. Genetic Algorithm
GA is a method for finding solutions based on biological evolution process [42]. The process includes random selection, crossover, and mutation for an individual with the best combination of genes. GA begins with initializing the propagation coefficient in model (1) for binary encoding. The encoding length is determined by parameter range and accuracy. We used to represent the parameters set: Assuming that all propagation coefficients are within [0, 1], the accuracy is 4 digits after decimal point, and thus the coding length can be determined by the following formula:
Therefore, a parameter can be represented by 14 bits of binary, and par consisting of all parameters needs to be encoded with bits of binary. We collected the disease status of Ebola for 34 weeks, and the detailed data are shown above, so we set the initial population of parameters to 34 parameters set. Afterwards, we need to determine the fitness function; we take the minimum residual sum of squares between the solution of the infected case in the model and the actual infected case: where represents the infected cases in the model at time and represents the actual infected cases at time . We use fitness function to evaluate the initial population and give the initial fitness value of parameters.
GA mainly includes three genetic operators: selection, crossover, and mutation. Selection is to apply the selection operator to the group. The purpose of selection is to inherit the optimized parameters directly to the next generation or to generate a new to the next generation through pairing and crossover. Selection operations are based on the fitness evaluation of parameters in the population. Here we adopt a random selection combined with elitism, which means that we will copy the parameters with higher fitness once, replace the ones with the least fitness, and retain the best parameters of each generation. This is the elite strategy; then we randomly select parameters for crossover and mutation.
Crossover operator plays a key role in GA. It is mainly divided into singlepoint crossover, twopoint crossover, and multiplepoint crossover. The commonly used is the singlepoint crossover; that is, a cross point is randomly set in the parameter string, when the intersection is performed, in front of this point or partial structures of two parameters sets exchange and thus two new parameters sets are generated. An example of a singlepoint crossover is shown in Figure 2. Because our parameter binary is too long, singlepoint crossover cannot meet our needs, so we chose multiplepoint crossover to increase the diversity of parameters.
The mutation operator is to change certain gene positions of parameter strings in the population; for example, 0 becomes 1, and 1 becomes 0. An example of mutation is shown in Figure 3. We adopted the gene locus mutation with adaptive mutation operator in GA; that is, each gene position was mutated with a certain mutation probability. Moreover, the mutation probability can be adaptively adjusted by the parameters set fitness. When the difference between parameters set fitness and the average fitness of the population is small, it means that parameters are close to each other, which is not conducive to the next crossover. Therefore, it is necessary to increase the mutation probability and reduce the mutation probability when the difference is large, because the mutation probability is usually between 0.001 and 0.1, and the probability of mutation is small, so that it is not easy to destroy the genes of the dominant parameters, and it can jump out when the algorithm falls into the local optimal solution. The mutation characteristic of GA can make the solution process randomly search the entire space where the solution may exist and ensure the diversity of the population, so the global optimal solution can be obtained to a certain extent.
Next, we solve model (1), decode the parameters into decimalism, and substitute them into the model solution to get the estimated value of confirmedinfection cases, that is, hospitalization cases. Fitness function (3) is used to fit the real infected cases. Then we find the parameters with the highest fitness in this generation, that is, the parameters with the least error, and keep them in the nextgeneration genetic process. Afterwards, we cyclically execute selection, crossover, mutation, and evaluation of new parameters set until the maximum number of iterations is reached.
The general steps of using GA to estimate parameters of Ebola model are explained as follows: Step 1: consider the parameter to be estimated as a gene chromosome, define the parameter using binary coding, and then initialize the population. Step 2: assign a fitness value to each parameter using equation (3). Starting from the second generation of parameters, parameters are ranked from small to large according to fitness values, and the first two parameters sets with the greatest fitness are duplicated once to replace the two parameters sets with the smallest fitness. Step 3 (selection process): add randomly initialized parameters to the population to increase population diversity, and then two parameters sets are randomly selected as paternal parameters sets. Step 4 (crossover): two new offspring parameters are generated by crossing the two parents at multiple points. Step 5 (mutation): use the gene locus mutation described above in combination with adaptive mutation operator. Step 6: solve equation (1) using the ode function. Step 7: convert the types of parameters sets from binary to decimal, substitute it into the solution of equation (1), obtain the predicted value of the diseased cases, and use equation (3) to evaluate the fitness value of the new parameters set to obtain an optimal parameters set. Step 8: when the fitness of the new offspring produced by genetic manipulation is higher than that of the parent, the new parameters sets replace the parents and are inserted into the parent population for the next genetic manipulation. If the optimal individual remains unchanged for 30 consecutive times, multiplepoint crossover and mutation are carried out. Step 9: save the best parameters set of this generation. Step 10: if the number of iterations is not reached, proceed with step 2.
The above steps are executed iteratively until the termination condition is reached. Parameter estimation is finished whenever the genetic operation completed. The parameter values to be used in the model are the optimal solution of the last generation of parameters, and the specific values are shown in Table 1. We can study the propagation dynamics and preventive measures of Ebola.
3. Main Results
There are many methods for parameter estimation, including Markov Chain Monte Carlo (MCMC) method and leastsquares method. It is not easy to analyze and estimate the parameters in the infectious disease model because there are many parameters in the model which cannot be estimated fully. Consequently, we propose to estimate the parameters in model (1) using GA, which are described in Section 2. Algorithm 1 shows the scheme of GA used on parameter estimation of the Ebola model. In this algorithm, par denotes a set of parameters, constants a and b represent two parameters sets randomly selected from par set and crossed according to the crossover probability to get and . In mutation, and are changed to and by gene locus mutations, according to mutation probability . IC denotes the initial value of each variable in equation (1) and t indicates the time of virus transmission. “count” indicates that successive generations of optimal values have not changed.

In this study, we used different genetic operators and fitness functions (3) to conduct data fitting for the real hospitaldiagnosed cases. We conducted 70 experiments and selected a set of parameters which performed well. The results are listed in Table 1. After experimental verification, we chose the crossover probability to be 0.8 and the initial mutation probability to be 0.01. The genetic algebra is 3000 times. The fitting result for the cumulative number of cases is shown in Figure 4.
Figure 5 represents the evolutionary process of the optimal value of each generation in GA. With the increase of genetic algebra, the error between the model solution and the real data is gradually decreasing, which means that the fitness of the model is increasing. Until the maximum genetic algebra is reached, a set of nearoptimal parameters are obtained. Since the error persists, we can regard the suboptimal solution as the optimal solution. The subgraph in Figure 5 is an enlarged view of early inheritance. Although this set of parameters performs well and converges quickly, we can see from the figure that it converges around 200 generations, but, because of the instability of GA, sometimes it takes a long time to converge to the optimal value. Therefore, we unified the genetic algebra to 3000 generations during the experiment. It can be seen from the figure that the GA can quickly reduce the error in the early stage, and the convergence speed is very fast, which shows the effectiveness of our algorithm.
In [5], the parameters and are given, and, in [23], the parameters and are given. Meanwhile, the parameters are given in [6]. Although some of the propagation coefficients are very different from those in other papers, because the deterministic models are different, we have that cabins are more, and we cannot directly compare them with those in other papers. We are uncertain about patients in the exposed period, so the propagation coefficient associated with it is uncertain. We can only use GA to solve each parameter value. Thus, we only perform comparison with some important parameters. In this paper, are close to the above literature to some extent. We obtain the basic reproduction number of Ebola virus in Sierra Leone which is calculated in the same way as in Xia et al.’s work [6]; it is basically consistent with given in [5] and given in [23]. The results show that the GA can accurately estimate all the parameters in the model, and the data are fitted well. Another advantage of the GA is that you are free to set the parameters precision, but you need to consider the length of the parameters set, which is very useful for getting highprecision parameters. It can be seen that GA can be used as a feasible method for parameter estimation.
Due to the fact that we are using a certain mathematical model, we only need to use the actual diseased data as test data and apply them in model (1) to get a set of nearoptimal parameters. We set each of the parameter ranges to [0, 1], randomly generate the initial parameters as the input of the model, and use the GA to obtain a set of parameters that fit the model better. We can use GA to give a variety of parameter combinations for researchers’ reference. However, some parameters may have overfitting problems, so we conducted 70 experiments to select a set of parameters that are more realistic. We calculated the 95% confidence intervals for the best set of parameters in the 70 experiments, and almost all parameters are within the confidence intervals, which also shows the validity of the GA for parameter estimation. The confidence intervals for all the parameters are shown in Table 1.
4. Discussion and Conclusion
This work proposed a GA parameter estimation method based on adaptive mutation operator, which could be applied to biomathematical models and differential equations in other fields. Through GA’s adaptive search parameters, the parameters in various models can be effectively found, and multiple parameter combination schemes can be given, which reduces the process of manual adjustment of parameters by researchers and provides an effective reference for scientific research. In addition, in the study of infectious diseases, due to the complexity of the model, all of the parameters are often difficult to obtain, and the basic reproduction number for evaluating whether there is an outbreak of an infectious disease needs to be obtained by calculating parameters; and the parameters can indicate the transmission dynamics of infectious diseases and the scale of transmission visually. Thus, parameter estimation is the most important, and the GA can be used to effectively find all the parameters in the model, which make up for the shortcomings of the traditional method, such as long calculation time and slow convergence speed. The method we have proposed has been evaluated in the experimental process, where the performance has reached the desired level. Finally, the GA can be applied not only to infectious disease models but also to other mathematical and physical models, and it proposes a new idea for parameter estimation.
Since the initial population of GA needs to assign values randomly, it will lead to output instability; we cannot quantify the uncertainty due to the discrete distribution of the output parameters, which is a common problem of GA and needs to be improved. On the other hand, fitness function is an important factor to determine the pros and cons of genetic evolution, so it is very important to select a suitable fitness function. Finally, a very important step in GA is to find the solution of the model, so it may not be suitable for the unsolvable equation, but it is applicable for most models. There are many improved genetic algorithms, and we can improve the existing algorithms and expand their scope of application.
The GA can solve specific problems with only a small amount of data, and the corresponding fitness function can be used for searching. Therefore, it is a generalpurpose algorithm used in many fields. In function optimization, GA can estimate not only the parameters of biomathematical model but also the kinetic parameters of microorganisms. It can also solve the performance parameters of nonlinear physical problems. In addition, GA performs well in path planning, cloud computing task scheduling, communication network design, image feature extraction, and other fields. Furthermore, there are some studies that combine GA with machine learning methods such as neural network.
In this paper, we introduce the basic process of adaptive mutation genetic algorithm, introduce how to use GA to estimate the parameters of Ebola virus model in Sierra Leone, and give the data curve and genetic iterative process for fitting actual infections. In addition, we offer a new idea for parameter estimation in other research fields, such as dynamical model of disease transmission [48–50] or predatorprey interactions [51–53] with spatial effects [32, 54–56] in the form of reactiondiffusion equations.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by Program for the Outstanding Innovative Teams (OIT) of Higher Learning Institutions of Shanxi, Natural Science Foundation of Shanxi Province (Grant no. 201801D221003), and China Postdoctoral Science Foundation (Grant no. 2017M621110).