Analysis and Models in Interdisciplinary Mathematics 2016View this Special Issue
Capturing the Data Uncertainty Change in the Cocaine Consumption in Spain Using an Epidemiologically Based Model
A probabilistic model is proposed to study the transmission dynamics of the cocaine consumption in Spain during the period of 1995–2011. Using the so-called probabilistic fitting technique, we study if the model is able to capture the data uncertainty coming from surveys. The proposed model is formulated through a nonlinear system of difference equations whose coefficients are treated as stochastic processes. A discussion regarding the usefulness and limitations of probabilistic fitting technique in order to capture the data uncertainty of the proposed model is presented.
In , the authors presented an epidemiologically based model to study the transmission dynamics of the cocaine consumption in Spain. This model worked well in the presented scenario, 1995–2005, where the population of cocaine consumers had an increasing trend. However, in 2008, the economic crisis began and the Spanish Health Ministry started to apply the National Drug Plan. As a consequence, the nonconsumer population increased and the consumer populations started to decrease. This sudden change could not be captured by the model as it was shown in .
In this work, we propose to use a variation of the model stated in  and assume that the model parameters are variable over the time in order to see if the variable model parameters are able to adapt and help the model to capture the observed changes in the cocaine consumption.
With this aim, we use the probabilistic fitting technique recently developed in [3, 4] by some of the authors. With this technique, we expect to capture the data uncertainty (data coming from surveys), despite the change of trend, and provide an estimation to the probability distribution of the model parameters. The latter will help us study if the observed data variation also produces variations in some or all model parameters as well as their quantification.
These types of mathematical models also have been used in the study of epidemics and other drug addictions, such as alcohol, tobacco, ecstasy, or heroin addiction [5–9], and in the approach to other sociological topics that are spread by social contact as obesity or extreme ideological behaviors [10, 11].
The work is organized as follows. In Section 2, we present the data we are going to use about cocaine consumers in Spain. In Section 3, we build a type-epidemiological model describing the cocaine consumption dynamics in Spain. Also, the model is scaled in order to match data and model magnitudes. In Section 4, we recall and adapt the probabilistic fitting technique introduced in . This technique is applied to the model in order to obtain both data estimation with uncertainty and an estimation of the probability distribution of the model parameters. In Section 5, we present and discuss the results. In Section 6, conclusions are drawn.
In this study, we use data from the Survey on Alcohol and Drugs in Spain (EDADES), part of the Spanish National Drug Plan . This survey is published every two years. Specifically, we have focused on the following key question quoted in this survey: how often have you consumed cocaine? The available responses were as follows: “never consumed (nonconsumers),” “at least once in your life (occasional consumers),” “at least once in the past year (regular consumers),” and “at least once in the past 30 days (habitual consumers).” This survey was launched in 1995 and we have data until 2011. The data are collected in Table 1.
In order to simplify the notation, we are going to define the set of time instants where survey data are available:
3. Model Building
From the available survey data, we then introduce an epidemiological model to describe the dynamics of the four subpopulations over the time. While a traditional epidemiological study considers the transmission of a biological contagion, for our study, we will apply a social contagion approach. The underlying assumption is the following: we can consider cocaine consumption to be a contagion insofar as the probability that one individual becomes a cocaine consumer depends on the interaction with people who already exhibit this behavior, that is, fellow cocaine consumers [1, 13]. Let us assume homogeneous population mixing; that is, each individual can contact with any other individual [14, 15]. With this assumption, we can now build a system of difference equations based on the model introduced in  to further describe the transmission dynamics.
As with the survey results, we divide the Spanish 15–65-year-old population (the age group considered in the surveys) into four subpopulations:(1), the amount of people who have never consumed cocaine at year (2), the amount of people who occasionally consume cocaine (at least once in their lives) at year (3), the amount of people who regularly consume cocaine (at least once in the past year) at year (4), the amount of people who habitually consume cocaine (at least once in the past month) at year
Furthermore, we have the total populationwhich is not going to be a constant value over the time.
The transitions between the different subpopulations are determined as follows:(i)Let us consider that the newly recruited 15-year-old individuals become members of subpopulation at rate ; that is, we consider that they have never consumed cocaine before.(ii)An individual in the nonconsumer population can become an occasional consumer through contact with occasional, regular, or habitual consumers. Then, this social contagion term can be modeled with .(iii)An occasional consumer can increase their consumption and become a regular consumer. This transition term can be modeled by .(iv)A regular consumer can increase their consumption and become a habitual consumer. We assume that this can be modeled by the term .(v)A habitual consumer can decide to go to drug-treatment therapy. If he/she stays in therapy for 6 months, according to experts, he/she can be considered a nonconsumer . This term is modeled by .
Using the above assumptions, a dynamic cocaine consumption model for the 15–65-year-old Spanish population is given by the following nonlinear system of difference equations:where(i) is the birth rate (the mortality between 0 years old and 14 years old is so small that it can be discarded),(ii) is the death rate for nonconsumers,(iii) is the death rate for occasional consumers,(iv) is the death rate for regular consumers,(v) is the death rate for habitual consumers,(vi) is the time-dependent transmission rate for cocaine consumption,(vii) is the time-dependent transition rate between occasional and regular cocaine consumers,(viii) is the time-dependent transition rate between regular and habitual cocaine consumers,(ix) is the time-dependent rate at which habitual consumers enter and complete drug-treatment therapy.
Figure 1 shows the flow diagram of the dynamic cocaine consumption model. The boxes represent the subpopulations and the arrows represent the transitions between the subpopulations. Arrows are labeled by their corresponding parameters of the model.
Following the idea proposed in , we are going to determine bounds for the model parameters. The birth rate ; both ends of this interval represent the minimum and maximum values for the birth rate during the period 1995–2011 . Analogously, the death rate . Taking into account the fact that the death rate of the consumers may be up to higher than  and the fact that the more an individual consumes drugs the higher probability to die because of drug abuse is, we have that .
Also, we have the time-dependent parameters , , , and for for which we consider that all of them lie between and the double of the values given in  (maximum likelihood). Then, , , , and , for .
The time step for simulations will be of two years according to the time instants in . Then, the total number of model parameters is 1 birth rate, death rates, , , , and , that is, model parameters.
3.1. Model Scaling
Despite the fact that (3)–(6) describe the cocaine consumption dynamics in Spain, there is a mismatch between the input for these difference equations, which is formulated for total population counts, and the survey data available, which is stated in percentages. We therefore need to scale the equations. To do this, we add together (3)–(6). Then, taking into account (2), the left-hand side yieldswhile the right-hand side yields
Applying the same method to the other equations, the following expressions are obtained:And the scaled equations can be written in the following form:
4. Probabilistic Fitting
This technique consists of using information of the data survey to assign probability distributions to the data. Then, we sample data values from these probability distributions and fit the model to the sampled data. Thus, we find model parameters that fit not only the data but also the uncertainty contained into the intrinsic survey error. Thus, these model parameters will allow the model to capture the data uncertainty (with confidence intervals).
4.1. Data 95% Confidence Intervals (95% CI)
Data in Table 1 correspond to the mean percentage obtained from the EDADES survey conducted between 1995 and 2011 every two years. In the technical specifications of each survey, we can see sample sizes of 15000, 15000, 15000, 15000, 15000, 27934, 23715, 20109, and interviews, respectively.
Taking into account the fact that the sample is not the same for each survey, let us assume that the survey outputs are independent. For each one of the 9 available surveys, let us denote by , , , and , a random vector whose entries are nonconsumers, occasional consumers, regular consumers, and habitual consumers and , , , , , , , , and are the sample sizes of surveys. These components represent exclusive selections (events) with probabilitieswhere , , , and are the percentages collected in Table 1 for each survey : . Thus, each random vector follows a multinomial probability distribution. Therefore, the probability that occurs times, occurs times, occurs times, and occurs times is given bywhere . The resulting multinomials for each EDADES survey can be seen in Table 2.
4.2. Probabilistic Estimation
Let be a short representation of the scaled model (13), where denotes the list of model parameters; that is,
This procedure is a classic optimization problem that can be carried out through genetic algorithms, PSO, Nelder-Mead, and so forth [19, 20] but having the following key difference: now we will fit the data sampled from probability distributions instead of the raw data. We shall perform the fitting times ( being a large number), storing both the parameter values and the calculated errors , ordered from smallest to largest errors. The result of this procedure is a list of model parameters fitted to a sample of the data with their corresponding errors represented in Table 4.
The value should be a large number in order to capture as much data uncertainty as possible during the sampling process and this uncertainty could be fitted by the model. In our case, we take .
Now, we take and in Table 4 and calculate the outputs for times in , the time instants where data are available. For each time instant, we shall calculate percentiles and , one for each one of the 4 subpopulations. Hence, we will name the sum of the following:(i)The Frobenius norm of the difference between the percentiles from the model output and from the data percentiles 2.5 in Table 3(ii)The Frobenius norm of the difference between the percentiles 97.5 from the model output and from the data percentiles 97.5 in Table 3
We will repeat the above process with the outputs from , , and , obtaining , the measure between the confidence bands from the outputs and the data. The same shall be done for with , , , and and so on, until , obtaining as the measure between the confidence bands from the outputs and the data.
Taking , we ensure that the confidence band of the outputs from is the closest to the confidence band from the data, with which our model will capture the maximum uncertainty of the data from the output of the models .
The probabilistic fitting procedure returns, as the best fitting, the one giving for with an error of . A graphical representation of this result can be seen In Figure 2.
The model CI band captures almost all data uncertainty except 1999 and 2011 in nonconsumers, 2009 in occasional consumers, and 2001 and 2011 in habitual consumers (there is no intersection between the model output band and these data CI).
The decrease in the nonconsumer subpopulation during the period of 1997–2001 is not properly captured by the model. With regard to the economic crisis and the National Drug Plan proposed in 2008 by the Spanish government, their effect does not appear until 2011, where an increase in nonconsumers and a decrease in occasional consumers arise. As can be seen in Figure 2, for these two subpopulations, the model is unable to capture the data uncertainty. Also, in the habitual consumers’ subpopulation, the model still does not realize the trend change.
All these comments lead us to say that the proposed model was fairly good to explain what happened when the number of consumers was increasing, mainly in occasional and regular consumers; nevertheless when the trend changed, it could not adapt and explain entirely what was happening.
With respect to the model parameters, in Figure 3, we can see the estimation of the probability density function (PDF) of , , , and for every .
Figure 3 has been built taking sets of model parameters selected by the probabilistic fitting procedure, in particular corresponding to , , , and for , and generate a kernel PDF in each time instant in using the Mathematica  command KernelMixtureDistribution .
In Figures 3 and 4, we can see the variability of the parameters over the time, trying to adapt to the cocaine consumption behavior changes. Note that has a mean and 95% confidence interval fairly stable, which means that the flow of people going to therapy is almost invariable over the time with independence of the changes in the consumption behavior. Therefore, we could make for all and transform this random process into a random variable.
The use of variable parameters may allow us to identify changes in the studied phenomenon which have not been considered in the hypotheses or which could arise unexpectedly, as it seems to happen in this regard to the economic crisis and the National Drug Plan, in this case. Accordingly, we can see in Figure 4 how parameters and , and with lower changes parameter , try to adapt in the rightmost part of the graphs to data changes, but, unfortunately, they do not change enough to capture the data uncertainty when the consumption trend varies. This leads us to think that the model may not be appropriate to explain the decreasing trends in cocaine consumption.
In this work, we present a model formulated through a system of nonlinear difference equations, where the model parameters are variable over the time. This model intends to describe the cocaine consumption dynamics in Spain in the period of 1995–2011. The model follows fairly well the dynamics until 2009, where the effect of two nonconsidered facts, the effect of the economic crisis and the implantation of the National Drug Plan, introduced a change in the behavior of the individuals, especially in the young people who are the group that is increasing the nonconsumer subpopulation.
We expected that the introduction of variable parameters over the time would allow us to capture the changes in the cocaine consumption dynamics; however, it could not. This indicates that the proposed model may not be appropriate when a decreasing trend in cocaine consumption emerges.
With regard to the parameters, we must say that it was not necessary to consider the parameter as variable, because it is fairly stable over the time. The remaining parameters, , , and , are trying to adapt to the data trend changes, but, in nonconsumer and habitual consumer subpopulations, they fail to do it accurately.
Finally, the three sides that this study, a model with variable parameters using the probabilistic fitting technique, provides should be pointed out:(1)Consider the model parameters by default as random processes (a family of random variables indexed by time), which is more versatile than considering them as random variables, and then we can detect if they really vary over the time.(2)Study the estimation of the parameter random processes and obtain estimations for the means, the confidence intervals, the PDFs, and hence other higher statistical moments.(3)Determine the usefulness of the model in several realistic situations or when data changes because unexpected scenarios emerge.
Although it was not mentioned before, a great computational effort has been done to obtain the presented results and this is one of the major drawbacks we want to improve in future works.
The authors declare that there are no competing interests regarding the publication of this article.
This work has been partially supported by the Ministerio de Economía y Competitividad (Grant MTM2013-41765-P).
M. Rubio, Uncertainty quantification in dynamical models. An application to cocaine consumption in Spain [Ph.D. dissertaion], Universitat Politècnica de València, Valencia, Spain, 2015, https://riunet.upv.es/handle/10251/55844.
J.-C. Cortés, F.-J. Santonja, A.-C. Tarazona, R.-J. Villanueva, and J. Villanueva-Oller, “A probabilistic estimation and prediction technique for dynamic continuous social science models: the evolution of the attitude of the Basque Country population towards ETA as a case study,” Applied Mathematics and Computation, vol. 264, pp. 13–20, 2015.View at: Publisher Site | Google Scholar | MathSciNet
J.-C. Cortés, F. Sánchez, F.-J. Santonja, and R.-J. Villanueva, “A probabilistic analysis to quantify the effect of March 11, 2004, attacks in Madrid on the March 14 elections in Spain: a dynamic modelling approach,” Abstract and Applied Analysis, vol. 2015, Article ID 387839, 8 pages, 2015.View at: Publisher Site | Google Scholar
Ministerio de Sanidad, Servicios Sociales e Igualidad (Ministry of Health, Social Services, and Equality), Observatorio Espanñol Sobre la Droga y las Toxicomanas, Informe 2015 (Observatory in Spain Concerning Drugs and Addiction, 2015 Report), http://www.pnsd.msssi.gob.es/profesionales/sistemasInformacion/informesEstadisticas/pdf/INFORME\_2015.pdf.
N. A. Christakis and J. H. Fowler, Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, Brown and Company, Hachette Book Group, 2009.
F. Brauer and C. Castillo-Chavez, Mathematical Models in Population Biology and Epidemiology, Springer Science & Business Media, 2001.
J. D. Murray, Mathematical Biology, Springer, New York, NY, USA, 2002.View at: MathSciNet
INE, Spanish Statistic Institute, http://www.ine.es.
G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, 4th edition, 2012.
C. Jacob and N. Khemka, “Particle Swarm Optimization in Mathematica an exploration kit for evolutionary optimization,” in Proceedings of the 6th International Mathematica Symposium (IMS '04), Banff, Canada, 2004.View at: Google Scholar