Research Article  Open Access
Christopher Anaya, Clara Burgos, JuanCarlos Cortés, RafaelJ. Villanueva, "Capturing the Data Uncertainty Change in the Cocaine Consumption in Spain Using an Epidemiologically Based Model", Abstract and Applied Analysis, vol. 2016, Article ID 1758459, 9 pages, 2016. https://doi.org/10.1155/2016/1758459
Capturing the Data Uncertainty Change in the Cocaine Consumption in Spain Using an Epidemiologically Based Model
Abstract
A probabilistic model is proposed to study the transmission dynamics of the cocaine consumption in Spain during the period of 1995–2011. Using the socalled probabilistic fitting technique, we study if the model is able to capture the data uncertainty coming from surveys. The proposed model is formulated through a nonlinear system of difference equations whose coefficients are treated as stochastic processes. A discussion regarding the usefulness and limitations of probabilistic fitting technique in order to capture the data uncertainty of the proposed model is presented.
1. Introduction
In [1], the authors presented an epidemiologically based model to study the transmission dynamics of the cocaine consumption in Spain. This model worked well in the presented scenario, 1995–2005, where the population of cocaine consumers had an increasing trend. However, in 2008, the economic crisis began and the Spanish Health Ministry started to apply the National Drug Plan. As a consequence, the nonconsumer population increased and the consumer populations started to decrease. This sudden change could not be captured by the model as it was shown in [2].
In this work, we propose to use a variation of the model stated in [1] and assume that the model parameters are variable over the time in order to see if the variable model parameters are able to adapt and help the model to capture the observed changes in the cocaine consumption.
With this aim, we use the probabilistic fitting technique recently developed in [3, 4] by some of the authors. With this technique, we expect to capture the data uncertainty (data coming from surveys), despite the change of trend, and provide an estimation to the probability distribution of the model parameters. The latter will help us study if the observed data variation also produces variations in some or all model parameters as well as their quantification.
These types of mathematical models also have been used in the study of epidemics and other drug addictions, such as alcohol, tobacco, ecstasy, or heroin addiction [5–9], and in the approach to other sociological topics that are spread by social contact as obesity or extreme ideological behaviors [10, 11].
The work is organized as follows. In Section 2, we present the data we are going to use about cocaine consumers in Spain. In Section 3, we build a typeepidemiological model describing the cocaine consumption dynamics in Spain. Also, the model is scaled in order to match data and model magnitudes. In Section 4, we recall and adapt the probabilistic fitting technique introduced in [3]. This technique is applied to the model in order to obtain both data estimation with uncertainty and an estimation of the probability distribution of the model parameters. In Section 5, we present and discuss the results. In Section 6, conclusions are drawn.
2. Data
In this study, we use data from the Survey on Alcohol and Drugs in Spain (EDADES), part of the Spanish National Drug Plan [12]. This survey is published every two years. Specifically, we have focused on the following key question quoted in this survey: how often have you consumed cocaine? The available responses were as follows: “never consumed (nonconsumers),” “at least once in your life (occasional consumers),” “at least once in the past year (regular consumers),” and “at least once in the past 30 days (habitual consumers).” This survey was launched in 1995 and we have data until 2011. The data are collected in Table 1.

In order to simplify the notation, we are going to define the set of time instants where survey data are available:
3. Model Building
From the available survey data, we then introduce an epidemiological model to describe the dynamics of the four subpopulations over the time. While a traditional epidemiological study considers the transmission of a biological contagion, for our study, we will apply a social contagion approach. The underlying assumption is the following: we can consider cocaine consumption to be a contagion insofar as the probability that one individual becomes a cocaine consumer depends on the interaction with people who already exhibit this behavior, that is, fellow cocaine consumers [1, 13]. Let us assume homogeneous population mixing; that is, each individual can contact with any other individual [14, 15]. With this assumption, we can now build a system of difference equations based on the model introduced in [1] to further describe the transmission dynamics.
As with the survey results, we divide the Spanish 15–65yearold population (the age group considered in the surveys) into four subpopulations:(1), the amount of people who have never consumed cocaine at year (2), the amount of people who occasionally consume cocaine (at least once in their lives) at year (3), the amount of people who regularly consume cocaine (at least once in the past year) at year (4), the amount of people who habitually consume cocaine (at least once in the past month) at year
Furthermore, we have the total populationwhich is not going to be a constant value over the time.
The transitions between the different subpopulations are determined as follows:(i)Let us consider that the newly recruited 15yearold individuals become members of subpopulation at rate ; that is, we consider that they have never consumed cocaine before.(ii)An individual in the nonconsumer population can become an occasional consumer through contact with occasional, regular, or habitual consumers. Then, this social contagion term can be modeled with .(iii)An occasional consumer can increase their consumption and become a regular consumer. This transition term can be modeled by .(iv)A regular consumer can increase their consumption and become a habitual consumer. We assume that this can be modeled by the term .(v)A habitual consumer can decide to go to drugtreatment therapy. If he/she stays in therapy for 6 months, according to experts, he/she can be considered a nonconsumer [16]. This term is modeled by .
Using the above assumptions, a dynamic cocaine consumption model for the 15–65yearold Spanish population is given by the following nonlinear system of difference equations:where(i) is the birth rate (the mortality between 0 years old and 14 years old is so small that it can be discarded),(ii) is the death rate for nonconsumers,(iii) is the death rate for occasional consumers,(iv) is the death rate for regular consumers,(v) is the death rate for habitual consumers,(vi) is the timedependent transmission rate for cocaine consumption,(vii) is the timedependent transition rate between occasional and regular cocaine consumers,(viii) is the timedependent transition rate between regular and habitual cocaine consumers,(ix) is the timedependent rate at which habitual consumers enter and complete drugtreatment therapy.
Figure 1 shows the flow diagram of the dynamic cocaine consumption model. The boxes represent the subpopulations and the arrows represent the transitions between the subpopulations. Arrows are labeled by their corresponding parameters of the model.
Following the idea proposed in [1], we are going to determine bounds for the model parameters. The birth rate ; both ends of this interval represent the minimum and maximum values for the birth rate during the period 1995–2011 [17]. Analogously, the death rate . Taking into account the fact that the death rate of the consumers may be up to higher than [12] and the fact that the more an individual consumes drugs the higher probability to die because of drug abuse is, we have that .
Also, we have the timedependent parameters , , , and for for which we consider that all of them lie between and the double of the values given in [1] (maximum likelihood). Then, , , , and , for .
The time step for simulations will be of two years according to the time instants in . Then, the total number of model parameters is 1 birth rate, death rates, , , , and , that is, model parameters.
3.1. Model Scaling
Despite the fact that (3)–(6) describe the cocaine consumption dynamics in Spain, there is a mismatch between the input for these difference equations, which is formulated for total population counts, and the survey data available, which is stated in percentages. We therefore need to scale the equations. To do this, we add together (3)–(6). Then, taking into account (2), the lefthand side yieldswhile the righthand side yields
Now, we shall definefrom which we get . From here, we will scale the equations, starting with (3). Here, using (8)(9), one obtainsand by dividing the numerator and denominator by , one gets
Applying the same method to the other equations, the following expressions are obtained:And the scaled equations can be written in the following form:
4. Probabilistic Fitting
This technique consists of using information of the data survey to assign probability distributions to the data. Then, we sample data values from these probability distributions and fit the model to the sampled data. Thus, we find model parameters that fit not only the data but also the uncertainty contained into the intrinsic survey error. Thus, these model parameters will allow the model to capture the data uncertainty (with confidence intervals).
4.1. Data 95% Confidence Intervals (95% CI)
Data in Table 1 correspond to the mean percentage obtained from the EDADES survey conducted between 1995 and 2011 every two years. In the technical specifications of each survey, we can see sample sizes of 15000, 15000, 15000, 15000, 15000, 27934, 23715, 20109, and interviews, respectively.
Taking into account the fact that the sample is not the same for each survey, let us assume that the survey outputs are independent. For each one of the 9 available surveys, let us denote by , , , and , a random vector whose entries are nonconsumers, occasional consumers, regular consumers, and habitual consumers and , , , , , , , , and are the sample sizes of surveys. These components represent exclusive selections (events) with probabilitieswhere , , , and are the percentages collected in Table 1 for each survey : . Thus, each random vector follows a multinomial probability distribution. Therefore, the probability that occurs times, occurs times, occurs times, and occurs times is given bywhere . The resulting multinomials for each EDADES survey can be seen in Table 2.

Now, we compute the quantiles 2.5 and 97.5 (95% CI) of each one of the joint multinomial distributions in Table 2: . The obtained 95% CI are collected in Table 3.

4.2. Probabilistic Estimation
Let be a short representation of the scaled model (13), where denotes the list of model parameters; that is,
Also, we have the data confidence intervals ( CI) of Table 3 obtained by sampling the joint probability distributions of Table 2 and calculating the percentiles and .
Given the probability distributions , , in Table 2, we take a sample , and , and we look for the values to the parameters so thatis as minimum as possible, with being the Frobenius norm [18]:
This procedure is a classic optimization problem that can be carried out through genetic algorithms, PSO, NelderMead, and so forth [19, 20] but having the following key difference: now we will fit the data sampled from probability distributions instead of the raw data. We shall perform the fitting times ( being a large number), storing both the parameter values and the calculated errors , ordered from smallest to largest errors. The result of this procedure is a list of model parameters fitted to a sample of the data with their corresponding errors represented in Table 4.

The value should be a large number in order to capture as much data uncertainty as possible during the sampling process and this uncertainty could be fitted by the model. In our case, we take .
Now, we take and in Table 4 and calculate the outputs for times in , the time instants where data are available. For each time instant, we shall calculate percentiles and , one for each one of the 4 subpopulations. Hence, we will name the sum of the following:(i)The Frobenius norm of the difference between the percentiles from the model output and from the data percentiles 2.5 in Table 3(ii)The Frobenius norm of the difference between the percentiles 97.5 from the model output and from the data percentiles 97.5 in Table 3
We will repeat the above process with the outputs from , , and , obtaining , the measure between the confidence bands from the outputs and the data. The same shall be done for with , , , and and so on, until , obtaining as the measure between the confidence bands from the outputs and the data.
Taking