Journal of Immunology Research

Volume 2015 (2015), Article ID 738030, 21 pages

http://dx.doi.org/10.1155/2015/738030

## Current Mathematical Models for Analyzing Anti-Malarial Antibody Data with an Eye to Malaria Elimination and Eradication

^{1}London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK^{2}Centro de Estatística da Universidade de Lisboa, Faculdade de Ciências, Universidade de Lisboa, Bloco C6, Piso 4, Campo Grande, 1749-016 Lisboa, Portugal^{3}MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, Medical School Building, Norfolk Place, London W2 1PG, UK^{4}Division of Population Health and Immunity, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, VIC 3052, Australia^{5}Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia

Received 28 August 2015; Accepted 19 October 2015

Academic Editor: Francesco Pappalardo

Copyright © 2015 Nuno Sepúlveda et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The last decade has witnessed a steady reduction of the malaria burden worldwide. With various countries targeting disease elimination in the near future, the popular parasite infection or entomological inoculation rates are becoming less and less informative of the underlying malaria burden due to a reduced number of infected individuals or mosquitoes at the time of sampling. To overcome such problem, alternative measures based on antibodies against specific malaria antigens have gained recent interest in malaria epidemiology due to the possibility of estimating past disease exposure in absence of infected individuals. This paper aims then to review current mathematical models and corresponding statistical approaches used in antibody data analysis. The application of these models is illustrated with three data sets from Equatorial Guinea, Brazilian Amazonia region, and western Kenyan highlands. A brief discussion is also carried out on the future challenges of using these models in the context of malaria elimination.

#### 1. Introduction

Malaria is a global health problem with more than 1 billion people estimated to be at risk. This infectious disease is caused by* Plasmodium* parasites transmitted to humans through bites of infected Anopheles mosquitos. Geographically,* Plasmodium falciparum* (*P. falciparum*) parasites predominate in sub-Saharan Africa while* Plasmodium vivax* (*P. vivax*) is the major infectious agent in South America and Southeast Asia. According to the latest World Malaria Report [1], disease mortality and risk have been steadily decreasing in the last decade to the point that many countries are already targeting malaria elimination and eradication [2–5]. This decreasing trend in malaria transmission intensity, although highly beneficial to the affected populations, brings additional challenges to disease surveillance and elimination (reviewed in [6]). One of these challenges is related to the use of the current metrics of malaria risk in populations where disease transmission intensity is low and potentially affected by seasonal effects. The popular parasite rate is determined by the proportion of infected individuals at time of the survey. However, in low transmission settings, this measure is critically affected by the different performance of current diagnostic tools to detect the presence of infection while screening asymptomatic individuals. Another difficulty of using such measure is the high chance of finding only a few infected individuals in the sample, thus, having limited power to discriminate disease hotspots from other less-affected sites, as demonstrated in studies from Brazil [7] or Somalia [8]. The entomological inoculation rate is yet another popular measure of malaria risk. It is defined by the frequency at which people are bitten by infectious mosquitoes, thus, being informative on the direct interaction between the human and mosquito populations. The gold standard to estimate this measure is to use human-landing catches where mosquitoes are caught as they attempt to land on the exposed limbs of field workers [9, 10]. Although alternative methods exist in the literature, the estimation of the entomological inoculation rate is in general a laborious and time-consuming task in low transmission settings owing to a low number of infected mosquitoes [11]. It is also affected by seasonal effects and mosquito population dynamics and the degree of mosquito attractiveness to the human hosts or the chemicals used in the study [11].

To tackle the limitations of the above malaria risk measures, alternative indicators based on antibodies against different malaria antigens have been proposed [12] and tested in different epidemiological contexts [7, 8, 13–16]. The rationale of using antibody data is that the antibody concentrations in the serum are a direct correlate of parasite exposure, thus, providing information on current and recent infections. The temporal stability in antibody concentrations is an important advantage to reduce any seasonal effect on malaria transmission. In seroepidemiological studies, the most popular antibodies are those against the blood-stage apical membrane antigen-1 (AMA1) and merozoite surface protein-1 (MSP1) [7, 8, 13–16] owing to their broad immunogenicity and putative role in malaria vaccine development [17, 18]. Recent research identified other parasite targets [19, 20] but these remain to be tested in different epidemiological settings. Experimentally, antibody quantification is usually done by means of traditional enzyme linked immunosorbent assays [21]. Optical densities or titres in arbitrary units are then used for the subsequent data analysis. The most popular approach is to first define the serological status, seropositive or seronegative, of each individual. One then calculates the so-called seroprevalence that is defined by the proportion of seropositive individuals in the sample. Several studies showed an increased resolution of seroprevalence in discriminating sites with different endemicity levels in relation to parasite rate [7, 8]. Further analysis is then carried out in order to estimate current malaria transmission intensity. Since seroprevalence tends to increase with age as a result of augmenting immunity against malaria parasites, different stochastic models can be constructed for the data using age as a proxy of time. The common assumption to all these models is that individuals transit between seronegativity and seropositivity states upon malaria exposure or absence of it. In this scenario, one typically estimates the rate by which seronegative individuals become seropositive, the so-called seroconversion rate (SCR). SCR was found to correlate well with the parasite rate [13] or the entomological inoculation rate [12], thus, capturing the underlying malaria transmission intensity. Moreover, SCR also strongly correlates to the annual parasite index (the number of confirmed cases during 1 year/population under surveillance) × 1000—usually calculated by official health authorities [7].

This paper aims to review the mathematical and statistical aspects underlying the analysis of antibody data for inferring malaria transmission intensity. Special attention will be given to current methods aiming to define seropositivity and the subsequent mathematical models for estimating SCR under different epidemiological settings: stable malaria transmission intensity, abrupt reduction in SCR due to a malaria control intervention, change in SCR due to a putative age-dependent behavior, detection of migration effects, and detection of individual level heterogeneity through a set of covariates. Models for antibody acquisition using antibody titres themselves will also be described. Three different data sets from Bioko Island in Equatorial Guinea [15], Jacarecanga from the Brazilian Amazonia region [7], and western highlands from Kenya [22] are used to illustrate the application of these models to real-world problems. Finally, future analytical challenges will be discussed in the context of malaria elimination and eradication.

#### 2. Mathematical Approaches to Analyzing Serology Data

##### 2.1. Defining Seropositivity

In practice, there are two popular approaches to determine the serological status of an individual. The first approach uses an additional sample of nonexposed individuals in order to determine the distribution of the antibody levels referring to the underlying seronegative population. Statistically, the antibody levels of this sample are usually log transformed in order to approximately obtain a Gaussian distribution for the data. The serological classification of each individual in the sample is done by the 3*σ* rule for Gaussian distributions described in any introductory textbook of statistics. In more detail, this rule defines the range of antibody levels containing a 0.999 probability under the assumption of a Gaussian random variable for the data. One then classifies the individuals as seropositive if the respective antibody levels exceed the mean plus 3 times the standard deviation of the seronegative population, otherwise the individuals are considered as seronegative. This simple approach, despite ensuring a high probability of correctly classifying exposed individuals, has the disadvantage of underestimating seroprevalence.

The second approach focuses on the data under analysis only. The basic assumption is that the sample is composed of a mixture of latent seronegative or seropositive populations. The respective data is then analyzed by the so-called two-component mixture Gaussian model invoking a Gaussian distribution with average value and standard deviation for the seronegative population and another one with average value and standard deviation for the seropositive population. For independent and identically distributed random sample of individuals, the corresponding sampling distribution is described by the following equation:where is the antibody level of the th individual in the sample, and are probability density functions of the Gaussian distributions associated with seronegative and seropositive populations, respectively, and is the probability of sampling a seropositive individual from the population. Maximum likelihood estimation is facilitated by using the expectation-maximization (EM) algorithm that can be found in the mixtools package for the R software [23]. The next stage of the analysis is to assign each individual to each corresponding serological population. Again, one can use the 3*σ* rule as described above [14]. An alternative way to perform such classification is to jointly use the probabilities of classifying an individual with antibody level as either seropositive or seronegative and then specify appropriate cut-off values to determine the serological status of each individual. The probabilities of classifying an individual with antibody level as seropositive and seronegative are, respectively, given byThe classification rule of the th individual in the sample is then described as follows:where and are the cut-off values in the antibody distribution that ensure a given classification probability, for instance, 90%. Note that individuals with antibody levels between and are deemed indeterminate due to the uncertainty in the corresponding serological classification. Besides checking whether model assumptions hold true on the data under analysis, an additional assessment of the quality of the classification rule is to report the size of this indeterminate region and the proportion of indeterminate individuals in the sample.

*Example I* (Bioko Island). In 2004 the health authority of Equatorial Guinea launched integrated treatment and mosquito control programs in the Bioko Island. After 4 years of their initiation, a large cross-sectional survey was conducted at 18 sentinel sites in the island in order to assess the impact of these programs on malaria transmission [15]. IgG antibody levels of 6400 individuals were measured for* P. falciparum* AMA1 by ELISA. The antibody levels as measured by arbitrary titres range from −116.3 to 2618.9, suggesting a wide breadth of immune responses to this malaria antigen (Figure 1(a)). The average antibody level was 390.8 while the standard deviation was estimated at 457.4. As expected from data of a malaria endemic region, the corresponding quantile-quantile plot showed a strong departure of the data in relation to the Gaussian distribution due to presence of recently or currently exposed individuals with high antibody levels (Figure 1(b)). By fitting the above two-component Gaussian mixture model to the data, the serological status of each individual was determined by (3) with and (Figure 1(c)). These cut-off values suggested that 31.2% and 56.1% of the sample consisted of seronegative and seropositive individuals, respectively. The remaining 12.7% of the sample had unclear serological classification (Table 1).