Advances in Epidemiology

Volume 2015 (2015), Article ID 721592, 11 pages

http://dx.doi.org/10.1155/2015/721592

## Forecasting Age-Specific Brain Cancer Mortality Rates Using Functional Data Analysis Models

^{1}Department of Mathematics and Computer Systems, Mercyhurst University, 501 East 38th Street, Erie, PA 16546, USA^{2}Department of Mathematics and Statistics, University of South Florida, 4202 E Fowler Avenue, Tampa, FL 33620, USA

Received 30 July 2014; Revised 2 January 2015; Accepted 20 January 2015

Academic Editor: Peter N. Lee

Copyright © 2015 Keshav P. Pokhrel and Chris P. Tsokos. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Incidence and mortality rates are considered as a guideline for planning public health strategies and allocating resources. We apply functional data analysis techniques to model age-specific brain cancer mortality trend and forecast entire age-specific functions using exponential smoothing state-space models. The age-specific mortality curves are decomposed using principal component analysis and fit functional time series model with basis functions. Nonparametric smoothing methods are used to mitigate the existing randomness in the observed data. We use functional time series model on age-specific brain cancer mortality rates and forecast mortality curves with prediction intervals using exponential smoothing state-space model. We also present a disparity of brain cancer mortality rates among the age groups together with the rate of change of mortality rates. The data were obtained from the Surveillance, Epidemiology and End Results (SEER) program of the United States. The brain cancer mortality rates, classified under International Classification Disease code ICD-O-3, were extracted from SEERStat software.

#### 1. Introduction

Functional data analysis is about the analysis of information on curves or functions [1]. Functional data is multivariate data with an ordering on the dimensions Muller, 2006. In the present study we are interested in the distribution of functions means, covariances, and relationships of functions to certain responses and other functions. One of the major advantages of functional data analysis is a strong possibility of using rate of change or derivatives of curves. In this study we combine methodologies from functional data analysis, nonparametric statistics, and time series forecasting. There has been an intense research in this area during the last decade [2]. In addition, future mortality rates are of great interests for strategic planner and the insurance industry. Accurate forecasting can be a strong indicator for allocating budget, planning, and policy making. Here, we develop forecasting models for crude mortality rates of malignant primary brain and central nervous system cancer of the United States for the study period of 1969–2008.

The age-specific mortality rates can play a significant role for the allocation of resources for brain tumor control and evaluations. Primary purpose of this study is to apply functional data analysis techniques to model age-specific brain cancer mortality time trends and forecast entire age-specific mortality function using state-space-approach [3]. Our aim in the present study is to answer the following questions:(i)to capture the subtle pattern of variation in mortality,(ii)to find the prediction interval of the forecast,(iii)to find a flexible model so that it could incorporate the covariates such as screening and treatment effects into the analysis.

However, there are some limitation in functional time series forecasting model. It only incorporates year of death (calendar period) and fails to consider year of birth (cohort) effects. The calendar period incidence and mortality trend can reveal the effects of new medical interventions but fail to reflect changes in risk factors such as screening and radiation therapy.

Currently, most of the forecasting models are based on age-period-cohort methods. These methods are structured to estimate the mortality rates of breast cancer [4], prostate cancer [5], and cervical cancer [6, 7]. These methods are basically regression models with mortality or incidence rates as outcome variables using Poisson error distribution with log link function. The most common problem in these models is nonidentifiability of parameters, very strong parametric assumptions, and sensitivity of projections and lack of inclusion of most recent changes in cohort effects. Most importantly, a limitation of these studies is that none of them forecast mortality rates with age-related changes.

The linear extrapolation and nonlinear Poisson distribution models are discussed in [8, 9] whereas a Bayesian age-period-cohort model with autoregressive smoothing of each of age, period, and cohort components is studied such that the resulting projections are estimated from current and past trends of the data in [4]. Lee and Carter (LC) [10] method is one of the most influential methods in demographic forecasting. This is perhaps the most cited paper by demographic researchers. LC proposed a long term forecasting method to extrapolate mortality rates and applied it to forecast US mortality rates for the year 2065.

There has been numerous extensions of LC method, some of the extensions and modifications of LC method can be found in [11]; Renshaw and Haberman (2003) [12], and the applications of LC method in fertility forecasting can found in Lee [13]. The method proposed by Lee and Carter in 1992 has become the leading statistical model of mortality/forecasting in the demographic literature [14], Deaton and Paxson, 2004 [15]. It was used as a benchmark for recent Census Bureau population forecasts [16], and two US social security technical advisory panels recommended its use, or the use of a method consistent with it [11].

A comprehensive discussion of the patterns of mortality rates for then G 7 countries is presented by Tuljapurkar et al. [14] using LC method. The LC model predicted 1-to-4-year higher life expectancy than official projections in the industrial nations, with larger differences for Japan.

There are numerous uncertainties which affect the mortality rates; however a probabilistically sound forecasting method, like LC method, is particularly useful to address the long term funding problems of public pension and insurance for increasingly ageing population in the industrial world.

#### 2. Functional Data Analysis (FDA) Model

##### 2.1. An Overview

The Lee Carter model for age-specific mortality rates is given by where is general age shape of age-specific mortality rates, represents the tendency of mortality at age , and is the time varying index. Equation (1) is a linear model of an unobserved period-specific intensity index, with parameter depending on age (LC 1992). LC model uses singular value decomposition (SVD) method for exact least square fit; however, a simple linear regression method can also approximate the parameters. LC incorporates a random walk with drift for the time series formed by , which is expressed as where is the drift term, is forecast to decline linearly with increment of , and are permanently incorporated in the trajectory [11]. The standard error of could be used for the detailed measure of uncertainty in forecasting .

*Generalization*. The Hyndman-Ullah [3] approach is a generalization of Lee and Carter (LC 1992) method.

Our primary goal is to find functional forecasting model for mortality rates of brain and central nervous system tumor in the United States. The proposed forecasting model is developed in the realm of functional data [1] for modeling log mortality rates. To develop the functional data, we invoke the nonparametric smoothing methods to mitigate the existing randomness in the observed information. In addition, the problems related to age groups and issues of outlying years are reasonably addressed by using functional principal component [1, 3]. The observed data is smoothed and principal component analysis is applied after smoothing the observed data.

The forecasting methodology by Hyndman-Ullah is a generalization of Lee and Carter method. This approach uses functional data analysis techniques and treats the age-specific mortality curves as the units of analysis rather than the discrete observations [3]. In practice, functional data are usually observed and recorded discretely as pairs , and is a snapshot of the function at time , possibly blurred by measurement error [1]. Generalized Lee Carter method models the mortality rates as a continuous function of age and captures the subtle variation between years. In addition the smoothness of the data reduces the observational error and forecast the entire function with prediction intervals [3].

In the following section we discuss the use of more flexible [3] method to model the brain cancer mortality rates which uses multiple functions to capture the changes in rates.

##### 2.2. FDA Model for Mortality Data

Let denote the mortality rate for midpoint of age group and year , . We model the log mortality, and assume that there are underlying functions that we are observing with error [17]. The mortality rates as a smooth function of age can be expressed as where is the center of age group , is an independent and identically distributed standard normal random variable, and allows the amount of noise to vary with the age . After developing functions of the given mortality rates, we fit the model where is the mean log mortality rate across years, is a set of orthogonal basis functions, and is the model error which is assumed to be serially uncorrelated [17]. The mean log mortality rates are estimated by using penalized regression splines (Wood, 2000) [18]. The pairs are estimated by decomposing the data into principal components, whereas is the difference between spline curve and fitted curve from the model. We wish to estimate the optimal set of orthogonal basis functions. The optimal orthogonal basis function is obtained via principal components (see Ramsay and Silverman, 2005, pages 151-152). Specifically, for a given , we want to find the basis functions which minimizes the mean integrated squared error: This is achieved using functional principal components (FPC) decomposition, [19], applied to the curves which provides the least number of basis functions, and explores the coefficients which are uncorrelated with each other.

##### 2.3. Forecasting

Equations (4) and (5) together yield Let denote the -step forecast of and let denote the -step ahead forecast of . Then, To forecast the coefficients from (8) we use state-space model for exponential smoothing. The exponential smoothing method provides a statistical framework for automatic forecasting [19]. This forecast then multiplied with estimated basis function to obtain the forecast of the entire function. In addition, exponential smoothing techniques also provide prediction intervals for the forecast by incorporating variance of error terms [20]. Forecast from exponential smoothing methods is estimated recursively where recent observations are given more weight than historical data. This method accommodates additive and multiplicative trend with automatic model selection for the given time series [17].

The state-space models provide a convenient and powerful framework for analyzing sequential data (see Harvey 1989) [21]. Many mortality data sets require extrapolation, as data has a time dimension. The state-space model can be used to calculate smooth feature or signals and associated standard errors provided the model is of the state-space form.

The sum of the variances of all individual terms is the forecast variance [3]: where is the variance obtained using the smoothing method. The forecast variance is given by : sum of square of residuals, : variance of the smooth estimate , is estimated by assuming binomial distribution of , and is the mean of for each .

We evaluate the accuracy of the mortality forecast by computing the mean integrated squared forecasting error (MISFE) defined as where is the minimum number of observations used in fitting the model.

#### 3. Data

An estimated 69,720 (10% increment from 2010) new cases of primary nonmalignant and malignant brain and central nervous system tumors are expected to be diagnosed in the United States in 2013 [22]. This caused 13,700 (5.5% increment from 2009) deaths because of the primary malignant brain and central nervous system tumors in the United States in 2012. It is estimated that 24,620 men and women (13,630 men and 10,990 women) will be diagnosed and 13,140 (4.26% increment from 2010) men and women are estimated to be deceased of brain and other nervous system cancer in 2013. Males and females have a 0.7% and 0.6% lifetime risk of being diagnosed with a primary malignant brain/central nervous system tumor. These projections are of major public health interest. However, their interpretation may be complex because of the effect of screening, risk factors, and accessibility of effective treatments.

Crude mortality rates per 100,000 persons based on the 2000 standard US population were extracted using the SEERStat 7.0.5 software of the Surveillance Epidemiology and End Results program, National Cancer Institute Institutes. We are using 416,480 (229,467 males and 187,013 females) malignant Brain cancer patients, where 381,238 are whites, 24,336 are African Americans, and 4,891 are others (American Indian/AK Native, Asian/Pacific Islander, 1969–2008).

The mortality rates were at their highest from 1885 to 1995. After 2000 we observed that the overall rates are leveling off or declining. From 2003–2008, the median age at death for cancer of the brain and other nervous system was 64 years of age. Approximately 4.2% died under age of 20; 3.8% died between 20 and 34; 7.1% between 35 and 44; 14.9% between 45 and 54; 21.8% between 55 and 64; 22.2% between 65 and 74; 19.6% between 75 and 84; 6.3% died at 85+ years of age. For the first part of our study, we use annual crude mortality rates in the United states from 1969 to 2008 in 5-year age groups (01–04, 05–09,,80–84, 85+).

#### 4. Results

For this study we obtained the data from the Surveillance, Epidemiology and End Results (SEER) program of National Cancer Institute in the United States [23]. Specifically, mortality data were obtained from the National Center for Health Statistics (NCHS) available on the SEERStat database. Annual age-specific brain cancer mortality data are designated by ICD 8 & 9 (1979–1998) code 174 and ICD 10 (1999+) codes C70 and C71. The data available is about different racial subgroups since 1969 in nineteen year age groups: 01–04, 05–09, 10–14, 15–19, 20–24, 25–29, 30–34, 35–39, 40–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, and 85+.

Figure 1 displays brain cancer mortality rates in US male by age group for the period of 1969–2007. We observe that the mortality rates for ages below 40 years show no obvious trend; for the same period the pattern of mortality rates for ages more than 40 exhibited significant variation of mortality rates for the elderly. The graph shows the brain cancer mortality rates among males between 45 and 65 years declined slowly throughout the study period. The pattern for the elderly (aged 65 and above) population is clearly increasing from 1969 to 2000 and started declining after 2000. However, mortality rates for the population subgroups between 45–65 show slightly decreasing pattern. The mortality rates for the age groups 80–84 and 85+ are very unstable and difficult to interpret because of the limited availability of data.