The age, disability, and comorbidity patterns of incidence rates of cancer and chronic noncancer diseases such as heart failure, diabetes mellitus, asthma, Parkinson's disease, Alzheimer's disease, skin melanoma, and cancers of breast, prostate, lung, and colon were studied for the US elderly population (aged 65+) using the National Long-Term Care Survey (NLTCS) data linked to Medicare records for 1991–2005. Opposite to breast cancer and asthma, incidence rates of heart failure and Alzheimer's diseases were increasing with age. Higher incidence rates of heart failure, diabetes, asthma, and Parkinson's and Alzheimer's diseases were observed among individuals with severe disabilities or/and comorbidities, while rates of breast and prostate cancers were higher among those with minor disabilities or fewer comorbidities. Our results were in agreement with those obtained from other epidemiological datasets, thus suggesting that Medicare administrative records can provide nationally representative incidence rates. Detailed sensitivity analysis that focused on the effects of alternative onset definitions, latent censoring, study design, and other procedural uncertainties showed the stability of reconstructed incidence rates. This Medicare-linked dataset can be used for studying highly debated effects of new medical technologies on aging-related diseases burden and future Medicare costs.

1. Introduction

The proportion of the elderly population in USA is constantly growing. That prioritizes the task of determining the national trends in health and vital status of older adults making it a major public health concern. To better address these health demands and to reduce economic burdens on society, it is important to understand the key factors driving the onset and progression of aging-related chronic diseases in humans. Unfortunately, the identification of disease age patterns with sufficient precision requires large population-based databases that are costly to collect. This is a reason why the studies on disease age patterns, along with investigations of factors affecting them, are not common in the US elderly population. Among aging-associated diseases, cancer incidences are better studied at a national level, predominantly due to the existence of the Surveillance Epidemiology and End Results (SEER) Registry data [1]. Age patterns of incidence of cardiovascular, cerebrovascular, and neurodegenerative diseases in general population and in oldest adults were recently studied using the Cardiovascular Health Study (CHS) data [2, 3]. However, this database is not nationally representative. Other common elderly chronic diseases affecting well-being of this population group such as Alzheimer’s disease and dementias [4] and diabetes mellitus [5] have been analyzed in details for general population, while their patterns among the US the elderly and the oldest old populations remain unclear. A number of studies of disease patterns in the elderly were conducted on local populations such as the Rochester, Minnesota, and Cache County studies on epidemiology of dementia and Alzheimer’s disease [6, 7]. However, these datasets also cannot represent the spectrum of characteristics (e.g., age, race, male/female ratio, etc.) of the nationwide population. Furthermore, not only age patterns of disease incidences are almost unknown, but also studies on the effects of their coexistence at the same individual are sparse. The older adults are known by their multimorbidily, thus making comorbidity issue one of the main factors limiting their well-being and survival. Patients with a single chronic disease or, more often, with several comorbid diseases are developing disability of varying severity. That became an important issue for policymakers as well as for health care agencies such as Medicare and Medicaid. Also, individuals with different disabilities could be at different (e.g., increased) risks of developing certain diseases; and these interrelations are poorly studied. Therefore, this study is motivated by the lack of comprehensive and representative analyses of age patterns of incidence of chronic diseases in the oldest adults at the national level, unclear interrelations between comorbid diseases, and sparse knowledge on how these diseases are affected by disability among the US older adults.

To calculate the incidence rates, data from the national registers or from the surveys representing the US population should be used. For surveys, special procedures are needed to generalize the obtained results to the national level. It could be done by using a weight function assigned to each individual in the sample such that the “weighted” sums (means) over individuals in the sample give the quantities at the national level. For such analysis the 1982–2005 National Long-Term Care Survey (NLTCS) can be used. It focuses on the US elderly (65 and older) population. Since NLTCS is linked with the Medicare records and the Vital Statistics files, it represents a unique opportunity for analysis of the data with continuous recording of health services provided, age of death, and detailed reassessment of health status which are performed by a survey every five years (except for the first two waves in 1982 and 1984). The participants of the NLTCS were drawn from the Medicare enrollment lists. In 1982–1999, in the NLTCS there were about 400,000 person-years of exposure over age 65, including over 100,000 person-years of exposure over age 85. The sample weights are provided by the US Census Bureau and are available for each year of followup. Thus, the NLTCS design provides an excellent opportunity to study the incidence patterns of aging-associated diseases in the US elderly population. Earlier analyses of the NLTCS data provided consistent results on functional disability, active life expectancy, and chronic disease prevalence in the US elderly; for example, for 1982–1999 period, a 15% decline in chronic disability (1.1% per annum) has been reported [811].

The Medicare service use files linked to the NLTCS (NLTCS-M) contain the information about disease diagnoses. This information can be used in a computational algorithm for identification of the dates of disease onsets. The primary goal of this study is to estimate age-adjusted and age-specific (as well as disability- and comorbidity-specific) disease incidence in the US elderly population applying such an algorithm for the NLTCS-M data. Specific attention is paid to detailed analysis of stochastic and systematic uncertainties in these estimates: for example, we compare the evaluated rates with those obtained using alternative approaches for evaluating the incidence rates using the Medicare files. Also, we compare our estimates with the results obtained from other studies. Thus, the disease incidence estimates for advanced ages presented here should be very valuable for basic understanding of interrelations between chronic aging-related disease incidence and senescence and for practical implementations when analyzing the national health trends and forecasting future Medicare expenditures.

2. Data and Methods

2.1. National Long-Term Care Survey (NLTCS), Medicare Files of Service Use, and Medical Cost

The primary data to be analyzed are the six waves of the NLTCS [12] spanning the period from 1982 to 2004/5 together with the linked Medicare data. Two of the six waves, namely, cohorts of 1994 and 1999, are used in the analysis. These specific waves were chosen primarily because the high-quality Medicare follow-up data are available only beginning from 1991, and also because the complete 5-year followup after the NLTCS interview is accessible only for these two waves after 1991. The NLTCS uses a sample of individuals drawn from the national Medicare enrollment files. The NLTCS provides reported data on hundreds of variables including age, sex, and activities/instrumental activities of daily living allowing for disability measurements. The US Census Bureau was employed for collecting data over all waves—so, training methods and materials, survey administration and management procedures, field operations, computer processing, and editing procedures are consistent across the surveys. Together with the high (95%) response rates in all NLTCS waves, that minimizes bias in trend estimates. The 1982–2004 NLTCS files include information on 49,258 individuals. In total, 34,077 individuals were followed up between 1994 and 1999. The so-called screener weights released by the US Census Bureau with the NLTCS were used in this study to produce the national population estimates.

All individuals in the NLTCS are continuously tracked for Medicare Part A and Part B service use. Thus, for all persons we have continuous records of Medicare service use from 1991 (or from the time the person has passed the age of 65 after 1990) until his/her time of death. These records are available for each institutional (inpatient, outpatient, skilled nursing facility, hospice, or home health agency) and noninstitutional (carrier-physician-supplier and durable medical equipment providers) claim type.

The self-reported information about serious (i.e., activities of daily living; ADLs) and less serious (i.e., instrumental activities of daily living; IADLs) impairments was used to construct a disability index [10]. The disability index has six categories (nondisabled, IADL only, 1-2 ADLs, 3-4 ADLs, 5-6 ADLs, Institutionalized) and is measured at the date of interview, that is, at the beginning of the followup. Comorbidity measures are represented by the Charlson comorbidity index [13, 14] that was also evaluated at the date of interview using Medicare records during a year prior to the date of interview.

2.2. Date of Onset Definitions

The date of onset of chronic disease is not defined with the same precision as mortality and there is always certain arbitrariness in defining the date of onset. Computational approaches of different complexity have been used for reconstruction of onsets of cancer and non-cancer diseases [1517]. In this study we argue that the algorithm developed in our prior studies [1821] can be applied for a spectrum of chronic diseases; therefore, this approach may be viewed as a unified definition of date of onset appropriate for population studies that use administrative data. Ten diseases were selected for analyses (the International Classification of Diseases (ICD-9) codes used for the considered diseases are shown in brackets): cancers of the lung (162.xx), colon (153.xx), breast (174.xx), and prostate (185.xx), skin melanoma (172.xx), heart failure (428.xx), Alzheimer’s disease (331.0), Parkinson’s disease (332.xx), diabetes mellitus (250.xx), and asthma (493.xx).

The computational algorithm is applied for each disease separately. First, the individual medical histories of an applicable disease were reconstructed from Medicare files combining all records with respective ICD-9 codes of that disease. Patients with the history of the disease before the date of interview in 1994 or in 1999 were excluded from the study for onset of this disease. Four variants of algorithms (Algorithms A, B, C, and D) of the identification of the disease onset from the disease-specific medical histories were considered. In Algorithm A, a date of a Medicare record (referred to as “this record” below in this subsection) is identified with the date of onset of applicable disease when both of the conditions mentioned below are met.(i)This record is the earliest record with respective ICD code as a primary diagnosis in one of four Medicare sources (inpatient care, outpatient care, physician services, and skilled nursing facilities). This choice is in accordance with the general practice of reconstruction of the date at onset from Medicare data [15, 16].(ii)In addition to this record, there is another record with its respective ICD code as the primary diagnosis from one of the four Medicare sources listed in (i), which appeared with a date different from the date of this record and not later than years after this record. Death occurred during this period (i.e., 0.3 years after this record) is also considered as the second record.In Algorithm B, the confirmation by the occurrence of the second record is not required; that is, only the first condition is valid. In Algorithm C, all codes (not necessary being primary) are considered valid and the confirmation is also not required. In Algorithm D, death is not considered as the second record.

2.3. Estimates of Age-Specific and Age-Adjusted Rates for the US Elderly Population

Age patterns of incidence rates are assessed by stratifying the sample into relevant age categories (a year, or several years). Empirical age-specific risks ( ) are calculated as a ratio of weighted numbers of cases to weighted person-years at risk: , where , , and is the individual weight; runs over all disease onsets detected in the age group, and runs over all individuals at risk in th age group. The individual weights (the NLTCS weights have been calculated by the US Census and released with the NLTCS data) are necessary to have the estimates representative of the entire US elderly population, that is, to take into account the effect of study design. The effects of study design also influence the calculation of standard error (SE) and confidence intervals of rate estimates. The used formula for SE is , where is the number of person-years estimated for unit weights. Thus, standard error is calculated based on the number of really measured individuals. A generalization of the formula for SE based on Wilson’s approach [22] is used when or is small. Comorbidity- and disability-specific patterns are calculated using the same equations for stratified population.

Age-adjusted rates (or directly standardized incidence rates) are the averages of age-specific rates. For population aged 66+, they are calculated as . There are many ways to estimate SE for the age-adjusted rate [2325]. In this study we use the simplest approach (in order to avoid dealing with uncertainties in SE estimation for low and even zero age-specific rates) based on the approximation suggested by Keyfitz [26], in which SE is estimated as , where is the unweighted sum of cases.

3. Results

The numbers of individuals in the pooled cohort without prevalent cases for each disease are shown in Table 1. Age-adjusted disease incidence rates are presented in Table 2 for all cohorts (i.e., for cohort of years 1994, 1999, and pooled for both time periods) and genders (i.e., males, females, and total population). For males, the highest incidence rates were observed for heart failure and prostate cancer, and for females, for heart failure and diabetes. Because of using the weights, the results both for age-adjusted rates and standard errors are valid for the US elderly population.

Table 3 provides information about evaluated gender disparities and time trends in the age-adjusted incidence rates. The 5-year time trends (from 1994 to 1999) were evaluated as the difference between age-adjusted estimates of two compared rates (i.e., for 1994 and for 1999) divided by the square root of the sum of their standard errors squared. Thus, the approximate -statistics are presented in Table 3 which can be compared to a normal distribution to estimate values of respective hypotheses. The 5-year decline was significant for heart failure and prostate cancer, and the trends to decline were observed for male lung cancer ( ) and female colon cancer ( ). The 5-year increase was significant for diabetes and Alzheimer’s disease. Gender disparities were observed (i) for heart failure, skin melanoma, lung cancer, colon cancer, and Parkinson’s disease—incidence rates of all these diseases were higher in males—and (ii) for asthma, with higher incidence rates in females.

The age, disability, and comorbidity patterns were analyzed for all considered diseases. The results for cancer and non-cancer diseases are presented in Figures 1 and 2, respectively. These patterns can substantially differ for two genders and two NLTCS cohorts. As follows from Table 3, gender disparities were more pronounced for lung cancer, colon cancer, melanoma, Parkinson’s disease, and asthma; therefore, sex-specific age patterns were presented in Figures 1 and 2 for these diseases. Time trends were more pronounced for prostate cancer, heart failure, Alzheimer’s disease, and diabetes—so, we presented cohort-specific rates for the total population in Figures 1 and 2 for these diseases. Among all diseases, the incidence rates were increasing with age for heart failure and Alzheimer’s disease; less prominent increase in incidence was observed for Parkinson’s disease. Rates of breast cancer and asthma were decreasing with age. Rates of other diseases had tendency to be more stable for all ages.

For several diseases (e.g., heart failure, diabetes, asthma, and Parkinson’s disease) the incidence rates were higher among individuals with severe disabilities, while for breast and prostate cancers the higher rate was registered among people with minor disabilities. The most dramatic increase of incidence with disability was for heart failure. Interestingly, that for many diseases institutionalized individuals have lower rates and for several (such as melanoma, lung cancer (males), colon cancer, and asthma) the lowest rates among all other disability groups including nondisabled individuals. However, for neurodegenerative diseases (i.e., Parkinson’s (females) and Alzheimer’s diseases) the rate for institutionalized individuals is the highest. Among individuals with high comorbidity indices (i.e., Charlson index) higher rates were observed for heart failure, melanoma, and Alzheimer’s disease. The incidence rates of cancers of breast and prostate and of diabetes had decreasing trends with increasing comorbidity index.

As we discussed in Section 2.2 several definitions of date at onset can be done using large administrative databases. Table 4 presents the results of calculating age-adjusted rates using alternative approaches. From these results we can conclude that the calculated rates are relatively stable. Thus, columns V1 and V2 represent calculations without age standardization using standard population of 2000 (V1) and without using NLTCS sample weights (V2). In the alternative censoring scheme (V3), the last day of observation is the latest day among (i) Part B coverage, and (ii) Medicare record in Part A or Part B, (iii) response on interview in the next NLTCS wave, while in the basic calculation, the final date of observation is the earliest date among dates of disease onset or death and the last date of cohort observation. Only minor changes in incidence rates obtained within V1–V3 strategies were detected. The results of calculations V4 and V5 reflect the effect of removing individuals from the cohort with a different level of additional coverage by HMO (exactly, by different fractions of months covered by HMO denoted by ). Since individuals covered by HMO are supposed to be healthier than the general elderly population, the obtained decline in the incidence rates under V4 and V5 strategies is expected. Other calculations represent less (V6–V11) or more conservative (V12) approaches to the definition of the date at onset. The higher variations are derived from V10 and V12 calculations types. In each of the approaches, one of the components of definition is replaced by an alternative one. Note that different choice of Medicare sources also influences the definition of prevalent cases, so using more Medicare sources does not always result in higher incidence rates. Overview of the results in Table 4 shows that the results on incidence rates of non-cancer diseases were more affected by the choice of calculation method than rates of cancers.

Observed variations in incidence rates display that different definition of incidence rate extracted from administrative data can be used and they result in significantly different incidence rates. However, male/female differences, estimates of time trends, and ratios of rates of various diseases remain stable across the different definitions of incidence rates.

3.1. Comparison to Other Studies

The age patterns (sex- and cohort-unspecific) calculated using two Medicare-based datasets (i.e., NLTCS-M as in this study and SEER-Medicare) were compared and an agreement was found for chronic diseases considered here [20]. Because of the same design for data collection and computation of disease incidences, the agreement between the rates obtained using these two datasets was expected. Further comparisons are required to compare our results with those obtained by other researchers in the studies of different designs and computational approaches. In such a comparison (described in detail below) we focus on each group of diseases, emphasizing (1) evaluated age-adjusted incidence rates, (2) the shape and level of the age pattern of incidence rates, and (3) sex differences. In general, the datasets which are used for disease analysis in elderly are predominantly disease-specific (i.e., on a single disease) rather than multidisease data. They are not specifically oriented toward the elderly population but include wide spectrum of age groups among which the people older than 65 years old (and especially 85+ years old) represent only a fraction of dataset.

3.1.1. Cancer

In USA, about 60% of cancers are diagnosed at age 65 and older [27]. The most detailed US data on cancer incidence come from the SEER registry [1]. Estimates of age-specific incidence rates obtained in our study on NLTCS-M data need to be compared with SEER estimates. Two types of disease cases were identified as cancers using the basic algorithm (i.e., Algorithm A): (i) confirmed in the second record and (ii) only one record existed, and death occurred just after the time of the first record. The second type of cases is not confirmed likely due to forthcoming death. The relative contributions of confirmed and nonconfirmed records differ for NLTCS-M and for SEER registry. The nonconfirmed cancer cases registered in the SEER registry are rare: about 95% of diagnosed cancers in the SEER registry are histologically confirmed and less than 2% of them are from the death certificate or at autopsy because of early death of a patient after the diagnosis [2830]. Therefore, we use the scheme for onset identification which excluded the second type of events, that is, Algorithm D. This approach exactly corresponds to onset definition in model V12 used in sensitivity analysis (see Table 4). The age patterns of incidence rates calculated using Algorithms A and D are presented in Figure 3. As expected, the SEER rates are mostly between predictions of these two algorithms. All cancers, except melanoma, have demonstrated a good agreement of the age-specific incidence rates. Rates for melanoma do not exceed two standard errors from the SEER estimates. In summary, the results obtained in our study demonstrated a good agreement on cancer incidence disease patterns with SEER registry.

3.1.2. Heart Failure

Figure 4 presents the comparison of the results on heart failure incidence rates obtained in our study with the results from cohort studies (summarized in [31]) such as Atherosclerosis Risk in Communities (ARIC), Cardiovascular Health Study (CHS), and Framingham Health Study (FHS). Figure 4 shows that our results for age-specific incidence rates of congestive heart failure (CHF) are in a good agreement for ARIC and FHS cohort studies estimates and significantly lower than those obtained from CHS. That could be due to the differences in disease incidence definitions in each study (i.e., what criteria were used for case selection/registration?), which could substantially influence the resulting incidence rates. For example, in the FHS cohort study the HF incident event is defined by combination of several major and minor criteria based on HF clinical symptoms, in ARIC study HF incidence rates were selected based on the hospital discharge ICD-9 codes 428 or 518.4, and in CHS cohort study HF incidence events were defined as diagnosed by a physician plus the patients receiving specific medical treatment (such as diuretic plus either digitalis, vasodilator, or angiotensin converting enzyme inhibitor) [31]. That can explain, at least in part, that incidence rates of HF in CHS cohort study were higher than those obtained from ARIC and FHS studies, as well as our results (see Figure 4).

3.1.3. Diabetes

Figure 5 shows that our results are in agreement with several studies performed on other cohorts, such as Canadian Study of Health and Aging [32], Zwolle Outpatient Diabetes project Integrating Available Care (ZODIAC-1, The Netherlands, [33]), and UK Pooled Diabetes Study [34]. In general, the incidence of diabetes did not change significantly with age in males and females in both of the above-mentioned studies and also in our study (except for the first and the last points of the age pattern obtained in our study). In ZODIAC-1 study the diabetes type II incidence in 1998–2000 was slightly higher than that in our study while in the UK the Poole Diabetes Study, those rates are slightly lower than in our study. Generally, the trend and absolute incidence rates in all considered studies correspond to our results. The age-specific predicted incidence rates of adult-onset diabetes were calculated in population-based retrospective study using community-based medical records in Rochester, Minnesota [35]: in 1985, the incidence rate per 100,000 person-years was about 600 for ages 70–74, and about 500 for ages 80–84. These results are in agreement with our estimates for these age groups.

Another study on diabetes was recently published by McBean et al. [5] who examined diabetes prevalence, incidence, and mortality from 1993 to 2001 among fee-for-service Medicare beneficiaries aged 67+ using a 5% random sample of enrollees. They estimated the rate at the level of 3,000 per 100,000. The reason of this disagreement could be in another scheme for diabetes onset identification [36] used for Medicare data. In this scheme, it is required that a second record with diabetes ICD must be observed only if the first one was registered as an ambulatory claim (i.e., a physician/supplier or hospital outpatient claim). This is a reason for the excess in the incidence rate in results of McBean et al. [5]. We recalculated incidence rates for diabetes using the approach used in [5, 36] and found almost 5-fold increase in the incidence rates comparing to the estimates given by Algorithm A. One of the alternative approaches for diseases onset identification model (model V11 given in Table 4) was exactly the approach used by McBean et al. [5]. The results for age-adjusted rates found for this approach were about 2,700 for males and 2,300 for females, which as expected are close to those obtained by McBean et al. [5].

3.1.4. Asthma

Asthma may occur in the elderly more frequently than it is usually appreciated and be, therefore, underdiagnosed [37]. Few studies described asthma onset at ages 65+ and available studies are often limited by small numbers of patients and a tendency to group all patients older than 55 or 60 years old as a single category [3841]. Age- and sex-specific incidence rates for definite and probable asthma per 100,000 person-years were analyzed in a population-based study in Rochester, Minnesota, for 1964–1983 [42]: incidence rates decreased with age the same way as in our study; however, the absolute incidence rates were lower than those obtained from our analysis—so, the rates were about 140 for males and 80 for females at ages 65–74, 110 for males and 70 for females at ages 75–84, and about 60 for males and 50 for females at ages 85+. Higher rates in our study could be due to the increased incidence of asthma over the decades, as well as due to the fact that data from the Rochester study were obtained from the medical records retrospectively, and it involved mostly Caucasians of a small midwestern city. Also the slight difference between male/female ratios in the two studies may also play a role. Also the results for incidence rate of asthma were reported by ARIC (summarized by the NIH/NHLBI in [31]) collected in 1987–2001. They were 225 for ages 65–74 and 398 (this estimate is considered unreliable) for ages 75–84, which is in a better agreement with our results. Female to male ratio in age-adjusted rates was similar in both studies: 1.49 (ARIC) and 1.27 (NLTCS-M). In a study conducted on Moscow population (Russia), which covered ages from the birth to age 85, asthma risk in adults declined steadily at ages 55+ in females and at ages 65+ in males and became very small among the oldest elderly [43, 44]. These declining trends are in agreement with our results.

3.1.5. Neurodegenerative Diseases (NDD)

The most common disorders in the NDD group are Alzheimer’s disease (and other dementia) and Parkinson’s disease. They have been in the focus of several studies and meta-analyses which estimated their incidence rates and age patterns in elderly populations. We compare the results of our calculation to the meta-analysis [4] and to the analysis of CHS [3] (Figure 6). Two algorithms (A and C) were used to calculate our results. The results of both of those studies are in agreement with our results obtained by Algorithm C and much higher than our results obtained using the base algorithm. The results from the Bronx Aging Study have demonstrated that whereas dementia incidence continues to increase beyond age 85, the rate of the increase appears to slow relative to that of 65- to 85-year-olds, suggesting that dementia in the oldest old might be related not to the aging process itself but to the age-associated risk factors [45, 46]. Similar pattern for Alzheimer’s disease was observed in the study based on inpatient claims in the NLTCS-M data for 1984–2001: risk decline was registered at ages 90+ [47]. Epidemiological studies on Alzheimer’s disease incidence from Europe, North America, Asia, Africa, Australia, and South America were analyzed and average annual age-specific incidence rates per 1,000 person-years were estimated for ages 65–95: the increasing rates of disease are in agreement with our data, and the absolute rates were in general agreement with our Algorithm C data [45].

The difficulties with studying the incidence data on Parkinson’s disease result from the difficulties in identifying a sufficiently large number of affected individuals in a well-defined or enumerated population. Low frequency of disease, difficulties in establishing diagnosis, and the absence of population-based disease registries contributed to the lack of its basic epidemiologic characteristics [48]. There is still few data on Parkinson’s disease incidence, especially in the oldest elderly, and it remains controversial whether there is a progressive rise in late life or a decline in incidence [49, 50]. Parkinson’s disease cases were analyzed for 1994-1995 for the members of the Kaiser Permanente Medical Care Program of Northern California: incidence rates per 100,000 were 38.8 at ages 60–69, 107.2 at ages 70–79, and 119.0 at ages 80–89, being more than twice higher in males than in females at ages 70+ [48]. Although our rates for ages above 80 are higher by a factor 1.5–2 and male to female ratio is 1.4, since statistical errors are large, we can conclude that generally these results are in agreement with those observed in our study.

4. Discussion

4.1. NLTCS-M Information As a Source of Information about Referent Values of Incidence Rates for the US Elderly Population

In this study the estimates on changes of incidence rates of cancer and non-cancer chronic diseases with (i) age among the older US adults (males and females), (ii) disability prevalence, and (iii) prevalence and severity of comorbidities were obtained using the NLTCS-Medicare-linked data. Disease incidences were analyzed for ten chronic conditions representing major groups of chronic conditions in the elderly; (i) circulatory: heart failure; (ii) cancer: breast, prostate, lung, and colon cancers and melanoma; (iii) neurodegenerative: Parkinson’s disease and Alzheimer’s disease; (iv) diabetes; and (v) asthma. We have demonstrated that NLTCS-M dataset could be very useful for answering the spectrum of questions on the elderly health in the USA from both medical and economical perspectives. Also, this dataset allows for bringing additional information which cannot be obtained from the other datasets: for example, comorbidity and disability. These data are population-based, minimizing selection bias with respect to geographic region, urban versus rural location, racial health disparities with a whole spectrum of race- and ethnic-specific populations, and socioeconomic characteristics. Each of these factors is an important predictor of disease risk, progression, treatment availability, and response. This information is limited in databases from more restricted populations [15].

4.2. Identified Properties of Incidence Rates

Generally, the obtained results were in accordance with our expectations. The comparison of the age patterns with other studies, as well as their sex differences and time trends, demonstrated similarities of these patterns with those obtained in other population studies in USA and other countries. Patterns of the majority of diseases were well described by the base algorithm, the most important features of which included (i) the occurrence of primary diagnosis in one of four Medicare sources (inpatient care, outpatient care, physician services, and skilled nursing facilities) and (ii) the confirmation of the diagnoses by another record. The only exception was Alzheimer’s disease: its patterns required certain corrections to the base algorithm to be adequately described; that is, only one record is sufficient and it needs not be primary. Note that for Alzheimer’s disease the nonpostmortal diagnosis (i.e., largely subjective or included a subjective component in distinguishing between Alzheimer’s disease and dementia) is still implemented; therefore, our result that an algorithm without confirmation better fits data could tell us that the incidence of Alzheimer’s disease can be overestimated among the oldest elderly patients [51, 52]. Disability was measured using self-reported information, while comorbidity was estimated using Medicare records during the year prior the date of interview and the beginning of 5-year follow-up period. Because of using the special weight, the results are valid at the national level. The properties of these patterns were briefly discussed in Section 3.

Several types of age patterns were observed in our study. The first type was flat or plateau. The diseases manifesting this shape were prostate cancer, melanoma, and diabetes. Note that in the analyses of the shape of age patterns the first and the last point can be cut. The first point can still have a mixture of prevalence case and the last point typically has larger statistical uncertainty. The second type of the shape was monotonic increase with age. Diseases with this shape were heart failure and Alzheimer’s disease. The third type had the shape with a maximum or inverted U-shape. Respective diseases were lung cancer, colon cancer, and Parkinson’s disease. Age at maximal rate was in the region 80–90 years for all cases. The fourth shape appeared in the analysis had monotonic rates decline. Breast cancer and asthma possessed these shapes.

Occurrences of the shapes with a maximum and, especially, with monotonic decline contradict the hypothesis that risk of geriatric diseases correlates with accumulation of adverse health events (genetic mutations, deterioration of vascular system, immunosenescence, etc.). Three basic concepts could be considered to explain this phenomenon. The appearance of such effects can be attributed to the effect of selection [53], where frail individuals do not survive to advanced ages. Respective approaches are popular in cancer modeling and were successfully applied to the SEER data [5456]. Another explanation is related to the possible underregistration of diagnoses at advanced ages; that, however, cannot be proved with available data [57, 58]. Other possible biologically motivated explanations are discussed in [59]: the authors suggested the mechanisms of possible contribution of individual aging to the shape of mortality curve by subdividing the individual age-associated changes into three components which had different influences on morbidity and mortality (i.e., basal, ontogenetic, and time dependent) and discussed how all three components of individual age-associated changes may interact in human organism and influence patterns of morbidity and mortality in population.

Also, note that the shape of age pattern can depend on how broad or, alternatively, how narrow the definition of the set of disease is. Akushevich et al. [60] considered broader definition of the diseases. For example, they considered total cancer disease including all malignant neoplasms (140–208) and neurodegenerative disorders (NDD) included psychoses (290–299), nonpsychotic mental disorders (300–316), and hereditary and degenerative diseases of the central nervous system (330–337). In such definition incidence rates for NDD monotonically increase with age. This is because NDD groups cover a wide range of specific diseases that are prevalent in the elderly population. Relatively few individuals at advanced ages do not have diagnoses corresponding to diseases from this group, significantly decreasing the number of unexposed persons. By focusing on specific diseases from these groups, we observe less pronounced increases—or even declines—of incidence rates at advanced ages that result in a peak in the age pattern.

4.3. Addressing the Limitations of Medicare Data

Medicare claim data have certain limitations that are with a matter of the determination of diagnoses. Sensitivity analysis is one possible way to deal with such uncertainties. In this paper we analyzed several sources of possible uncertainties such as different definitions of disease onset and different censoring schemes. The rates obtained for different schemes of onset identification can be significantly different, but this simply corresponds to different definitions of incidence rates, many of which are used in epidemiology (e.g., fatal and nonfatal incidence rates). The following specific sources of potential uncertainties were in the focus of this study: (i) approaches to identification of incident cases (all Medicare sources, keeping only primary diagnoses, different definitions of the onset), (ii) censoring strategies (dependence between recovery and death), (iii) disenrollment from Medicare and coverage by HMO, and (iv) study design effects (e.g., using NLTCS weights). Also, the impact of factors of observed heterogeneity (e.g., age, disability, and comorbidity) on the incidence rates is also investigated.

4.4. Incidence Rates and Forecasting Models

Modern models of forecasting the health state and the associated medical costs include three essential components or submodels: (i) the model of medical cost projections conditional on health state, (ii) health state projections, and (iii) description of initial health state of a cohort to be projected [6164]. In this paper we investigated the risks of the chronic diseases and their age patterns as well as associations of the risks with potential covariates like comorbidity and disability indices. A detailed investigation of the model of medical cost projections conditional on health state was presented by us in [19]. Given the developed models of health state and the medical cost projections (as well as a quantified description of the initial state of a cohort), the projection for a population can be constructed by simulating individual trajectories. The total expenditures are calculated as a sum of medical costs obtained for each individual. If the health state needs to be described by a number of chronic diseases, then the detailed stratification over respective categorized variables or use of multivariate regression models allows for better description of health states; however, it can result in an abundance of model parameters to be estimated. One way to overcome these difficulties is to use an approach where the model components would be some demographically based aggregated characteristics allowing to mimic the effects of specific states. In this case the information about comorbidity- and disability-specific patterns of incidence rates has to be incorporated into the model. The model developed using this concept (i.e., the use of comorbidity index rather than the set of multiple correlated categorical variables representing the health state) allows for essential reduction in the degrees of freedom of the problem. Note that time trends and covariates are naturally implemented into such an approach.

5. Conclusion

The age-, disability-, and comorbidity-specific incidence rates of ten highly prevalent aging-related chronic diseases were analyzed using the NLTCS-M data. The most appropriate approach for identification of the disease onset required forthcoming occurrence of repeated claims containing chosen ICD codes as a prime diagnosis in basic Medicare sources. Comparing the age patterns obtained using this computational approach with those available in the literature showed a good agreement for the majority of diseases. Thus, the national incidence rates can be adequately evaluated from the Medicare service use files. Usefulness of the Medicare data for evaluation of the national incidence rates is very important because of limited data sources for evaluation of incidence patterns at advanced ages in the national population. These timely results can inform current scientific and policy debates about the effects of biomedical research and therapeutic innovations on disease incidence at increasingly advanced ages.


The research reported in this paper was supported by the National Institute on Aging Grants R01AG027019, R01AG032319, and R01AG028259. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health.