About 80% of all cancers are diagnosed in the elderly and up to 75% of cancers are associated with behavioral factors. An approach to estimate the contribution of various measurable factors, including behavior/lifestyle, to cancer risk in the US elderly population is presented. The nationally representative National Long-Term Care Survey (NLTCS) data were used for measuring functional status and behavioral factors in the US elderly population (65+), and Medicare Claims files linked to each person from the NLTCS were used for estimating cancer incidence. The associations (i.e., relative risks) of selected factors with risks of breast, prostate, lung and colon cancers were evaluated and discussed. Behavioral risk factors significantly affected cancer risks in the US elderly. The most influential of potentially preventable risk factors can be detected with this approach using NLTCS-Medicare linked dataset and for further deeper analyses employing other datasets with detailed risk factors description.

1. Introduction

About 80% of all cancers are diagnosed at ages above 65 years, and up to 75% of cancers are thought to be associated with behavioral factors—if modified, they could significantly reduce cancer burden [1]. Analyzing an impact of the modifiable factors on cancer risk, it has been speculated that about 50% of cancers are potentially preventable [2]. Although there are many specific results clarifying the effects of lifestyle factors on risk of lung, breast, prostate, colorectal, and other cancers, both the roles of various lifestyle factors and combined effects of multiple factors are still not clear. The availability of large datasets with more detailed information provides a new prospective in studying the role of behavioral factors in the cancer risk both for each factor alone and by taking into account risk factor interactions.

The sources for obtaining the evidence on associations between behavioral factors and cancer risk include in vitro studies, animal experiments, ecological studies, and case-control studies. However, there are certain limitations in providing with exposure-to-a-factor—cancer risk correlations [3, 4]. The most influential is study design biases (e.g., selection bias): for example, due to the fact that information on behavioral factors is usually collected by interviewing the patients with diagnosed cancers thus causing the bias of the estimates. The prospective cohort studies can avoid most of methodological biases; however, they are typically expensive, especially, when detailed questionnaires are required.

In this paper, analysis of multiple associations between behavior factors and cancer risk is presented using the National Long-term Care Survey linked to Medicare files of service use. The developed approach is free of many limitations usually accompanying similar studies. First, our approach is based on the cohort study in which the measurements were performed before the beginning of the cohort followup for cancer incidence. Therefore, selection or recall biases which are typical for case-control studies are not the case in the study design. Second, in earlier studies, the evaluated associations of the same type, and especially relative risks obtained for different lifestyle factors could hardly be compared between each other due to the differences in the study designs, time of measurements, and so forth. In contrast, data used in our study included multiple risk factors which were measured simultaneously, thus providing with possibility to compare evaluated associations between different risk factors. The used dataset is useful for both getting an additional knowledge about the roles of already recognized cancer risk factors as well as to establish new candidate behavioral factors which could potentially influence cancer risk in the US elderly which will provide with the background hypothesis to be tested in further analyses. Third, this study is based on population which is representative of the whole US elderly population, thus allowing to overcome such limitations of meta-analysis as heterogeneity bias, publication bias, and several others (reviewed by Manton, Akushevich, and Kravchenko [5], Sections 3.1 and 3.2).

2. Data and Methods

Two sources of data are used in the analysis: the nationally representative NLTCS, for measuring functional status and behavioral factors in the elderly, and Medicare Claims linked to the NLTCS, for cancer incidence in the US population. Breast, prostate, lung, and colon cancers were selected for analyses due to their high-incidence rates in elderly and because their incidence rates can be relatively well reconstructed from Medicare data. The SEER data were used as “a gold standard” to compare age patterns of selected cancers from SEER with age patterns prediction based on NLTCS-Medicare data.

2.1. Medicare Claims Data

Medicare is the primary health insurer of 97% of the U.S. population aged 65+ years. All Medicare beneficiaries receive Part A benefits, which cover inpatient care in short- and long-stay hospitals, skilled nursing facilities, home health, and hospice care. About 95% of beneficiaries are also subscribed to Medicare Part B to obtain the benefits covering physician service, outpatient care, durable medical equipment, and home health (in certain cases). The Medicare claims records contain the information on dates and costs of each service, types of providers, ICD-9-CM diagnoses, auxiliary diagnostic codes, and procedure codes.

2.2. The National Long-Term Care Survey

The NLTCS (1982, 1984, 1989, 1994, 1999, and 2004/5) contains longitudinal and cross-sectional data on a nationally representative sample of about 49,000 US individuals aged 65+ years, with 17,000–20,000 age-eligible survivors in each of six rounds. The 1994 and 1999 NLTCS waves were analyzed: more than 200 variables were selected in each wave of survey being grouped as follows (a complete list of all variables used in the analysis is presented in Table 1 in the Electronic Supplementary Material available online at doi: 10.5402/2011/415790):(a)demographic characteristics (4 variables: sex, race, marital status, and urban versus rural living);(b)self-reported comorbidity (27 major medical conditions and recent medical problems);(c)daily living activities (22 variables: 6 activities of daily living (ADLs) with the two severity levels and 10 instrumental activities of daily living (IADLs));(d)range of motion (16 variables reflecting ability to perform daily activities such as walking, using fingers to grasp and handle small objects, climbing stairs);(e)physical activity list (29 variables, 25 of them reflecting specific physical activities (e.g., golf, tennis) were measured in 1994 only);(f)nutrition and social activities (30 variables, 24 of them representing a nutrition survey were measured in 1999 only);(g)alcohol consumption and smoking (4 variables, reflecting two severity levels);(h)other functioning (28 variables reflecting self-estimates of health, information about mood, habits, keeping in touch with friends and relatives, and if they are satisfied with their life);(i)housing and neighborhood characteristics (23 variables describing the area, housing, and amenities where sample lives, as well as including information whether he lives with other household members, and neighborhood characteristics);(j)health insurance (6 variables containing information on coverage by Medicare, HMO, Medicaid, etc.);(k)medical providers and prescription medicine (44 variables providing with information on the use of health care services and public and private expenditures for health care services);(l)cognitive functioning (18 variables about cognitive status of individuals, 10 of them are measured in 1994 and 11 of them are measured in 1999);(m)income and assets (4 variables correlated with socioeconomic status of individuals);(n)body mass index (5 variables representing Body-mass index and eating style).

The following concept was used for selecting the variables to be studied. First, we collected all substantive variables measured in certain NLTCS surveys which were independent from responses to other questions. The most of variables were binary, and the variables with multiple outcomes were dichotomized by aggregating outcomes with similar meanings. Then, among these variables only those with low frequencies for missing data were kept: the frequencies for missing data were less than 0.02 for 65% of variables, 0.02–0.05: for 16% of variables, 0.05–0.15: for 8% variables, 0.15–0.25: for 8% of variables, and 0.25–0.45: for 2% of variables (the variables of the last group contain the questions about individual’s cognitive status).

2.3. Methods for Association Studies

For each variable (232:for 1994 and 229:for 1999 surveys) the association with four cancers incidence were analyzed using 1994 and 1999 surveys, in total, 3,688 associations. The empirical analysis and methods of univariate, two-factor, and multivariate statistical estimation with Cox’s proportional hazards model were used, with individual weights (so-called, CDS Detailed Cross-Sectional NLTCS weights) for obtaining the US elderly population relevant results. The standard errors for all estimates were calculated based on real numbers of individuals, that is, for nonweighted populations. For small numbers of individuals in a certain stratum, corrections for the standard calculation of standard errors were used according to [6].

For lung, colon, breast, and prostate cancers, the age-adjusted incidence rates conditional on a specific outcome (i.e., specific answer on a specific question) were estimated and the relative risks were estimated as the ratios of the rates for alternative outcomes. Note, that age-adjusted risks were calculated for subpopulations with different responses for certain questions/variables (e.g., current smokers and nonsmokers) using the same population weights for both outcomes. Therefore, the rates conditional on a specific outcome of each variable were adjusted for total population, thus taking into account a possible effect of age dependence of certain outcome prevalence. For example, lung cancer rates in smokers and nonsmokers were adjusted for age structure of total population to include smokers, non-smokers, and individuals with missing information on smoking status.

Calculations of relative risks of specific outcomes for all cancers were also performed in the univariate proportional hazard model. SAS software PHREG was used for parameter estimation. Two basic methods of individual followups were used and compared: (1) the time-period-based followup started from the date of individual interview for which stratification by age and sex were used at the maximization of partial likelihood, and (2) the age-based followup started from the age at interview.

2.4. Procedure for Onset Identification

The age at onset was defined for each studied cancer. First, individual medical histories were reconstructed from all Medicare files combining all records with respective ICD-9 codes: breast cancer (174.xx), prostate cancer (185.xx), lung cancer (162.xx), and colon cancer (153.xx). Second, individuals with the histories of the considered cancer before the date of interview were excluded. Then, the date of Medicare record (referred as “this record” below in this subsection) was identified with the date of cancer onset if both two below conditions are satisfied:(i)this record was the earliest record with respective ICD code as a primary diagnosis in one of four Medicare sources (inpatient care, outpatient care, physician services, and skilled nursing facilities);(ii)there was another record with respective ICD code as a primary diagnosis in these four Medicare sources which appeared in another claim and on a date other than the date of this record and no later than 0.3 of a year after this record.

Since we analyzed the cases starting from 1994 and Medicare histories were available from 1991, we had a sufficient time period (>36 months) to reject the prevalent cases. In this analysis we also excluded the individuals with additional coverage by HMO, as well as individuals enrolled into Medicare less than half a year before the interview in year 1994 or 1999. Table 1 presents the age-adjusted rates of cancer incidence in 1994–1998 and 1999–2004 compared with those calculated for these periods using SEER data.

3. Results and Discussion

Age-adjusted estimates for associations between behavioral factors and the risk of four most common cancers (lung, prostate, breast, and colon) were calculated by three methods: (i) calculating the age-adjusted rates with adjusting of each subpopulation (i.e., with positive and negative outcomes for a specific question) for total population; (ii) based on the proportional hazard model with two approaches for choosing the follow-up variables based on age and (iii) on time. All three methods took into account the fact that age is the main and well-documented cancer risk factor. They were designed to analyze the associations for same-age individuals.

3.1. Associations with Cancer Risk: Results and Discussion

Using the three approaches discussed above, we calculated the age-adjusted associations between behavioral factors and the risk of four most common cancers (lung, prostate, breast, and colon) and selected the most significant lifestyle variables associated with increased cancer risk. A specific list of selected associations depends on selection criteria. The strictest criterion used in the analysis was based on the Bonferroni correction and an additional requirement that the found associations have to be detected both in the cohort of 1994 and 1999. Only two associations were found to satisfy this criterion: heavy cigarette smoking and lung cancer ( R R = 7 ), and cancer history presence (cancer site nonspecified) and breast cancer risk ( R R = 6 ) (in part, the high relative risk could be due to a mixture of the prevalence cases which cannot be separated from incident cases). The Bonferroni correction is too conservative, and it is supposed to be applied to independent hypotheses testing, which is not the case of this study due to the explanatory variables correlated especially within the specific groups. Therefore, two other criteria based on P-values equaling  .05 and  .002 are used.

Keeping the associations with 𝑃 < . 0 5 detected at least by two of three methods resulted in a list containing 40 variables for breast cancer, 25 for prostate cancer, 43 for lung cancer, and 23 for colon cancer (see Table 2 in the Electronic Supplementary Material available online at doi: 10.5402/2011/415790). Tables 2(a)–2(d) shows the sublist that included variables for which at least one of six relative risks estimated by three methods for two years has P-value lower than  .002 (marked with bold font). The majority of these variables were obtained from the subgroups such as comorbidity and health status, housing and neighborhood characteristics, nutrition, social activities, and other functioning. Specific variables in the list from the other groups are body mass index and the type of insurance coverage for breast and lung cancers, alcohol consumption and physical activity for prostate and colon cancers, and cigarette smoking for lung cancer.

Comorbidity is an important risk factor of cancer mortality, however its role in cancer risk is not so clear and varies depending on cancer site. Our results demonstrated that the comorbidity effect was larger for breast and colon cancers and less pronounced (but still significant) for lung cancer. Specifically, circulatory disease and certain neurological disorders increased breast cancer risk, and pulmonary diseases were associated with increased risks of lung and colon cancers. For example, having pneumonia during last year was associated with increased risk of colon cancer ( R R = 2 . 9 ), and emphysema was associated with the increased lung cancer risk ( R R = 3 ). It has been shown in other studies that prior history of respiratory diseases such as emphysema, asthma, and pneumonia was associated with increased lung cancer risk, for example, for emphysema O R = 2 . 8 7 [7]. The causal nature of the association between respiratory diseases and lung cancer remains speculative because since both emphysema and chronic bronchitis are strongly influenced by smoking. There is an evidence that inflammation may also play a role in colon carcinogenesis (through C-reactive protein and, probably, interleukin-6 factors), however, epidemiological studies are sparse [8]. Also pneumonia could be associated with smoking, which in turn may increase the risk of colon neoplasia [9]. At present, however, there is no proven hypothesis about the role of respiratory diseases in lung and colon carcinogenesis, and empirical evidences are not entirely consistent and are largely derived from the observational epidemiologic studies [10].

For prostate cancer, there were indications of inverse association with comorbidity in our study, for example, prostate cancer risk was lower for persons with arthritis (both osteo- and rheumatoid arthritis) ( R R = 0 . 4 8 ). The inverse associations for other comorbidities were not significant; however they did demonstrate the tendency. Our study also demonstrated the reduced prostate cancer risk ( R R = 0 . 4 5 ) in patients with self-reported diabetes. These results are in agreement with recently published data from the Prostate Cancer Prevention Trial in which diabetes was associated with reduced risk of prostate cancer: O R = 0 . 5 3 and O R = 0 . 7 2 were detected of the risk of a low-grade and high-grade tumors, respectively.  Particularly significant inverse association with prostate cancer risk ( O R = 0 . 2 7 ) was found for early-onset diabetes (diagnosed before age 30) [11, 12]. However, the mechanisms underlying these associations are still not completely clear.

We have found that physical activity decreased risks of all four studied cancers, with a more significant decrease in individuals who reported moderate activities ( R R = 3 . 5 ). The effect of vigorous activities was also positive (i.e., reducing cancer risk); however, the estimates of RRs varied depending on the type of physical activity. Note, that while analyzing the effects of physical activity the bias could occur due to the difficulties in measuring this factor, its overreporting, and confounding factors. However, the inverse associations with physical activity (i.e., reducing cancer risk) have been described in other studies for most of human cancers, including colorectal, breast, prostate, and lung [13].

Our results demonstrated that maintaining normal body weight was associated with decreased risks of cancers of breast ( R R = 0 . 5 5 ), prostate ( R R = 0 . 6 ), and colon ( R R = 0 . 4 ). A “tradeoff” between the effects of BMI (measured, not self-reported) on breast and lung cancer risks was detected: while normal BMI (18–25 kg/m2) reduced breast cancer risk twice, lung cancer risk increased more than three times, and vice versa, that is, BMI above 25 kg/m2 doubled breast cancer risk while diminished risk of lung cancer. The inverse association between BMI and lung cancer could be due to confounding smoking (i.e., smokers may maintain lower BMI easier). Data from the multiple case-control and cohort studies suggest this possibility: after adjustment for confounding smoking, the inverse association became insignificant [14]. Other studies showed that an excessive calorie intake was strongly related to colon and postmenopausal breast cancer risk [15]; however, not enough evidence was provided for prostate cancer risk of its association with body mass index [14].

Two variables were used in our study to characterize alcohol consumption: (i) “drinking alcoholic beverages such as beer, wine, or liquor no more than 1–3-times a month or not drinking at all”, and (ii) consuming the alcohol “at least 1 or 2 times a week”. No significant associations were found for the first variable, while the second variable (i.e., heavier alcohol consumption) was associated with increased prostate cancer risk ( R R = 1 . 7 ). This is in agreement with recent results obtained from the Prostate Cancer Prevention Trial [16]. Besides, we did not consider cancers for which alcohol consumption an evident risk factor.

The effects of dietary patterns on cancer risk in our study were not statistically significant for all studied cancers, and associations for these cancers were also not proved in other studies [1720]. Specifically, no clear association was found between fruits and vegetables consumption and reduced colon cancer risk. This is in agreement with weak or nonsignificant associations obtained from other studies. These results do not support the existence of protective role of dietary fiber against colon cancer [21, 22]. Also, no association was found in our study between beef, pork, and lamb (without specification on well-done or other cooking regimen) consumption and increased colon cancer risk. The recent meta-analyses showed that high intake of processed meat but not fresh meat could increase risk of colon cancer [23, 24], while well-done meat could increase colon cancer risk in susceptible individuals with rapid-rapid phenotypes of NAT2 and CYP1A2 [25].

No associations have been found in our study between breast, colon, and lung cancer risk and different levels of disability. This is in accordance with the results obtained from other studies where no significant protective effects of disability were found [26]. However, a markedly decreased risk of breast cancer was observed among disabled older women compared with physically capable but inactive women [27]. In our study the positive association between decreased risk and disability was detected for prostate cancer.

Several stable associations, which cannot be straightforwardly interpreted, were detected. These associations are of noncausal character and can be further investigated in two-factor analysis using other measured variables as confounders or mediators for their explanation. They could also be due to unobserved heterogeneity in cancer risk and could potentially be clarified in future studies. An example of such an association is the relationship between breast cancer risk and nine variables from the HNC group (e.g., positive responses to questions like “Which of these things would make things easier or more comfortable for you: extra wide doors or hallways, push bars on the door, extra handrails, etc?”) which are strongly associated with breast cancer risk with RR from 4 to 7 and P-value of the association less than  .001. No associations of these variables were found for risks of other cancers.

Because of occurrence of false-positive results while testing the hypotheses and/or noncausal nature due to observed and unobserved confounding, the second step in the analysis was the two-factor analysis including effects of interactions between risk factors allowing for revealing effects of confounding and effectively taking into account the mutual correlations inside the groups of similar questions.

3.2. Two-Factor Analysis

Analysis of simultaneous effects of two variables allowed us to check whether associations found in unidimensional analyses were confounded by other measured variables. Simultaneous effects of all possible pairs of variables were evaluated using the Cox proportional model focusing on detecting the significant change in the estimated relative risk (or the loss of the significance of the estimate) after adding the second variable. Several types of the effects of second variables were identified: confoundings, candidate mediators, independent predictors, and overlapping predictors (notation is discussed by [28, 29]).

As expected, smoking was the strongest and most often confounding of other risk factors for lung cancer: it changed substantially the effects of sex (by 2.2  𝜎 𝐸 , where 𝜎 𝐸 is the standard error of the RR estimate in unidimensional analysis, in 1994 and by 0.6  𝜎 𝐸 in 1999), urban living (0.8  𝜎 𝐸 ), easy loosing temper (1.2  𝜎 𝐸 ), emphysema (0.7  𝜎 𝐸 ), and BMI (0.7  𝜎 𝐸 ) on lung cancer risk. The BMI changed effects of diet (about 1.0  𝜎 𝐸 ) and lost appetite (0.6  𝜎 𝐸 ). Smoking, physical activity, social activity, satisfaction with life, overeating, type of medical insurance, and access to medical services (except of the Veteran Administration insurance) were independent from mediating lung cancer risk by other factors. Certain variables whose effects changed the initial effect of independent variable (i.e., that in unidimensional analysis) can be causally linked to the effect of independent variable; therefore, they can be considered as mediators. Detailed analysis of such causal pathways requires further investigation using a theory of statistical mediation MacKinnon [29] and will be performed elsewhere.

For other cancers, the confounding/mediation effects were less noticeable. The BMI influenced effects of certain comorbidities (such as circulatory diseases, about 1.0  𝜎 𝐸 ) and overeating (0.6  𝜎 𝐸 ) on breast cancer risk, while the effects of vitamins, social activities/hobbies, and contacts with relatives were independent from the effects of other. For colon cancer, BMI, being almost an independent factor, mediated the effects of sex (1.4  𝜎 𝐸 ), alcohol consumption (0.8  𝜎 𝐸 ), and overeating (0.7  𝜎 𝐸 ).

The two-factor analysis also revealed the situations with overlapping effects among variables which are correlated, codominant, and have no temporal precedence, for example, effect of variables from the HNC group on breast cancer risk. This is because of their mutual correlation and the so-called effect of statistical collinearity, when the estimated effect of a predictor cannot be interpreted itself. Summarizing, the effects of confounding evaluated in two-factor analysis do not change the conclusions made while using univariate approach but further specify the evaluated associations. Also this analysis clarified that further progress can be achieved by investigating the combined effect of correlated variables (e.g., those from the same NLTCS group) by constructing an aggregated index.

4. Outlook and Conclusion

In this study, we analyzed how lifestyle factors represented by a number of variables were associated with incidence of four most prevalent cancers such as lung, prostate, breast, and colon. Overall view on the results of association analyses allowed us to describe population groups of higher and lower risks of these cancers. Being a smoker was the main characteristic of elderly population group of higher risks of lung cancer, with comorbidity (e.g., emphysema), lower BMI, and poor functional status also each playing the role. The population of higher risk of colon cancer was characterized by a higher BMI and comorbidity. The elderly women at higher breast cancer risk reported higher occasional activity and intentions to improve things around her day to day (these relationships could be indirect and require further investigation). The group of higher risk of prostate cancer had lower comorbidity, disability, and functional status (partly, it could be due to the underdiagnoses in individuals with poor health state).

In this study, many well-recognized associations were confirmed; however, certain fundamental questions about lifestyle effects on cancer incidence remain unclear. Specifically, directions for further investigations could include analyses (i) of comorbidity variables for which the inverse associations with prostate cancer was found (e.g., arthritis or diabetes), (ii) of associations indirectly related to cancer risk variables (e.g., housing, neighborhood, or income characteristics) which could be potentially explained in terms of confounding factors. From biomedical perspective, potential extension of this study could include the NLTCS-Medicare data analysis clarifying (i) how found cancer site-specific associations could be affected by racial disparities and (ii) whether there is a difference in factors effects and their mediation for cancers of reproductive and nonreproductive systems as well as more detailed analysis of sex-specific associations and potentially different role of certain factors in mortality and cancer risk in males and females.

Further analysis could deal with implementation of interactions between two or more variables using multi-factor analysis, applying the theory of statistical mediation [29], searching for so-called instrumental variables, and constructing quasirandomization using propensity score approach (reviewed by Faries et al. [30]). For example, the propensity score can be evaluated for each association by considering respective independent variable as “exposure” or “treatment” and using all other measured variables as predictors of the “exposure” in logistic regression. Another further investigation could focus on searching the latent variables capable of describing the heterogeneity in cancer risk: for example, for lung cancer such a variable could be self-care, psychological condition, or happiness (the latter is also a candidate variable for breast cancer). One formal method capable of identification of the latent variables associated with certain risk is the linear latent structure analysis [31, 32] when a score is identified by statistical methods and analyzed in a certain basis each component of which is associated with a group of higher/lower risk. Important feature of the method is that it takes into account mutual correlation between predictors.

The most influential (i.e., demonstrated the strongest association with cancer risk) of potentially controllable risk factors can be detected using the approach developed in this paper and then applied to further deeper analyses, including other data sets with detailed risk factors description/characteristics, for example, analyses of duration of exposure and intensity of risk factor. These approaches could provide with the steps toward the individualized forecasting of cancer risk potentially resulting in preventive strategies which could be oriented to population groups with specific characteristics such as those obtained from the indices and association/confounding findings of this study.


The research reported in this paper was supported by the National Institute on Aging Grants R01AG027019, R01AG032319, and R01AG028259. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health.

Supplementary Materials

Table 1 of Electronic Supplementary Material. List of all variables used and their outcomes in 1994 and 1999 surveys. Relative risks are given in Tables 2 of Electronic Supplementary Material and in Table 2 in the text.

Table 2 of Electronic Supplementary Material. Relative risks (only significant estimates are shown) of incidental cancer for specific variables, four cancer sites, and two surveys (1994 and 1999). The values of relative risks correspond to outcomes in column ‘outcome’. Three used methods are marked as AA (age-adjusted), CT (time follow-up in Cox model), and CA (age follow-up in Cox model).

  1. Supplementary Material
  2. Supplementary Material