Engagement in cognitively stimulating activities has been considered to maintain or strengthen cognitive skills, thereby minimizing age-related cognitive decline. While the idea that there may be a modifiable behavior that could lower risk for cognitive decline is appealing and potentially empowering for older adults, research findings have not consistently supported the beneficial effects of engaging in cognitively stimulating tasks. Using observational studies of naturalistic cognitive activities, we report a series of mixed effects models that include baseline and change in cognitive activity predicting cognitive outcomes over up to 21 years in four longitudinal studies of aging. Consistent evidence was found for cross-sectional relationships between level of cognitive activity and cognitive test performance. Baseline activity at an earlier age did not, however, predict rate of decline later in life, thus not supporting the concept that engaging in cognitive activity at an earlier point in time increases one's ability to mitigate future age-related cognitive decline. In contrast, change in activity was associated with relative change in cognitive performance. Results therefore suggest that change in cognitive activity from one's previous level has at least a transitory association with cognitive performance measured at the same point in time.

1. Introduction

With the rising proportion of older adults and increases in life expectancy [1], there has been increased interest in maintaining and promoting cognitive health in later life. Although declines in some domains of cognition are part of the natural course of aging [2, 3], sufficient evidence from prospective and observational studies indicates that the trajectories and outcomes of cognitive decline may be mitigated by participating in cognitively stimulating activities [4, 5]. Recent reviews of cognitive interventions suggest some potential benefits that may improve functioning in healthy older adults or slow decline in individuals with mild cognitive impairment (MCI) and those already affected with dementia [69]. Results from a meta-analysis of randomized controlled trials in healthy aging revealed a strong positive effect on cognition at immediate, medium-, and long-term followup after cognitive training [10]. Compelling results from large longitudinal studies have also shown that engagement in everyday cognitive activities predicts preserved cognition [11, 12] and decreases in incident Alzheimer’s disease [13]. It is therefore not surprising that the market for “brain fitness” technologies, currently valued at $300 million, is projected to swell exponentially within this decade [14].

A facet of this research that is relatively understudied involves examining the degree to which discrete types of everyday cognitive activity relate to change in specific cognitive domains over time. Most of the aforementioned trials incorporated training in multiple cognitive abilities and accordingly found support for cognitive training in general, but some reviews report less promising results for domain-specific training. Memory is frequently the targeted cognitive domain in many interventions, with training involving efforts to improve recall of newly learned information, including skills training, imagery, and mnemonic strategy use; however, meta-analyses reveal minimal efficacy for memory-focused techniques [15]. Investigations of the effectiveness of training in other cognitive abilities such as executive functions [16] and working memory [17] suggest that these have farther-reaching effects on cognitive function, but results are regarded as preliminary. Formal interventions employing cognitive skills training are rarely conducted outside of clinical trials, leaving observational studies as a valuable resource for evaluating potential benefits of everyday cognitive activities. Observational studies typically include self-report inventories of activities commonly regarded as cognitively stimulating, such as solving puzzles, listening to the radio, or reading books. While such studies are not experimental by design, they contribute a significant addition to the literature by assessing changes in accessible, everyday cognitive activities as these relate to change in cognitive abilities in a naturalistic setting.

Despite the promising body of work that has accumulated in recent years, definitive conclusions regarding the benefits of cognitive activity are precluded by several methodological concerns [11, 18]. Limitations include the inadequacy of activity assessments, psychometric variability of cognitive outcome measures, conceptual differences in the expected relationships between activities and specific cognitive domains, and insufficient assessment of moderating variables such as level of education and sex. While a study in which all these limitations are fully addressed has yet to be conducted, existing data from several longitudinal studies can be leveraged to disentangle some of these effects. In this paper, we intend to demonstrate that a coordinated analysis of four large longitudinal studies of aging can elucidate the benefits of changes in cognitive activity over time to performance trajectories in specific cognitive domains.

Thus, the purpose of this study was to examine the effects of self-reported everyday cognitive activities and changes in these activities on changes in four domains of cognition (reasoning, fluency, memory, and semantic knowledge) in longitudinal models that incorporate data from the Origins of Variance in the Oldest-Old: Octogenarian Twins Study (Octo-Twin), the Long Beach Longitudinal Study (LBLS), the Seattle Longitudinal Study (SLS), and the Victoria Longitudinal Study (VLS). This investigation was part of a larger coordinated effort to examine the effects of lifestyle activities on cognitive function across multiple large-scale longitudinal studies of aging that formed the basis of a meeting of the Advanced Psychometrics Methods in Cognitive Aging Workshop. The aim of this workshop was to use a common analytic protocol across studies from the Integrative Analysis of Longitudinal Studies on Aging (IALSA [19]) network.

These studies were specifically selected based on their collection of cognitive, physical, and social activity data along with a range of cognitive functioning measures over multiple occasions. While the cognitive activity and cognitive function variables are not always identical, the subsets of variables in each study were chosen based on the rationale that they tapped similar domains at the construct level; specifically, we chose measures thought to tap fluid reasoning (Gf, i.e., verbal reasoning, block design, and verbal fluency measures), short-term memory (Gsm, i.e., immediate recall of a verbally presented story or word list), and crystallized knowledge (Gc, i.e., measures of vocabulary and acquired knowledge [20]). In some cases the measures are the same, but more often they differ, precluding statistical combination of indicators and outcomes between studies. However, we argue that concurrently analyzing the data with the same method provides opportunities for both strict and conceptual replication within the same study—an approach that to our knowledge has not been attempted previously. Thus, our goal was to build and implement a common model to each dataset to enable comparisons across all outcomes for the four longitudinal studies. The two primary hypotheses we tested were whether (1) cognitive activity at baseline would predict the trajectory of cognitive function over time and (2) change in cognitive activity would predict change in cognitive function over time.

2. Method

2.1. Multistudy Analysis Overview

We report a series of mixed effects models that included baseline and change in cognitive activity predicting cognitive function over up to 21 years of time in four large-scale longitudinal studies of older adults. Three of the four studies included in this analysis (LBLS, SLS, and VLS) specifically aimed to study healthy aging and only recruited community-dwelling older adults who were presumed to be cognitively normal at baseline. The fourth study, Octo-Twin, also included a largely cognitively normal sample of older adults, but those who did have dementia diagnoses at baseline ( 𝑛 = 9 8 ) were excluded from the present analysis. In order to model roughly equivalent cognitive outcomes across the four study samples in this coordinated effort, analyses included selected measures of reasoning, fluency, episodic memory, and semantic knowledge from the larger battery of tests included within each longitudinal study sample. In the Octo-Twin study, there was no fluency measure available, and we thus present only three of the four cognitive outcomes. These cognitive tasks were selected to represent a range of cognitive abilities from basic to more complex functions. Study sample characteristics and demographic, cognitive function, and cognitive activity measures are described below (Tables 14).

2.2. Origins of Variance in the Oldest-Old (Octo-Twin) Sample (Sweden)

The Octo-Twin study is based on the oldest cohort of the Swedish Twin Registry and includes 702 participants aged 80 years and older at the time of the first examination. All individuals with a dementia diagnosis at baseline were excluded from the analyses ( 𝑛 = 9 8 ). The total sample included 604 participants, of whom 572 had cognitive measures. The longitudinal design for survivors included a maximum of five measurement times at two-year intervals beginning in 1991–1993. The average rate of attrition from one test interval to the next was 20% (10% per year), primarily due to death. Table 1 provides a summary of participant characteristics for the Octo-Twin participants included in this study.

2.3. Octo-Twin Materials and Procedure
2.3.1. Octo-Twin Cognitive Ability Measures

Reasoning was assessed using block design [21], in which participants are presented with red and white blocks and instructed to assemble the blocks to reproduce a design portrayed on a card within a predetermined time limit. As previously mentioned, the Octo-Twin study did not have a measure of fluency so this cognitive domain was not analyzed and compared with fluency results from the three other longitudinal studies. Memory was assessed using the Prose Recall test in which participants were asked for immediate free recall of a brief (100 word) story that had a humorous point [22]. Responses were coded for the amount of information recalled in a manner similar to the scoring of story units in the Wechsler Memory Scale Logical Memory test [23]. Semantic knowledge was assessed using the Swedish version of the WAIS Information Task [24], which requires participants to provide answers to questions assessing acquired knowledge of facts [25].

2.3.2. Octo-Twin Cognitive Activity Measure

The cognitive activity measure was based on self-report of engagement in six cognitively stimulating activities including playing games (e.g., chess and bridge), completing crossword puzzles, reading literature, writing, conducting genealogical research, or any otherdocumentation,studies, or other mentally demanding activity (e.g., handicraft), each rated dichotomously as “no” (0) or “yes” (1). Participants were also asked if they “train their memory or keep their mind active” rated as “no” (0), “yes, to a certain degree” (1), or “yes, definitely” (2). A composite score for cognitive activity was created by summing responses across items (range = 0–8). Change in cognitive activity was computed by subtracting the cognitive activity score at baseline from all subsequent activity scores.

2.4. Long Beach Longitudinal Study Sample (California, USA)

The LBLS was started in 1978 in Long Beach, California, with participants recruited from the Family Health Plan Health Maintenance Organization (HMO) who were primarily from Long Beach and Orange Counties. Panel 1 included 583 individuals aged 28–36 or 55–87. The ethnic composition of the older group (98% Caucasian) was similar to the 65+ population for the area based on the 1970 census. Panel 2, initiated in 1992, included 633 contacted from the same HMO (64 were excluded due to frank dementia, serious sensory, or neurological problems). In order to include the same measures as those in the Seattle Longitudinal Study, LBLS Panel 1 ( 𝑛 = 1 0 6 ) and Panel 2 ( 𝑛 = 6 3 1 ) data from 1994 to 2003 were used in the current analysis. During this period, data were collected at 3-year intervals. Only visits occurring at age of 55 or older were included in this study (baseline 𝑛 = 5 6 1 ). Dementia incidence is not known.

Demographic information and descriptive statistics for the sample are presented in Table 2. The table displays the number of participants at each test occasion that completed the four cognitive measures and the retention rates for those measures from one testing to the next (top line). Collapsed across measures and testing, the average retention rate from one testing to the next was 55.8% or 17% per year. Age and education increased from the first to the fourth test occasions, suggesting that the sample became more selective over time. Similar patterns of selection were observed for the cognitive measures of reasoning, memory, fluency, and semantic knowledge.

2.5. LBLS Materials and Procedure
2.5.1. LBLS Cognitive Ability Measures

Reasoning was indexed as a composite score of the Schaie-Thurstone Adult Mental Abilities Test (STAMAT [26]) Letter and Word Series tests. In Letter Series, participants viewed a series of letters (e.g., a b c c b a d e f f) and were asked to discover the rule that governs the series by identifying the letter from an array of four possible responses that should come next in the series. Participants were to complete as many of the 30 items as possible within six minutes. Word Series was a parallel test to Letter Series but the letters were replaced with months (e.g., January) and days of the week (e.g., Monday). Fluency was measured using Word Fluency, in which participants were instructed to write down as many words as possible in five minutes that begin with a specified letter “s.” Participants were instructed that they could not use proper nouns or create words by changing endings of other listed words (e.g., if the letter was “w” and you already said “want,” you should not also say “wants,” “wanting,” or “wanted”). Memory was measured using immediate written recall of a list of 20 concrete high-frequency nouns studied for 3.5 minutes. Semantic knowledge was assessed using the STAMAT Recognition Vocabulary test. Participants were given a word and asked to circle a synonym of that word from four possible alternatives. The test included 50 items completed within a 5 minute time limit.

2.5.2. LBLS Cognitive Activity Measure

The cognitive activity measure was derived from a modified version of the Life Complexity Scale (LCS), originally developed for the Seattle Longitudinal Study [27]. The modified scale consisted of six items from the LCS: educational activities, leisure reading, playing musical instruments, writing letters, playing games, and cultural activities. Participants were asked to record the number of “hours per week on average” they spent doing each activity. Due to extreme variability in reported hours observed within and between items, responses were dichotomized for the present analysis with those who reported no time spent on a given activity coded as 0 and those who reported one or more hours of activity coded as 1. Items were summed to create a composite measure of cognitive activity (range = 0–6). Change in cognitive activity was computed by subtracting the cognitive activity score at baseline from all subsequent activity scores.

2.6. Seattle Longitudinal Study Sample (Washington, USA)

The SLS was initiated in 1956 in Seattle, Washington, and includes eight samples recruited from a local HMO at seven-year intervals and followed longitudinally every seven years (total 𝑛 across all study samples = 4,854). The current analysis includes data from participants in the study from 1984 to 2005 (total 𝑛 across 1984–2005 study samples = 2,040) and includes longitudinal data for up to four testing occasions. This subset of the larger study was selected due to changes in measures used over the course of the entire study in order to have equivalent measures of cognition and activity at each time point and with the LBLS. Only visits occurring at age 55 or older were included in our analyses, yielding a total of 1,649 participants at baseline. Baseline was defined as each participant’s first study visit, and time was measured in all analyses as years in study (coded as 0, 7, 14, and 21). Attrition during these 7-year intervals was approximately 50%, or about 7% per year. Dementia prevalence and incidence are not known. See Table 3 for SLS participant characteristics over the four waves of data analyzed here.

2.7. SLS Materials and Procedure
2.7.1. SLS Cognitive Measures

Reasoning was assessed with the Word Series test from the Schaie-Thurstone Adult Mental Abilities Test (STAMAT [26]), in which participants were provided with a printed word series and instructed to choose the next word in the series in multiple-choice format by identifying the rule that governed a series. The test consisted of 30 items, and total score was based on number of correct responses completed in 6 minutes. Fluency was assessed with the Word Fluency test from the Primary Mental Abilities test [28], in which participants were asked to write down words beginning with the letter “s” following a rule set (do not use proper nouns and do not use different conjugations of the same word). Total score was based on number of correct responses generated in 5 minutes. Memory was assessed with a task in which participants were asked to study a list of 20 printed words for 3.5 minutes and provide immediate written recall of the items. Semantic knowledge was assessed with the Educational Testing Service (ETS) test of Advanced Vocabulary, in which participants were asked to identify synonyms for printed words from 5 choices [29]. Total score was based on number of correctly identified synonyms out of 36 test items completed within 4 minutes.

2.7.2. SLS Cognitive Activity Measure

The cognitive activity measure was derived by summing dichotomized test responses to five cognitive activity items (reading, educational activities, music, writing, and cultural activities) from a modified version of the Life Complexity Scale [27]. Cognitive activity change was computed by subtracting baseline activity from each follow-up activity measure.

2.8. Victoria Longitudinal Study Sample (British Columbia, Canada)

The VLS was begun in the 1986 in Victoria, British Columbia, and consists of three cohorts started in 1986, 1992, and 2001, respectively, followed longitudinally at 3-year intervals. Longitudinal data used in this study were from Samples 1 (baseline 𝑛 = 4 8 4 ) and 2 (baseline 𝑛 = 5 3 0 ). For this investigation, data from seven waves of Sample 1 and five waves of Sample 2 were included in analyses. Approximately 25% of the sample was lost to follow up at each wave or 8% per year. Dementia prevalence and incidence are not known. Relevant demographic information regarding the study sample is provided in Table 4.

2.9. VLS Materials and Procedure
2.9.1. VLS Cognitive Ability Measures

Reasoning was indexed by Letter Series [30] in which participants were presented with a series of letters and asked to identify the next letter in the sequence that was consistent with the sequence rule. Fluency was measured by performance on a similarities task [30]. In this timed task, participants were presented with target words and asked to write as many words as possible with the same or nearly the same meaning within 6 minutes. Memory was indexed using a 30-item noun list learning task comprised of five semantic categories. Participants studied the word list for 2 minutes followed by a 5-minute free recall task [30]. Semantic knowledge was assessed using a 54-item recognition vocabulary test. This task was adapted from the ETS Kit of Factor Referenced Tests [29].

2.9.2. VLS Cognitive Activity Measure

The cognitive activity measure included a subset of items from the VLS Activity Lifestyle Questionnaire [3]. The 27 items comprising the Novel Information Processing scale were selected due to the cognitively stimulating nature of the activities. For each item, participants indicated the frequency of engagement in that activity over the past two years on a scale from 0 to 9 (i.e., never, less than once a year, about once a year, 2 or 3 times a year, about once a month, 2 or 3 times a month, about once a week, 2 or 3 times a week, and daily). Individual item distributions were reviewed, and 11 of the 27 original items with little to no variability were eliminated. The remaining items were a priori hypothesized to fall into three general types of activities: those involving what we termed “Communication,” “Computations,” or “Conundrums.” Confirmatory factor analysis using Mplus version 6.0 [31] was conducted to test a three-factor model including six items indexing Communication (enrolling in college courses, giving a talk, attending lectures, studying a second language, writing, and writing letters specifically), five items indexing Computation (balancing a check book, performing mathematical calculations, working on taxes, engaging in business activity, and using a calculator), and five items indexing Conundrums (engaging in crosswords, chess/checkers, knowledge games, word games/scrabble, and jigsaw puzzles). Fit criteria were the comparative fit index (CFI) and the root mean squared error of approximation (RMSEA), where criteria for excellent fit include CFI > 0.95 and RMSEA < 0.05 [32]. Allowing for within-factor residual correlations, the model demonstrated acceptable fit (CFI = 0.95, RMSEA = 0.04). The factor scores generated by this analysis were then used as the primary predictor variables in three separate mixed effects models.

2.10. General Analytic Approach

The current analysis was conducted as part of a larger effort to examine the effects of lifestyle activities on cognitive function using the same analytic approach across studies from the Integrative Analysis of Longitudinal Studies on Aging (IALSA) network [19], and models were selected in part to maintain consistency across lifestyle activities. Across all four studies, we examined common demographic covariates including age (in years), years of formal education, and sex (coded as 0 = male, 1 = female). Age and education were mean centered to their respective study’s baseline mean value. In order to maximize use of all available data, we defined baseline as the first study visit for each participant with available cognitive activity data. We analyzed the data with mixed effects modeling using Stata software, version 12 (StataCorp, 2011) and restricted maximum likelihood (REML) estimation, random slopes and intercepts, and an unstructured covariance matrix. In the Octo-Twin study, participants were nested within their twin pair. In the VLS, we controlled for enrolment cohort. Model assumptions were verified by examining residuals computed using predicted values that included the random effects. Separate models were fit for each of the four cognitive measures. We defined the criterion for significance as 𝑃 < 0 . 0 5 . While we recognize that this criterion may be viewed as liberal, given the large number of comparisons across all statistical models in our analysis, we assert that this approach is warranted in this study as we are representing results from four independent longitudinal studies following similar statistical procedures for each. Thus, the emphasis in this paper is replication of the pattern of results across the studies. In this way, the “strictness” of the evaluation of the effects comes from noting whether a particular effect is replicated across the different samples. The “familywise” alpha rate is that which occurs within each study, not across all of them together.

3. Results

An initial 19-term model included the following terms: (1) baseline age, (2) sex, (3) education, (4) baseline activity, (5) baseline activity × age, (6) baseline activity × sex, (7) baseline activity × education, (8) individually defined time since baseline, (9) time × baseline age, (10) time × sex, (11) time × education, (12) time × baseline activity, (13) time × baseline activity × baseline age, (14) time × baseline activity × sex, (15) time × baseline activity × education, (16) change in activity from baseline (activity change), (17) activity change × baseline age, (18) activity change × sex, and (19) activity change × education. This full model was evaluated in each study data set independently, and terms that were not significant in any of the four studies were dropped in order to present a parsimonious set of results that retained the fullest set of parameters found in any study. This process eliminated 7 of the 19 terms, including all 3-way interactions, and four of the 2-way interactions, including the interactions between activity change and age, sex, or education, as well as the interaction between baseline activity level and sex. This resulted in a final model that included 12 terms summarized in Table 5. As a proper meta-analytic summary would require identical measures across a larger number of studies, we rely on straightforward comparison of the conclusions derived from each study.

3.1. Baseline Covariates and Cross-Sectional Relationships

There was a significant relationship between self-reported cognitive activity at baseline and baseline performance on tests of cognitive abilities across all measures and studies but the LBLS, which did not find this relationship in the reasoning and memory models. Overall, these findings suggest that participants who were more cognitively active at baseline tended to have better cognitive performance. One of the studies (VLS) included three distinct measures of cognitive activity—those involving Communication (e.g., writing), Computations (e.g., managing finances), and Conundrums (e.g., completing crossword puzzles)—enabling us to determine if specific cognitive activities were differentially related to the cognitive outcomes. While all three types of cognitive activities showed significant cross-sectional relationships with cognitive outcomes (all 𝑃 < 0 . 0 0 1 ), the strongest relationships with cognitive function were found for Conundrums, followed by Computation and Communication.

Older age was associated with lower baseline performance across all studies on measures of reasoning, fluency, and memory. In contrast, the relationship between age and baseline performance on semantic knowledge measures was inconsistent, with LBLS and Octo-Twin results suggesting lower performance in older age, SLS showing no age differences, and VLS suggesting that older age was associated with better performance. Baseline associations between sex and cognitive performance showed a consistent relationship across studies for all memory outcomes, with women consistently performing higher than similar aged men. SLS and LBLS women additionally performed higher than men on reasoning and fluency measures. Across other cognitive outcomes, baseline associations between performance and sex were less consistent, with VLS women performing better on fluency in the Computations model and Octo-Twin women performing lower than men on semantic knowledge. Higher education was consistently associated with higher baseline cognitive performance across all studies and cognitive outcomes.

Two baseline covariate interaction terms were retained in the final model, and both showed inconsistent relationships across studies and outcome measures: the age by baseline cognitive activity interaction term was significant in the VLS memory models and the Conundrums/semantic knowledge model. There was a similarly significant interaction between baseline age and activity level in the LBLS reasoning model ( 𝑃 < 0 . 0 5 ). The education by baseline cognitive activity interaction term in the VLS Communication models for reasoning, fluency, and semantic knowledge was significant, suggesting that those with lower education had a higher association between baseline activity and cognitive test performance. The VLS Computation and Octo-Twin models for semantic knowledge also showed this relationship.

3.2. Longitudinal Relationships

Across all studies and cognitive outcomes, there was, with one exception (VLS computation with reasoning), no evidence for baseline level of cognitive activity predicting change in cognitive outcomes over time. There was, however, a consistent positive relationship between change in cognitive activity from baseline and within person variability in cognitive outcomes across nearly all cognitive outcomes in all four studies. Specifically, after accounting for the expected linear within person trajectories, variation in cognitive activity was significantly related to variation in performance on all measures in all studies except reasoning and fluency in LBLS and reasoning and memory, in the case of Conundrums only, in VLS.

Within-person declines were seen over time across all studies and all cognitive outcomes except LBLS fluency. Older participants declined faster compared to younger participants on all VLS, SLS, and LBLS cognitive outcome measures except LBLS memory. Evidence for differential decline in older participants was not seen in Octo-Twin, which has a much narrower age range. Women declined less than men on fluency measures in the SLS and VLS Computations models and on semantic knowledge measures in the SLS and Octo-Twin study. Level of education was not a significant predictor of rate of cognitive decline in all but one study (LBLS) and one outcome measure (reasoning, coefficient = −0.06, 𝑃 < 0 . 0 1 ).

4. Discussion

Our results provide compelling evidence across four longitudinal studies that changes in everyday cognitive activity level tracks with variation in multiple aspects of cognitive function. In three of the four studies (Octo-Twin, SLS, and VLS), participants reported engaging in fewer cognitive activities over time. In the fourth study (LBLS), participants endorsed a slight increase in average number of cognitive activities over time, which was likely due to differential retention of higher functioning individuals. While the overall trend was for participants to report slightly less cognitive activity at each follow-up visit in all but the LBLS sample, there was actually considerable variability in activity change scores, with some participants in each study reporting increased cognitive activity at follow-up visits relative to their baseline levels.

These results suggest that there is an increased risk of cognitive decline for individuals whose engagement in cognitive activities decreases over time relative to their baseline levels, and, conversely, the results suggest that increases in cognitive activity from baseline are associated with better than expected cognitive performance. Cognitive activity change appeared to most consistently track with variation in semantic knowledge, as the activity change term was significant in all six models. Strong evidence of activity change tracking with fluctuations in memory and fluency was also indicated, as five of six models had significant activity change terms in the memory models and in four of the five fluency models. Activity change was significantly related to variation in reasoning in four of the six models, making the models with reasoning outcomes the least consistent relative to models with the other cognitive outcomes. That two of the four inconsistent findings occurred in LBLS, which had the most similarity with SLS, in terms of both measures and sampling, suggests that some other factor, such as attrition, may be responsible for these differences. It is interesting to note, however, that the standard deviation of reported activity level did not differ from that of SLS. The lack of association with reasoning and memory for one of the three VLS activity variables (Conundrums) could be due to chance, although this finding may also suggest that changes in level of engagement on tasks involving problem solving are less related to changes in reasoning and memory function than they are to changes in fluency and semantic knowledge.

Across studies, with the exception of VLS Computation with reasoning, there were no significant relationships between baseline cognitive activity and change in cognition over time, suggesting that level of cognitive activity at an earlier point in time is not related to subsequent cognitive decline. Thus, these results do not demonstrate that level of engagement in cognitively stimulating activities earlier in older adulthood can somehow increase one’s cognitive reserve or ability to maintain cognitive function in spite of age-related brain changes [33]. Nonetheless, our results do have important clinical implications in that they suggest that individuals who exhibit changes from a previous level of cognitive activity can be expected to have associated fluctuations in cognitive performance, or vice versa.

In terms of cross-sectional relationships, all studies provide evidence for activity/cognition relationships, and the VLS results allow us to conclude that level of engagement in cognitive activities involving what we termed “Conundrums” (e.g., playing chess, completing crossword puzzles) are most strongly and consistently related to concurrent function across cognitive domains, but evidence for relationships between engagement in activities involving Computations (e.g., balancing a check book) and Communication (e.g., writing letters) was also demonstrated. Thus, while the data do not provide particularly compelling evidence that engagement in one type of cognitively stimulating activity is preferable, activities involving novel information processing appear to be most related to concurrent cognitive function, a finding that is consistent with the extant literature [34].

The lack of evidence for cognitive activity level at baseline predicting cognitive decline over time in some respects may be interpreted as discouraging, as it implies that older adults who more frequently engage in cognitive activities may not be influencing the trajectory of their cognitive function in the coming years. However, across all studies, change in level of cognitive activity from baseline generally followed a normal distribution, with considerable portions of each sample reporting an increase in level of cognitive activity from baseline levels. The positive association between cognitive activity change and the cognitive outcomes across studies thus suggests that individuals who increase their cognitive activities may be effectively reducing age-related cognitive decline.

Our results demonstrate that older age is associated with faster decline, which supports the overall validity of our approach and suggests that we are detecting relevant change. The finding that education was not predictive of rate of cognitive decline with one exception (the LBLS reasoning model) suggests that education is not protective or predictive of a faster decline in normal aging. These multistudy results build upon findings from a recent paper using data from one of the studies (VLS) included in the current paper [35], in which the authors conclude that the relationship between education and cognitive performance is merely a cross-sectional relationship between level of education and cognitive function, and that longitudinal models that covary for baseline cognitive function are in effect creating a statistical artifact that is seen as an effect of education on rate of decline [34]. However, it is important to note that all studies included in the current analysis were designed to characterize normal cognitive aging, and results are not directly comparable to studies examining the effect of education level or cognitive reserve on the incidence and rate of decline in Alzheimer’s disease [33].

The current study has many strengths, including the large sample sizes and multinational representation in our study samples, which improves the generalizability of the findings. In addition, the inclusion of four separate studies with unique sample characteristics, methodologies for recruitment, different methods for measuring cognitive activity and cognitive function, and differing frequency and length of followup, all serve to minimize the likelihood that these findings are spurious. When results across such a coordinated analysis are inconsistent, any one of these differences between studies could be responsible for discrepancies and reflect a limitation of the design. For example, the inconsistencies in the relationships between baseline covariates and their interactions (e.g., sex and age with baseline activity level) highlight a weakness of our study design. Inconsistencies could also be attributable to the heterogeneity in the activity measures used across the four longitudinal studies, as the scales included different items with different response ratings, yielding restricted ranges of responses on some measures. It is also possible that the inconsistencies are due to differences in the cognitive outcomes used in the different studies, or any number of other differences in the methodologies across studies. However, it is important to note that when the model results demonstrate consistent patterns across studies despite variations in methodology, the heterogeneity of measures and sampling methods becomes a major strength of the multi-study approach, as there is improvement in the reliability of conclusions that can be drawn from the results, relative to the typical single-study design.

Perhaps the most obvious limitation inherent in the observational design of all studies included in this investigation is that conclusions implying causality cannot be inferred from these results. Specifically, while an increase in cognitive activity from baseline was associated with better than expected cognitive performance, and, conversely, activity decrease was associated with worse than expected performance, it is not possible to conclude that change in activity level was the cause for change in rate of cognitive decline. An alternative explanation is that decreases in level of cognitive activity from baseline levels observed in this study result from deteriorating cognitive functions rather than cause it. Put simply, this study design does not answer whether completing crossword puzzles reduces one’s risk of cognitive decline or if cognitive decline reduces the likelihood that one will complete crossword puzzles. In addition, this study does not address the protective effects of cognitive activity for incident dementia or Alzheimer’s disease. While there is a large body of the literature examining the beneficial effects of cognitive activity in reducing dementia risk (e.g., [36]), the studies included in the current investigation were based on normal cognitive aging, and individuals with dementia diagnoses were excluded from the present analysis.

What these results impart, however, is that regardless of the causal mechanisms underlying these changes, the associations between cognitive activity and cognitive outcomes in this study are in directions that are intuitively and scientifically consistent with prior literature. This fact, coupled with the large-scale naturalistic, observational design of this study, lends credence to the burgeoning literature that directly examines the causal effect of cognitive activity on cognitive outcomes. Extension of this work in populations at great risk for dementia, or with individuals already diagnosed with neurodegenerative diseases, remains a worthwhile goal.


The research was supported in part by the Integrative Analysis of Longitudinal Studies of Aging (IALSA) research network (NIA AG026453, S. M. Hofer and A. M. Piccinin, PIs) and the Conference on Advanced Psychometric Methods in Cognitive Aging Research (NIA R13AG030995, D. M. Mungas, PI). Dr. Mitchell is supported by a VA Advanced Fellowship in Geriatrics through the New England Geriatric Research Education and Clinical Center (GRECC). Dr. L. E. Gibbons was supported by a Grant from the NIH (AG05136, Murray Raskind, PI). Drs. A. Atri, S. D. Shirk and M. B. Mitchell were supported by NIA Grant AG027171 (A. Atri, PI). Dr. M. Lindwall was supported by the Swedish National Centre for Research in Sports (CIF). The LBLS was funded by NIA Grants AG10569 and AG00037 (E. M. Zelinski, PI). The Octo-Twin study was funded by NIA AG08861 (B. Johansson, PI). SLS was funded by the National Institute of Child Health and Human Development (HD00367, 1963–1965; HD04476, 1970–1973) and the National Institute of Aging(AG00480, 1973–1979; AG03544, 1982–1986;AG04470, 1984–1989; AG08055, 1980–2006;AG027759,2006–2008; currently AG024102, 2005–2015; S. L. Willis, PI). The VLS is currently funded by NIA grant AG008235 (R. A. Dixon, PI). Tina L. Huang and Nadar Fallah participated in early work on the VLS portion of the paper. Finally, and most importantly, the authors express our deep gratitude for the commitment of the study participants across all four studies, without whose generous contribution and dedication this research would not be possible. The contents of this study do not represent the views of the Department of Veterans Affairs or the United States Government. The authors report no conflict of interests with the present study. M. B. Mitchell, C. R. Cimino, and A. Benitez shared first authorship.