Abstract

Background. Health literacy, the set of skills for locating, understanding, and using health-related information, is associated with various health outcomes through health behaviors and health care service use. While health literacy has great potential for addressing health disparities stemming from the differing educational attainment in diverse populations, knowledge about subpopulations that share the same risk factors is useful. Objective. This study employed a logistic regression tree algorithm to identify subpopulations at risk of limited health literacy in Canadian adults. Design. The nationally representative data were derived from the International Adult Literacy and Skills Survey (n = 20,059). The logistic regression tree algorithm splits the samples into subgroups and fits logistic regressions. Results. Results showed that the subpopulation comprised of individuals 56 years and older, with household income less than $50,000, no participation in adult education programs, and lack of reading activities (i.e., newspaper, books) was at the greatest risk (82%) of limited health literacy. Other identified subgroups were displayed in an easily interpreted tree diagram. Conclusions. Identified subpopulations organized in tree diagrams according to the risk of limited health literacy inform not only intervention programs targeting unique subpopulations but also future health literacy research.

1. Identifying At-Risk Subpopulations of Canadians with Limited Health Literacy

Health literacy (HL) describes the set of abilities to locate, understand, and use health-related information to make better health decisions throughout one’s life [1]. Along with rapidly advancing medical/health sciences and changing health care systems, being able to obtain timely health-related information is critical for health maintenance/promotion [2]. Limited HL is prevalent in many adult populations and is directly and indirectly associated with lack of knowledge about health/diseases, miscommunication in healthcare settings, and adverse health behaviors (e.g., infrequent preventive healthcare service use and sedentary lifestyle) and, in turn, poorer health outcome [24]. Nearly half of the adult populations in developed nations like the US and Canada have insufficient HL skills [5, 6]. Improving HL is one strategy suggested for addressing health disparities [7].

The Rootman and Ronson HL conceptual model [8] depicts predictors of HL and theoretical pathways between HL and health. In this model, community-/societal-level (e.g., community development and public policy) and individual-level (e.g., aging and primary language) HL determinants are called “Actions” and “Determinants,” respectively. HL may directly influence one’s health, for instance, due to misunderstanding/misuse of medications [9]. At the same time, HL may influence health through various well-known health determinants including healthcare service use, health and disease management behaviors (e.g., smoking), economic resource access, and living environments [1014]. Additionally, previous studies found that social factors such as access to quality education early in life and continuous opportunities for continuous literacy improvement (e.g., work-related literacy tasks) and health literacy proficiency are interrelated [15, 16]. In short, the Rootman and Ronson HL conceptual model suggests the complex associations of demographic, socioeconomic and geographic factors on health and, in turn, emphasizes the diversity of limited HL risk factors as well as theoretical pathways to health.

The health disparities research model suggests that identification of populations at risk of adverse health should be the first step in public health research [17]. Generally, public health intervention programs can be categorized as either a homogeneous intervention program for a large target population (i.e., population approach) or as a tailored program for each individual at risk (i.e., individual-at-high-risk approach) [18, 19]. However, a review of recent public health interventions suggests that both approaches have not yet made significant impacts on health disparities [20, 21]. Arguably, lack of attention to the intrinsic variability in populations in the population approach and resource-/labor-intensiveness in the individual-at-high-risk approach partially explain some unsuccessful interventions. A reasonable compromise between the population and individual intervention targeting options would be to target at-risk subpopulations identified by a series of risk factors.

Guided by the Rootman and Ronson HL conceptual model (2005) and the health disparities research framework [17], this study identifies subpopulations at risk of limited HL in Canadian adults using the International Adult Literacy and Life Skills Survey data (described later). Specifically, the following research question is addressed: what characteristics define subpopulations at high risk of limited HL in the Canadian adult population? While a combination of bivariate analyses may identify some reasonable subpopulations, more insight is likely to be provided by simultaneously considering a large collection of potentially relevant predictors of limited HL.

2. Methods

2.1. Data

The data come from the International Adult Literacy and Skills Survey (IALSS). IALSS data include a nationally representative sample of Canadian adults ( ,059) and provide measures in literacy skills; the data also include information regarding demographic, socioeconomic, lifestyle, and health conditions [22]. The definition of each literacy domain is described elsewhere [23]. HL measure is derived from a subset of health-related prose literacy, document literacy, and numeracy test items. The score ranges from zero to 500. The respondents were categorized according to their HL scores into 5 levels (level 1 (very poor skills) = 0–225; level 2 (poor skills) = 226–275; level 3 (adequate skills) = 276–325; level 4 (Strong skills) = 326–375; level 5 (strongest skills) = 376–500) [6, 24]. The literacy measurement tool was jointly developed by the Educational Testing Service (ETS) in the US and Statistics Canada and is comparable to other major national-level adult literacy assessments such as the National Assessment of Adult Literacy (NAAL) and the International Adult Literacy Survey (IALS), assessment tools also developed by the same organization (ETS) [25].

2.2. Outcome Variable

The outcome variable is a dichotomous variable coded as [1] if individuals have the lowest category of HL, limited HL (level 1) defined in the IALSS, and (0) if individuals have the basic, intermediate or proficient HL skills (levels 2–5). Assuming that limited HL translates into higher risk of adverse health outcomes we believe is an appropriate emphasis. In our preliminary analyses (results now shown) which are to compare the respondents with limited HL level 1 to those with all other levels (each level separately), those with limited HL level 1 had significantly lower scores in all widely used physical health indicators and most of mental health indicators except two emotional problems indicators (i.e., feeling depressed or anxious; did not do work or other activity as carefully as usual) than those with upper level HL skills (i.e., SF-12; 26). In this regard, focusing on level 1 as the highest risk group is empirically supported by a variety of health indicators.

2.3. Predictor Variables

The model includes predictor variables that reflect “determinants” factors from the Rootman and Ronson model (2005). We focus on the “determinants” factors due to unavailability of “actions” or community/society-level factors in the IALSS. In this model, the determinants include “Education,” “Early child development,” “Aging,” “Personal capacity,” “Living/working condition,” “Gender,” and “Culture.” Although the variables from the IALSS may not always reflect each component, this study preserves the terminologies in the Rootman and Ronson model.

Education. College education is an indicator variable recording whether the respondents completed a college degree. Adult education records whether the respondents attended any postsecondary education or training including any educational/training program, course, private lessons, workshops, and so forth in last 12 months. The following three variables are used to reflect the respondents’ everyday literacy-related activities. Reading Newspapers records whether a respondent reads newspapers at least once a week. Watching TV records the hours per day a respondent watches TV, video, or DVD: categories (1) 1 hour or less, (2) over 1 hour to 2 hours, (3) more than 2 but less than 5 hours, and (4) 5 or more hours. Reading Books records whether a respondent reads books at least once a week.

Early Child Development. English as first language learned records whether the first language a respondent learned was English. English as a primary language records whether the primary language spoken at home was English when a respondent grew up.

Aging. Age group records the six age groups (age 16–25; 26–35; 36–45; 46–55; 56–65; 66 and older) according to the respondents’ age at the time of the survey in 2003. Age in years was not available for all IALSS respondents.

Personal Capacity. Self-rated health records whether a respondent reported his/her health is (5) excellent, (4) very good, (3) good, (2) fair, or (1) poor. While health status is not a common indicator of personal capacity including literacy skills, several studies found that health status indicators can serve as a surrogate measure of chronic conditions (e.g., diabetes and disease-caused visual impairment) that are negatively associated with one’s HL [26, 27]. Emotional problem records whether the respondent reported any emotional problems (e.g., depressed feeling and feeling anxious) during the past 4 weeks. Life satisfaction records whether a respondent reported his/her life satisfaction as (5) extremely satisfied, (4) satisfied, (3) neither satisfied nor unsatisfied, (2) unsatisfied, or (1) extremely unsatisfied.

Living/Working Condition. Household Income records the respondent’s annual household income (converted to the US dollar unit) as (1) $25,000 or less, (2) $25,001 to $50,000, (3) $50,001 to 90,000, and (4) more than $90,000. Employment records whether a respondent was working at least part-time at the time of the study. Urban area indicates whether a respondent lived in an urban area. Region records one of the five regions where the respondents reside and includes Atlantic, Quebec, Ontario, West, and North. Health insurance was not taken into account due to the universal health care coverage in Canada. Also, marital status information was not available in IALSS. Student records whether a respondent was a student (including work programs).

Gender. Female records if a respondent is female or not.

Culture. Immigrants records whether a respondent was an immigrant. Born in Canada records whether a respondent reported that they were born in Canada. Immigrants indicates whether a respondent was born outside of Canada and moved to Canada at some point of their lives. A value = 1 for Immigrants also identifies individuals who become Canadian citizens by naturalization.

2.4. Statistical Methods

This study employs a logistic regression tree approach, which has several advantages over commonly used traditional methods such as bivariate testing and logistic regression in epidemiologic research [28]. Compared to standard logistic regression, the logistic regression tree approach allows a more flexible model structure, including the incorporation of complex interactions, for describing how predictor variables impact the odds of limited HL. This approach implements a method that identifies subpopulations with similar HL and then allows for the fitting of a logistic regression model with potentially different predictor variable for each subpopulation. In addition, this method does not require imputation or case-wise deletion to deal with missing values.

2.5. Logistic Tree with Unbiased Selection—LOTUS

This study employs the logistic regression tree algorithm named Logistic Tree with Unbiased Selection (LOTUS) [29]. LOTUS is designed specifically to deal with binary responses (limited HL versus not limited HL in this study) and has advantages (e.g., unbiased split variable selection) over other classification and regression tree algorithms such as CART [30] and CHAID [31]. LOTUS can be considered a combination of classification trees and logistic regression. LOTUS merges the desirable capabilities of these two methods in such a way that logistic regressions are fit in recursively split subdata (i.e., subpopulations).

To determine the final size (i.e., number of subpopulations) of the logistic regression tree, LOTUS employs an overfit-and-prune approach. The recursive process of splitting nodes continues until no further splits are possible due to either homogeneity (e.g., a subgroup has 100% limited HL) or a lack of sufficient numbers of cases. Obviously, such a complete, unpruned tree provides an excellent fit to the provided data but may not generalize well to new observations. To prune back the overgrown tree in this study, 10-fold cross-validation using the 1-SE rule is used as described by Breiman et al. [30]. More technically, detailed descriptions of LOTUS are provided in Chan and Loh [29] and Loh [32]. The LOTUS algorithm version 2.3 was applied for the IALSS respondents to identify subgroups at high risk of limited HL.

3. Results

Table 1 includes descriptive summaries of candidate predictor variables for the individuals with and without limited HL. Approximately 29% of the respondents had limited HL, the lowest, Level 1 category of HL. As expected, a series of bivariate comparisons showed that higher education, learning/reading activities (e.g., adult education, reading newspapers), English as the first language, younger age, higher income, employment, and nonimmigrants and being born in Canada are more common in individuals with greater HL skills than in individuals with limited HL.

The results of the LOTUS analysis are summarized in a tabular form listing all identified subpopulations (Table 2) ordered from highest percentage to lowest percentage of limited HL. In addition, a graphical display of these subpopulations is presented in tree diagrams (Figure 1(a) (left half of the diagram) and Figure 1(b) (right half of the diagram)). The pathway to the highest risk groups is in bold in Figures 1(a) and 1(b). The respondents aged 56 and older with household income less than $50,000 who read books less than once a week, read newspapers less than once a week, and report no adult education program participation last 12 months were at the highest risk (81.8%) of limited HL (see Figure 1(b)).

Among each subpopulation (i.e., terminal nodes or rectangular boxes in the tree diagram) identified according to the risk of limited HL, further within-subpopulation differences were detected by simple logistic regression models; 5 out of 15 subpopulations had household income as the predictor variable in the simple logistic regression models. Also, a noteworthy interaction effect between reading newspapers and having a college education on the odds of limited HL was detected in the subgroup of people aged 56 and older with household income less than $50,000 who read books less than once a week and report no participation in adult education programs in the last 12 months. In this particular subpopulation, the respondents who read a newspaper more than once a week had 0.18 times odds of having limited HL when they had college education compared to those who did not. On the other hand, respondents who read a newspaper less than once a week had 0.29 times the odds of having limited HL when they had college education compared to those who did not. In other words, college education was associated with lower chance of limited health literacy in both of these subpopulations although the effect of college education varied between individuals who read a newspaper more than once a week and those who did not.

4. Discussion

Building on the HL conceptual model [8] and the public health disparities research model [17], this study employs a logistic regression tree analysis and identifies the subpopulations at risk of having limited HL in the Canadian adult population aged 16 years and older. The proportions of limited HL in the identified subpopulations range from 12% to 82% (see Figures 1(a) and 1(b)). Approximately 82% of the subpopulation of Canadian adults aged 56 years and older with household income less than $50,000 who read books and newspapers less than once a week and report no participation in adult education last 12 months exhibited limited HL, the highest percentage among all 15 subpopulations identified.

Combinations of demographic and other relevant factors clearly identified subpopulations at risk of limited HL. The identified risk factors can be roughly classified into two groups: (1) the demographic/socioeconomic factors of age and household income; (2) everyday learning activities: reading books, participation in adult education programs, and reading newspapers. These risk factors are also reflected in the “Determinants” domain of Rootman and Ronson’s model. Although the observational data in this study do not allow causal inference, a lack of regular learning activities may further increase the risk of limited HL in particular subgroups (e.g., older age and low income) with individuals who are already at greater risk [12]. At the same time, it is also possible that some aging-related physical or cognitive declines and/or a lack of resources (e.g., time for reading and money to buy new books) might influence opportunities for learning activities and, in turn, resulted in limited HL [33, 34].

In addition, earlier educational attainment (lack of college education was the best predictor of limited HL in the highest risk subpopulation) is likely to determine one’s socioeconomic status (e.g., income and occupation) and put those with lower educational attainment are put at risk of limited HL [3538]. Arguably, from a long-term perspective, lack of literacy skills (e.g., health literacy) could impact social status determinants like educational attainment and employment, and vice versa. Therefore, individuals with poor initial literacy skills may experience cumulative disadvantages over the life course [15, 39]. As suggested in Rootman and Ronson’s model, formal education contributes to one’s general literacy as well as other types of literacy skills (e.g., scientific literacy) and, in turn, HL skills. In this regard, life-long (i.e., mostly occupation-related formal programs) and life-wide (i.e., informal education programs) learning activities such as participation in adult education programs and reading newspapers/books might have acted as mediators of the association between limited HL and possible risk factors, particularly after participation in formal schooling ends [15]. Indeed, those who reported frequent reading/learning activities were less likely to have limited HL.

It should be noted that the national level efforts to address limited HL have been made by the governments and national organizations last few decades [40, 41]. In most cases, the health systems (e.g., healthcare providers and health organizations) provide customized materials and additional services to individuals with limited HL. Such “downstream” approach (i.e., assisting existing health illiterates) is a critical part of national health goals because even individuals with mid-level HL skills often face difficulties in healthcare settings [42]. Especially, unfamiliar health-related literacy tasks may be challenging. At the same time, efforts to prevent limited health literacy (i.e., “upstream” approach) are necessary for sustainable long-term HL interventions as more than half of Canadian adults may have difficulties with health-related literacy tasks [6]. Results of this study could inform not only interventions for existing limited HL prevalence but also for preventing future health illiterates. HL and/or literacy education interventions in earlier lives are key for successful prevention strategies considering the identified risk factors like older age and lack of learning activities. As one’s health needs and necessary health knowledge change over the life course, identified risk factors in this study may help target potential health illiterates. Finally, a more health literate population may require less case management and direct HL education which could translate into health care cost savings.

4.1. Implications

The findings from this study have several implications for future research and practice to improve HL. First, this analysis identifies subpopulations with higher risk of limited HL; these findings can guide resource allocations for interventions from public health practitioners and heath policy makers. Targeting the highest risk subgroups for interventions is an efficient and, hopefully, effective alternative. Second, the information about multiple risk factors and their relationships with each other in the tree diagrams provide useful insights for designing intervention programs. For instance, for individuals who are older and in lower income households, HL intervention programs need to be sensitive to their needs including familiar communication styles and logistics (e.g., transportation) [4345]. For example, given the identified reading-related risk factors, multiple interventions such as offering a series of educational opportunities and/or making reading activities more accessible to the high risk groups may improve one’s literacy and ultimately health literacy [46]. Additionally, the risk factor of older age suggests that intervention programs/materials should be accessible (e.g., larger font and familiar examples) to mid-aged to older adults with limited health literacy. On a related note, careful examination of tree diagrams generates possible hypotheses for future inquiries. Arguably, each subpopulation has different processes that lead individuals to limited HL. Therefore, more detailed subpopulation analyses could improve understanding about etiological pathways to limited HL.

Third, logistic regression tree algorithms like LOTUS can be applied for various other populations such as community members, hospital patients, and individuals with diseases (e.g., diabetes and arthritis) and data routinely collected by political boundaries [28]. Finally, the identified subpopulations in order of the risk of limited HL (see Table 2) need to be systematically compared in future research. Such a comparative study may further advance practically relevant HL research. Also, the subpopulation strategy may be further tailored to reflect local demographic characteristics (e.g., race and ethnicity), culture, politics, and education systems. In short, while the findings in this study are not conclusive, they will guide current practice and research related to limited HL prevalence that is a serious public health concern by proposing a novel approach for identifying subpopulations.

4.2. Limitations

Possible omitted variable bias might exist due to the information availability in the IALSS data. Other types of education measurement, cognitive skills, more detailed community-level information (i.e., “Actions” domain in Rootman and Ronson’s model) besides regions, health care utilization, and other health conditions are important to be included in future data collection. Also, although the IALSS is arguably one of only few nationally representative datasets that provides methodologically sound estimate of health literacy skill levels, the evaluation of specific level has not been established, and therefore, the results may need to be verified in future research. Moreover, the statistical method (LOTUS) employed in this study has some advantages over traditional methods, but LOTUS could still benefit from several features for further improvement. Such options might include more flexible options for split point selection of a continuous variable (instead of specific percentile points), the capability to incorporate survey sampling weights, the provision of additional model fit indices (e.g., c-statistic), and features to analyze longitudinal data. Finally, the logistic regression tree approach does not replace traditional methods but adds new analytic capabilities.

5. Conclusion

This study identified specific subpopulations that share the same characteristics and are at risk of limited HL. Specifically, older Canadian adults (56 years and older) earning less than $50,000 household income who read infrequently and who do not participate in adult education programs are at the highest risk of having limited HL. As demonstrated in this study, the logistic regression tree algorithm (LOTUS) generates an easily interpreted tree diagram and provides insights such as splitting points and/or potential interaction effects of key risk factors. In addition, possible hypotheses about distinct associations and etiological relationships between limited HL and risk factors across age groups should be on the future research agenda. The findings in this study, particularly the idea of a subpopulation approach, provide promising leads for targeting future intervention programs to address the issue of highly prevalent limited HL in adult populations.