In mean-based approaches to dietary data analysis, it is possible for potentially important associations at the tails of the intake distribution, where inadequacy or excess is greatest, to be obscured due to unobserved heterogeneity. Participants in the upper or lower tails of dietary intake data will potentially have the greatest change in their behavior when presented with a health behavior intervention; thus, alternative statistical methods to modeling these relationships are needed to fully describe the impact of the intervention. Using data from Tu Salud ¡Si Cuenta! (Your Health Matters!) at Home Intervention, we aimed to compare traditional mean-based regression to quantile regression for describing the impact of a health behavior intervention on healthy and unhealthy eating indices. The mean-based regression model identified no differences in dietary intake between intervention and standard care groups. In contrast, the quantile regression indicated a nonconstant relationship between the unhealthy eating index and study groups at the upper tail of the unhealthy eating index distribution. The traditional mean-based linear regression was unable to fully describe the intervention effect on healthy and unhealthy eating, resulting in a limited understanding of the association.

1. Introduction

Many health behavior interventions focus on positive lifestyle changes in the areas of increasing physical activity and healthy diets. Adopting these behavior changes can prevent or reduce the negative health consequences of obesity in minority US populations. Mexican Americans are particularly prone to physical inactivity and poor diets because of lack of fruit and vegetable consumption compared to Non-Hispanic Whites [1, 2]. Despite research showing poorer dietary intake than other ethnic groups, within the Mexican American population there is heterogeneity in healthy and unhealthy food intake [3].

Dietary intake data is typically measured using self-report tools and individual food intake is aggregated into compositional data or patterns to describe overall diets. When the dietary data are analyzed using mean-based approaches, such as ordinary least squares (OLS) regression, potentially important relationships with disease risk at the lower and upper levels of the distribution could be obscured due to unobserved heterogeneity. Participants in the upper or lower tails of dietary intake data, where inadequacy or excess is greatest, will theoretically have the greatest change in their behavior when presented with a health behavior intervention; thus, alternative statistical methods to modeling these relationships are needed to fully describe the impact of the intervention. This is particularly notable in certain populations, such as Mexican Americans, where variation in factors such as acculturation and language influence food choices and adherence to traditional and western diet patterns [36].

As an alternative to mean-based regression, quantile regression (QR) was developed by Koenker and Bassett and has primarily been used in the fields of risk management and business [7]. Quantile regression has been extended for handling longitudinal data based on different approaches that account for serial correlations within a subject and has been used as an important alternative to mean-based regression approaches because of its flexibility for modeling nonnormal data, or heterogeneous conditional distributions [8]. QR can model the conditional distribution of the response, not only on the conditional mean, giving the research critical insights when valuable information lies in the tails. Despite QR being computationally intensive and not equipped to handle small data sets, it is more robust to outliers than mean-based regression, where estimates of the conditional mean can be strongly influenced by outliers.

Application of QR to health and behavioral sciences is increasing and could be a valuable statistical tool for health researchers. QR has been used to evaluate the effects of physical activity or dietary intake on varying quantile levels of certain variables, such as BMI [912], waist circumference [13], socioeconomic status [14], and risk factors of disease outcomes including health-related scores and biomarker data [1519]. A limited number of studies have introduced a QR-based approach specifically applied to behavioral data [2022]. Yet, there is limited research focusing on how to use and apply QR results to improve behavioral interventions and maintenance of behavior change over time by possibly addressing the upper and lower tails of the population distribution differently.

The goal of this review was to compare traditional mean-based linear regression with QR through the illustration of their applications to real data from the behavioral intervention study aimed at improving healthy eating and to demonstrate the usefulness of QR in fully describing the relationships.

2. Linear Quantile Mixed Effect Regression

Let be the measurement for the -th subject () at time , then we define a linear mixed effect regression model aswhere is a vector of covariates at , is an unknown vector of regression parameters, and the correlation among the observations within the i-th subject is induced by the subject-level residuals, i.e., vector and an associated vector for random effect variables. The error term can be defined as , where random errors for individual records, , are independent of each other. We assume that linear quantile mixed models are determined based on the asymmetric Laplace distribution (ALD) [23], which has a good performance on data generated from many error distributions, and a relationship with the L1-norm objective function [7]. Let a response variable be an ALD, denoted , then we can define a probability density function,where is the skewness parameter, is the location parameter, is the scale parameter, and a loss function represents the contribution by residuals . Assuming the location parameter is , a quantile regression model related to the -th quantile of a response variable , conditional on and , has the form: where is a vector of quantile-specific regression parameters corresponding to the coefficient in a linear regression model (1) and ~, which are also dependent on . The objective function for for fixed τ is expressed asWe can estimate quantile-specific regression parameters that minimize the objective function above. As we assume ~, ALD is determined as a scale mixture of normal distribution based on Laplace distribution with the skewness parameter that is treated here as a quantile level. Then a likelihood for at -th quantile can be expressed as If is considered a nuisance parameter, then the maximization of this likelihood above is equivalent to the minimization of the objective function of quantile regression (4) defined above. More details regarding estimation process are available elsewhere [8].

3. Example

3.1. Tu Salud ¡Si Cuenta! (Your Health Matters!) at Home Intervention

The behavioral data used in the current study were from the Tu Salud ¡Si Cuenta! (Your Health Matters!) at Home Intervention. One of the main objectives of this randomized control trial was to increase participant intake of healthy foods and decrease unhealthy food intake through exposure to community health workers delivering a behavioral modification intervention. The study was conducted in the Texas Rio Grande Valley area and included participants who were Mexican American adults, aged 18-75 years, and enrolled in the ongoing Cameron County Hispanic Cohort [1, 24]. Participants were randomly selected and randomized into either the intervention or standard care group from June 2010 to April 2013. The intervention group received up to six monthly community health work home visits in the first 6 months of the intervention, which included lifestyle change education, motivation, and support. No other intervention elements, other than that equivalent to the standard care group, were offered during the last 6 months of the trial. The standard care group participants were potentially exposed to a community-wide physical activity and healthy diet campaign across the 12 months. Data were collected at baseline, 6- and 12-month follow-ups.

Participants completed a dietary intake questionnaire that asked if yesterday they had eaten 20 commonly and culturally appropriate foods and how many times with the following responses available: no, once, twice, three times, four times, and five or more times [25, 26]. Responses were summed into Healthy and Unhealthy Eating Indices (HEI and UNHEI, respectively). The HEI score was comprised responses to the 10 healthy food items (baked or grilled fish, turkey or chicken; eggs; beans; fruit; fruit juice; orange vegetables; other vegetables; salad; whole grain breads; and whole grain cereals) with a possible response range from 0 to 50. The UNHEI was composed of the responses to the 9 unhealthy food items (baked goods; french fries or chips; fried meat; frozen desserts; red and processed meats; nonchocolate candy; regular sodas; sweetened or sports drinks; and white bread) with a possible range from 0 to 45 [27]. Both HEI and UNHEI scores appeared to be well-approximated by a normal distribution.

3.2. Quantile Regression and Mean-Based Regression

To assess intervention effect on healthy and unhealthy eating, a multivariable longitudinal QR and mean-based model were conducted based on the linear mixed effect model equation below.where the index score, , can be either the HEI or UNHEI measurement for the -th participant () at visit and a binary variable for study group (=1 if intervention) and and are dummy variables for two follow-up visits, i.e., month 6 (visit 2) and month 12 (visit 3), respectively. Interaction terms between study group and follow-up visits were included in the model to obtain estimates of the intervention effect at each time point. is a vector of a set of potential confounders that were adjusted for in the model (i.e., gender, age, diabetes, marital status, years in school, employment status, type of insurance, generation, and preferred language) and is an associated parameter vector. We also considered a random intercept by including an error term for the -th subject. We used lqmm R package [8] for QR models and SAS proc mixed for mean-based models.

3.3. Results

There were 500 participants randomized to either the standard care or intervention groups, n=250 respectively. At baseline, the mean HEI score was 6.6 (standard deviation (SD)=3.3) for the standard care group and 6.9 (SD=3.5) for the intervention group. The mean UNHEI score for the standard care group was 5.4 (SD=3.4) and for the intervention group was 5.6 (SD=3.6).

Results from QR and mean-based regression are presented in Figure 1. The red line indicates estimated beta coefficients based on mean-based model for the effect of the study group at each time point, showing slight differences (i.e., beta coefficient <0.4) in mean HEI and mean UNHEI between intervention and standard care groups at baseline and follow-ups.

With regard to HEI, the results for QR and mean-based regression do not substantially differ. In contrast, the QR results indicate a nonconstant relationship between unhealthy eating and study groups at the upper tail of the distribution of the UNHEI. At baseline, the association between the distribution of UNHEI scores and study groups is not constant, as the intervention group is more likely to be in the upper tail of the UNHEI distribution at the start of the study. At month 6, the effect of the intervention is inconsistent across the UNHEI distribution. For example, at the upper tail of the UNHEI distribution the intervention group had higher UNHEI scores, yet around the quantile level =0.05 and 0.75 the intervention group reported lower UNHEI scores than the control group. The strength of the association in the upper tail of the distribution is attenuated at 6 months compared to baseline. More strikingly, at the 12-month follow-up QR suggests that there is an increase in unhealthy food intake in intervention group compared to control for the participants in the upper tail of the UNHEI data distribution.

4. Discussion

Mean-based regression results showed minimal differences in the healthy eating index at any visit between intervention and standard care groups, likewise for the unhealthy eating index. These results would lead a researcher to incorrectly assume that the intervention failed to increase intake of healthy foods or decrease unhealthy food intake or possibly conclude that the reasons for the lack of change might not be due to the intervention itself but to information bias or environmental changes in the community based intervention.

In contrast, the results of the QR highlight a different relationship between the study groups and outcomes. The estimated coefficients were not constant across the distribution of the UNHEI outcome at baseline and follow-ups. These results may indicate a baseline imbalance in the UNHEI outcome, which under mean-based regression would have not been identified, and approaches to adjust for the imbalance should be considered. Likewise at the 6-month follow-up, the protective effect of the intervention would have also been ignored using mean-based methods. The QR results for the unhealthy index at the 12-month follow-up identified an inconsistent relationship between study group and UNHEI. At the lower tail of UNHEI, the intervention was protective, then this relationship reversed at the upper tail of UNHEI. Overall, there was little difference in the UNHEI between intervention and standard care groups, except at the upper tail of the UNHEI distribution. This indicates purely mean-based approach may not be appropriate for evaluating the effect of the intervention on dietary uptake behaviors in populations with unobserved heterogeneity.

5. Conclusions

The traditional mean-based linear regression was unable to fully describe the relationship between healthy and unhealthy eating and the intervention, resulting in a limited understanding of the intervention effect. Use of quantile regression identified a different relationship by modeling the coefficients across the distribution of the outcome resulting in a more complete picture of the association. These findings from the quantile regression results could be applied towards developing more effective behavioral intervention trials in heterogeneous populations.


The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH CTSA or NIMHD or UTCO.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


The authors would like to acknowledge and thank Dr. Belinda Reininger for her guidance and support on demonstrating the statistical approaches through the behavioral data observed from the intervention program, the Tu Salud ¡Si Cuenta! (Your Health Matters!) at Home Intervention, which was supported by the UT Health Clinical and Translational Science Award (UL1 TR000371), NIH/National Institute on Minority Health and Health Disparities (MD000170 P20), and the Texas Department of State Health Services funding for University of Texas Community Outreach (UTCO). The authors would like to recognize the support provided by the Biostatistics/Epidemiology/Research Design (BERD) component of the Center for Clinical and Translational Sciences (CCTS) at the UT Health Science Center at Houston, which is mainly funded by the NIH Centers for Translational Science Award (NIH CTSA), grant UL1 RR024148.