Computational and Mathematical Methods in Medicine

Volume 2017 (2017), Article ID 7518035, 16 pages

https://doi.org/10.1155/2017/7518035

## Estimating and Interpreting Effects from Nonlinear Exposure-Response Curves in Occupational Cohorts Using Truncated Power Basis Expansions and Penalized Splines

^{1}Department of Mathematics and Statistics, American University, Washington, DC, USA^{2}Occupational Science & Technology, University of Wisconsin-Milwaukee, Milwaukee, WI, USA

Correspondence should be addressed to Elizabeth J. Malloy; ude.nacirema@yollam

Received 31 January 2017; Revised 25 April 2017; Accepted 16 May 2017; Published 20 September 2017

Academic Editor: Ruisheng Wang

Copyright © 2017 Elizabeth J. Malloy et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Truncated power basis expansions and penalized spline methods are demonstrated for estimating nonlinear exposure-response relationships in the Cox proportional hazards model. R code is provided for fitting models to get point and interval estimates. The method is illustrated using a simulated data set under a known exposure-response relationship and in a data application examining risk of carpal tunnel syndrome in an occupational cohort.

#### 1. Introduction

The Cox proportional hazards (PH) model is frequently used to model survival data or time-to-event data, particularly in the presence of censored survival times [1]. The hazard, or instantaneous risk, of an event of interest, typically mortality or morbidity, is modeled in terms of one or more explanatory variables relative to an unspecified baseline hazard rate. This hazard ratio (HR) for the outcome—often interpreted as a type of relative risk—is the effect of interest and may be used in epidemiological studies for risk assessment. In occupational settings, it is common to have an occupational exposure as one of the explanatory variables in the model and the association between the outcome and this exposure is of interest. In this case, the HR, or its logarithm, may be referred to as the exposure-response relationship. The focus is thus on estimation of and inferences for this exposure-response relationship. Nonlinear exposure-response relationships do arise in the analysis of occupational cohorts [2–7]. An attenuation of the HR at the highest exposures has been well documented [8] and interpretation of nonlinear exposure-response relationships is useful in epidemiological risk assessment [9]. Methods for modeling nonlinearities are needed in those situations when a linear exposure-response is not expected or when one desires to formally assess a nonlinear association.

Consider an occupational cohort with individuals on which the time until a given health event of interest, , is measured. These times may be right censored if the individual did not have the event of interest during the study time. This is denoted by an indicator variable,* c*_{i}, which takes the value of 1 if the individual had the event and 0 if the time is censored. The general form of the Cox PH model for a single covariate iswhere is the hazard function, is the corresponding quantitative exposure variable, is the baseline hazard function, and is the regression coefficient. In this form, the logarithm of the hazard ratio (HR) is linear, , and the exposure-response relationship is described as linear (on the log-scale). The HR for a given exposure* x *is , where is interpreted as a multiplicative effect when comparing the hazard (or risk) at exposures one unit apart.

A nonlinear exposure-response relationship can be modeled by including a transformation of* x*_{i} in the model:where is a known function. Many user-specified choices exist for this functional form, such as exposure categories and algebraic functions [10]. These methods generally require user input for exposure category cut-points or the algebraic expression, such as a logarithmic transformation of the exposure variable, . An alternative to this type of specification is to use methods which do not impose a priori shape or categorical constraint on the exposure-response relationship. Examples of such “smoothing” methods are polynomial regression splines [11] and penalized splines [12]. One criticism of smoothing methods is their lack of interpretable parameters [13], such as the regression coefficient. Nevertheless, interpretable estimates (i.e., HR) with corresponding confidence intervals can be found directly from the fitted model, even when using smoothing methods. We illustrate this interpretation and the use of these methods (regression and penalized splines) and compare them to exposure categories and standard algebraic forms in the context of occupational physical exposure analyses.

This manuscript provides a detailed introduction to modeling and interpreting nonlinear exposure-response curves using these spline functions. We assume familiarity with the Cox PH model and survival data. The remainder of the paper is structured in three sections. Section 2 gives the theoretical Cox proportional hazards model for spline-based estimates of nonlinear exposure-response associations. These methods are simultaneously explained and illustrated using a simulated data set under a known nonlinear exposure-response relationship. The section ends with an examination of the interpretation of the estimated HR using point estimates and pointwise confidence intervals. Section 3 gives an application in which we examine the nature of the association between job physical demands and incidence of carpal tunnel syndrome (CTS) in an occupational cohort of 569 individuals previously analysed by Garg et al. [14]. Final discussion and comments are in Section 4. An Appendix contains additional theoretical details for estimation and inferences. The R software [15] code is available from the corresponding author.

#### 2. The Cox Proportional Hazards Model for a Nonlinear Exposure-Response Relationship

##### 2.1. Splines and the Cox Proportional Hazards Model

In the Cox PH model in (2), we use a basis expansion representation of the exposure-response function based on a linear combination of known basis functions, , There is vast literature on using basis functions in linear models and there are many options for selecting basis functions to use. The text by Ruppert et al. [16] provides many nice examples. A simple basis for a linear exposure-response relationship would consist of the single function. For a quadratic association, the basis functions are and . This can be extended to a polynomial of degree* p* by using the* p* basis functions . Note that we omit the unit basis function, which corresponds to the intercept term in the model, because in the Cox PH model setting the intercept is subsumed by the unspecified baseline hazard function. Estimates in the Cox PH model are relative to the unspecified baseline hazard.

To provide flexibility in capturing local features in the exposure-response curve, polynomial spline terms may also be used as basis functions. A spline function is a function, typically a polynomial, defined on a subinterval of the range of exposures. Splines allow for estimation of the exposure-response relationship using a piecewise-defined curve. They are generally considered to provide more flexibility in estimating nonlinear relationships than polynomials or other algebraic functions. To define a piecewise linear curve over four regions in which the slope changes from region to region, we would use a set of basis functions consisting of the functions , where are exposure values at which the slope changes and are called “knots.” These are user-specified values, similar in spirit to categorical cut-points where changes in the response occur. The “+” subscript notation indicates the function is equal to the expression given in parentheses when that expression is positive. That is, . In this way, a nonlinear association can be estimated by fitting the model in (2) with . The standard maximum partial likelihood method yields estimates of the coefficients, giving an estimated ln(HR) of . Higher order (degree) polynomials can also be used by expanding the set of basis functions to include all polynomial terms up to degree* p* and then* K *degree* p* spline functions, defined using* K* knots: . This set is called the truncated power basis of degree* p* [16] and allows for smoother exposure-response estimates as functions formed from linear combinations of these basis functions have continuous derivatives. With small to moderate numbers of knots, a standard Cox PH model can be fit to estimate the nonlinear exposure-response curve.

As an illustration, we simulated a data set of individuals whose exposure-response relationship shows an attenuation at the highest exposures; see Figure 1. Specifically, on the log-scale, the true is a quadratic function with a maximum at an exposure of units. These data were generated using the method described in Bender et al. [17] and Malloy et al. [18]. The exposure variable was set so that approximately 13% of individuals were unexposed. With this exposure distribution (displayed in Figure 1) and the corresponding true exposure-response relationship, approximately 16% of individuals are cases. Survival times were left skewed with the median case survival time approximately 17 time-units and the median for noncases about 20 time-units. To give a sense of how survival varies with exposure in this simulated data set, prior to fitting the Cox PH models, we created five equally spaced exposure categories and found the estimated survival functions using the Kaplan-Meier estimate using the survival package [19] in R. The five exposure categories were a baseline group with no exposure (approximately 13% of observations), those with exposures between 0 and 5 (approximately 46% of observations), between 5 and 10 (32%), between 10 and 15 (8%), and above 15 (1%). Figure 1(c) shows the estimated survival functions for these five exposure categories. The baseline/no exposure group has the highest survival rates while the highest exposed group has the lowest survival rates, up until a survival time of about 15 time-units, at which point the highest exposed group overlaps with the 10- to 15-exposure group. This is consistent with the generating model, in which there is a drop in the logarithm of the hazard ratio for these highest exposed individuals (Figure 1(a)).