Abstract

Under cohort sampling designs, additional covariate data are collected on cases of a specific type and a randomly selected subset of noncases, primarily for the purpose of studying associations with a time-to-event response of interest. With such data available, an interest may arise to reuse them for studying associations between the additional covariate data and a secondary non-time-to-event response variable, usually collected for the whole study cohort at the outset of the study. Following earlier literature, we refer to such a situation as secondary analysis. We outline a general conditional likelihood approach for secondary analysis under cohort sampling designs and discuss the specific situations of case-cohort and nested case-control designs. We also review alternative methods based on full likelihood and inverse probability weighting. We compare the alternative methods for secondary analysis in two simulated settings and apply them in a real-data example.

1. Introduction

Cohort sampling designs are two-phase epidemiological study designs where information on time-to-event outcomes of interest over a followup period and some basic covariate data are collected on the whole first-phase study group, referred to as a cohort, and in the second phase, more expensive or difficult-to-obtain additional covariate data are collected only on a subset of the study cohort. This usually comprises the cases, that is, individuals with a disease event of interest during the followup, and a randomly selected subset of noncases. Examples are the case-cohort [1โ€“3] and nested case-control [4, 5] designs. Primarily, such designs are applied for the purpose of studying associations between the time-to-event outcomes and the covariates collected in the second phase. However, with such data having been collected, an interest frequently arises to reuse it for studying associations between the second-phase covariates and the other available covariate data. For instance, the covariates collected in the second phase could be genotypes, while the other covariates may be various phenotype measurements carried out at the outset of the followup period for the whole cohort. The interest would then be to explain a phenotypic response with the genetic covariates. Following Jiang et al. [6] and Lin and Zeng [7], we refer to such a situation as secondary analysis. Here, we concentrate specifically on non-time-to-event secondary outcomes. Analysis of secondary time-to-event outcomes under the nested case-control design has been considered previously by Saarela et al. [8] and Salim et al. [9].

As our motivating example, we consider here a single cohort which was used in a larger meta-analysis of association between the European lactase persistence genotype and body mass index (BMI) [10], the latter being a secondary outcome in the cohort study in question. The cohort consists of 5073 men aged 55โ€“77 years from southern and western Finland, who originally formed the placebo group of the ATBC cancer prevention study [11]. Whole blood samples of the participants were taken between 1992 and 1993, which is here considered as the baseline of the cohort, with followup for cardiovascular disease events and all-cause mortality available until the end of year 1999. There is no loss to followup, so the only censoring present is of type I due to end of the followup period. This cohort is a part of MORGAM project, an international pooling of cardiovascular cohorts [12]. Genotype data (including the lactase persistence SNP rs4988235) under this project have been collected under a case-cohort design described in detail by [13] and herein in Section 4.3.1. Given such data, our aim is to estimate the association between the lactase persistence genotype and BMI making use of genotype data collected on both the random subcohort and cases of all-cause mortality.

Secondary analysis of case-control data has been studied previously, using profile likelihood [14], inverse selection probability weighting methods [15โ€“17], or retrospective likelihood [6, 7]. However, to the best of our knowledge, a systematic discussion on secondary analysis under cohort sampling designs has been lacking, which we will aim to rectify here by discussing alternative approaches for such an analysis under a generic two-phase study design. We will briefly review the full likelihood approach which utilizes all observed data (Section 2), as well as pseudolikelihoods based on inverse selection probability weighting (Section 3). For these approaches, we propose a conditional likelihood-based alternative (Section 4), restricted to the fully observed second-phase study group. Conditional likelihood inference under cohort sampling designs has been studied previously for the analysis of the primary time-to-event outcome by Langholz and Goldstein [18] and Saarela and Kulathinal [19]; here, we extend these methods to the secondary analysis setting. The main interest is in continuous secondary outcomes, though the approach would also be valid for categorical responses. As special cases of the general setting, we consider case-cohort and nested case-control designs. As extensions to the basic setting, we consider treatment of missing second-phase covariate data and adjustment for left truncation in the case of incident time-to-event outcomes (Section 5). In Section 6, we present two simulation studies, first comparing the efficiencies of the alternative approaches and then demonstrating the potential adverse effects of small sampling fraction in full likelihood inference. We also carry out the analysis in the real-data example using all three alternative methods. As the model for the continuous secondary response variable, in addition to the customary normal distribution, we consider more flexible model specifications, thus aiming to incorporate residual-based model fit diagnostics into the model itself. While other generalizations for the normal distribution have been proposed, we adopt here the four-parameter normal-polynomial quantile mixture [20], which includes the normal distribution as a special case.

2. Notation, Assumptions, and Full Likelihood

To cover our motivating example and also the general case, we introduce first some notation. Let the set ๐’žโ‰ก{1,โ€ฆ,๐‘} represent the individuals in the cohort. Primary outcome in the study is a time-to-event outcome characterized by random variables (๐‘‡๐‘–,๐ธ๐‘–), ๐‘–โˆˆ๐’ž, where ๐‘‡๐‘– corresponds to event time and ๐ธ๐‘– to the event type of individual ๐‘–, with ๐ธ๐‘–=0 indicating a censoring event and ๐ธ๐‘–=1 a death due to any cause. Extension to incident nonfatal outcomes and multiple outcome types is considered separately in Section 5.2. A secondary outcome of interest ๐‘Œ๐‘–, here BMI, is observed on all study participants at the outset of the study. In addition, there may be other covariates ๐‘‹๐‘–, available on all ๐‘–โˆˆ๐’ž, relevant to be included in the analysis. In the present example, these comprise only the age at the start of the followup. Additional covariate data (here the lactase persistence genotype) ๐‘๐‘– are collected only on the second-phase study group ๐’ชโ‰ก{๐‘–โˆถ๐‘…๐‘–=1}โŠ†๐’ž, specified by the inclusion indicators ๐‘…๐‘–โˆˆ{0,1}, analogously to the survey response/nonresponse setting of Rubin [21]. We will henceforth use vector notations of the type ๐‘๐’ชโ‰ก{๐‘๐‘–โˆถ๐‘–โˆˆ๐’ช} to represent data obtained on different subsets of individuals. We will not make a distinction between random variables and their realized values in the notation when this is clear from the context. The observed data as a whole are then represented as (๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž,๐‘๐’ช). We are interested in the model ๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ) for the secondary outcome, more precisely, the parameter describing the association between ๐‘Œ๐‘– and ๐‘๐‘–, which in our case correspond to BMI and the genotypic covariate.

We assume that the first-phase sampling mechanism has been unconfounded in the sense of Rubin [21, page 36], so that we may ignore the first-phase sampling mechanism in all subsequent analyses. This means that the cohort recruitment (possibly through survey sampling) and possible nonresponse depend only on the ๐‘‹๐‘–-covariates; then if all subsequent analyses are conditional on ๐‘‹๐‘–, the selection mechanism may be ignored. In contrast, the second phase sampling may be outcome dependent; the second-phase sampling mechanism is specified by the joint probability distribution for the set of indicator variables ๐‘…๐’ž. This is assumed to be unconfounded with ๐‘๐’ž, that is, ๐‘…๐’žโŸ‚(๐‘๐’ž,๐œƒ)โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž. What follows could be further generalized by assuming the sampling mechanism to be ignorable so that ๐‘…๐’žโŸ‚(๐‘๐’žโงต๐’ช,๐œƒ)โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž,๐‘๐’ช, but since most common cohort sampling mechanisms go under the former assumption, and it will simplify the exposition, we will proceed with that. In addition, we assume the random vectors (๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–,๐‘๐‘–) either to be (infinitely) exchangeable over unit indices ๐‘– (cf. [21, page 40]), or equivalently, conditionally independent given the collection of all relevant parameters ๐œƒ. It should be noted that the exchangeability assumption needs not to be extended to the inclusion indicators ๐‘…๐‘–. Now, following, for example, Saarela et al. [8], we can write a full (observed data) likelihood expression๐‘ƒ๎€ท๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ,๐œƒ(๐›ผ,๐›ฝ,๐›พ)โˆ๎‘๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–โˆฃ๐‘‹๐‘–๎€ธร—๎‘,๐›พ๐‘–โˆˆ๐’žโงต๐’ช๎€œ๐‘ง๐‘–๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ฝ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ.,๐›พ(2.1) In the secondary analysis situation, only the parameters ๐›ฝ are of interest, while ๐›ผ and ๐›พ remain present as nuisance parameters. There are potential drawbacks related to the above likelihood expression; it requires integration over the unobserved covariate data in the set ๐’žโงต๐’ช (which may be large compared to ๐’ช), as well as modeling of the population distribution of ๐‘๐‘–. If possible, we would like to avoid this due to the required estimation of the nuisance parameters ๐›พ and the risk of model misspecification. Furthermore, observed data likelihoods may become sensitive to misspecification of the model for the response variable; the missing data can act similarly to extra parameters, and the actual model parameters may lose their intended interpretation. This is a real problem especially in cohort sampling designs with a rare event of interest, since the proportion of uncollected covariate data in the study cohort may then be very high. We demonstrate such a situation with simplified simulation example in Section 6.2.

3. Methods Based on Inverse Probability Weighting

Valid estimates for the parameters of the secondary outcome model could alternatively be obtained by using inverses of the first-order inclusion probabilities ๐‘ƒ(๐‘…๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž) (assumed here to be known) as weights (e.g., [22, 23]). The weighted log-likelihood function, approximating the corresponding complete data log-likelihood, to be used for the secondary analysis is๎“๐‘–โˆˆ๐’ช๎€ท๐‘Œlog๐‘ƒ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐›ฝ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž๎€ธ=๎“๐‘–โˆˆ๐’ž๐Ÿ{๐‘…๐‘–=1}๎€ท๐‘Œlog๐‘ƒ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐›ฝ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž๎€ธ.(3.1) In a typical cohort sampling design, all cases would receive unit weights, while noncases selected to the set ๐’ช would receive weights greater than one, inverse to their probability to be included in the set of controls/subcohort. While this kind of weighting results in unbiased estimation of the parameters of interest, it will potentially result in reduced efficiency compared to the full and conditional likelihood approaches, since in (3.1) cases receive smaller weights compared to noncases irrespective of whether the case status is actually associated to the secondary outcome or the covariate of interest, and thus the information in these observations may not be fully utilized. We will demonstrate this in a simulation study in Section 6.1. Theoretical justification for estimating function (3.1), as well as variance estimation is discussed in Appendix A. A variation of the above Horvitz-Thompson [24] type of weighting would be poststratification-based estimation (e.g., [15, 25]), with the stratification carried out over the relevant first-phase variables.

4. Conditional Likelihood Inference under Cohort Sampling Designs

4.1. Definition

A very general definition for conditional likelihood is given by Cox and Hinkley [26, pages 16-17] and Cox [27, page 269] as follows. Let ๐‘ˆ be a random vector corresponding to all observed data, and let this be partitioned into relevant subsets ๐‘ˆ=(๐‘‰,๐‘Š), the transformation not depending on unknown parameters ๐œƒ. The conditional likelihood given the realization ๐‘‰=๐‘ฃ is then the conditional probability distribution ๐‘ƒ(๐‘Šโˆˆ๐‘‘๐‘คโˆฃ๐‘‰=๐‘ฃ,๐œƒ) with respect to ๐œƒ. Few generally accepted rules for choosing the partitioning can be found from the literature, one of these being conditioning on an ancillary statistic or something close to that in order to eliminate nuisance parameters (e.g., [28, 29]). In contrast, although working under the same general definition, here we condition on a sampling mechanism, in order to restrict the analysis into a subgroup for which a useful likelihood expression can be written. Generally such a conditioning may lose information on ๐œƒ, but will nevertheless produce valid estimates. The rules we set out for choosing the partitioning to construct a conditional likelihood under the two-phase study setting are as follows. (1) Condition on the sampling mechanism, that is, the set of inclusion indicators ๐‘…๐’ž, which produced the second-phase study group ๐’ช onto which the analysis is to be restricted. Do not condition on any further information on the sampling mechanism, such as inclusion in the subcohort or the set of controls, since this can easily lead to overconditioning which will lose a lot of information. In our notation, such additional information on the sampling mechanism is implicitly included in ๐‘Š and, given the assumptions stated in Section 2, will cancel out of the resulting likelihood expression. (2)Other observed variables may be placed into ๐‘‰ or ๐‘Š at will, depending on the parameters of interest. For instance, if the parameters ๐›พ are not of interest, we may condition on ๐‘๐’ช by placing it in ๐‘‰. If the parameters ๐›พ need to be estimated, ๐‘๐’ช may be placed in ๐‘Š.(3)We must have ๐‘ƒ(๐‘Šโˆˆ๐‘‘๐‘คโˆฃ๐‘‰=๐‘ฃ,๐œƒ)๐‘ƒ(๐‘‰โˆˆ๐‘‘๐‘ฃโˆฃ๐œƒ)=๐‘ƒ(๐‘ˆโˆˆ๐‘‘๐‘ขโˆฃ๐œƒ), that is, all the relevant observed variables must either be modeled or conditioned upon.

Applying these conditioning rules will reproduce the conditional likelihood expressions obtained previously in special cases of the current framework by Langholz and Goldstein [18] and Saarela and Kulathinal [19]. The proposed conditional likelihood framework also includes the familiar retrospective likelihood, often suggested for analysis under case-control designs (e.g., [30, 31]), as a special case (see Appendix B). Whereas Langholz and Goldstein [18] considered only the special case of logistic likelihoods, here we aim to first derive the likelihood expressions in the general case before substituting in any specific models.

4.2. Conditional Likelihood Expression: The General Case

Following the stated rules, and making the same general assumptions as in Section 2, we proceed by partitioning the observed data into relevant subsets ๐‘Šโ‰ก(๐‘‡๐’ช,๐ธ๐’ช,๐‘Œ๐’ช) and ๐‘‰โ‰ก(๐‘…๐’ž,๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช) and, using a shorthand notation of ๐‘„๐‘–โ‰ก(๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–) for all outcome variables, work with a conditional likelihood ๐‘ƒ๎€ท๐‘„๐’ชโˆฃ๐‘…๐’ž,๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๐‘ƒ๎€ท๐‘…,๐œƒ๐’žโˆฃ๐‘„๐’ž,๐‘‹๐’ž,๐‘๐’ช๎€ธ๐‘ƒ๎€ท๐‘„,๐œƒ๐’ž,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ๐‘ƒ๎€ท๐‘„,๐œƒ๐’žโงต๐’ช,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ,๐œƒ๐œƒโˆ๐‘ƒ๎€ท๐‘„๐’ž,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ๐‘ƒ๎€ท๐‘„,๐œƒ๐’žโงต๐’ช,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ,,๐œƒ(4.1) where the proportionality follows from the assumption on unconfounded second-phase sampling mechanism. In Appendix C, we show that such a conditional likelihood indeed has the properties of a likelihood, that is, a score function with zero mean and variance equal to the Fisher information. Asymptotic normality of the maximum likelihood estimators based on the conditional likelihood follows obviously from these results in the special case of Bernoulli sampling (Section 4.3.1), although in the general case, this may require further assumptions on the sampling mechanism, such as asymptotic independence of the inclusion indicators ๐‘…๐‘–.

The ratio of the numerator and the second term in the denominator can be further written as ๐‘ƒ๎€ท๐‘„๐’ž,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘„๐’žโงต๐’ช,๐‘๐’ชโˆฃ๐‘‹๐’ž๎€ธ=โˆ,๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘„๐‘–,๐‘๐‘–โˆฃ๐‘‹๐‘–๎€ธโˆ,๐œƒ๐‘–โˆˆ๐’žโงต๐’ชโˆซ๐‘ง๐‘–๐‘ƒ๎€ท๐‘„๐‘–,๐‘๐‘–โˆˆ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ,๐œƒโˆ๐‘–โˆˆ๐’ชโˆซ๐‘ž๐‘–๐‘ƒ๎€ท๐‘„๐‘–โˆˆ๐‘‘๐‘ž๐‘–,๐‘๐‘–โˆฃ๐‘‹๐‘–๎€ธโˆ,๐œƒ๐‘–โˆˆ๐’žโงต๐’ชโˆซ๐‘ง๐‘–๐‘ƒ๎€ท๐‘„๐‘–,๐‘๐‘–โˆˆ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ=๎‘,๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘„๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ.,๐œƒ(4.2) Here, the product forms follow from the exchangeability assumption for the random vectors (๐‘„๐‘–,๐‘‹๐‘–,๐‘๐‘–). Thus, we have obtained ๐‘ƒ๎€ท๐‘„๐’ชโˆฃ๐‘…๐’ž,๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=โˆ,๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘„๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ,๐œƒ,(4.3) where the numerator factors into the parametric models ๐‘ƒ(๐‘„๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐œƒ)=๐‘ƒ(๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–,๐›ผ)๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ). The denominator is the conditional likelihood correction term (sometimes called ascertainment correction, e.g., Ma et al. [32]). Its specific form depends on the second-phase sampling mechanism, and in the general case, it does not reduce into a product form. It depends on the model parameters, since it is not conditioned on all of (๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’ž). Generally, the challenge in conditional likelihood correction terms is in representing them in terms of parameters estimable from the data. Here, the term can be written as ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๎€œ,๐œƒ๐‘ž๐’ช๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’ž,๐‘‹๐’ž๎€ธ๐‘ƒ๎€ท๐‘„๐’ชโˆˆ๐‘‘๐‘ž๐’ชโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ,๐œƒ,(4.4) where ๐‘ƒ(๐‘„๐’ชโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช,๐œƒ) is given by (4.2), and we have ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๎€œ,๐œƒ๐‘ฆ๐‘–โˆถ๐‘–โˆˆ๐’ช๎€œ๐‘ก๐‘–โˆถ๐‘–โˆˆ๐’ช๎“๐‘’๐‘–โˆถ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’ž๎€ธร—๎‘๐‘–โˆˆ๐’ช๎€บ๐‘ƒ๎€ท๐‘‡๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–.,๐›ฝ๎€ธ๎€ป(4.5) Here, the first term inside the integral specifies the sampling mechanism and is assumed to be known. The obtained likelihood expression utilizes all observed data on covariates ๐‘๐‘–, and it can be easily seen that if there is no association between (๐‘‡๐‘–,๐ธ๐‘–) and ๐‘๐‘– and between ๐‘๐‘– and ๐‘‹๐‘–, we lose no information relevant to learning about the association between ๐‘Œ๐‘– and ๐‘๐‘–, when the conditional likelihood (4.3) is used instead of the corresponding full likelihood (2.1). This was also demonstrated in the simulation study of Saarela and Kulathinal [19]. On the other hand, if the other observed data do give significant information on the unobserved covariate values ๐‘๐’žโงต๐’ช, efficiency could potentially be improved by using the full likelihood which utilizes all observed data, with the cost of having to specify a model for ๐‘๐‘–. In Section 6.1, the efficiencies of expressions (2.1) and (4.3) are compared in a simulated setting.

4.3. Special Cases: Case-Cohort and Nested Case-Control Designs
4.3.1. Case-Cohort/Bernoulli Sampling

Here, we are mainly interested in a variation of the โ€œefficient case-cohort designโ€ suggested by Kim and De Gruttola [33, pages 155-156] as an alternative to sampling within strata. To improve efficiency in sampling of the subcohort, here the distribution of some key covariates (in the present notation either ๐‘‹๐’ž or both (๐‘‹๐’ž,๐‘Œ๐’ž)) in the subcohort is approximately matched to that of the cases by first fitting a logistic regression model, say, logit{๐‘ƒ(๐ธ๐‘–=1โˆฃ๐‘‹๐‘–,๐œ‡)}=๐œ‡โ€ฒ๐‘‹๐‘–, and then selecting the subcohort using Bernoulli sampling with the probabilities ๐œ‹๐‘–โ‰ก1/(1+exp{โˆ’๎๐œ‡โ€ฒ๐‘‹๐‘–}), independently of the case status. We then have ๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’ž,๎๐œ‡)=๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–,๎๐œ‡), where in practice, we make the approximation ๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–,๎๐œ‡)โ‰ˆ๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–,๐œ‡) and will subsequently suppress the sampling mechanism parameters from the notation. The selection probabilities may also be rescaled to give a desired expected subcohort size ๐‘š as ๐œ‹โˆ—๐‘–โ‰ก๐‘š๐œ‹๐‘–/โˆ‘๐‘–โˆˆ๐’ž๐œ‹๐‘–โ‰ˆ๐‘š๐œ‹๐‘–/(๐‘๐‘ƒ(๐ธ๐‘–=1)). More generally, the subcohort selection probability ๐œ‹๐‘– may be any known function of (๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–), that is, ๐œ‹๐‘–โ‰ก๐œ‹(๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–), with the sampling design specified by ๐‘ƒ(๐‘…๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’žโˆ)=๐‘–โˆˆ๐’ž๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–โˆ)=๐‘–โˆˆ๐’ž๐œ‹(๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–)๐‘…๐‘–(1โˆ’๐œ‹(๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–))1โˆ’๐‘…๐‘–. The special case of Bernoulli sampling is of interest, because here the product of the first-order inclusion probabilities specifies the joint distribution of the inclusion indicators (see, e.g., [34, pages 62-63]), which considerably simplifies the analysis using conditional likelihood. In the case of stratified without replacement sampling, the following could only be interpreted as a first-order approximation when the sampling fractions are small. In the Bernoulli sampling case, the conditional likelihood correction term reduces into the product form ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๎‘,๐œƒ๐‘–โˆˆ๐’žโงต๐’ช๐‘ƒ๎€ท๐‘…๐‘–=0โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–๎€ธร—๎€œ๐‘ฆ๐‘–โˆถ๐‘–โˆˆ๐’ช๎€œ๐‘ก๐‘–โˆถ๐‘–โˆˆ๐’ช๎“๐‘’๐‘–โˆถ๐‘–โˆˆ๐’ช๎‘๐‘–โˆˆ๐’ช๎€บ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘ก๐‘–,๐‘’๐‘–,๐‘ฆ๐‘–,๐‘‹๐‘–๎€ธ๎€ท๐‘‡ร—๐‘ƒ๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ๎€ธ๎€ป(๐›ผ,๐›ฝ)โˆ๎‘๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎€œ๐‘ก๐‘–๎“๐‘’๐‘–๎€บ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘ก๐‘–,๐‘’๐‘–,๐‘ฆ๐‘–,๐‘‹๐‘–๎€ธ๎€ท๐‘‡ร—๐‘ƒ๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–.,๐›ฝ๎€ธ๎€ป(4.6) Let โ„ฐโ‰ก{๐‘–โˆถ๐ธ๐‘–=1} denote the set of individuals who died during the followup, and let ๐‘‘โ‰ก|โ„ฐ|. In the case-cohort design discussed in Kulathinal and Arjas [35], Kulathinal et al. [13], and Saarela and Kulathinal [19], all cases are selected to the case-cohort set, so that ๐‘ƒ(๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–=1,๐‘Œ๐‘–,๐‘‹๐‘–)=1, ๐‘–โˆˆโ„ฐ, while to increase the efficiency of the design, the subcohort is selected with probabilities which depend on the age ๐‘๐‘– at the start of the followup (included in the ๐‘‹๐‘– covariates) through a logistic model as discussed above, giving the inclusion probability for non-cases as ๐‘ƒ(๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–=0,๐‘Œ๐‘–,๐‘‹๐‘–)=๐œ‹(๐‘๐‘–), ๐‘–โˆˆ๐’žโงตโ„ฐ. Under Bernoulli sampling, the realized sample size ๐‘›โ‰ก|๐’ช| is random, with the expected sample size in the present example given by ๐ธ(๐‘›โˆฃ๐ธ๐’ž,๐‘‹๐’žโˆ‘)=๐‘‘+๐‘–โˆˆ๐’žโงตโ„ฐ๐œ‹(๐‘๐‘–). In the case of a mortality outcome and type I censoring at predetermined times ๐‘๐‘–, with the observed time given by ๐‘‡๐‘–๎‚๐‘‡โ‰กmin(๐‘–,๐‘๐‘–), where ๎‚๐‘‡๐‘– is the latent time of death, (4.6) further simplifies into (see also [19, pages 12-13])๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ,๐œƒ(๐›ผ,๐›ฝ)โˆ๎‘๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎€œ๐‘ก๐‘–โˆˆ[0,๐‘๐‘–]๎€บ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘ก๐‘–,๐ธ๐‘–=1,๐‘ฆ๐‘–,๐‘‹๐‘–๎€ธ๐‘ƒ๎€ท๐‘‘๐‘ก๐‘–,๐ธ๐‘–=1โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๎€ท๐‘…,๐›ผ+๐‘ƒ๐‘–=1โˆฃ๐‘ก๐‘–,๐ธ๐‘–=0,๐‘ฆ๐‘–,๐‘‹๐‘–๎€ธ๐‘ƒ๎€ท๐‘‘๐‘ก๐‘–,๐ธ๐‘–=0โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ท,๐›ผ๎€ธ๎€ปร—๐‘ƒ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ=๎‘,๐›ฝ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎€บ๐‘ƒ๎€ท0โ‰ค๐‘‡๐‘–<๐‘๐‘–,๐ธ๐‘–=1โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎€ท๐‘‡๐‘–โˆˆ๐‘‘๐‘๐‘–,๐ธ๐‘–=0โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๐‘ƒ๎€ท,๐›ผ๎€ธ๎€ป๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ=๎‘,๐›ฝ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎‚ƒ๐‘ƒ๎‚€๎‚๐‘‡0โ‰ค๐‘–<๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎‚๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎‚€๎‚๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๐‘ƒ๎€ท,๐›ผ๎‚๎‚„๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ=๎‘,๐›ฝ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎‚ƒ๎€ท๎€ท๐‘1โˆ’1โˆ’๐œ‹๐‘–๐‘ƒ๎‚€๎‚๐‘‡๎€ธ๎€ธ๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๐‘ƒ๎€ท,๐›ผ๎‚๎‚„๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ.,๐›ฝ(4.7) Hence, (4.6) could be represented in terms of the survival function. Suppose now that in addition to the type I censoring due to the end of the followup period, there may be random censoring during the followup; for instance, this may be the case if the outcome in the case-cohort design (๐ธ๐‘–=1) is not all-cause mortality, but, say, mortality due to cardiovascular diseases. Deaths due to other causes, denoted as ๐ธ๐‘–=2, then appear as censoring. As before, let ๎‚๐‘‡๐‘– denote the latent time of death, with ๐ธ๐‘– indicating the type of death or end of followup period. Similarly as above, we get ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๎‘,๐œƒ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎€บ๐‘ƒ๎€ท0โ‰ค๐‘‡๐‘–<๐‘๐‘–,๐ธ๐‘–=1โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎€ท0โ‰ค๐‘‡๐‘–<๐‘๐‘–,๐ธ๐‘–=2โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎€ท๐‘‡๐‘–โˆˆ๐‘‘๐‘๐‘–,๐ธ๐‘–=0โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๐‘ƒ๎€ท,๐›ผ๎€ธ๎€ป๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ=๎‘,๐›ฝ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎‚ƒ๐‘ƒ๎‚€๎‚๐‘‡0โ‰ค๐‘–<๐‘๐‘–,๐ธ๐‘–=1โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎‚๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎‚€๎‚๐‘‡0โ‰ค๐‘–<๐‘๐‘–,๐ธ๐‘–=2โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎‚๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎‚€๎‚๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๐‘ƒ๎€ท,๐›ผ๎‚๎‚„๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ.,๐›ฝ(4.8) Here, the probabilities ๎‚๐‘‡๐‘ƒ(0โ‰ค๐‘–<๐‘๐‘–,๐ธ๐‘–=๐‘˜โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–,๐›ผ), ๐‘˜โˆˆ{1,2}, are given by the cumulative incidence functions and ๎‚๐‘‡๐‘ƒ(๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–,๐›ผ) by the joint survival function of a competing risks survival model for the two types of death (e.g., [36, pages 251-252]). These are in principle identifiable from the observed data for the set ๐’ช, since the subcohort has been selected independently of the case status and thus will include a number of other deaths. On the other hand, if this number is very small, the middle term in the above sum contributes little to the correction term, and thus unstable estimation of the corresponding parameters does not necessarily hinder the estimation of the parameters of interest.

4.3.2. Risk Set Sampling

Consider now a nested case-control sampling mechanism in which all cases are selected to the case-control set with probability one, and ๐‘š controls per case ๐‘— are selected using without replacement sampling from the risk set โ„›๐‘—โ‰ก{๐‘–โˆถ๐ด๐‘–(๐‘‡๐‘—)=1}โงต{๐‘—} (of size ๐‘Ÿ๐‘—โ‰ก|โ„›๐‘—โˆ‘|=๐‘–โˆˆ๐’ž๐ด๐‘–(๐‘‡๐‘—)โˆ’1), where ๐ด๐‘–(๐‘‡๐‘—) is the at-risk indicator for subject ๐‘– at event time ๐‘‡๐‘—. This is carried out independently for all cases ๐‘—โˆˆโ„ฐ. Let the sampled set of time-matched controls for case ๐‘— be denoted as ๐’ฎ๐‘—โŠ†โ„›๐‘—, some of which may also be future cases since the sampling is carried out without regard to the future case status of the individuals in the risk set. Let ๐’ฎโ„ฐโ‰ก{๐’ฎ๐‘—โˆถ๐‘—โˆˆโ„ฐ} denote the collection of the sampled risk sets and โ‹ƒ๐’ฎโ‰ก๐‘—โˆˆโ„ฐ๐’ฎ๐‘— the pooled set of sampled controls. Noting that knowing all of (๐‘‡๐’ž,๐ธ๐’ž) specifies the risk sets โ„›โ„ฐโ‰ก{โ„›๐‘—โˆถ๐‘—โˆˆโ„ฐ} as well as the order of the events, the risk set sampling mechanism is specified by the joint probability distribution ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’ž๎€ธ๎€ท๐‘…=๐‘ƒ๐’žโˆฃโ„›โ„ฐ๎€ธ=๎“,โ„ฐ๐’ฎโ„ฐโˆถ||๐’ฎ๐‘—||=๐‘š,๐‘—โˆˆโ„ฐ๐‘ƒ๎€ท๐‘…๐’žโˆฃโ„›โ„ฐ,๐’ฎโ„ฐ๎€ธ๐‘ƒ๎€ท๐’ฎ,โ„ฐโ„ฐโˆฃโ„›โ„ฐ๎€ธ=๎“,โ„ฐ๐’ฎโ„ฐโˆถ||๐’ฎ๐‘—||=๐‘š,๐‘—โˆˆโ„ฐ๐Ÿ{๐’ฎโˆชโ„ฐ=๐’ช}๎‘๐‘—โˆˆโ„ฐ๐‘ƒ๎€ท๐’ฎ๐‘—โˆฃโ„›๐‘—๎€ธ=๎‘๐‘—โˆˆโ„ฐ1๎€ท๐‘Ÿ๐‘—๐‘š๎€ธ๎“๐’ฎโ„ฐโˆถ||๐’ฎ๐‘—||=๐‘š,๐‘—โˆˆโ„ฐ๐Ÿ{๐’ฎโˆชโ„ฐ=๐’ช}.(4.9) If the sampling has been carried out within strata specified by the covariates ๐‘Œ๐’ž and/or ๐‘‹๐’ž through some function ๐‘”(๐‘‹๐‘–,๐‘Œ๐‘–), the risk sets would be defined within strata as โ„›๐‘—={๐‘–โˆถ๐ด๐‘–(๐‘‡๐‘—)=1,๐‘”(๐‘‹๐‘–,๐‘Œ๐‘–)=๐‘”(๐‘‹๐‘—,๐‘Œ๐‘—)}โงต{๐‘—}, and the above reasoning would apply similarly with the redefined risk sets. The conditional likelihood correction term now becomes ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๎€œ,๐œƒ๐‘ฆ๐‘–โˆถ๐‘–โˆˆ๐’ช๎€œ๐‘ก๐‘–โˆถ๐‘–โˆˆ๐’ช๎“๐‘’๐‘–โˆถ๐‘–โˆˆ๐’ช๎‘๐‘—โˆˆโ„ฐ1๎€ท๐‘Ÿ๐‘—๐‘š๎€ธ๎“๐’ฎโ„ฐโˆถ||๐’ฎ๐‘—||=๐‘š,๐‘—โˆˆโ„ฐ๐Ÿ{๐’ฎโˆชโ„ฐ=๐’ช}ร—๎‘๐‘–โˆˆ๐’ช๎€บ๐‘ƒ๎€ท๐‘‡๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,,๐›ฝ๎€ธ๎€ป(4.10) which does not reduce into a product form. Exact numerical evaluation of this term would require enumeration of all possible combinations of sampled risk sets of size ๐‘š which would result in the observed case-control set ๐’ช. This is unlikely to be feasible in practice with realistic sample sizes, which is why consideration of approximations may be necessary. Samuelsen [23] showed the inclusion indicators under risk set sampling to be asymptotically pairwise uncorrelated, with the first-order inclusion probabilities for the noncases ๐‘–โˆˆ๐’žโงตโ„ฐ given by ๐‘ƒ(๐‘…๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’žโงต{๐‘–},๐ธ๐‘–โˆ=0)=1โˆ’๐‘—โˆˆโ„ฐ[1โˆ’๐‘š๐ด๐‘–(๐‘‡๐‘—)/๐‘Ÿ๐‘—]. Thus, it might be tempting to replace (4.9) with โˆ๐‘–โˆˆ๐’ž๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’ž). However, the properties of such an approximation will require further research.

5. Extensions

5.1. Missing Second-Phase Covariate Data

Further issue to be considered in practical applications is possible missing covariate data within the set ๐’ช. For instance, if the covariates to be collected under the cohort sampling design are genotypes, after selection of subjects to be genotyped, it may turn out that the DNA amount or concentration is too low or the genotyping is otherwise unsuccessful. Let ๐‘€๐’ชโ‰ก{๐‘€๐‘–โˆถ๐‘–โˆˆ๐’ช} be a set of indicator variables indicating whether the measurement of ๐‘๐‘– turned out to be unsuccessful after selection of subject ๐‘– into the set ๐’ช, and let โ„ณโ‰ก{๐‘–โˆถ๐‘€๐‘–=1}โŠ†๐’ช. Now ๐‘๐‘– is observed only on the set ๐’ชโงตโ„ณ. We consider the implications of this separately for all of the three approaches discussed above.

5.1.1. Full Likelihood

If the missingness can be assumed missing at random, that is, ๐‘€๐’ชโŸ‚(๐‘โ„ณ,๐œƒ)โˆฃ๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž,๐‘๐’ชโงตโ„ณ, the missingness indicators can again be included in the proportionality constant, and the full likelihood expression becomes ๐‘ƒ๎€ท๐‘…๐’ž,๐‘€๐’ช,๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘๐’ชโงตโ„ณโˆฃ๐‘‹๐’ž๎€ธ,๐œƒ(๐›ผ,๐›ฝ,๐›พ)โˆ๎‘๐‘–โˆˆ๐’ชโงตโ„ณ๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–โˆฃ๐‘‹๐‘–๎€ธร—๎‘,๐›พ๐‘–โˆˆ๐’žโงต๐’ชโˆชโ„ณ๎€œ๐‘ง๐‘–๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ฝ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ.,๐›พ(5.1)

5.1.2. Inverse Probability Weighting

If the missingness mechanism would be known (and missing at random), a weighted pseudolikelihood expression for the set ๐’ชโงตโ„ณ could be obtained by multiplying the original selection probabilities by the probabilities of observing the value of ๐‘๐‘– after selection ๎“๐‘–โˆˆ๐’ชโงตโ„ณ๎€ท๐‘Œlog๐‘ƒ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐›ฝ๐‘ƒ๎€ท๐‘€๐‘–=0โˆฃ๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž,๐‘๐’ชโงตโ„ณ๎€ธ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž๎€ธ.(5.2) In practice, however, the missingness mechanism would have to be modeled to obtain estimates for these probabilities.

5.1.3. Conditional Likelihood

By partitioning the observed data into ๐‘Š=(๐‘‡๐’ชโงตโ„ณ,๐ธ๐’ชโงตโ„ณ,๐‘Œ๐’ชโงตโ„ณ) and ๐‘‰=(๐‘…๐’ž,๐‘€๐’ช,๐‘‡๐’žโงต๐’ชโˆชโ„ณ,๐ธ๐’žโงต๐’ชโˆชโ„ณ,๐‘Œ๐’žโงต๐’ชโˆชโ„ณ,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ), we obtain a conditional likelihood ๐‘ƒ๎€ท๐‘„๐’ชโงตโ„ณโˆฃ๐‘…๐’ž,๐‘€๐’ช,๐‘„๐’žโงต๐’ชโˆชโ„ณ,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ๎€ธ,๐œƒ๐œƒโˆ๐‘ƒ๎€ท๐‘€๐’ชโˆฃ๐‘…๐’ž,๐‘„๐’ž,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘€๐’ชโˆฃ๐‘…๐’ž,๐‘„๐’žโงต๐’ชโˆชโ„ณ,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ๎€ธโˆ,๐œƒ๐‘–โˆˆ๐’ชโงตโ„ณ๐‘ƒ๎€ท๐‘„๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ชโˆชโ„ณ,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ๎€ธ,,๐œƒ(5.3) where ๐‘ƒ(๐‘…๐’žโˆฃ๐‘„๐’žโงต๐’ชโˆชโ„ณ,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ,๐œƒ) is obtained from (4.5) by replacing the set ๐’ช with ๐’ชโงตโ„ณ. Unlike the second-phase sampling mechanism, the missingness mechanism is generally unknown. From (5.3), it can be seen that if we are willing to assume that ๐‘โ„ณ are either missing completely at random or that ๐‘€๐’ช depends only on ๐‘‹๐’ž, the terms involving the missingness indicators cancel out of the likelihood. On the other hand, if the missingness may depend on the response variables ๐‘„๐’ž, the missingness mechanism would have to be modeled. However, in such a case, it may be easier to work with the partitioning ๐‘Š=(๐‘€๐’ช,๐‘‡๐’ช,๐ธ๐’ช,๐‘Œ๐’ช,๐‘๐’ชโงตโ„ณ) and ๐‘‰=(๐‘…๐’ž,๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž) and model the population distribution of ๐‘๐‘– rather than trying to estimate parameters describing the missingness mechanism. This in fact corresponds to what was done by Saarela and Kulathinal [19] and was required there because haplotypes are only partially identifiable from unphased genotype data.

5.2. Incident Outcomes and Left Truncation

Previous discussion was specific to a primary mortality outcome using time on study as the main time scale. In this section, we discuss separately how the different methods can accommodate cohort sampling for incident nonfatal primary outcomes. In the analysis of secondary, non-time-to-event outcomes, the presence of left truncation due to exclusion of cases of prevalent disease presents an additional complication. If the parameters of the secondary outcome model correspond to the background population alive at the cohort baseline (rather than to the disease-free population), this additional selection factor requires further adjustment. If the primary outcome is a mortality endpoint, this is not an issue, since then there is no further selection due to prevalent conditions. In likelihood-based adjustment for left truncation, the main time scale of the analysis has to be chosen as age instead of time on study. In survival modeling, it is well known that conditioning on event-free survival until the age at study baseline corresponds to exclusion of the followup time before that (e.g., [37, page 580], and the references therein). However, this is no longer true in the case of missing covariate data (e.g., [8, pages 5997-5998]), or indeed in the analysis of secondary outcomes, as will be demonstrated below. The set notations ๐’ž and ๐’ช are here taken to refer to the disease free cohort and second-phase study group.

It should also be noted that under case-cohort designs it is common to collect second-phase covariate data for more than a single outcome, since the case-cohort design naturally enables the analysis of multiple outcomes using a single subcohort selection. This is also the case in our example cohort discussed in Sections 1 and 6.3, where, in addition to cases of all-cause mortality, genotype data has been collected also on cases of nonfatal incident cardiovascular disease events. To keep the example simple, in the data analysis, we consider only the case-cohort set for all-cause mortality. However, it is in principle straightforward to accommodate multiple outcome types in the likelihood expressions discussed below by using competing risks type notation where ๐‘‡๐‘– denotes the observed time of the first incident disease event, death, or censoring, with ๐ธ๐‘–=0 indicating censoring and ๐ธ๐‘–โˆˆ{1,2,โ€ฆ,๐พ} the different types of outcome events for which the second-phase covariate data is collected. Since utilizing the likelihood expressions does not necessitate estimation of ๐พ different hazard functions, the endpoint definitions may be pooled as seen suitable.

5.2.1. Full Likelihood

The full likelihood expression (2.1) is now conditioned on the selection rule ๐‘‡๐’žโ‰ฅ๐‘๐’žโ‰ก๐‘‡๐‘–โ‰ฅ๐‘๐‘– for all ๐‘–โˆˆ๐’ž, that is, event-free survival until the age at the cohort baseline. The likelihood expression becomes ๐‘ƒ๎€ท๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘๐’ชโˆฃ๐‘‡๐’žโ‰ฅ๐‘๐’ž,๐‘‹๐’ž๎€ธ,๐œƒ(๐›ผ,๐›ฝ,๐›พ)โˆ๎‘๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–โˆฃ๐‘‹๐‘–๎€ธ,๐›พโˆซ๐‘ฆ๐‘–โˆซ๐‘ง๐‘–๐‘ƒ๎€ท๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ผd๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ฝ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธร—๎‘,๐›พ๐‘–โˆˆ๐’žโงต๐’ชโˆซ๐‘ง๐‘–๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ฝ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ,๐›พโˆซ๐‘ฆ๐‘–โˆซ๐‘ง๐‘–๐‘ƒ๎€ท๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ผ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท,๐›ฝ๐‘‘๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ.,๐›พ(5.4)

5.2.2. Inverse Probability Weighting

The weighted pseudolikelihood approach does not readily take into account additional selection which occurs in studies of incident outcomes. This was also pointed out by Reilly et al. [15], who in their discussion note that their reweighting approach for case-control studies is not valid for incident or nested case-control studies, since the incident cases and healthy controls available cannot be reweighted to represent the general population. Although this is true for the basic estimating function (3.1), similarly as in the missing data case, a double weighting mechanism can be devised to account for the additional selection step as ๎“๐‘–โˆˆ๐’ช๎€ท๐‘Œlog๐‘ƒ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐›ฝ๐‘ƒ๎€ท๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘…,๎๐›ผ๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž๎€ธ.(5.5) Since this weighting requires estimation of the time-to-event model parameters ๐›ผ, resampling-based variance estimation would be required to obtain valid standard errors for the estimates of ๐›ฝ to account for the uncertainty in estimation of the weights.

5.2.3. Conditional Likelihood

As in Section 4.2, we get ๐‘ƒ๎€ท๐‘„๐’ชโˆฃ๐‘…๐’ž,๐‘‡๐’žโ‰ฅ๐‘๐’ž,๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=โˆ,๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘„๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘…๐’ž,๐‘‡๐’žโ‰ฅ๐‘๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ,๐œƒ,(5.6) where the correction term becomes ๐‘ƒ๎€ท๐‘…๐’ž,๐‘‡๐’žโ‰ฅ๐‘๐’žโˆฃ๐‘„๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ=๎€œ,๐œƒ๐‘ฆ๐‘–โˆถ๐‘–โˆˆ๐’ช๎€œ๐‘ก๐‘–โˆˆ[๐‘๐‘–,โˆž)โˆถ๐‘–โˆˆ๐’ช๎“๐‘’๐‘–โˆถ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’ž๎€ธร—๎‘๐‘–โˆˆ๐’ช๐‘‡๎€บ๎€ท๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–.,๐›ฝ๎€ธ๎€ป(5.7) In the case-cohort design discussed in Section 4.3.1, with type I censoring, this further simplifies into ๐‘ƒ๎€ท๐‘…๐’ž,๐‘‡๐’žโ‰ฅ๐‘๐’žโˆฃ๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž,๐‘๐’ช๎€ธ,๐œƒ(๐›ผ,๐›ฝ)โˆ๎‘๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎€œ๐‘ก๐‘–โˆˆ[๐‘๐‘–,๐‘๐‘–]๎“๐‘’๐‘–๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘ก๐‘–,๐‘’๐‘–,๐‘ฆ๐‘–,๐‘‹๐‘–๎€ธ๎€ท๐‘‡ร—๐‘ƒ๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ=๎‘,๐›ฝ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎‚ƒ๐‘ƒ๎‚€๐‘๐‘–โ‰ค๎‚๐‘‡๐‘–<๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎‚๎€ท๐‘,๐›ผ+๐œ‹๐‘–๎€ธ๐‘ƒ๎‚€๎‚๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ท๐‘Œ,๐›ผ๎‚๎‚„ร—๐‘ƒ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ=๎‘,๐›ฝ๐‘–โˆˆ๐’ช๎€œ๐‘ฆ๐‘–๎‚ƒ๐‘ƒ๎‚€๎‚๐‘‡๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎‚โˆ’๎€ท๎€ท๐‘,๐›ผ1โˆ’๐œ‹๐‘–๐‘ƒ๎‚€๎‚๐‘‡๎€ธ๎€ธ๐‘–โ‰ฅ๐‘๐‘–โˆฃ๐‘‹๐‘–,๐‘ฆ๐‘–,๐‘๐‘–๎€ท๐‘Œ,๐›ผ๎‚๎‚„ร—๐‘ƒ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ.,๐›ฝ(5.8)

6. Illustrations

6.1. Simulation Study

In order to compare the efficiency of the alternative estimation methods discussed above, namely, full likelihood, conditional likelihood, and weighted pseudolikelihood, we supplemented the cohort data described in Section 1 with a simulated covariate, following the real-data-based simulation approach described by Saarela et al. in [8, pages 5998โ€“6000], in order to have the simulation setting resemble as closely as possible the real data analysis setting of Section 6.3. This Bayesian procedure corresponds to multiple imputation [21], where given the observed data (๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž) and predetermined associations with the primary and secondary outcomes, an additional โ€œmissingโ€ covariate ๐‘๐’ž is simulated from the posterior predictive distribution ๐‘ƒ(๐‘๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž). Case-cohort sampling was then simulated in the complete datasets (๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž,๐‘๐’ž) thus obtained. The objective of this approach, as compared to using completely simulated data, was to obtain simulation results directly relevant to the real study of interest. We included a set of ๐‘=5039 individuals with a BMI measurement (๐‘Œ๐‘–) available and eligible for the case-cohort study. The primary (time-to-event) outcome (๐‘‡๐‘–,๐ธ๐‘–) was taken to be all-cause mortality, with ๐‘‘=996. Given the primary and secondary outcome data and age at baseline (๐‘‹๐‘–โ‰ก๐‘๐‘–) observed for the cohort, and the current parameter values, additional binary covariate values were drawn from the conditional distributions ๐‘ƒ๎€ท๐‘๐‘–=๐‘ง๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–๎€ธ=๐‘ƒ๎€ท๐‘‡,๐›ผ,๐›ฝ,๐›พ๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–=๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ,๐›พโˆ‘๐‘ง๐‘–โˆˆ{0,1}๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘‹๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–=๐‘ง๐‘–โˆฃ๐‘‹๐‘–๎€ธ,,๐›พ(6.1) where the model for the survival outcome, ๐‘ƒ(๐‘‡๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–,๐›ผ)๐›ผโˆ[๐œ†๐‘–(๐‘ก๐‘–)]๐Ÿ๐‘–{๐‘’=1}๐‘†๐‘–(๐‘ก๐‘–), was specified as a proportional hazards model ๐œ†๐‘–(๐‘ข)โ‰ก๐œ†0(๐‘ข)exp{๐›ผ1๐‘๐‘–+๐›ผ2๐‘Œ๐‘–+๐›ผ3๐‘๐‘–}, with time on study as the time scale and age at baseline as a covariate. Here, the the baseline hazard function ๐œ†0 was specified in terms of a piecewise constant function using 15 time bins of equal length over the followup period of seven years. Flat improper priors were used for all parameters (those not fixed to selected values), on log scale for the nonnegative parameters.

Normal model ๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝโˆผ๐‘(๐›ฝ0+๐›ฝ1๐‘๐‘–+๐›ฝ2๐‘๐‘–,๐›ฝ23) was used for the secondary outcome, while the population distribution of the additional covariate was specified as ๐‘ƒ(๐‘๐‘–=๐‘ง๐‘–โˆฃ๐‘‹๐‘–,๐›พ)=๐›พ๐‘ง๐‘–(1โˆ’๐›พ)1โˆ’๐‘ง๐‘–. Here, parameters ๐›ผ3, ๐›ฝ2, and ๐›พ were fixed at selected values while the other parameters were allowed to be determined by the data and integrated out by drawing from their respective full conditional posterior distributions using Markov chain Monte Carlo sampling. Variables ๐‘Œ๐‘– and ๐‘๐‘– were centered but otherwise untransformed. 1000 values for each ๐‘๐‘–, ๐‘–โˆˆ๐’ž, were obtained by running the MCMC sampler for 25000 rounds after a 10000-round burn-in with each set of fixed values of (๐›ผ3,๐›ฝ2,๐›พ) given in Table 1 and saving every 25th state of the chain. In each of the complete datasets obtained by combining the observed data and the simulated covariate values ๐‘๐‘–, case-cohort selection was carried out by matching the age distribution of the subcohort to that of the cases by first fitting a logistic regression model with age as a covariate to the observed survival status, as detailed in Section 4.3.1. The predictive probabilities from the logistic model were scaled to give an expected subcohort size of 1000, and the subcohort was selected using the obtained probabilities in Bernoulli sampling. This procedure gave an expected case-cohort set size of ๐ธ(๐‘›โˆฃ๐ธ๐’ž,๐‘‹๐’ž)=1761.

In fitting models to the datasets so obtained, we used the same model specifications as above, with the exception of the proportional hazards model, where we fitted a โ€œmisspecifiedโ€ Weibull model ๐œ†๐‘–(๐‘ข)โ‰ก(๐›ผ4/๐›ผ0)(๐‘ข/๐›ผ0)๐›ผ4โˆ’1exp{๐›ผ1๐‘๐‘–+๐›ผ2๐‘Œ๐‘–+๐›ผ3๐‘๐‘–}. Even though this is a different model than the one specified for the simulation step, we did not simulate the time-to-event outcome data, which was taken from the example cohort; a comparison between fitted piecewise constant and Weibull hazards (results not shown) indicated that the Weibull model is adequate for modeling these data, accounting for the increasing baseline mortality rate due to ageing of the cohort. Full likelihood function (2.1), conditional likelihood function (4.3), and weighted pseudolikelihood function (3.1) were maximized with respect to the parameters (๐›ผ,๐›ฝ,๐›พ), (๐›ผ,๐›ฝ), and ๐›ฝ, respectively, substituting in the above parametric models. The numerical optimization was carried out using the optim function of the ๐‘… statistical software, applying the BFGS optimization algorithm [38]. The integrals over ๐‘ฆ๐‘– in (4.7) were evaluated using numerical (quadrature) integration, using the gsl_integration_qagi function of GNU scientific library [39]. The standard errors of the maximum likelihood estimators were evaluated by inverting the numerically differentiated Hessian matrix at the maximum likelihood point. The standard errors of the inverse-probability-weighted estimates were evaluated using the robust variance formula of Appendix A with the observed information (A.4) and observed score (A.5). The results from the 1000 replications are presented in Table 1. Here, the parameters (๐›ผ3,๐›ฝ2,๐›พ), corresponding to the simulated covariate, are of main interest; the other parameters reflect the observed data, and thus the Monte Carlo variances for these are not relevant. The standard errors for the corresponding parameters are given in the real-data example of Section 6.3. The results indicate that conditional likelihood estimation gives similar efficiency compared to the corresponding full likelihood estimates irrespective of whether the covariate of interest was associated with the survival outcome. In contrast, both likelihood-based approaches gave better efficiency compared to inverse probability weighting. This is expected, as the weighted pseudolikelihood gives smaller weights to cases, whereas the conditional likelihood weights cases and noncases differentially only to the extent they actually differ. Both the inverse observed information and robust variance estimates agreed well with the Monte Carlo variances.

6.2. Multimodality under Full Likelihood When the Sampling Fraction Is Small

Let now the observed data be only (๐‘…๐’ž,๐‘Œ๐’ž,๐‘๐’ช), and let the sampling mechanism be simple random sampling without replacement; ๐‘ƒ(๐‘…๐’ž๎€ท)=1/๐‘๐‘›๎€ธ, where the sample size ๐‘› is fixed. The specific aim of the following example is to demonstrate the vulnerability of observed data likelihoods, integrated over the missing covariate data on the set ๐’žโงต๐’ช, to misspecification of the response model. The full likelihood expression now becomes ๐‘ƒ(๐‘…๐’ž,๐‘Œ๐’ž,๐‘๐’ชโˆฃ๐œƒ)(๐›ฝ,๐›พ)โˆโˆ๐‘–โˆˆ๐’ช[๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐›ฝ)๐‘ƒ(๐‘๐‘–โˆโˆฃ๐›พ)]๐‘–โˆˆ๐’žโงต๐’ชโˆซ๐‘ง๐‘–๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘ง๐‘–,๐›ฝ)๐‘ƒ(๐‘๐‘–โˆˆ๐‘‘๐‘ง๐‘–โˆฃ๐›พ). Due to the simple random sampling, the corresponding conditional likelihoods simplify into ๐‘ƒ(๐‘Œ๐’ชโˆฃ๐‘…๐’ž,๐‘Œ๐’žโงต๐’ช,๐‘๐’ชโˆ,๐œƒ)=๐‘–โˆˆ๐’ช๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐›ฝ) and ๐‘ƒ(๐‘Œ๐’ช,๐‘๐’ชโˆฃ๐‘…๐’ž,๐‘Œ๐’žโงต๐’ชโˆ,๐œƒ)=๐‘–โˆˆ๐’ช[๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐›ฝ)๐‘ƒ(๐‘๐‘–โˆฃ๐›พ)].

With ๐‘=1000, we simulated covariate values from ๐‘๐‘–โˆผBernoulli(0.2) and two different sets of response values (both independently of ๐‘๐‘–) from ๐‘Œ๐‘–โˆผ๐‘(4,4) and ๐‘Œ๐‘–โˆผGamma(4,1) (same mean and variance, but the latter distribution is skewed to the right). We fit two alternative models to these data, in both models ๐‘๐‘–โˆฃ๐›พโˆผBernoulli(๐›พ), with the models for the response specified as ๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐›ฝโˆผ๐‘(๐›ฝ0+๐›ฝ1๐‘๐‘–,๐›ฝ22) and ๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐›ฝโˆผNPQM(๐›ฝ0+๐›ฝ1๐‘๐‘–,๐›ฝ2,๐›ฝ3,๐›ฝ4), where ๐›ฝ are the collections of all model parameters. The latter model is the normal-polynomial quantile mixture distribution proposed by Karvanen ([20, pages 950โ€“953]; see also ๐‘… package Lmoments), which we apply here for regression modeling. The parametrization here can be expressed in terms of the first four L-moments as ๐œ†1=๐ธ(๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐›ฝ)=๐›ฝ0+๐›ฝ1๐‘๐‘–, ๐œ†2=๐›ฝ2/โˆš๐œ‹ (L-scale), ๐œ†3=๐›ฝ3 (L-skewness), and ๐œ†4=๐›ฝ4 (L-kurtosis). This distribution is suitable for regression modeling as the first L-moment is the mean of the distribution, while the two additional shape parameters allow for more flexible modeling of the residual distribution. The NPQM distribution includes normal distribution as a special case when ๐œ†3=0 and ๐œ†4/๐œ†2=0.1226. The full and conditional log-likelihood expressions under these model specifications were maximized with respect to ๐›ฝ and ๐›พ as described in the previous section. Initial values for the algorithm were set at the sample moments calculated from the marginal distribution of the response variable, with the regression coefficient set to ๐›ฝ1=0 and ๐›พ=0.5.

Table 2 shows the mean maximum likelihood estimates over 1000 replications for all 8 combinations of data generating model, fitted model, and estimation method. With the response simulated from normal distribution, both models estimated using either full or conditional likelihood give the expected results (Figures 1 and 2). However, with the response simulated from gamma distribution and the sampling fraction ๐‘›/๐‘ small, the misspecified normal model estimated using full likelihood indicates a spurious association between the response and the covariate. This is because the missing data act as extra parameters, allowing the optimization procedure to obtain a better fit to the skewed data. The corresponding sampling distributions for ๐›ฝ1 shown in Figure 3 have become bimodal. It should be noted that the initial values given to the optimization algorithm corresponded always to the โ€œcorrect solutionโ€ of no covariate effect, thus enabling the algorithm to find the correct mode. However, bimodality in the sampling distribution does not necessarily indicate that the likelihood functions given a single realized dataset would be bimodal.

Using the NPQM model which allows a skewed residual distribution does not correct the situation either when combined with full likelihood estimation (Figure 4). This is not unexpected since adding more parameters to an already overparameterized situation is not necessarily helpful in solving identification problems. In contrast, conditional likelihood continues to give reasonable results in all cases, with the skewness of the data correctly reflected by the parameter ๐›ฝ3 of the NPQM model. Some degree of misspecification of the response model is required for the multimodality to appear since when we simulated the response variable from NPQM distribution, the full likelihood fit of the NPQM model indicated no problems, while the full likelihood fit of the normal model showed the same problems as with gamma data (Figures 5 and 6). We also repeated all the simulation alternatives with ๐›ฝ1=1 (๐‘Œ๐‘–โˆผ๐‘๐‘–+๐‘(4,4) and ๐‘Œ๐‘–โˆผ๐‘๐‘–+Gamma(4,1)), with the conclusions essentially unchanged (Table 3 and Figures 7, 8, 9, and 10).

6.3. An Example with Real Data

The case-cohort set for all-cause mortality in the example cohort (๐‘=5039) is of size ๐‘›=1816, the union of a subcohort of size 1068 and 996 deaths due to any cause (since the subcohort has been selected independently of the case status, it includes a number of cases). The case-cohort design applied in this cohort was described in Section 4.3.1. Due to sample unavailability or unsuccessful genotyping, the genotype for the lactase persistence SNP rs4988235 was unavailable for 156 individuals, giving |๐’ชโงตโ„ณ|=1660. The model for the data described in Section 1 was specified as follows: Hardy-Weinberg model ๐‘๐‘–โˆฃ๐›พโˆผBinom(2,๐›พ) was used for the number of lactase persistence alleles. The model for BMI, which is the response variable of interest, given age at baseline ๐‘‹๐‘–โ‰ก๐‘๐‘– and the genotype ๐‘๐‘–, was taken to be ๐‘Œ๐‘–โˆฃ๐‘๐‘–,๐‘๐‘–,๐›ฝโˆผNPQM(๐›ฝ0+๐›ฝ1๐‘๐‘–+๐›ฝ2๐Ÿ{๐‘๐‘–=0},๐›ฝ3,๐›ฝ4,๐›ฝ5). The model for the survival outcome was taken to be ๐‘ƒ(๐‘‡๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘Œ๐‘–,๐‘๐‘–,๐›ผ)๐›ผโˆ[๐œ†๐‘–(๐‘ก๐‘–)]๐Ÿ๐‘–{๐‘’=1}๐‘†๐‘–(๐‘ก๐‘–), where ๐‘†๐‘–(๐‘ก๐‘–โˆซ)โ‰กexp{๐‘ก๐‘–0๐œ†๐‘–(๐‘ข)๐‘‘๐‘ข}, and using again the Weibull form for the hazard function ๐œ†๐‘–(๐‘ข)โ‰ก(๐›ผ4/๐›ผ0)(๐‘ข/๐›ผ0)๐›ผ4โˆ’1exp{๐›ผ1๐‘๐‘–+๐›ผ2๐‘Œ๐‘–+๐›ผ3๐Ÿ{๐‘๐‘–=0}}. In both regression models, the lactase persistence allele noncarriers are compared to the hetero- and homozygote carriers of the allele. Under the case-cohort design where the subcohort sampling probabilities depend on ๐‘๐‘–, the conditional likelihood expression takes the form ๐‘ƒ๎€ท๐‘‡๐’ชโงตโ„ณ,๐ธ๐’ชโงตโ„ณ,๐‘Œ๐’ชโงตโ„ณโˆฃ๐‘…๐’ž,๐‘€๐’ช,๐‘‡๐’žโงต๐’ชโˆชโ„ณ,๐ธ๐’žโงต๐’ชโˆชโ„ณ,๐‘Œ๐’žโงต๐’ชโˆชโ„ณ,๐‘‹๐’ž,๐‘๐’ชโงตโ„ณ๎€ธ,๐œƒ(๐›ผ,๐›ฝ)โˆ๎‘๐‘–โˆˆ๐’ชโงตโ„ณ๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘๐‘–,๐‘๐‘–๎€ธ,๐›ฝโˆซ๐‘ฆ๐‘–๎€บ๎€ท๎€ท๐‘1โˆ’1โˆ’๐œ‹๐‘–๐‘†๎€ธ๎€ธ๐‘–๎€ท๐‘๐‘–๐‘ƒ๎€ท๎€ธ๎€ป๐‘‘๐‘ฆ๐‘–โˆฃ๐‘๐‘–,๐‘๐‘–๎€ธ,,๐›ฝ(6.2) or ๐‘ƒ๎€ท๐‘€๐’ช,๐‘‡๐’ช,๐ธ๐’ช,๐‘Œ๐’ช,๐‘๐’ชโงตโ„ณโˆฃ๐‘…๐’ž,๐‘‡๐’žโงต๐’ช,๐ธ๐’žโงต๐’ช,๐‘Œ๐’žโงต๐’ช,๐‘‹๐’ž๎€ธ,๐œƒ(๐›ผ,๐›ฝ,๐›พ)โˆ๎‘๐‘–โˆˆ๐’ชโงตโ„ณ๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–=๐‘ง๐‘–๎€ธโˆฃ๐›พโˆ‘๐‘ง๐‘–โˆซ๐‘ฆ๐‘–๎€บ๎€ท๎€ท๐‘1โˆ’1โˆ’๐œ‹๐‘–๐‘†๎€ธ๎€ธ๐‘–๎€ท๐‘๐‘–๐‘ƒ๎€ท๎€ธ๎€ป๐‘‘๐‘ฆ๐‘–โˆฃ๐‘๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–=๐‘ง๐‘–๎€ธร—๎‘โˆฃ๐›พ๐‘–โˆˆโ„ณโˆ‘๐‘ง๐‘–๐‘ƒ๎€ท๐‘‡๐‘–,๐ธ๐‘–โˆฃ๐‘Œ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–=๐‘ง๐‘–๎€ธโˆฃ๐›พโˆ‘๐‘ง๐‘–โˆซ๐‘ฆ๐‘–๎€บ๎€ท๎€ท๐‘1โˆ’1โˆ’๐œ‹๐‘–๐‘†๎€ธ๎€ธ๐‘–๎€ท๐‘๐‘–๐‘ƒ๎€ท๎€ธ๎€ป๐‘‘๐‘ฆ๐‘–โˆฃ๐‘๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–=๐‘ง๐‘–๎€ธ,โˆฃ๐›พ(6.3) depending on whether we condition upon the observed genotype data ๐‘๐’ชโงตโ„ณ, or whether this is modeled as part of the likelihood (see Section 5.1.3). The difference between these two approaches is that the former requires more assumptions on the missingness mechanism. Therefore, we compare here the two approaches to see whether there is any observable difference between the results. In addition, expression (6.3) enables estimation of the population allele frequency ๐›พ. However, it should be noted that while the summation over the single SNP genotype variable ๐‘๐‘– is not a computational problem in the present example, this is not necessarily the case generally, with multiple continuous covariates in the model. Therefore, the potential advantage of expression such as (6.2) is that it avoids specification of the covariate distribution and the integration over the missing covariate values.

A remaining issue to be considered is the numerical evaluation of the integrals over ๐‘ฆ๐‘– in the correction terms of (6.2) and (6.3). It should be noted that these have to be evaluated at each parameter value tried at the hill-climbing numerical optimization. Although numerical (quadrature) integration would be feasible in one dimension, here we propose evaluation of such integrals using Monte Carlo integration by repeated sampling from the distribution ๐‘ƒ(๐‘Œ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘๐‘–,๐‘๐‘–,๐›ฝ). This has an added advantage that while the numerical evaluation of the density of the NPQM model is computationally expensive, random variates can be easily simulated from this distribution since it is defined through the quantile function. This proceeds by drawing random variates from [0,1]-uniform distribution and applying formula (6.2) of Karvanen [20]. With random variates ๐‘ฆ๐‘–(๐‘—)โˆผNPQM(๐›ฝ0+๐›ฝ1๐‘๐‘–+๐›ฝ2๐Ÿ{๐‘๐‘–=0},๐›ฝ3,๐›ฝ4,๐›ฝ5), ๐‘—=1,โ€ฆ,๐‘˜, drawn for each ๐‘–โˆˆ๐’ชโงตโ„ณ given the current parameter values, the integrals in (6.2) are then approximated by the means โˆ‘(1/๐‘˜)๐‘˜๐‘—=1[1โˆ’(1โˆ’๐œ‹(๐‘๐‘–))๐‘ƒ(๐‘‡๐‘–>๐‘๐‘–โˆฃ๐‘ฆ๐‘–(๐‘—),๐‘๐‘–,๐›ผ)].

Maximum likelihood estimates obtained by numerical maximization of the expressions (6.2) and (6.3) with respect to ๐›ผ, ๐›ฝ, and in the latter case also ๐›พ, are presented in Table 4 (๎๐›ผ and ฬ‚๐›พ) and Table 5 (ฬ‚๐›ฝ). For comparison, we also maximised a full likelihood of the form (5.1) with respect to (๐›ผ,๐›ฝ,๐›พ) and a weighted pseudolikelihood of the form (5.2) with respect to ๐›ฝ. In the latter case, we accounted for the missingness within the case-cohort set by fitting a logistic regression model logit{๐‘ƒ(๐‘€๐‘–=0โˆฃ๐ธ๐‘–,๐‘๐‘–,๐‘Œ๐‘–,๐œ‚)}=๐œ‚0+๐œ‚1๐ธ๐‘–+๐œ‚2๐‘๐‘–+๐œ‚3๐‘Œ๐‘– in the set ๐’ช and calculated the adjusted weights as inverses of ๐‘ƒ(๐‘€๐‘–=0โˆฃ๐ธ๐‘–,๐‘๐‘–,๐‘Œ๐‘–,ฬ‚๐œ‚)๐‘ƒ(๐‘…๐‘–=1โˆฃ๐ธ๐‘–,๐‘๐‘–). Due to the extra estimation step, we used bootstrap with 5000 replications to obtain standard errors for the weighted pseudolikelihood estimates. Both conditional likelihood expressions gave very similar results, suggesting that the missing data within the case-cohort set is not a major issue in the present case. The comparison between different number of Monte Carlo replicates in numerical evaluation of the conditional likelihood correction term suggests that the estimates do no longer appreciably change when ๐‘˜ is increased from 1000. However, the change in the estimates when increasing ๐‘˜ from 100 to 1000 is already well within the standard errors.

Full likelihood and weighted pseudolikelihood estimates agreed well with the conditional likelihood ones, although the latter had higher standard errors, as was the case also in the simulations. As noted by Kettunen et al. [10], the absence of the lactase persistence allele shows association with lower body mass index. The residuals from the model for BMI are significantly skewed to the right, as indicated by the estimates of ๐›ฝ4.

7. Discussion

Although conditional logistic likelihood is well known in the context of risk set sampling designs (e.g., [5, 18, 40]), its connection to the general concept of conditional likelihood has rarely been emphasized by authors. Furthermore, use of the conditional likelihood approach is not limited to the binary event outcome response situation. In this paper we have attempted to highlight these issues in the context of modeling a continuous secondary response variable under cohort sampling designs. Naturally, the conditional likelihood approach is also valid for the primary time-to-event outcome. Although here we considered only parametric specifications for the time-to-event outcome, semiparametric maximum likelihood techniques could also be utilized here. A potential disadvantage of the likelihood-based methods for secondary analysis is that they require specification of a model for the primary time-to-event outcome even though it is not of main interest in the secondary analysis setting, whereas specification of this model is avoided in the weighted pseudolikelihood approach.

Compared to the full likelihood approach which would be applicable under any cohort sampling design, the advantage of the conditional likelihood is that modeling of the population distribution of the covariates collected in the second phase can be avoided, if this is not of primary interest. In addition, as we demonstrated in a simulation example, full likelihood expressions with most of the covariate data unobserved may no longer be well behaved, a problem which the conditional likelihood approach does not have. The disadvantage of the conditional likelihood approach is that the second-phase sampling mechanism, that is, the joint distribution of the inclusion indicators, needs to be specified. In the case-cohort/Bernoulli sampling situation, this is straightforward, but in mechanisms such as risk set sampling, where the sampling probabilities are specified only implicitly, resolving the joint sampling probability can be computationally nontrivial, and approximations may be needed in practice. This is avoided in the full likelihood approach, since it does not require specification of the sampling mechanism. The inverse-probability-weighted estimating function only requires specification of the first-order selection probabilities, but the possible dependencies induced by the sampling mechanism need to be addressed in the variance estimation step. The full and conditional likelihood approaches gave equivalent efficiencies in our simulated setting, although the result might be different if the first-phase data involves covariates which are highly predictive of the second-phase covariate of interest. In any case, both likelihood-based methods gave a clear improvement in efficiency in the secondary analysis setting compared to the inverse-probability-weighted method.

Appendices

A. On Inverse-Probability-Weighted Pseudolikelihood Estimators

The use of the expression (3.1) in approximating the corresponding complete data log-likelihood โˆ‘๐‘–โˆˆ๐’žlog๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ), where we denote log๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ)โ‰ก๐‘™๐‘–(๐›ฝ), is justified by considering the expectation of the pseudoscore function with respect to the joint distribution ๐‘ƒ(๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘๐’ž,๐‘‹๐’ž)=๐‘ƒ(๐‘…๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž)๐‘ƒ(๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘๐’ž,๐‘‹๐’ž), which is assumed to be the correct data generating mechanism. This becomes ๐ธ๐‘…๐’ž,๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž,๐‘๐’ž๎ƒฌ๎“๐‘–โˆˆ๐’ž๐Ÿ{๐‘…๐‘–=1}๎€ท๐‘Œ๐œ•log๐‘ƒ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐›ฝ/๐œ•๐›ฝ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘‹๐’ž,๐‘Œ๐’ž๎€ธ๎ƒญ=๎€œ๐‘ก๐’ž๎€œ๐‘ฅ๐’ž๎€œ๐‘ฆ๐’ž๎€œ๐‘ง๐’ž๎“๐‘’๐’ž๎“๐‘–โˆˆ๐’ž๎“๐‘Ÿ๐‘–๐Ÿ{๐‘Ÿ๐‘–=1}๐‘™๎…ž๐‘–โˆ‘(๐›ฝ)๐‘Ÿ๐’žโงต๐‘–๐‘ƒ๎€ท๐‘…๐’ž=๐‘Ÿ๐’žโˆฃ๐‘ก๐’ž,๐‘’๐’ž,๐‘ฅ๐’ž,๐‘ฆ๐’ž๎€ธ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘ก๐’ž,๐‘’๐’ž,๐‘ฅ๐’ž,๐‘ฆ๐’ž๎€ธ๎€ท๐‘‡ร—๐‘ƒ๐’žโˆˆ๐‘‘๐‘ก๐’ž,๐ธ๐’ž=๐‘’๐’ž,๐‘‹๐’žโˆˆ๐‘‘๐‘ฅ๐’ž,๐‘Œ๐’žโˆˆ๐‘‘๐‘ฆ๐’ž,๐‘๐’žโˆˆ๐‘‘๐‘ง๐’ž๎€ธ=๎“๐‘–โˆˆ๐’ž๎€œ๐‘ฅ๐‘–๎€œ๐‘ฆ๐‘–๎€œ๐‘ง๐‘–๐‘™๎…ž๐‘–(๎€œ๐›ฝ)๐‘ก๐‘–๎“๐‘’๐‘–๐‘ƒ๎€ท๐‘‡๐‘–โˆˆ๐‘‘๐‘ก๐‘–,๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘ฅ๐‘–,๐‘ฆ๐‘–,๐‘ง๐‘–๎€ธ๎€ท๐‘‹ร—๐‘ƒ๐‘–โˆˆ๐‘‘๐‘ฅ๐‘–,๐‘Œ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–,๐‘๐‘–โˆˆ๐‘‘๐‘ง๐‘–๎€ธ=๎“๐‘–โˆˆ๐’ž๐ธ๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–๎€บ๐‘™๎…ž๐‘–(๎€ป=๎“๐›ฝ)๐‘–โˆˆ๐’ž๐ธ๐‘‹๐‘–,๐‘๐‘–๎ƒฏ๐ธ๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎ƒฌ๎€ท๐‘Œ๐œ•log๐‘ƒ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€ธ,๐›ฝ,๐œ•๐›ฝ๎ƒญ๎ƒฐ(A.1) the last form being the expectation of the complete data score function for parameters ๐›ฝ, which becomes zero through the inner expectation if the parametric model ๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ) is correct, that is, corresponds to the data generating mechanism. Thus, the expression (3.1) gives a valid estimating equation for parameters ๐›ฝ. However, since the variance of the pseudoscore function does not equal the Fisher information, a robust/sandwich type variance estimator must be used instead of the inverse of the observed information. As an example, below we derive this in the special case of a Bernoulli sampling mechanism ๐‘ƒ(๐‘…๐’žโˆฃ๐‘‡๐’ž,๐ธ๐’ž,๐‘Œ๐’ž,๐‘‹๐’žโˆ)=๐‘–โˆˆ๐’ž๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–), where this is straightforward. For consideration of asymptotic variances of inverse-probability-weighted estimators under stratified without replacement sampling or risk set sampling, we refer to Breslow and Wellner [41] and Cai and Zheng [42].

Denoting โˆ‘๐‘ž(๐›ฝ)โ‰ก๐‘–โˆˆ๐’ชlog๐‘ƒ(๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–,๐›ฝ)/๐‘ƒ(๐‘…๐‘–โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘Œ๐‘–,๐‘‹๐‘–) and ฬ‚๐›ฝโ‰กargmax๐›ฝ๐‘ž(๐›ฝ), the latter is given as a solution to the estimating equation ๐‘žโ€ฒ(๐›ฝ)=0, or approximately, by the first-order Taylor expansion at the true value ๐›ฝ0, as a solution to ๐‘žโ€ฒ(๐›ฝ0)+๐‘ž๎…ž๎…ž(๐›ฝ0)(๐›ฝโˆ’๐›ฝ0)=0, which in turn, by substituting expected to observed pseudoinformation, gives the approximate relationship ฬ‚๐›ฝโˆ’๐›ฝ0โ‰ˆ๐ธ[โˆ’๐‘ž๎…ž๎…ž(๐›ฝ0)]โˆ’1๐‘žโ€ฒ(๐›ฝ0). Taking covariance of both sides then gives ฬ‚cov๐›ฝโ‰ˆ๐ธ[โˆ’๐‘ž๎…ž๎…ž(๐›ฝ0)]โˆ’1cov[๐‘žโ€ฒ(๐›ฝ0)]๐ธ[โˆ’๐‘ž๎…ž๎…ž(๐›ฝ0)]โˆ’1, where the pseudo-Fisher information is given by ๐ธ๎€บโˆ’๐‘ž๎…ž๎…ž๎€ท๐›ฝ0๎ƒฌโˆ’๎“๎€ธ๎€ป=๐ธ๐‘–โˆˆ๐’ž๐Ÿ{๐‘…๐‘–=1}๐‘™๐‘–๎…ž๎…ž๎€ท๐›ฝ0๎€ธ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–๎€ธ๎ƒญ=๎“๐‘–โˆˆ๐’ž๐ธ๐‘‹๐‘–,๐‘๐‘–๎€ฝ๐ธ๐‘Œ๐‘–โˆฃ๐‘‹๐‘–,๐‘๐‘–๎€บโˆ’๐‘™๐‘–๎…ž๎…ž๎€ท๐›ฝ0.๎€ธ๎€ป๎€พ(A.2) Since the pseudoscore function has zero mean and now ๐‘…๐‘–โŸ‚๐‘…๐‘—, ๐‘–โ‰ ๐‘—, its covariance is given by๎€บ๐‘žcov๎…ž๎€ท๐›ฝ0๎ƒฌ๎“๎€ธ๎€ป=๐ธ๐‘–โˆˆ๐’ž๎“๐‘—โˆˆ๐’ž๐Ÿ{๐‘…๐‘–=1}๐Ÿ{๐‘…๐‘—=1}๐‘™๎…ž๐‘–๎€ท๐›ฝ0๎€ธ๐‘™๎…ž๐‘—(๐›ฝ0)๐‘‡๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–๎€ธ๐‘ƒ๎€ท๐‘…๐‘—=1โˆฃ๐‘‡๐‘—,๐ธ๐‘—,๐‘‹๐‘—,๐‘Œ๐‘—๎€ธ๎ƒญ=๎“๐‘–โˆˆ๐’ž๐ธ๎ƒฌ๐Ÿ{๐‘…๐‘–=1}๐‘™๎…ž๐‘–๎€ท๐›ฝ0๎€ธ๐‘™๎…ž๐‘–(๐›ฝ0)๐‘‡๐‘ƒ(๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–)2๎ƒญ=๎“๐‘–โˆˆ๐’ž๐ธ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–,๐‘๐‘–๎ƒฌ๐‘™๎…ž๐‘–๎€ท๐›ฝ0๎€ธ๐‘™๎…ž๐‘–(๐›ฝ0)๐‘‡๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–๎€ธ๎ƒญ.(A.3) In practice, these quantities are evaluated at the maximum likelihood point, replacing the expected quantities by their observed counterparts as ๐ธ๎€บโˆ’๐‘ž๎…ž๎…ž๎€ท๐›ฝ0๎“๎€ธ๎€ปโ‰ˆโˆ’๐‘–โˆˆ๐’ช๐‘™๐‘–๎…ž๎…ž๎€ทฬ‚๐›ฝ๎€ธ๐‘ƒ๎€ท๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–๎€ธ,(A.4)๎€บ๐‘žcov๎…ž๎€ท๐›ฝ0โ‰ˆ๎“๎€ธ๎€ป๐‘–โˆˆ๐’ช๐‘™๎…ž๐‘–๎€ทฬ‚๐›ฝ๎€ธ๐‘™๎…ž๐‘–๎€ทฬ‚๐›ฝ๎€ธ๐‘‡๐‘ƒ(๐‘…๐‘–=1โˆฃ๐‘‡๐‘–,๐ธ๐‘–,๐‘‹๐‘–,๐‘Œ๐‘–)2.(A.5)

Alternatively, bootstrap variance estimation might be utilized, but it should be noted that standard bootstrap would be valid only under Bernoulli type sampling designs, which can be interpreted as sampling from infinite population, whereas dependencies induced by sampling without replacement would require application of finite population bootstrap methods (e.g., [43, 44]).

B. Relationship to Retrospective Likelihood

Consider a special case where the observed data are only (๐‘…๐’ž,๐ธ๐’ž,๐‘Œ๐’ช,๐‘๐’ช), where ๐ธ๐‘–, ๐‘–โˆˆ๐’ž, are binary indicators for the case status. This corresponds to the case-control situation considered by Jiang et al. [6] and Lin and Zeng [7]. The set ๐’ž now represents a study base from which the cases and controls have been selected. Now choosing a partitioning ๐‘Š=(๐‘Œ๐’ช,๐‘๐’ช) and ๐‘‰=(๐‘…๐’ž,๐ธ๐’ž), and supposing only the case status information has been used in the selection of cases and controls, we obtain a conditional likelihood ๐‘ƒ๎€ท๐‘Œ๐’ช,๐‘๐’ชโˆฃ๐‘…๐’ž,๐ธ๐’ž๎€ธ=๐‘ƒ๎€ท๐‘…,๐œƒ๐’žโˆฃ๐ธ๐’ž๎€ธ๐‘ƒ๎€ท๐ธ๐’ž,๐‘Œ๐’ช,๐‘๐’ช๎€ธโˆฃ๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐ธ๐’ž๎€ธ๐‘ƒ๎€ท๐ธ๐’ž๎€ธ=๎‘โˆฃ๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐ธ๐‘–โˆฃ๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–๎€ธโˆฃ๐›พโˆซ๐‘ฆ๐‘–โˆซ๐‘ง๐‘–๐‘ƒ๎€ท๐ธ๐‘–โˆฃ๐‘ฆ๐‘–,๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘ง๐‘–๎€ธ๐‘ƒ๎€ท๐‘,๐›ฝ๐‘–โˆˆ๐‘‘๐‘ง๐‘–๎€ธ.โˆฃ๐›พ(B.1) The last form is the retrospective form suggested by Lin and Zeng [7, page 257] for secondary analysis under case-control designs. Thus, retrospective likelihood is a special case of conditional likelihood. The advantage of the above expression is that the terms specifying the sampling mechanism cancel out. The drawbacks are that the baseline risk level (part of ๐›ผ) is not identifiable from this expression (see, e.g., [30, page 1076]) and that the parameters ๐›พ have to be estimated or otherwise made to disappear (Lin and Zeng [7] applied profile likelihood). However, nothing prevents us from choosing a more useful partitioning of the observed data. For instance, with ๐‘Š=๐‘Œ๐’ช and ๐‘‰=(๐‘…๐’ž,๐ธ๐’ž,๐‘๐’ช), we obtain ๐‘ƒ๎€ท๐‘Œ๐’ชโˆฃ๐‘…๐’ž,๐ธ๐’ž,๐‘๐’ช๎€ธ=๐‘ƒ๎€ท๐‘…,๐œƒ๐’žโˆฃ๐ธ๐’ž๎€ธ๐‘ƒ๎€ท๐ธ๐’ž,๐‘Œ๐’ช,๐‘๐’ช๎€ธโˆฃ๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐ธ๐’ž๎€ธ๐‘ƒ๎€ท๐ธ๐’ž,๐‘๐’ช๎€ธ=๎‘โˆฃ๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐ธ๐‘–โˆฃ๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘๐‘–๎€ธ,๐›ฝโˆซ๐‘ฆ๐‘–๐‘ƒ๎€ท๐ธ๐‘–โˆฃ๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘๐‘–๎€ธ,,๐›ฝ(B.2) or, with ๐‘Š=(๐ธ๐’ช,๐‘Œ๐’ช) and ๐‘‰=(๐‘…๐’ž,๐ธ๐’žโงต๐’ช,๐‘๐’ช), analogously to Section 4.2, ๐‘ƒ๎€ท๐ธ๐’ช,๐‘Œ๐’ชโˆฃ๐‘…๐’ž,๐ธ๐’žโงต๐’ช,๐‘๐’ช๎€ธ,๐œƒ๐œƒโˆ๐‘ƒ๎€ท๐ธ๐’ž,๐‘Œ๐’ช,๐‘๐’ช๎€ธโˆฃ๐œƒ๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐ธ๐’žโงต๐’ช๎€ธ๐‘ƒ๎€ท๐ธ,๐œƒ๐’žโงต๐’ช,๐‘๐’ช๎€ธ=โˆโˆฃ๐œƒ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐ธ๐‘–โˆฃ๐‘Œ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆฃ๐‘๐‘–๎€ธ,๐›ฝโˆซ๐‘ฆ๐‘–โˆถ๐‘–โˆˆ๐’ชโˆ‘๐‘’๐‘–โˆถ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐‘…๐’žโˆฃ๐ธ๐’ž๎€ธโˆ๐‘–โˆˆ๐’ช๐‘ƒ๎€ท๐ธ๐‘–=๐‘’๐‘–โˆฃ๐‘ฆ๐‘–,๐‘๐‘–๎€ธ๐‘ƒ๎€ท๐‘Œ,๐›ผ๐‘–โˆˆ๐‘‘๐‘ฆ๐‘–โˆฃ๐‘๐‘–๎€ธ.,๐›ฝ(B.3) The baseline risk level may be identifiable from this last expression, depending on the sampling mechanism used (cf. [18]). The misconception that case-control design necessitates the use of retrospective likelihood has been recently criticized by Langholz [45] and results from equating likelihood function to a data generating mechanism. However, the order of the data collection does not need to determine how the likelihood is factorized, as long as the sampling mechanism is properly taken into account.

C. Mean and Variance of the Conditional Likelihood Score Function

In the following, we suppress the covariates ๐‘‹๐’ž from the notation since these are always conditioned upon and do not affect the derivations. Also, for notational simplicity, the parameter ๐œƒ is taken to be a scalar, and the notation is compressed so that, for example, ๐‘ƒ(๐‘‘๐‘ง๐’ชโˆฃ๐œƒ)โ‰ก๐‘ƒ(๐‘๐’ชโˆˆ๐‘‘๐‘ง๐’ชโˆฃ๐œƒ)โ‰ก๐‘ƒ(๐‘๐‘–โˆˆ๐‘‘๐‘ง๐‘–forall๐‘–โˆˆ๐’ชโˆฃ๐œƒ). The probability measures indexed with respect to ๐œƒ are assumed to possess the regularity properties allowing interchanging the order of integration with respect to the random variables and differentiation with respect to ๐œƒ. The conditional likelihood function of interest is now ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ=๐‘ƒ๎€ท๐‘Ÿ,๐œƒ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ž,๐‘‘๐‘ง๐’ช๎€ธโˆฃ๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธโˆฃ๐œƒ๐œƒโˆ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ช,๐‘‘๐‘ž๐’žโงต๐’ชโˆฃ๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ง๐’ช๎€ธโˆฃ๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’žโงต๐’ชโˆฃ๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ง๐’ช๎€ธ=๐‘ƒ๎€ทโˆฃ๐œƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,,๐œƒ(C.1) and the corresponding score function becomes ๎€ท๐œ•log๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ=๎€ท๐œ•๐œƒ๐œ•log๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒโˆ’๎€ท๐‘Ÿ๐œ•๐œƒ๐œ•log๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ=๎€ท๐œ•๐œƒ๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธโˆ’๎€ท๐‘Ÿ,๐œƒ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ.,๐œƒ(C.2) The expectation of the score function is considered with respect to the full model ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ž,๐œƒ)๐‘ƒ(๐‘‘๐‘ž๐’ž,๐‘‘๐‘ง๐’žโˆฃ๐œƒ). The expectation of the first term in (C.2) becomes ๐ธ๎ƒฌ๎€ท๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ=๎“,๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’ž๎€œ๐‘ง๐’ž๎€ท๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท๐‘Ÿ,๐œƒ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ž๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ž,๐‘‘๐‘ง๐’ž๎€ธ=๎“โˆฃ๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’ž๎€œ๐‘ง๐’ช๎€ท๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธร—๎€œ,๐œƒ๐‘ง๐’žโงต๐’ช๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ž,๐‘‘๐‘ง๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ=๎“โˆฃ๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’ž๎€œ๐‘ง๐’ช๎€ท๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎€ท๐‘Ÿ,๐œƒร—๐‘ƒ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ.โˆฃ๐œƒ(C.3) In the above expressions, it should be noted that the index set ๐’ช is fixed by ๐‘Ÿ๐’ž. Due to the assumption on unconfounded sampling mechanism, the term ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ช,๐œƒ) does not depend on ๐œƒ, and we get ๐ธ๎ƒฌ๎€ท๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ=๎“,๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’ž๎€œ๐‘ง๐’ช๐œ•๎€บ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ๎€ธ๎€ป๐‘ƒ๎€ท๐œ•๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ=๎“โˆฃ๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’žโงต๐’ช๎€œ๐‘ง๐’ช๐œ•๎‚ƒโˆซ๐‘ž๐’ช๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’ช,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎‚„,๐œƒ๐‘ƒ๎€ท๐œ•๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ=๎“โˆฃ๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’žโงต๐’ช๎€œ๐‘ง๐’ช๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐‘ƒ๎€ท๐œ•๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ.โˆฃ๐œƒ(C.4) Similarly, the expectation of the second term in (C.2) becomes๐ธ๎ƒฌ๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ=๎“,๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’ž๎€œ๐‘ง๐’ž๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท๐‘Ÿ,๐œƒ๐’žโˆฃ๐‘ž๐’ž,๐‘ง๐’ž๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’ž,๐‘‘๐‘ง๐’ž๎€ธ=๎“โˆฃ๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’ž๎€œ๐‘ง๐’ช๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎€ท,๐œƒร—๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท๐‘Ÿ,๐œƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๐‘ƒ๎€ท,๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ=๎“โˆฃ๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’žโงต๐’ช๎€œ๐‘ง๐’ช๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐‘ƒ๎€ท๐œ•๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ๎€œโˆฃ๐œƒ๐‘ž๐’ช๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ=๎“,๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’žโงต๐’ช๎€œ๐‘ง๐’ช๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐‘ƒ๎€ท๐œ•๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ.โˆฃ๐œƒ(C.5) Thus, we have ๐ธ๎ƒฌ๎€ท๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ๎ƒฌ๎€ท๐‘Ÿ,๐œƒ=๐ธ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ,๐œƒ,(C.6) and the conditional likelihood score function has zero expectation. The Fisher information is given by ๐ธ๎ƒฌโˆ’๐œ•2๎€ทlog๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐œ•๐œƒ2๎ƒญ๎ƒฌ๐œ•=๐ธ2๎€ท๐‘Ÿlog๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐œ•๐œƒ2๎ƒญ๎ƒฌ๐œ•โˆ’๐ธ2๎€ทlog๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐œ•๐œƒ2๎ƒญ๎ƒฌ๐œ•=๐ธ2๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ2๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ๎‚ธ,๐œƒโˆ’๐ธ๐œ•๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎‚ธ+๐ธ๐œ•๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎ƒฌ๐œ•โˆ’๐ธ2๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ2๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ.,๐œƒ(C.7) By writing open the expectations as above, it is easy to see that ๐ธ๎ƒฌ๐œ•2๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ2๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ๎ƒฌ๐œ•,๐œƒ=๐ธ2๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ2๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ,๐ธ๎ƒฌ๎€ท๐‘Ÿ,๐œƒ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎€ท,๐œƒ๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ=๎“,๐œƒ๐‘Ÿ๐’ž๎€œ๐‘ž๐’žโงต๐’ช๎€œ๐‘ง๐’ช๎€ท๐‘Ÿ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎€ท๐‘Ÿ,๐œƒ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐‘ƒ๎€ท๐œ•๐œƒ๐‘‘๐‘ž๐’žโงต๐’ช,๐‘‘๐‘ง๐’ช๎€ธ๎‚ธโˆฃ๐œƒ=๐ธ๐œ•๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2.(C.8) Thus, the Fisher information becomes ๐ธ๎ƒฌโˆ’๐œ•2๎€ทlog๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๐œ•๐œƒ2๎ƒญ๎‚ธ=๐ธ๐œ•๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎‚ธโˆ’๐ธ๐œ•๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎‚ธ=๐ธ๐œ•๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎‚ธ+๐ธ๐œ•๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎ƒฌ๎€ท๐‘Ÿโˆ’2๐ธ๐œ•๐‘ƒ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎€ท,๐œƒ๐œ•๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ/๐œ•๐œƒ๐‘ƒ๎€ท๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ๎ƒญ๎‚ธ,๐œƒ=๐ธ๐œ•๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘‘๐‘ž๐’ชโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ชโˆ’,๐œƒ)๐œ•๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช,๐œƒ)/๐œ•๐œƒ๐‘ƒ(๐‘Ÿ๐’žโˆฃ๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎‚น,๐œƒ)2๎ƒฌ๎€ท=๐‘‰๐œ•log๐‘ƒ๐‘‘๐‘ž๐’ชโˆฃ๐‘Ÿ๐’ž,๐‘ž๐’žโงต๐’ช,๐‘ง๐’ช๎€ธ,๐œƒ๎ƒญ,๐œ•๐œƒ(C.9) that is, equal to the variance of the conditional likelihood score function.

Acknowledgments

The majority of this work was carried out when the first author was working at the Department of Chronic Disease Prevention of the National Institute for Health and Welfare, Helsinki, Finland. The work was partly supported by the European Commission through the Seventh Framework Programme CHANCES Project [HEALTH-F3-2010-242244]. The authors would like to thank Professor Jarmo Virtamo and Professor Markus Perola of the National Institute for Health and Welfare for permission to use the ATBC data in our illustration.