Computational and Mathematical Methods in Medicine

Volume 2019, Article ID 1526290, 11 pages

https://doi.org/10.1155/2019/1526290

## Likelihood-Ratio-Test Methods for Drug Safety Signal Detection from Multiple Clinical Datasets

^{1}Mathematical Statistician in the Division of Biostatistics, Office of Clinical Evidence and Analysis, CDRH, FDA, Silver Spring, MD 20993, USA^{2}Mathematical Statistician in the Division of Biometrics I, Office of Biostatistics, OTS, CDER, FDA, Silver Spring, MD 20993, USA^{3}Division Director in the Division of Biostatistics, Office of Clinical Evidence and Analysis, CDRH, FDA, Silver Spring, MD 20993, USA

Correspondence should be addressed to Lan Huang; vog.shh.adf@gnauh.nal

Received 28 September 2018; Revised 7 December 2018; Accepted 21 January 2019; Published 19 February 2019

Academic Editor: Anna Tsantili-Kakoulidou

Copyright © 2019 Lan Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Pre- and postmarket drug safety evaluations usually include an integrated summary of results obtained using data from multiple studies related to a drug of interest. This paper proposes three approaches based on the likelihood ratio test (LRT), called the LRT methods, for drug safety signal detection from large observational databases with multiple studies, with focus on identifying signals of adverse events (AEs) from many AEs associated with a particular drug or inversely for signals of drugs associated with a particular AE. The methods discussed include simple pooled LRT method and its variations such as the weighted LRT that incorporates the total drug exposure information by study. The power and type-I error of the LRT methods are evaluated in a simulation study with varying heterogeneity across studies. For illustration purpose, these methods are applied to Proton Pump Inhibitors (PPIs) data with 6 studies for the effect of concomitant use of PPIs in treating patients with osteoporosis and to Lipiodol (a contrast agent) data with 13 studies for evaluating that drug’s safety profiles.

#### 1. Introduction

Meta-analysis approaches for multiple independent studies have become very popular in medical research. In many observational and/or clinical trial studies, meta-analysis can be performed using the study-level summary measures or patient-level information; for example, the studies can be integrated using a common statistical measure such as the study-level mean or effect size and computing a weighted average of this common measure using a statistical approach such as a fixed-effect model or a random-effects model [1]. The weights are usually related to the study-level sample sizes or within study variation but may depend on other factors. This type of approach is referred as the traditional meta-analysis and is being extensively used (as supportive) in the pre- and postapproval of drug products for evaluating their efficacy and safety. The traditional meta-analysis of many large and small clinical trials, published studies, registries, and large clinical and/or observational databases, for thorough evaluation of clinical efficacy endpoints such as the mean change in the weight-loss or blood-pressure and hazard ratio in survival comparison and clinical safety endpoints such as odds ratio, risk ratio, and absolute risk difference, has become a common practice for a modern-day pre- and postmarket clinical/observational studies [1, 2]. For example, a number of meta-analyses of rosiglitazone trials for patients with type-2 diabetes have been conducted to evaluate the risk for myocardial infarction (MI) and cardiovascular mortality [3], whereas in a meta-analysis of 15 clinical trials submitted to FDA during 1987–2012, Borges et al. [4] reviewed randomized withdrawal maintenance trials for major depressive disorder.

Using the traditional meta-analysis for safety evaluation, researchers can evaluate the point estimates and 95% confidence intervals for odds ratio or risk ratio of the drug-AE pair of interest from each study, and then combine the estimates through a fixed-effect model or a random-effects model, produce an overall estimate of the parameter of interest and its associated 95% confidence interval, and then display the results using a forest plot. Here, we intend to extend the exploration of using traditional meta-analysis to safety signal detection, where relative risks (RRs) are commonly used when the drug exposure information is available, and they are usually called the risk ratios. The relative event rates or proportional reporting rates are used when there is lack of drug exposure information, which is usually the case in passive surveillance of medical products. It is important to explore safety signals in each study; however, when studying safety signals, researchers usually collect information from many trials (or studies) since a single clinical study with focus on efficacy cannot provide enough information for safety events. The clinical studies, included in a large safety data or database, are usually independent studies with different protocols. It is possible that a signal detected in one study may not be detected in other studies due to variation across studies (in terms of sample sizes, study sites, personnel, patients enrolled, study time, and others).

Several methods have been developed for data mining or safety signal detection for exploring multiple drugs and AEs (for example, proportional reporting ratios [5], reporting odds ratios [6], likelihood ratio tests [7–9], and Bayesian methods [10–13]). However, these signal detection methods usually work on pooled large passive data and are not designed to incorporate the heterogeneity from multiple studies. Here, we propose new methods for drug safety signal detection (with an intent to control the type-I error and false discovery rate), for data with multiple studies, obtained from large observational databases such as FDA event reporting system (FAERS; https://open.fda.gov/data/faers/) or from clinical trial databases. The new methods utilize the regular likelihood ratio test (LRT) for signal detection [7] and consist of a two-step approach for exploring safety signals from multiple studies/sources. In the first step, the regular LRT is applied to the safety data by study and in the second step, the regular LRT test statistics from different studies are combined to derive an overall test statistic for conducting the global test at a prespecified level of significance, and if the global null is rejected in favor of the global alternative, the data provides evidence of a signal, overall.

The paper is organized as follows. In Section 2, we give a brief review of the basic LRT method for signal detection (regular LRT) and introduce several methods, based on regular LRT, for signal detection from multiple studies. In Section 3, the proposed LRT methods for signal detection are applied to two datasets for illustration: first one, to a dataset on concomitant use of PPI drugs for patients taking drugs treating osteoporosis, with interest in comparing two drug groups (PPI + placebo vs. placebo only) from 6 selected studies; and second one to a selected set of 13 published studies on Lipiodol (a contrast agent) with maximum dose of 15 mg. A simulation study is conducted to evaluate the performance of the LRT analysis methods for multiple studies in Section 4. We conclude Section 5 with a discussion.

#### 2. Methods

##### 2.1. A Summary of Regular LRT

The likelihood-ratio-test based method for signal detection developed originally for passive surveillance of large safety databases and is available to public for use in openFDA (https://open.fda.gov/tools/), called here as the regular LRT method, is a frequentist method based on multiple tables [7]. For a particular AE *j*, of interest, there are *I * tables if there are a total of *I* drugs in the study. Here, the drugs are considered different rows, and the jth AE can be considered as a column (see Section 3.1). If, for a particular drug, one wants to compare many AEs, drug should be considered as a column variable and the AEs should be the rows (see Section 3.2).

Define as the cell count for ith row (e.g., drug) and jth column (e.g., AE) and assume that where is the reporting rate of ith drug for jth AE, and that where is the reporting rate of all other drugs excluding ith drug for *j*th AE. Here, , and Dropping the suffix *j* in and , assume that is fixed, the interest is to test the null hypothesis against the alternative hypothesis that for at least one The likelihood ratio statistic for and , as derived in [7], iswhere and .

The maximum likelihood ratio (MLR) test statistic, for the one-sided alternative, iswhere the maximum is taken over Since logarithm is a monotonic (increasing) function of , so it is convenient to work with .

The above formulation was constructed assuming there is no drug exposure information in the large postmarket safety database from passive surveillance system. In this case, “no drug exposure” usually refers to the fact that we may know how many adverse events are reported with respect to a certain drug in a passive surveillance system, but we may not know the number of patients who actually took the drug and the drug exposure information for each person. Therefore, was used to serve as an approximation of total drug use and relative reporting rates were compared for such an analysis using data from FDA adverse event reporting system (FAERS; https://open.fda.gov/data/faers/).

When the drug exposure for ith drug () is available, all can be replaced by and the relative risks can then be compared with available drug exposure information (see some definitions in Huang et al. [8]). Drug exposure information may be available in a legacy database including data from completed clinical trials or data from ongoing clinical trials (for safety monitoring purpose). In clinical trial data, the drug exposure for a patient is usually well-defined and prespecified as the total dose taken by the patient during the study, or the exposure time from a certain amount of drug. In some cases, we may not have well defined drug exposure information from completed clinical trials. For example, the precise drug exposure for the concomitant use of PPI is not collected in the studies included in Section 3.1, where we may have to impute the exposure with some reasonable assumptions.

Note that in order to detect signals using information from multiple studies, the drug exposure definition should be consistent and comparable across different studies considered in a single meta-analysis. More details will be discussed in the applications.

The log likelihood ratio statistic is then written aswhere , and .

Since the distribution of MLLR test statistic under the null hypothesis is not tractable, a Monte Carlo procedure (MC) is used to obtain the empirical distribution of MLLR. The empirical distribution of MLLR under the null hypothesis can now be obtained by generating a large number of Monte Carlo samples for the cell-report counts and for the AE, using multinomial distribution with known as the total number of events. If the drug exposure is available, the distribution is then If the based on the observed data, , is greater than the threshold value of (the upper percentile point of the empirical distribution), the null hypothesis is rejected with alpha = 0.05. The value of MLLR can be calculated as

The drug associated with is then the most significant signal detected.

##### 2.2. LRT Analysis Approaches for Signal Detection from Multiple Studies

Here, we propose several LRT approaches based on the regular likelihood ratio test (LRT) for safety signal detection with multiple studies. Note that in the following, or can be calculated by the formula described in Section 2.1 by study.

###### 2.2.1. Analysis of Pooled Data from Several Studies Using Regular LRT

Suppose there are a total of S studies or datasets. Let denotes the total of event/report counts for ith drug and jth AE, summed over all the S studies (note that the subscript *i* is used for drug here and that one can define the row as drug or AE depending on the interest). Using this definition of “pooled” , we can apply the regular LRT to detect the drug signals. However, the regular LRT applied to the pooled data may not control the type-I error as the Monte Carlo simulation for obtaining the empirical distribution of the test statistic is carried out based on the pooled data, but not the study-level data. We observed this issue in the simulation study.

Another issue with this analysis of pooled studies is that it does not address the study to study variation, that is, heterogeneity of studies. Study heterogeneity may come from different sources including study designs (prospective versus retrospective), different endpoints, different distributions of effect modifiers, and different source of data. Therefore, the analysis of the pooled studies without considering the heterogeneity may lead to biased results. This method is also vulnerable to Simpson’s paradox [14, 15] and should be used with caution. For example, in a medical study for evaluating kidney stone treatment [16, 17], the paradoxical conclusion is that treatment A is more effective when used on patients with small stones and also when used on patients with large stones, yet treatment B is more effective on all patients (combined data).

In the following subsections, two LRT approaches for incorporating study-level heterogeneity are presented.

###### 2.2.2. Maximum of MLLR Statistics from Multiple Studies (MMLR)

Assume there are a total of S studies (with similar patients and objectives and are relevant for the purpose of current active/passive surveillance safety study), we define MLLR statistic for a fixed AE (*j*) of interest and sth study is dropping the suffix *j*. Then, the test statistic for testing the global null hypothesis versus the global alternative hypothesis is the maximum of over all studies defined by The empirical distribution of MMLLR can be obtained by Monte Carlo simulation by generating the null data with and from observed data and with the same relative risk for all rows from each study, and then calculating Like the regular LRT, MMLR controls the type-I error.

A drug with MMLLR from observed data (for a particular study) is a signal if the related value (the rank of the MMLLR from the observed data among the MMLLR values obtained from empirical data divided by the total number of empirical data) is less than a prespecified significance level (such as 0.05). Furthermore, if interested, we can identify secondary drug-study combinations as signals with logLR values as the second largest, third largest, and forth largest values among all values for the drug-study combinations.

###### 2.2.3. Weighted LRT Using Total Drug Exposure as Weight (wLRn)

In this subsection, we assume fixed jth column and drop the suffix *j* in the following derivations.

Let be the total drug exposure for ith drug in sth study. Then, the weighted LRT statistic, based on the total drug exposure, is defined as where denotes the number of studies for the ith drug, and note that could be different for different rows. can be interpreted as the weighted average of logLR from different studies for ith row with weight .

The test statistic for testing the global null hypothesis versus global alternative hypothesis is then defined as where the maximum is obtained over all drugs,

For statistical inference of wLRn method, the simulated null datasets are generated from a multinomial distribution with and from observed data and with the same relative risk for all rows by study. The empirical distribution of is formed by the 10,000 obtained from simulated null data. The value of the is obtained by comparing the with the 10,000 values from the Monte Carlo process:

If the for ith drug (row) has value , then the ith drug is a signal. After detecting the global signal, we can move to the 2nd largest, 3rd largest logLR or weighted logLR values, and so on for secondary signals.

In summary, the statistics discussed in Section 2.2 are presented in Table 1.