Abstract

Substance dependence is a complex environmental and genetic disorder with significant social and medical concerns. Understanding the etiology of substance dependence is imperative to the development of effective treatment and prevention strategies. To this end, substantial effort has been made to identify genes underlying substance dependence, and in recent years, genome-wide association studies (GWASs) have led to discoveries of numerous genetic variants for complex diseases including substance dependence. Most of the GWAS discoveries were only based on single nucleotide polymorphisms (SNPs) and a single dichotomized outcome. By employing both SNP- and gene-based methods of analysis, we identified a strong (odds ratio = 13.87) and significant (P value = ) association of an SNP in the NCK2 gene on chromosome 2 with opiates addiction in African-origin men. Codependence analysis also identified a genome-wide significant association between NCK2 and comorbidity of substance dependence (P value = ) in African-origin men. Furthermore, we observed that the association between the NCK2 gene (P value = ) and opiates addiction reached the gene-based genome-wide significant level. In summary, our findings provided the first evidence for the involvement of NCK2 in the susceptibility to opiates addiction and further revealed the racial and gender specificities of its impact.

1. Introduction

Substance dependence is believed to result from a combination of genetic and environmental factors. Since substance dependence is a chronic brain disease, with high relapse rates, it causes serious social, economic, and medical consequences [13]. The World Health Organization (WHO) and the United Nations Office on Drugs and Crime (UNODC) reported that opiates dependence is associated with a high risk of HIV infection when opiates are injected using contaminated injection equipment [4]. Paulozzi et al. in 2006 reported that the number of deaths which involved prescription opioid analgesics increased from 2,900 in 1999 to at least 7,500 in 2004, an increase of 160% in just 5 years [5]. All available evidence indicated that the increasing numbers of deaths are significantly correlated to the increasing use of prescription drugs, especially opioid painkillers, among people during the working years of life. While exposure to drugs is the prerequisite for addiction, the most important question is as follows: who will be addicted after the exposure? Genes are believed to be a major factor, although it is most likely that there are multiple genes as well as gene-environment interactions. For this reason, understanding the genetic mechanisms behind vulnerability to drug addiction is critical to improve the quality of overall health and life.

Linkage and genome-wide association studies (GWASs) have implicated many regions and genes for dependence on alcohol, tobacco, and opiates. GABRA2, CHRM2, ADH4, PKNOX2, GABRG3, TAS2R16, SNCA, OPRK1, and PDYN have all been associated with alcohol dependence with various degrees of replication [621]. Associations of other candidate alcohol dependence genes, such as KIAA0040, ALDH1A1, and MANBA [18, 20, 2225], remain to be confirmed. Several groups reported CHRNA5, CHRNA3, CHRNB4, and CSMD1 to be associated with nicotine dependence [2634]. Meanwhile, recent studies also reported that a group of genes, such as OPRM1 [3537], OPRD1, OPRK1 [21, 38, 39], HTR1B [40], SLC6A4 [41], GABRG2 [42], and BDNF [43], to be associated or in linkage with opiates addiction.

Complex diseases may involve heterogeneous genetic effects in different ethnic and gender groups [7, 4447]. Luo et al. [44] reported that African-origin smokers become dependent at a lower threshold (number of cigarettes per day) than European-origin smokers. Hartel et al. [46] found that men are more vulnerable to addiction when compared to women. In addition, Chen et al. [7] revealed that PKNOX2 is associated with drug addiction in European-origin women. These examples underscore the necessity to consider demographic or even other covariates in genetic association studies.

Many of the reported genetic variants have been identified through single SNP association tests. Despite many of the successes, a single SNP tends to have a small effect, and the single SNP-based association tests require a very stringent significance level, which is likely a key factor to the so-called “missing heritability” problem [48, 49]. To overcome some of these limitations, gene-based analysis [5052] has emerged to jointly analyze the SNPs within genes. Gene-based methods are less affected by the heterogeneity of a single locus; hence the results may be more robust across populations [53], which increases the likelihood of replication. Hence, we performed both single SNP-based and gene-based association analyses for the data from the Study of Addiction: Genetics and Environment (SAGE) [6] which includes well-characterized phenotypic data on substance dependence including addiction to nicotine, alcohol, marijuana, cocaine, opiates, and other drugs. In our analysis, we find a genome-wide significant association of NCK2 gene on chromosome 2 with opiates dependence in African-origin men at both the SNP and gene levels. NCK2 is a member of NCK family of adaptor proteins, which is associated with tyrosine-phosphorylated growth factor receptors of their cellular substrates [54]. However, to the best of our knowledge, NCK2 has not been reported to be associated with any drug addiction outcomes in humans.

2. Materials and Methods

Phenotypes for multisubstance dependency and genome-wide SNP data from SAGE [6] were downloaded from dbGaP (http://www.ncbi.nlm.nih.gov/gap). SAGE is a large case-control association study which investigates the genetic variants for drug addiction. The samples were collected from three large-scale genome-wide association studies: Collaborative Study on the Genetic of Alcoholism (COGA), the Family Study of Cocaine Dependence (FSCD), and the Collaborative Genetic Study of Nicotine Dependence (COGEND) [16, 44, 55, 56]. The original data set contains 4,121 subjects with six categories of substance dependence data: addiction to alcohol, cocaine, marijuana, nicotine, opiates, and other drugs. Lifetime dependence on these six substances is diagnosed by the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). The genotyping was performed by the illumina Human 1 M platform. In this study, we followed a quality control/quality assurance process similar to previous analyses [7, 57]. Individuals with call rates <90% and SNPs with minor allele frequency MAF <1% were excluded from the analysis. The value for the Hardy-Weinberg equilibrium was set up by >0.0001. These steps reduced the level of noise in genotypes and increased the efficiency of analysis. There are 60 duplicate genotype samples and 9 individuals with ethnic backgrounds other than African origin or European origin. All of those individuals were removed from the subject list. Finally, there were a total of 3,627 unrelated samples with 859,185 autosomal SNPs for our final analysis. To alleviate the confounding by population substructure, we stratified the sample by race and sex. Finally, there are four sub-samples: 1,393 European-origin women, 1,131 European-origin men, 568 African-origin women and 535 African-origin men. The distribution of subjects diagnosed with lifetime dependence on substances in each of the six categories: nicotine, alcohol, marijuana, cocaine, opiates, or other drugs are presented in Table 1.

3. Methods

Figure 1 displays the flow chart of our analytic strategy, and the details of the association analysis methods are described later.

3.1. Statistical Analysis for Single Trait

The SNP-based association is performed by the standard allelic test and logistic regression to obtain the values for individual SNPs, and PLINK software (version 1.07) was used for analysis [58]. Meanwhile, a list of SNP pairs in linkage disequilibrium (LD) () is calculated for the gene-based association test.

For the gene-based analysis, we used the open-source tool: Knowledge-Based Mining System for Genome-Wide Genetic Studies (KGG, version 2.0)—based on the SNP association test results and LD files produced by PLINK. The procedure was performed as the following. We first calculate the effective number of independent value among SNPs within a gene. Then, we sort the SNPs and calculate the effective number of independent -values among the top significant SNPs. Finally, the modified Simes test [51] was employed to obtain a gene-based value as follows, where is the th most significant among the SNPs within a gene. We refer the interested readers to [51] for details.

In the gene-based method, SNPs within 20 kilo bases (kb) 5′ upstream and 10 kilo bases (kb) 3′ downstream of a gene’s coding regions [59] were assigned to the gene. In addition, we included other SNPs if they are in strong LD with the initially mapped SNPs within the gene [60].

Since there are about 20,000 protein coding genes in human genome, we used as the genome-wide significance threshold for the gene-based association test. In contrast, we used as the genome-wide significance threshold for the SNP-based association test [61].

3.2. Codependence Association Analysis

Although logistic regression is commonly used to study a binary outcome, it is not suitable to evaluate comorbidity involving multiple outcomes. We use a nonparametric association test based on Kendall’s tau [62] to study the comorbidity. The Kendall’s tau-based association test proceeds as follows.

Suppose that we observe a -dimensional vector of traits , genotype , and a -dimensional vector of covariates for the th subject in a population-based study with subjects, and are independent samples. For subjects and , let and be their vectors of traits, respectively, and analogously, and and and are their genotypes and covariates. Generalized from Kendall’s tau, a statistic is defined to measure the association between and as follows: Without considering the covariates and conditioning on all phenotypes, follows an asymptotically normal distribution in the absence of association [63]. To accommodate covariates, a weighted statistic has been developed [64, 65]. We refer to Jiang and Zhang [64] for a detailed description of the method. For the purpose of comparison, we present the results with and without considering age as the covariate. Recall that our analysis is stratified by ethnicity and gender.

4. Results

4.1. Association Analysis at SNP Level

Table 2 summarizes the top four significant SNPs (with ) in gene NCK2 on chromosome 2 (2q12) for opiates dependence in African-origin men. We identified a genome-wide significant SNP (rs2377339 with ) for the opiates dependence in African-origin men by the allelic test. Logistic regression also yielded strong evidence for the association between the SNP rs2377339 () and opiates dependence although the -value did not reach the genome-wide significance threshold. In addition, Table 2 presents the association results for the other five addictions with the four candidate SNPs. None of the four SNPs appeared significantly associated with the other five substance addictions.

4.2. Association Analysis at Gene Level

The gene-based association results are displayed in the last two rows of Table 2. Specifically, we included 39 SNPs in NCK2. The values from the gene-NCK2-based tests that were obtained through the standard allelic test and logistic regression are and , respectively. The gene-based value from the standard allelic test reached the genome-wide significance at gene level. The gene-based value through logistic regression is very close to the gene-based genome-wide significance level. Therefore, both methods provided significant evidence that supports the association between the NCK2 gene and opiates dependence in African-origin men. For the addiction of the other five substances in African-origin men, nicotine dependence had the most significant association with the NCK2 gene .

4.3. Haplotypes Analysis

We also examined association of haplotypes with opiate addiction in NCK2 region. Figure 2 displays the linkage disequilibrium (LD) heat map of 14 SNPs in 28 kb region [66]. Haplotype “AGTTCAGATCTCGT” with probability 0.016 yielded a value of . The genome-wide significant association between this haplotype and opiate addiction reduces the chance of a false discovery at the peak of a single SNP.

4.4. Contingency Table Analysis

We further examined the relationship between SNP rs2377339 and the opiates dependence in African-origin men. Table 3 depicts the allele frequencies of SNP rs2377339. The proportion of individuals having minor allele G is 21.43% in the case group and 1.63% in the control group. The odds ratio of SNP rs2377339 is 13.87, indicating that those who have the risk allele (G) for rs2377339 are at a significantly increased risk of being diagnosed with opiates dependence.

4.5. Stratification Analysis

Furthermore, in Table 4, we investigated the racial specificity and sex difference in the association between SNP rs2377339 and opiates dependence. This scrutiny required us to include all racial and gender groups. We observed that the MAF and values vary between different races and genders. The association between rs2377339 and opiates dependence becomes less significant in the overall cohort, after we adjusted race and gender in logistic regression.

4.6. Codependence Association Analysis

In Table 5, we also presented the association results for NCK2 and comorbidity of substance dependence. The most significant signal in NCK2 was observed for SNP rs2377339 in men of African-origin with in adjusted association test and in unadjusted association test. values of SNPs in NCK2 for other ethnicity by gender groups were far from the genome-wide significance level and, hence, are omitted here.

5. Discussion

We found a genome-wide significant association between SNP rs2377339 and opiates dependence in African-origin men. The NCK2 gene that contains SNP rs2377339 also achieved the genome-wide significance for opiates dependence at the gene level. For the addiction of the other five substances, nicotine dependence had the most significant association but not significant at the genome-wide level.

NCK2, a member of NCK family of adaptor proteins, is reported to be associated with tyrosine-phosphorylated growth factor receptors of their cellular substrate [54]. The association between NCK2 and nicotine dependence has been suggested in humans [67, 68]. Our finding coupled with those human studies enhances the plausibility of a causality relationship between NCK2 and drug addiction.

Importantly, about one-fifth of opiates addiction subjects in the African-origin men carried minor allele G of SNP rs2377339, which is more than 10-fold of the frequency in the nonopiates dependence group. This suggested that the minor allele G in SNP rs2377339 potentially elevates the risk for opiates dependence in African-origin men. We acknowledge that our analysis included only 44 African-origin men with opiates dependence. Therefore, it is important and necessary to validate our finding through independent and larger cohort studies. Specifically, there are two possible strategies to validate our finding. The direct approach is to replicate the association between SNP rs2377339 and opiates dependence in a larger cohort. An indirect approach is to evaluate whether SNP rs2377339 is associated with any substance dependence (opiates, alcohol, marijuana, etc.) as presented in Table 2.

A distinction of our analysis is to consider simultaneously multiple substance addictions rather than a single substance. This approach, which is a realistic depiction of substance dependence, confirmed that a novel susceptibility gene, NCK2 is significantly associated with substance dependence in African-origin men.

This study has several limitations. First, we stratified by ethnicity and sex, which reduced sample sizes and affected the power of our analysis. Nonetheless, the significant associations revealed in African-origin men are consistent with the notion that men may be socially more prone to environmental influences that promote substance use and thus more vulnerable to addiction [46]. Second, for SNP rs2377339, we observed heterogeneous genetic effects, suggesting interactions between race, sex, and the gene, because the association is much weakened after adjusting for race and gender. Such interactions have been suggested in other addiction research [44, 45, 47]. Again, our result further supports the importance to examine interactions among genes, race, and sex in addiction.

Conflict of Interests

The authors declare that they have no conflict of interests.

Authors’ Contribution

Zhifa Liu and Xiaobo Guo contributed equally.

Acknowledgments

This work was supported by Grant R01 DA016750-09 from the National Institute on Drug Abuse. Funding support for the Study of Addiction: Genetics and Environment (SAGE) was provided through the NIH Genes, Environment and Health Initiative (GEI) (U01 HG004422). SAGE is one of the genome-wide association studies funded as part of the Gene Environment Association Studies (GENEVA) under GEI. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for collection of datasets and samples was provided by the Collaborative Study on the Genetics of Alcoholism (COGA; U10 AA008401), the Collaborative Genetic Study of Nicotine Dependence (COGEND; P01 CA089392), and the Family Study of Cocaine Dependence (FSCD; R01 DA013423). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH Contract “High Throughput Genotyping for Studying the Genetic Contributions to Human Disease” (HHSN268200782096C). The datasets used for the analyses described in this paper were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000092.v1.p1 through dbGaP Accession no. phs000092.v1.p.