GWAS have identified variation in the FGFR2 locus as risk factors for breast cancer. Validation studies, however, have shown inconsistent results by ethnics and pathological characteristics. To further explore this inconsistency and investigate the associations of FGFR2 variants with breast cancer according to intrinsic subtype (Luminal-A, Luminal-B, ER−&PR−&HER2+, and triple negative) among Southern Han Chinese women, we genotyped rs1078806, rs1219648, rs2420946, rs2981579, and rs2981582 polymorphisms in 609 patients and 882 controls. Significant associations with breast cancer risk were observed for rs2420946, rs2981579, and rs2981582 with OR (95% CI) per risk allele of 1.19 (1.03–1.39), 1.24 (1.07–1.43), and 1.17 (1.01–1.36), respectively. In subtype specific analysis, above three SNPs were significantly associated with increased Luminal-A risk in a dose-dependent manner ; however, only rs2981579 was associated with Luminal-B, and none were linked to ER−&PR− subtypes (ER−&PR−&HER2+ and triple negative). Haplotype analyses also identified common haplotypes significantly associated with luminal-like subtypes (Luminal-A and Luminal-B), but not with ER−&PR− subtypes. Our results suggest that associations of FGFR2 SNPs with breast cancer were heterogeneous according to intrinsic subtype. Future studies stratifying patients by their intrinsic subtypes will provide new insights into the complex genetic mechanisms underlying breast cancer.

1. Introduction

Recently, large genome-wide association studies (GWAS) have identified nearly 70 genetic susceptibility loci associated with breast cancer risk [17]. Specifically, a locus within an intron of the FGFR2 gene is consistently the most strongly associated one [1, 3, 8]. As the majority of GWAS were performed mainly in Caucasian populations, replicate studies have often failed to extrapolate the association to diverse ethnic regions, such as Asians [9] and African-Americans [10]. These inconsistencies mainly stem from differences in linkage disequilibrium (LD) patterns and variable minor allele frequencies of SNPs between ethnicities according to the abovementioned studies.

From another point of view, breast cancers vary greatly in clinical behavior, morphological appearance, and molecular alterations. Genomic studies have established that breast cancer can be divided into 4 major intrinsic subtypes (Luminal-A, Luminal-B, HER2-enriched, triple negative) that differ significantly in terms of incidence, survival, and response to therapy [11, 12]. Therefore, determining whether genetic risk factor associations for breast cancer differ by subtype of the tumors represents a critical etiologic question. Evidence that genetic variants in FGFR2 may influence tumor subtype is provided by the fact that susceptibility loci in FGFR2 have stronger associations for estrogen receptor positive disease (ER+) than ER− [13].

However, the first wave of GWAS has been conducted with a predominance of ER positive disease and is unable to determine whether tumor subtypes modify the association between breast cancer risk and the susceptibility loci recently identified. Additionally, recent interest has focused on ER expression status [4, 1416] and few of these studies have provided data for subtypes defined jointly by ER, progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status or more biomarkers [17]. Determining whether breast cancer genetic risk factors are linked to tumors with specific intrinsic subtypes may provide a gateway for developing tailored prevention and early detection strategies.

We therefore decided to use the data source provided by the Southern China Breast Cancer Genetics Study (SCBCGS) to evaluate the hypothesis that tumor intrinsic subtypes, in particular those defined jointly by the expression of ER, PR, HER2, and Ki-67, modify the association between breast cancer risk and the common FGFR2 intron-2 polymorphisms recently identified. Without a doubt, this paper will expand and refine our previous reports on analyses only by ER status through including up to three additional tumor markers. Moreover, the detailed analyses are allowing us to verify whether definition of the genomic intrinsic subtypes of breast cancer can provide another window into the underlying heterogeneity between different studies and thus make more definite conclusions than previous reports.

2. Materials and Methods

2.1. Study Population

Individuals included in the current analysis were Han Chinese women who participated in the SCBCGS. The SCBCGS was a multicenter, hospital-based study of breast cancer conducted among Han Chinese women from three areas of the Southern China, including Canton, Chongqing, and Nanchang. Previous reports have described the study population and enrollment process in detail [1821]. Briefly, consecutive patients with histologically confirmed primary breast cancer were recruited from defined hospitals. Control individuals were selected using the Resident Registry of each city and were frequency matched on ethnicity, age (±5 years), and community of residence to the cases. Detailed information on histories of menstrual and reproductive factors, hormone therapy (HT), weight, height, and family history of breast cancer for each participant was collected during in-person interviews.

Extensive early studies confirmed that late age of first term pregnancy (>30 years) and early menarche (<13 years) are known risk factors for breast cancer [22, 23]. Thus, we included age at menarche and first-term pregnancy in our multivariable models. In addition, to determine the heterogeneity of relationship between variants in FGFR2 and sporadic breast cancer by intrinsic subtypes, only patients without family history of breast cancer were eligible for present study.

The study was approved by the institutional review boards at all participating institutes (IRB numbers 2009-SCBCGS-GZ-01, 2009-SCBCGS-CQ-01, and 2009-SCBCGS-NC-01), and all participants provided written, informed consent before participating in the study.

2.2. DNA Extraction and Genotyping

Laboratory protocols for the DNA extraction and genotyping methods used by the SCBCGS have been previously described in detail [18, 19]. Briefly, genomic DNA was extracted from whole blood using TIANamp Genomic DNA Purification Kit according to the manufacturer’s protocol and stored at −80°C until used for further analysis.

Recent GWAS have confirmed that all the significantly associated SNPs of FGFR2 with breast cancer risk fell in a 25 kb linkage disequilibrium (LD) block entirely within intron-2. To further replicate the GWAS findings in the Chinese women, we first identified 7 variants showing association with breast cancer in one or more GWAS [1, 3, 8]. Given the sample size and statistical power of present study, two SNPs (rs7895676 and rs11200014) were excluded because of a low minor allele frequency (MAF) of less than 25% in Han Chinese from Beijing (CHB) from HapMap. Thus, only 5 SNPs (rs1078806, rs1219648, rs2420946, rs2981579, and rs2981582) were selected for analysis in this study. The genotyping of SNPs was done using the SEQUENOM MassARRAY matrix-assisted laser desorption ionization time of flight mass spectrometry platform [18, 19].

2.3. Classification of Biologic Subtype

Finally, database review identified 609 eligible patients with details on ER, PR, HER2, and Ki67 expression. Four subtypes were constructed: (i) triple negative (ER−, PR−, and HER2−), (ii) ER−HER2+ (ER−, PR−, and HER2+), (iii) Luminal-B (ER+ and/or PR+ and either HER2+ and/or ), and (iv) Luminal-A (ER+ and/or PR+ and not HER2+ or ). Figure 1 shows the classification scheme based on combinations of the biomarker. ER and PR were considered positive if immune-histochemistry (IHC) staining was ≥10%; an IHC score of 3+ or HER2 amplification by fluorescence-in-situ-hybridization (FISH) score was used to determine HER2 positive status [24]. At a Ki67 cutoff point of ≥10%, tumors were designated “high proliferation” [25].

2.4. Statistical Analysis

For each SNP, deviation of genotype frequencies in controls from the Hardy-Weinberg-Equilibrium (HWE) was assessed by a goodness-of-fit test. Differences in frequencies of SNP alleles and genotypes between cases and controls were evaluated using chi-square test or Fisher’s exact test as appropriate. Breast cancer risk was estimated as odds ratios (ORs) and 95% confidence intervals (CIs), based on unconditional logistic regression and adjusted for potential confounders. Analyses were carried out assuming a dominant, codominant, and additive allelic effect for each polymorphism. The Cochran-Armitage trend test was performed to test additive genetic effect model.

Linkage disequilibrium (LD) pattern and population haplotype frequencies for the SNPs were estimated using the online SNPStats tool using an expectation maximization algorithm [26]. Using the most frequent haplotype as the reference group, an additive model was used to introduce haplotype counts, and an unconditional regression model was applied to calculate ORs (95% CIs) adjusting for potential confounders.

Stratified analysis according to intrinsic subtypes was additionally conducted. All statistical tests were two-sided, and was considered significant. To correct multiple testing, we estimated the adjusted significance by applying the Bonferroni correction for all the SNPs tested in the analysis. Statistical analysis was performed using SPSS version 19.0 (IBM SPSS Statistics for Windows, IBM Corporation, Somers, NY) unless otherwise specified.

3. Results

3.1. Characteristics of Controls and Cases

Table 1 shows the specific characteristics of the controls and cases by the intrinsic subtype. Compared with controls, cancer cases were older and more likely to be parous with first full-term pregnancy at ≥30 years and postmenupausal HT non-user. Notably, no significant differences were seen in basic characteristics between subtypes. Thus, age, HT use, menopausal status, and age at first full-term pregnancy were selected as potential confounders in the primary analyses.

3.2. Hardy–Weinberg Equilibrium Testing

The minor allele frequencies of all tested SNPs are roughly similar with the corresponding frequencies of the HCB (Chinese) and JPT (Japanese) population. All the observed genotype frequencies were found to be in agreement with HWE in controls except for rs1219648, which deviates from HWE () and thus was excluded from the subsequent analyses (Table 2).

3.3. Associations with Breast Cancer Risk Overall and by Subtype Separately

Table 3 shows the allele and genotype distributions of the remaining four polymorphisms in the combined sample and in the subgroups. Chi-square test depicted significant associations for rs2420946 and rs2981579 with overall breast cancer risk ( and , resp.). After adjusting for the abovementioned potential confounders, logistic regression analysis further confirmed these associations which remained significant in dominant model for rs2420946 (C/T + T/T: ), in both codominant (T/T: ) and dominant (T/C + T/T: ) model for rs2981579, and in per-allele model for rs2981579/T even after Bonferroni correction ().

In a subgroup of Luminal-A, the association between rs2420946 and breast cancer risk was the strongest (adjusted OR = 1.69, 95% CI: 1.13–2.53 for the T/T genotype and adjusted OR = 1.55, 95% CI: 1.12–2.15 for the C/T genotype compared with the common homozygote CC) in a dose-dependent manner (). Significant associations were also observed between Luminal-A breast cancer risk and the homozygous minor allele genotype (T/T) for rs2981579 (adjusted OR = 1.68, 95% CI: 1.15–2.45, ) and rs2981582 (OR = 2.01, 95% CI: 1.35–3.01, ). However, after Bonferroni correction, only rs2420946 () and rs2981582 () were found to be associated with Luminal-A breast cancer risk under dominant model.

Under dominant model, rs2981579 was associated with increased Luminal-B breast cancer risk (; Table 3) with Bonferroni-adjusted . Rs2981582 showed a marginal association () with ER−HER2+ breast cancer risk under dominant model. However, based on the multiple hypothesis testing, this association was not significant (Bonferroni-adjusted ). No significant associations between selected SNPs and triple negative breast cancer risk were detected under any of the assumptions (Table 3).

3.4. Linkage Disequilibrium and Haplotype Analysis

LD analyses showed that four variants were in moderate LD with each other (pairwise value range from 0.472 to 0.774 and value range from 0.588 to 0.997) (Figure 2). Estimated haplotype (rs1078806-rs2420946-rs2981579-rs2981582) frequencies are presented in Table 4. Compared with the most common haplotype T-C-C-C, the C-T-T-T haplotype, with a frequency of 22.9% and containing three risk alleles (rs2420946T, rs2981579T, and rs2981582T), was associated with an increased breast cancer risk in the whole sample (adjusted OR = 1.30, 95% CI: 1.07–1.57, ) (Table 4) and in subtypes Luminal-A (adjusted OR = 1.52, 95% CI: 1.17–1.97, ) and Luminal-B (OR = 1.52, 95% CI: 1.07–2.15, ) (Figure 3). Breast cancer risk, particularly Luminal-A breast cancer risk, was also significantly increased for carriers of haplotypes of T-T-T-C and T-C-T-C (Table 4 and Figure 3).

4. Discussion

FGFR2 belongs to the FGFR family of tyrosine kinase receptors involved in various signaling pathways that contribute to the process of tumorigenesis through cell growth, apoptosis, and differentiation [27]. Subsequent analyses support their functional relevance to breast cancer risk that FGFR2 polymorphisms located in intron-2 alter the binding of two transcription factors, Oct-1/Runx2 and C/EBPb, resulting in an increase of FGFR2 gene expression both in cell lines and in breast tissue [28]. Specifically, a number of case-control studies have been conducted to investigate the association between FGFR2 polymorphisms located in intron-2 with breast cancer susceptibility in Chinese populations [2932]. However, these studies have yielded inconsistent results.

To investigate this inconsistency, one important step is to study whether these common variants interact with known breast cancer intrinsic subtypes. Thus, present study investigated whether 5 common FGFR2 SNPs were associated with specific tumor subtypes defined by four markers. This will be the first Chinese study to validate and provide convincing evidence for heterogeneity in the strength of the association of FGFR2 susceptibility locus with the risk of specific subtype. Furthermore, stratification of tumors also provided further insights into etiological heterogeneity.

First, this study confirmed that three SNPs in the second intron of FGFR2 (rs2420946, rs2981579, and rs2981582) were significantly associated with increased risk of breast cancer, which validates earlier GWAS results [3]. This result is in accordance with Raskin et al. [33], who in Ashkenazi and Sephardi Jews population found statistically significant differences between breast cancer cases and healthy controls for rs2420946 and rs2981579 polymorphisms. Furthermore, Fu et al. [34] reported that rs2420946 polymorphism in the second intron of the FGFR2 gene is significantly associated with increased breast cancer risk in nonfamilial breast cancer but not in familial breast cancer in a Chinese Han population. Similarly, in our study, to explore the genetic risk factors for sporadic breast cancer, all subjects also lacked a family history of cancer and free of other malignant diseases. Therefore, it is likely that some polymorphisms in intron-2 of FGFR2 play a role in tumorigenesis in this subgroup of Chinese women.

Further subtype stratification analyses showed that rs2981579 was associated with the increased risk of both Luminal-A and Luminal-B according to dominant or codominant polygenetic risk models. Consistent with previous reports, this study confirmed that rs2420946 [35] and rs2981582 [36] are most strongly associated with Luminal-A, with no evidence for an association with the risk of triple negative tumors or ER−HER2+ tumors. As a receptor tyrosine kinase, above strong associations of FGFR2 with luminal-like tumors are also consistent with the involvement of FGFR2 in estrogen-related breast carcinogenesis [37].

In haplotype analysis, the risk haplotype of FGFR2 (rs1078806C-rs2420946T-rs2981579T-rs2981582T) was associated with a significantly increased luminal-like breast cancer risk compared with the rs1078806T-rs2420946C-rs2981579C-rs2981582C haplotype, with no association observed for ER− and PR− tumors. Our findings on haplotype analysis are to several extents similar to previous studies due to further stratification by adding other tumor markers, not included in previous publications [29, 34, 38]. Furthermore, LD pattern between the four FGFR2 variants in our Southern Chinese Han population was moderate, in contrast to Caucasians from the HapMap CEU samples ( range from 0.97 to 1.0), but resembling other Asian populations [38], indicating a fairly independent risk effect of each locus in Asian population, but the results warrant screening in larger sample sets.

Therefore, different patterns of association with specific tumor subtypes observed in our study strengthen the evidence for hypothesis that genetic factors differ by intrinsic subtype. Therefore, different patterns of association with specific tumor subtypes observed in our study strengthen the evidence for hypothesis that genetic factors differ by intrinsic subtype [39]. To some extent, one study including cases unselected for intrinsic subtype status could result in contrary results and subsequent inconsistent conclusions. On the other hand, future studies stratifying patients by their intrinsic subtypes or including more homogenous tumor types will give much more power to classic case control studies.

One strength of present study was that ER, PR, HER-2, and ki67 status were all assessed using the same processing protocols and criteria for pathology review for all cases. However, several limitations of this study must be considered. First, though this current study has sufficient power (>90%) to detect a log-additive OR of 1.30 with allele frequencies >27%, providing sufficient sensitivity to detect most of the SNPs at the significance level of 0.05 (two-sided), other SNPs with ORs < 1.3 may be affected by insufficient power afforded in this study. Furthermore, the exact powers of three SNPs (rs2420946, rs2981579, and rs2981582) with Luminal-A and rs2981579 with Luminal-B were 79.64%, 72.46%, 92.15%, and 66.13%, respectively. However, we could not confirm that other SNPs lacked an association with specific breast cancer subtypes because we had limited samples and a lack of power to detect a true association. Larger sample sizes could help improve the power and ensure the correct conclusion regarding whether these SNPs are associated with specific breast cancer subtypes. Indeed, while this paper was in preparation, as part of the Southern China Breast Cancer Genetics Study, more participants are being recruited. We expect that the findings from the present study will be replicated.

Second, when we did analyses by receptor subtypes, 28.9% of cases were excluded due to unavailable information, which is similar to that reported by previous studies either conducted within the epidemiology registries [40] or conducted using receptor status measured at a single laboratory [41]. Furthermore, except the reporting hospital, comparison of demographic and clinical characteristics showed no significant difference between breast cancer patients included and excluded from present study. It is unlikely that the association between FGFR2 SNPs and subtypes of breast cancers differed by whether corresponding information were available. Another weakness is that misclassification probabilities of subtypes are likely to be independent of susceptibly loci and thus would tend to underestimate association strengths rather than creating spurious associations [36]. For example, a recent study showed a high discordance between HER expression based on IHC and mRNA, 60% of the HER2+ by IHC tumors were not classified as HER2+ by mRNA [42]. To address these limitations, we are currently conducting a study aimed at evaluating the value of additional classifications to expand our understanding of the etiology of this heterogeneous tumor.

5. Conclusions

In conclusion, our study revealed a significant association of FGFR2 intron-2 SNPs with breast cancer risk in Southern Han Chinese and provided strong evidence for differential susceptibility according to intrinsic subtype. Further epidemiological and experimental studies of larger datasets along with intrinsic subtype categorization are warranted to explore and confirm the role of these variants in increasing breast cancer risk, which will provide biological insights on the mechanisms of carcinogenesis and ultimately lead to improvement in prevention and treatment.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Authors’ Contribution

Huiying Liang, Xuexi Yang, and Lujia Chen were considered to contribute equally to this work. Ming Li directed the study and was responsible for the study design. Huiying Liang and Xuexi Yang performed data management, statistical analyses and interpretations of results and drafted the initial paper. Lujia Chen and Hong Li did the statistical analyses and interpreted the data. Anna Zhu and Minying Sun collected data and provided administrative, technical, or material support. Lujia Chen and Haitao Wang administrated the genotyping analysis. Lujia Chen, Haitao Wang, and Minying Sun were responsible for biospecimen and data collection. All authors read, critically revised, and approved the final paper.


The authors thank all of the patients and healthy control subjects who participated in our study. The authors are also grateful to Dr. Gorka Ruiz de Garibay for his writing assistance. This work was supported by the National High Technology Research and Development Program of China (Grant no. 2012AA020205), the National Natural Science Foundation of China (Grant no. 81302327 and Grant no. 81401755), and China study abroad Scholarship (Grant no. 201308440152).