Abstract

Copy-number variations (CNVs) may contribute to genetic variation in humans. Reports regarding existence and characteristics of CNVs in a large apparently healthy Japanese cohort are quite limited. We report the data from a screening of 213 unrelated Japanese individuals using comparative genomic hybridization based on a bacterial artificial chromosome microarray (BAC aCGH). In a previous paper, we summarized the data by focusing on highly polymorphic CNVs (in ≥5.0 % of the individuals). However, rare variations have recently received attention from scientists who espouse a hypothesis called “common disease and rare variants.” Here, we report CNVs identified in fewer than 10 individuals in our study population. We found a total of 126 CNVs at 52 different BAC regions in the genome. The CNVs observed at 27 of the 52 BAC-regions were found in only one unrelated individual. The majority of CNVs found in this study were not identified in the Japanese who were examined in the other studies. Family studies were conducted, and the results demonstrated that the CNVs were inherited from one parent in the families.

1. Introduction

Genomes vary from one another in multifarious ways, and the totality of this genetic variation underpins the heritability of human traits. Various recent studies show the landscape of genetic variation and allow estimation of the relative contributions of sequence (base substitutions) and structural variation (indels, insertions, or deletions; copy-number variations [CNVs], and inversions). Among DNA sequence variations in the human genome, CNVs directly contribute to gene expression changes through gene dosage effects; the distribution of CNVs is now considered to be wider than previously thought [16]. Genome resequencing studies have shown that CNVs, involving the gain or loss of several hundreds of bases to several hundred kilobases (kb) of the genome, can be an important source of genetic variation among human populations of different ethnic groups as well as among individuals. Following the development of methodologies and the introduction of new research platforms [712], information regarding the nature and pattern of CNVs from representative populations has been accumulated. The functional impact of CNVs has been demonstrated across the full range of biology [13], from cellular phenotypes, such as gene expression [14], to all classes of human disease with an underlying genetic basis: sporadic, Mendelian, complex, and infectious (reviewed in [15]).

Clinical geneticists need to discriminate pathogenic from benign CNVs in their patients and have made extensive use of data from CNV surveys of apparently healthy individuals [16]. The mere presence or absence of a variant in such control data sets is only partially informative, as determination of the pathogenicity of inherited CNVs is at present limited by a lack of information on CNV frequency and combinations in apparently healthy individuals. Based on these considerations, examinations of a relatively large number of individuals from various specific ethnic groups have recently been conducted using different array platforms, such as BAC arrays [6, 17, 18], oligo arrays [1921], and others [22]. The results are not always consistent, and it is likely that different human populations bear different inherited CNVs. The numbers of Japanese individuals examined to date are smaller than those in studies of other ethnicities [4]. Polymorphic CNVs have received considerable attention since they might play an important role in the etiology of common diseases. Therefore, more data regarding CNVs should be accumulated from Japanese populations. We summarized the available data by focusing on highly polymorphic CNVs (in ≥5.0% of the individuals) in a previous paper [23]. However, rare variations have recently received attention from scientists who espouse a hypothesis called “common disease and rare variants,” in contrast to the “common disease-common variants” hypothesis that underpins most genome-wide association studies. In this paper, we focus on CNVs observed at relatively low frequency (<5.0% of the individuals) in a population residing in Hiroshima and Nagasaki, Japan, using CGH with BAC clones as targets.

2. Materials

The study was conducted in two stages: Stage (1) 80 unrelated Japanese individuals were examined by BAC-aCGH with an array having 2,241 BAC clones; Stage (2) 133 unrelated Japanese individuals were examined by BAC-aCGH containing 2,622 BAC clones.

2.1. Target BAC Clones

The majority of the clones used in Stage 1 of this study were selected from the set of cytogenetically mapped P1 artificial chromosome (PAC) clones and bacterial artificial chromosome (BAC) clones reported by the BAC Resource Consortium [26] and obtained from either the Children’s Hospital Oakland Research Institute (Oakland, CA, USA) or from Invitrogen Inc., Co. (Carlsbad, CA, USA). In Stage 2, together with the BAC clones in Stage 1, additional 381 BAC clones were used, a majority of which were collaboratively obtained from Dr. Matsumoto of Yokohama City University. A total of 2,241 clones of chromosomal fragments from chromosome 1 to chromosome 22 were used in Stage 1 and 2,622 clones in Stage 2. The clones were distributed every 1.2 Mb across all human autosomes in Stage 1 and every 1.1 Mb in Stage 2. In addition to autosomal clones, four kinds of X-chromosomal clones were used as internal references. With respect to the examinations in Stage 1, three sets of arrays were constructed as described in the previous paper [27]. For Stage 2, all of the clones were printed onto one glass slide.

2.2. Genomic DNA Samples

The genomic DNA samples used in this study were reported in a previous study [23, 27]. In brief, the DNA used for the population studies and the family studies was extracted from lymphoblastoid cell lines obtained from the offspring of atomic-bomb survivors and their family members. Lymphoblastoid cell lines were derived from a cryopreserved archive of families consisting of father, mother, and offspring from Hiroshima and Nagasaki for whom permanent cell lines have been established by Epstein-Barr (EB) virus transformation. The genomic DNA was isolated using conventional methods as described in detail elsewhere [28]. A total of 305 of the offsprings were screened initially. Since the offspring in some cases included siblings, we selected one representative offspring to create a group of “unrelated individuals,” thereby avoiding any duplication of CNVs from families containing two or more siblings, in accordance with the rules described in a previous report [23]. In this study, 80 individuals (40 from each city) were included in examinations in Stage 1 and 133 (84 from Hiroshima, 49 from Nagasaki) in Stage 2. The individuals provided their informed consent prior to the study. The Institutional Review Board of our foundation approved this study.

3. Methods

3.1. Array Preparation

The arrays were prepared as described in the previous paper [23]. In brief, with respect to Stage 1, cloned DNA was digested by NotI. On the other hand, in Stage 2, cloned DNA was digested with MseI, and the fragmented DNAs were amplified by ligation-mediated PCR carried out as described by Snijders et al. [29]. In both stages, the target DNAs (0.5 μg/μL) were dissolved in 50% dimethylsuloxide and printed in triplicate onto the glass slides (Matsunami Glass Co. Ltd.) using the Affymetrix 417 Arrayer (Affymetrix).

3.2. Screening by the Array CGH Method

The screenings of both stages were conducted following the procedures described previously [23]. In brief, for labeling DNA, test and reference genomic DNA (1.25 μg each) was cut by BamHI and labeled by a random-priming method with Cyanine-5- and Cyanine-3-labeled dUTP (Cy5- and Cy3-dUTP; PerkinElmer Life Sciences, Wellesley, MA, USA). Human CotI DNA was employed to block repetitive sequences of the labeled probes.

Prehybridization was conducted to block repetitive sequence binding of target DNA on the arrays and to prevent nonspecific binding of probe DNA to the targets. Following the initial incubation, the prehybridization solution was removed, and hybridization solution with Cy-labeled DNA (prepared as described above) was added. The prehybridization and the hybridization were conducted with continuous mixing overnight at 37°C. After hybridization, the arrays were washed by the procedures reported previously [27]. All of the procedures were conducted using the GeneTAC Hybridization Station (Genomic Solutions Inc., Ann Arbor, MI, USA).

After washing procedures, fluorescent images of the hybridized arrays were obtained using a ScanArray 5000 confocal laser scanner (PerkinElmer Life Sciences). ArraySuite (Scanalytics Inc., Fairfax, VA, USA) in Stage 1 and Gene Pix (Axon Instruments, Sunnyvale, CA) in Stage 2 were used to quantify the fluorescence of each spot on the array images. We then processed the data using software specifically developed to identify “CNVs” [27].

3.3. Procedures of Population Studies

The population studies were conducted in accordance with accepted procedures as described in the previous paper [27]. In brief, hybridization was performed as follows: mixtures of (a) Cy5-labeled reference DNA and Cy3-test DNA, and (b) inverse labeling with Cy3-reference DNA and Cy5-test DNA were applied to the slides. The two complementary hybridizations (i.e., (a) and (b)) were conducted for each individual. The variations in a given sample from a single individual that were identified consistently in both complementary hybridizations ((a) and (b)) were regarded as putative CNVs. Since only DNA segments detected as putative CNVs in two individuals or more were regarded as true CNVs, further analyses were not conducted. On the contrary, when CNVs were detected in only one individual, all were confirmed with repeated examinations using arrays.

3.4. The qPCR Procedure for CNVs Identified in Offspring

The qPCR procedure was performed using SYBR premix EX Taq (Takara-Bio, Ohtsu, Japan) and the Light Cycler System (Roche Diagnostics Japan, Tokyo, Japan), according to the procedures described in the previous paper [27]. The results were analyzed with Light Cycler Data Analysis software using a second derivative maximum model. Relative DNA content of variants for every amplimer was calculated using a “normal” individual without CNVs as controls. The relative copy number of each site was normalized using quantity of the amplified segment of a portion of a different chromosome (chromosome 16:78888541–78888690; this position empirically showed the smallest deviation among many samples in our experiments) as a standard (relative copy number = 2) and was indicated as mean ± SD ( 𝑛 = 3 ).

3.5. Procedure of Family Studies for Confirmation of Inheritance of CNVs Identified Once in This Study

For the CNVs identified in only one offspring, further examinations were carried out for confirmation of inheritance. The family studies using qPCR for the CNVs were conducted using DNAs from the mother, the father and sibling of offspring, if available. The procedures of qPCR were the same as above.

4. Results

4.1. Population Studies

We examined 213 unrelated individuals. The main purpose of this paper is to report the data accumulated about CNVs found in <5.0% of the individuals. (The number of CNVs was less than 10 in each BAC region.) The results obtained from the population studies are shown in Tables 1 and 2. In Table 1, the CNVs identified in two individuals or more are summarized. On the other hand, in Table 2, the CNVs identified in only one individual through this study are summarized. A total of 126 CNVs were identified at 52 different BAC regions in the genome. The CNVs observed at 27 of 52 BAC regions were found in only one unrelated individual, and all of them were confirmed by repetitive analysis using the same array system as in the population studies. The family studies by qPCR were conducted for 23 of 27 CNVs, and the results revealed that those 23 CNVs were inherited from one of the parents. The inheritance of CNVs from the remaining two offsprings was not confirmed, since DNA from the parents was not obtained. On the other hand, the CNVs on two BAC clones (RP11-88B18, RP11-86B6) were identified in only one unrelated individual. In fact, however, they were observed in individuals who were not selected as “unrelated” individuals. The numbers of individuals who expressed the CNVs are described in the parentheses in Table 2. For that reason, we did not conduct family studies for those CNVs.

4.2. The Typical Patterns of Three Family Studies of CNVs on BAC Clones by qPCR Analysis

As typical examples of family studies, the results of qPCR carried out for the CNVs observed in three BAC-clone regions are described in Figure 1 (RP11-81H5), Figure 2 (RP11-90M13), and Figure 3 (RP11-90O18 and RP11-90C3). For the first case, a single 180,264 bps deletion was observed in an offspring. The family study demonstrated that the CNV was inherited from the mother. There is no gene in this BAC clone region. For the second case, the BAC-clone (RP11-90M13) contains two genes, protein phosphatase 2, regulatory subunit B, alpha (PPP2R2A) and early B-cell factor 2 (EBF2). The deletion-type CNV was confirmed by qPCR, and the CNV was found to be inherited from the father. The deletion covered part of intron 7 of PPP2R2A and the promoter region of EBF2. For the last case, as shown in Figure 3, the CNV was covered by two BAC clones (RP11-90O18 and RP11-90C3). This result was revealed by qPCR, and the CNV was found to be inherited from the father. The CNV contained the series of zinc finger protein families (ZNF 107, 138, 273, 117, 92).

4.3. Comparing Our Data with Published Data

Many segmental duplications have already been summarized in public databases, such as the Database of Genomic Variants (DGV Database; http://projects.tcag.ca/variation/), UCSC Human Genome Browser (UCSC Database. http://genome.ucsc.edu/index.html), and NCBI Map Viewer (NCBI Database; http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi). The known CNVs have also been reported in the same databases. However, when the CNVs reported in the databases or papers were too small to be detected by our BAC-aCGH methods, we assumed that they were different from our CNVs. The CNV data obtained in our studies, including an indication of the presence or absence of CNVs already reported in other databases, are summarized in Tables 1 and 2.

4.4. Comparing Our Data with the Data of Japanese Published by Other Groups

Compared to Caucasian populations, very little information about CNVs in Japanese populations has been systematically screened. We compared our data with the data of Japanese populations from the HapMap project as reported by Redon et al. [4], in which CNVs were examined by tiling BAC array. In addition to the data by BAC array, three groups, including Redon et al. [4], reported the data of Japanese populations based on examinations conducted using oligo-array methods, although each group used a different array platform [24, 25]. These results are also summarized in Tables 1 and 2. In the case where a multiple number of CNVs were identified in our study (Table 1), the CNVs in 16 of 25 BAC regions were reported in the other reports. Moreover, regarding one individual, the CNVs identified in nine of 27 BAC regions were reported in the data reported by the others, although we should emphasize again that the CNVs identified in our BAC region are not exactly the same as those reported by the others. The majority of CNVs identified in our study tend not to be identified in the other reports. Thus, they can be considered novel CNVs in the general Japanese population.

4.5. Genes and Disease-Related Genes

We summarized some of the genes and disease-related genes that overlap with the BAC-clone region for our CNVs (Table 3). In addition, mRNAs also have been reported in the databases, but they are too numerous to describe here. All BAC-clone regions contained at least one mRNA, although the functions of a majority of the mRNAs are not yet known (data not shown).

5. Discussion

We examined 213 unrelated Japanese individuals using BAC-aCGH and found a total of 126 CNVs on 52 different BAC regions in the genomes. The CNVs observed on 27 of the 52 BAC-regions were found in only one unrelated individual. The family studies were conducted for 23 CNVs, and the results demonstrated that these were inherited from one of the parents. A fraction of the regions involved in the CNVs observed in our study (i.e., 34 of 126 or about 27%; Tables 1 and 2) were not reported previously in other studies listed in the database. Thus, they are considered novel CNVs. In contrast, the data of highly polymorphic CNVs [23] demonstrated that only 8% (55 of 680) had not been reported. This result showed that a significantly large number of novel CNVs have been identified among relatively less frequent CNVs ( 𝜒 2 = 3 6 . 6 4 , 𝑃 = 5 . 1 0 × 1 0 1 0 ). This finding suggests that the structural rearrangements for creating less frequent CNVs are evolutionarily recent. We thus paid attention to differences in the number of CNVs identified in Hiroshima compared to Nagasaki. With respect to the CNVs observed multiple times, 12 of 25 were identified in one city. These CNVs appear to be so-called “private polymorphisms” occurring more recently than the others. On the other hand, quite a small number (21 of 126; about 17%) of the CNVs in this report were found on the BAC clones that overlapped with segmental duplication (Tables 1 and 2). A majority (about 63%) of highly polymorphic CNVs, however, were observed on the BAC-clone overlapping with segmental duplication (see Table 1 in [23]). This result suggests that segmental duplication might play a significant role in the creation of highly polymorphic CNVs. These observations are supported by previous data from Sharp et al. [17], who reported the sharing of CNVs among several populations, meaning that the specific genomic imbalances either predated the migration of modern humans from Africa or arose independently in different populations.

On the other hand, the data representing Japanese populations are based on reports from three groups. We compared our CNV data to that of these other reports [4, 24, 25]. The CNVs found in 27 of 52 BAC regions (about 52%) were not identified by the other groups. Speculation as to the reason for this includes the possibility that since the CNVs were private polymorphisms as mentioned above, they were not identified in the Japanese populations used in the other works.

In humans, it is estimated that around 23% of currently known CNVs reside in known or putative genes. Many CNVs that lie outside genic regions of the genome may still have a significant influence on gene expression by affecting gene regulatory elements [30]. Another important area of exploration is linkage disequilibrium (LD), that is, nonrandom association between alleles at different loci, between CNVs and other genomic variants (especially SNPs), which may provide important insight into the genomic evolution (e.g., the recombination patterns) of CNVs [31, 32]. However, studies of LD between CNVs and SNPs are hindered by the fact that CNVs generally reside in genomic regions associated with segmental duplications and/or repeat-rich regions, which are difficult to sequence and use for detection of SNPs.

Recent association studies have attempted to link CNVs with expression profiles, diseases, and known phenotypic differences. On the other hand, distribution of CNVs differs depending on genetic background, and population-specific CNVs might account for the divergence of some physiological traits and disease prevalence among populations [31]. Like SNPs, CNVs also show variability among populations; therefore, comprehensive CNV information is needed when applying CNVs as genetic markers in disease or trait studies.

For the reasons cited above, we made plans to accumulate Japanese data in which we provide information about the relation between Japanese CNV data and phenotypic differences correlated with reported diseases. In our previous paper [23], we summarized such data by focusing on highly polymorphic CNVs found in ≥5.0% of the individuals.

More recent studies based on advanced molecular technologies, such as genome-wide association studies [33, 34] and next generation sequencing [35, 36], reported that many genes appear to play important roles in the etiology of common diseases. Depending on newly developed technologies, rare variations have recently received attention from scientists who espouse a hypothesis called “common disease and rare variants.” In this paper, we summarized CNVs that were identified in fewer than 10 individuals in our previous population study. We focused on genes reported in the DGV, UCSC, and NCBI databases (Table 3), although many mRNAs were also listed in them. Since these genes are good candidate markers for enabling us to examine the etiology of common diseases and phenotypical heterogeneities among individuals, our CNVs have the potential to become useful markers in future studies.

6. Conclusion

In this study, 34 CNVs were new, indicating that CNV coverage of the human genome is still incomplete and that there is diversity between the Japanese and other ethnic populations. Moreover, the CNVs found in about two-thirds of the BAC regions examined in our study were not identified in the studies in which three groups examine Japanese populations using different array platforms. The newly identified CNVs extend the coverage of CNVs in the human genome (also the general Japanese population). Some CNVs contained genes that might be related to phenotypic heterogeneity among individuals. Moreover, it is expected that the CNVs could now be taken into consideration when genetic studies, for example, CNV association studies, are conducted.

Authors’ Contribution

Y. Satoh and N. Takahashi equally contributed to this work as first authors.

Acknowledgments

The authors thank H. Omine, J. Kaneko, A. Miura, M. Imanaka, and E. Nishikori for their technical assistance. They are grateful to E. B. Douple for critical reading of the paper. This publication was supported by Research Protocols RP 1-01 and RP 2-07 of the Radiation Effects Research Foundation (RERF) and in part by Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology and the Japan Science and Technology Agency (Core Research for Evolutional Science and Technology). RERF, Hiroshima and Nagasaki, Japan, is a private nonprofit foundation funded by the Japanese Ministry of Health, Labour and Welfare and the U.S. Department of Energy, the latter in part through the National Academy of Sciences.