Abstract

The Death Valley at Žerjav in northern Slovenia exhibits a gradient of heavy metal pollution in the soil with severe consequences for species richness and composition along this gradient. Recently, a progressive loss of large-genome species in parallel with increasing concentrations of heavy metals has been shown. Here, we have measured the genome size of a near-complete sample of these species with flow cytometry and analysed the correlation of heavy metal pollution with the C- and Cx-values assigned to the test plots. The method of probability analysis was a hypergeometric distribution method. We confirm, on a different methodological basis than previously, that along the pollution gradient, species with high C- and Cx-values are increasingly underrepresented. This lends support to the “large genome constraint hypothesis”, predicting that plants with large genomes are at a disadvantage under all aspects of evolution, ecology, and phenotype, because junk DNA imposes a load to the organism.

1. Introduction

The molecular mechanisms that lead to the more than 2000-fold variation in the amount of DNA in higher plants' genomes (Zonneveld [1]), are no longer as mysterious as they were when the term “C-value paradox” was first used (Thomas [2]). The biological significance of this variation, however, is still debated. Several hypotheses are available, of which the Nucleotype Hypothesis has been the most fruitful since its formulation by Bennett [3, 4]. It emphasizes the importance of the physicomechanical properties of the cell nucleus (the nucleotype), in addition to the genotype and the environment, for niche occupation, adaptation, and competitiveness of an organism, in both plants and animals (Gregory [5]). Up to the present, the only well-studied nucleotypic parameters are the C-value and the Cx-value. C-value or holoploid genome size (Greilhuber et al. [6]), that is, the DNA amount contained in the chromosome complement of an organism, is directly and positively correlated with cell size (Bennett [4], Knight and Beaulieu [7]). Cx-value or monoploid genome size, the DNA amount contained in the single basic chromosome set with chromosome number x, has a strong positive influence on cell cycle duration (Bennett [3, 4], Francis et al. [8]). This is clearly a simplification inasmuch as cell and nuclear size vary according to cell type and physiological status (Bennett and Rees [9]). Cell cycle duration is influenced by genetic controls (Francis [10]) and environment, notably temperature (Verma and Lin [11], Verma [12]). Moreover, the distinction between C- and Cx-value is blurred in the case of (palaeo)polyploids with diploidized genomes. Nevertheless, the correlation C-value/cell size/nuclear size holds closely within certain cell types, for example, meristematic and embryonal cells (Greilhuber [13]) and stomata (Knight and Beaulieu [7]), while the correlation Cx-value/cell cycle time has been proven repeatedly (Bennett [3], Verma and Lin [11]). It is noteworthy that (neo)polyploids do not show longer cell cycle duration compared to diploids (Verma and Lin [11]), and sometimes it is even slightly shorter (Bennett [3]). Since the Cx-value of an organism together with the ploidy level determines the C-value, cell cycle duration is also correlated with C-values in global analyses (Francis et al. [8]). Several lines of evidence lend support to the opinion that large genomes are a burden to organisms and restrict their adaptability. This concept was recently forwarded as the “Large Genome Constraint Hypothesis” (Knight et al. [14]). It would thus be predicted that under stressful environments species with too large a genome increasingly go extinct, either locally or totally. However, the species and genome size spectrum along stress gradients in natural environments has rarely been investigated.

Here, we return to a recent study, in which a number of species and their genome sizes were studied with regard to a gradient of heavy metal pollution along which they grew (Vidic et al. [15]). The locality studied is the “Death Valley”, Dolina Smrty, at Žerjav in northern Slovenia. The heavy metal pollution of the soil stems from a former lead smelter and reaches 33.3 g/kg soil at a plot on which plants are still growing. Heavy metals and plant community composition had been analyzed in five test plots (A–E) at distances of 330 m (E), 420 m (D), 520 m (C), 670 m (B), and 2000 m (A) from the smelter chimney, the last site being the control plot. Diploid herbaceous species had been recorded and collected, and their genome size measured with Feulgen DNA image densitometry. The number of species occurring at the test plots was shown to decrease with increasing pollution, as was the number of species with large genomes (arbitrarily defined as those of the upper quartile of the C-values in the total sample of 70 species). Using a simulation method it was shown that the probability of finding such low frequencies of large genomes by chance decreases with increasing lead pollution. The results were interpreted as being supportive of the Large Genome Constraint Hypothesis (Knight et al. [14]).

The intention behind undertaking a new study on this system is to corroborate the conclusions from the Žerjav study using a technical approach now in much wider use than Feulgen densitometry, that is, flow cytometry using propidium iodide as the DNA stain. Flow cytometry is not only faster, but also more precise thanks to the high number of nuclei measured and other characteristics of the technique (Greilhuber et al. [16]). There was, furthermore, one genome size of Anemone nemorosa that was unsupported by literature data [17] and that had to be clarified and possibly corrected. The species measured were collected mostly not in the Death Valley, but this is unproblematic for the present purpose, because genome sizes are fairly stable within a narrow taxon (species at a ploidy level).

To calculate of the chance probability of finding at the test plots the given frequency of large-genome species, we use instead of a lottery simulation method the hypergeometric distribution method. In the Žerjav study by Vidic et al. [15], the probability of finding the observed frequency of large genomes solely by statistical fluctuation was determined by a lottery simulation. But there is a formula to calculate this probability. Distribution of these probabilities is the hypergeometric distribution. Calculation is always advantageous over a lottery simulation, since the result is a priori infinitely precise. A lottery simulation is always an approximation to the calculated value; a repeat will most likely not give the same result. To avoid numerical problems, established statistics software should be used to calculate the hypergeometric distribution.

2. Materials and Methods

2.1. Plant Material

Living plant species as listed in Vidic et al. [15] were collected in Austria, Croatia, and Slovenia (Table 1). Whole plants were wrapped with wet tissue and stored in a plastic bag at the collection sites, and usually kept in the refrigerator for up to one week until investigation with flow cytometry. For Euphorbia amygdaloides whole dry fruits were collected. Samples exhibiting potentially problematic compounds (e.g., mucous polysaccharides) were incubated for up to one month for a starving period in the refrigerator, after which time young etiolated leaves or shoots were used for flow cytometry with better success. Herbarium specimens are deposited in WU. Identification of the taxa is based on Fischer et al. [18]. These accessions cover 60 species from 51 genera and 22 families. The C-value of Orobanche alba is cited from Weiss-Schneeweiss et al. [19] and those of Knautia drymeia and Scabiosa columbaria from Temsch and Greilhuber [20] (Table 2). Altogether, 63 species from 54 genera and 24 families are included in this paper.

2.2. Flow Cytometry (FCM)

Following the chopping method of Galbraith et al. [21], about 25 mg fresh leaves (dry fruits in case of Euphorbia amygdaloides) from each plant sample were cochopped in Otto's buffer I (Otto et al. [22]) together with Pisum sativum ( pg DNA; Greilhuber and Ebert [23]), Zea mays (1C = 2.73 pg DNA; Doležel et al. [24]), Secale cereale (1C = 7.79 pg DNA; Doležel et al. [24]), or Solanum pseudocapsicum (1C = 1.29 pg DNA; Temsch et al. [25]) as the internal standard organisms. The resulting isolated nuclei were filtered through a 30  m mesh and subsequently incubated with RNase A for 30 minutes at for digestion of double-stranded RNA. The nuclei were then stained in Otto's buffer II (Otto et al. [22]) containing the fluorochrome propidium iodide (PI, 50  g/ml) for 1 hour or overnight in the refrigerator. For measurement, a CyFlow ML flow cytometer or a PAII (both Partec, Muenster, Germany) was used. Light sources were for the CyFlow a green laser (532 nm, Cobolt Samba, Cobolt AB, Stockholm, Sweden) and for the PAII a mercury lamp. One preparation was made per individual and from this at least three measurement runs were performed, with 5000 measured particles per run. Usually, the coefficient of variation (CV) was less than 3%, but whenever higher CVs occurred, one or two more runs were added. For each run, the 1C-value was calculated according to the formula: mean fluorescence intensity of the sample organism’s G1 nuclei population divided by mean fluorescence intensity of the standard’s G1 nuclei population times the 1C-value of the standard organism. The resulting sample values are shown in Table 2.

2.3. Chromosome Counts

Whenever the ploidy levels of our samples had to be clarified, chromosome counts were done on slides made according to protocols for Feulgen densitometric analysis (Greilhuber and Temsch [26]), which was the case in Campanula rotundifolia, Fragaria vesca, Hieracium murorum, Knautia drymeia, Lathyrus pratensis, Origanum vulgare, Petrorhagia saxifraga, Plantago major, Plantago media, Veronica officinalis, and Viola hirta. Proliferating parts of these species were fixed in 3 : 1 methanol/acetic acid (3 : 1) overnight at , or in formaldehyde (FA, 4% in Sörensen buffer) for two hours followed by 3 : 1 fixation, and transferred to 96% ethanol for further storage. Feulgen stained mitotic configurations were analyzed for chromosome number (Table 2).

2.4. Data Analysis

Vidic et al. [15] described a decrease in species having large nuclear characters with decreasing distance of the sample plots to a source of heavy metal pollution. Large nuclear characters are the upper quartile of C-values, Cx-values, chromosome number, and mean chromosomal DNA content. In Vidic et al. [15] a randomization test was used to determine the probability for large nuclear characters equal or less than observed for each plot. The probability can, however, be calculated directly: The number of distinct samples of size K out of a lot of size N is given by the binomial coefficient . (A number lottery is a common example: There are distinct patterns of 6 numbers out of 45). If we only consider large nuclear characters, there are distinct patterns of size i for L large characters in the entire lot. The number of samples of size K with exactly i large characters is given by , since each of the large character patterns can be combined with all possibilities to select K-i small characters out of N-L in the lot. The probability of finding exactly i large genomes in a sample of size K is . This is the hypergeometric probability distribution. The probability of finding i or less large genomes (cumulative probability) as described in Vidic et al. [15] is . Obviously , which simply means, the number of large genomes in the sample is less or equal to L, the number of large genomes in the lot.

Binomial coefficients are composed of the factorials of the occurring numbers. These can be very large numbers, which are likely to exceed the range of representable numbers in a computer system, or at least substantial loss of precision has to be expected. Therefore it is strongly recommended to use established statistics software to calculate such probability distributions. Here we used SAS 9.1 (SAS [27]). As an alternative to commercial software we made trial runs based on prime factor decomposition of the factorials to circumvent the problems with large numbers. We got the same results as in SAS. The accordance of our data with Vidic et al. [15] was visualized by a Bland-Altmann-plot (Figure 1).

3. Results

3.1. Data

Table 2 contains data from 63 species, of which 60 are original and have been measured with flow cytometry and 3 further were taken from published sources (Temsch and Greilhuber [20], Weiss-Schneeweiss et al. [19]). These are 90% of the species investigated by Vidic et al. [15]. Chromosomes were counted in nine species in which more than one ploidy level occurs and a deviation between our data and those of Vidic et al. [15] was recognized. In three species the ploidy level given in Vidic et al. [15] and here in Table 2 is not the same. Different ploidy levels (recognized by chromosome counts) between the two studies were found in the following taxa: Lathyrus pratensis, Petrorhagia saxifraga, and Viola hirta (Table 2). In this case, we corrected our C-values for ploidy level given for the Žerjav plants in the further calculations, assuming that the actual C-value of these plants closely approaches this estimate.

A comparison of the data of Vidic et al. [15] and those of the present study shows, besides a general agreement in many points, some striking deviations from the expectation. There is in the first instance Anemone nemorosa, of which the low 1C-value of 5.5 pg given in Vidic et al. [15] could not be confirmed with material from Žerjav and can only be a mismeasurement, because it is well below the 1C-value of diploids, which are expected to have about 10 pg (Zonneveld in Bennett and Leitch [17]). We also measured material from the Botanical Garden in Vienna (HBV) and from one accession in Lower Austria with congruent results (19.48 pg), very similar to the C-value of Zonneveld et al. [28] (  pg). Anemone nemorosa was therefore excluded from some statistical analyses. Another case of deviation is Knautia drymeia. Vidic et al. [15] give  pg at (which is possibly an error and should read ), while we measure  pg at (Temsch and Greilhuber [20]).

The variation within the 1C-values ranged 114.6-fold from 0.17 pg in Aruncus dioicus to 19.48 pg in the tetraploid Anemone nemorosa. 1Cx-values varied 98.3-fold from 0.17 pg in Aruncus dioicus to 16.71 pg in Hepatica nobilis.

3.2. Comparison of the Data Sets

Among the 62 species for which agreement was established in regard to ploidy level, 38% agree within +/−0 to 5%, 24% within +/−5 to 10%, 19% within +/−10 to 20%, and 19% are more strongly deviating. The average ratio (present study/Vidic et al. [15]) over these species is 0.97. 14 C-values in Vidic et al. [15] are lower than the present ones. The correlation (C-values corrected for ploidy and without A. nemorosa) yielded a correlation coefficient .

3.3. Probability Analysis

Vidic et al. [15] addressed the question of whether the degree of heavy metal pollution along a gradient has an influence on the C-value and the Cx-value spectra of the species living on five test plots. Since the number of species decreases with increasing pollution, the crucial point is whether the observed shift of the C-value composition towards lower C-values is stronger than expected under the condition of randomness. Vidic et al. [15] found a significant decrease of species with large genomes along increasing pollution using a lottery simulation technique. The present approach relies on the direct calculation of the probabilities of the occurrence of large nuclear characters.

3.3.1. Our Data Set Analysed Using Hypergeometric Distribution

There are minor differences between the values plotted in Vidic et al. [15, Figure (c)] and the results that we recalculated applying our method to the data of Vidic et al. [15]. We calculated the probability from the composition of the entire set (hypergeometric distribution), wheras Vidic et al. [15] counted the probability from randomly drawn samples. Their method is an estimate, strictly speaking, which would explain the deviation. It is unlikely that Vidic et al. [15] would obtain exactly the same results in case of a repeat of their method. In our calculation the probabilities for having found a random genome size spectrum decrease with increasing lead concentration at the plots. At the highest lead concentration, the probability level of 4.1% is reached with the C- and the Cx-value spectrum.

Using the limits for large genomes (upper quartile) from Vidic et al. [15] the result appears slightly more significant. This probability should not be misunderstood as significance of a statistical test. Rather it is a characterization of the detrimental effect of pollution on large genomes: a low probability makes it unlikely that pollution has no effect on the phytocoenosis at this plot. The species and data set of Vidic et al. [15] analysed with hypergeometric distribution result in similar Figures (Figures 2(a) and 2(b)) as in [15, Figure ], reaching 11.3% probability for the species composition at plot E [15] for 1Cx-values and 1.3% for 1C-values.

Regarding the frequency of large genomes, there are slight differences between the limits set by us and by Vidic et al. [15] because large genomes are defined as the upper quartile of the genome size distribution. In our data set, large genomes are those with  pg and  pg. The progressive elimination of large genomes (C- and Cx-values) with increasing heavy metal pollution and decreasing distance to the lead smelter is clear with both data sets (Figures 3(a)3(c), Table 2), but much more convincing with the present 1Cx data. The deficit of large 1Cx-values at plot A in Vidic et al. [15] data seems to result from wrong ploidy levels in some species. The frequency of polyploid genomes also decreases but with less regularity (Figures 4(a) and 4(b)).

The correlations between heavy metal pollution at test plots on the one hand, and % species with large genomes ( , ) and the probabilities of hypergeometric distribution ( , ), on the other hand, are negative and statistically significant.

4. Discussion

Vidic et al. [15] demonstrated a strong negative correlation between heavy metal content of the soil and the proportion of plant species with large genomes along a gradient of pollution caused by the emissions of a lead smelter. The conclusion was that genome size is associated with differential survival of species in this extreme environment and that species with large genomes are at a disadvantage.

The present paper is in some respect a control investigation for the Žerjav study by Vidic et al. [15] using a different methodology for genome size measurement, that is, more precise flow cytometry instead of the error-susceptible Feulgen densitometry, and for calculation of probabilities for large genome frequencies, that is, a hypergeometric distribution method instead of a simulation method. Our material originated mostly not from Žerjav. However, the results are representative for the Žerjav locality because genome sizes are generally very stable within a species.

Our results confirm the findings of Vidic et al. [15] and exhibit in some regard even clearer trends. The species characterized by large genomes (i.e., upper quartile of the genome sizes) out of the complete species spectrum are increasingly removed in parallel with heavy metal pollution until none of these species is left in the most polluted site. This clearer trend, compared to Vidic et al. [15], may be the consequence of removing a few possible errors from the Žerjav data set (mainly correction of a mismeasurement in Anemone nemorosa and possible misidentification of ploidy level in Knautia drymeia), and a general improvement of the genome size data using a more precise measurement method. Notably, Anemone nemorosa ranks highest in genome size and occurs only at the control site (plot A), while it is only fifth-highest in the data set of Vidic et al. [15].

Vidic et al. [15] considered only C-values, while we also consider Cx-values. With these, we find the same trends as with C-values inasmuch as the frequency of large monoploid genomes and the hypergeometrically distributed probability decrease continuously from plot A to plot E. The frequency of polyploid genomes also decreases finally, but plot C is the highest and violates any significance. The observed trends, therefore, cannot be attributed either to the C-value or to the Cx-value alone.

The positive correlation between radiosensitivity and genome size in plants described by Sparrow and Miksche [29] seems to be most relevant for the present results, because direct genotoxic effects are involved in both studies. A high frequency of chromosome aberrations has been observed by Druškovič [30] in plants from the Death Valley, Dolina Smrti at Žerjav. It seems plausible to us that DNA damage with its various consequences for cell division and cell function is the primary cause for the exclusion of large genomes at the more heavily polluted plots. Once a nucleus is hit, the damage is realized irrespective whether the nucleus is big or small, but large nuclei and especially mitotically and meiotically active ones are a larger target for damage and receive more hits.

Apart from chemical stress, also natural stress such as extreme temperatures and low precipitation seem to favour plants with small genomes rather than large ones in certain floras, whereas the species with largest genomes occur preferably in mesic areas (Knight and Ackerly [31]). This and other evidence led Knight et al. [14] to propose the Large Genome Constraint Hypothesis, which predicts that “plants with large genomes are at a disadvantage under all aspects of evolution, ecology and phenotype because large genomes are inflated with unnecessary junk DNA whose replication and maintainance imposes a load to the organism” (Knight et al. [14]).

In line with this hypothesis is the observation of Vinogradov [32] that red-list species more often have large genomes than non-endangered species. Gruner et al. [33] report slower root growth in species with large genomes, which may be one factor of risk for endangered plant species. The analyses by Bennett et al. [34] in weeds and nonweeds showed that weeds and especially the most aggressive weeds have smaller genomes than nonweeds, which indicates that competitive stress brings plants with larger genomes into a disadvantage.

Šmarda et al. [35] investigated differential seedling survival in a tetraploid Festuca pallens population with up to 1.189-fold genome size variation at constant chromosome number of 2n = 28. In this case, during development of the plants those with the lowest and the highest C-values were eliminated under conditions of intraspecific competition for resources. This case of stabilizing selection could be explained by a disadvantage for karyotypes with too many deletions or duplications and is of a different character than the selection phenomena observed under chemical or ecological stress.

Acknowledgments

The authors firstly thank Marina Dermastia and Tatjana Vidic for kindly supporting this investigation with plant material (Anemone nemorosa) and for valuable discussions and Jasna Dolenc Koce (Ljubljana) and Markus Koch (Heidelberg) for seeds of Thlaspi praecox. The authors also thank Friedrich Ehrendorfer, Irmgard Greilhuber, Josef Greimler, Walter Gutermann, Elvira Hörandl, Christiane König, Harald Niklfeld, and Hermann Voglmayr for providing material and for helping with the identification, and Martina Mittlböck and Harald Heinzel for the valuable discussion on data analysis.