Table of Contents Author Guidelines Submit a Manuscript
International Journal of Agronomy
Volume 2011 (2011), Article ID 697879, 7 pages
Research Article

Principal Component and Cluster Analysis as a Tool in the Assessment of Tomato Hybrids and Cultivars

1National Agricultural Research Foundation, 570 01 Thermi, Thessaloniki, Greece
2Genetics and Plant Breeding Department of AUTH, 541 24 Thessaloniki, Greece

Received 11 March 2011; Revised 7 June 2011; Accepted 16 June 2011

Academic Editor: Ravindra N. Chibbar

Copyright © 2011 G. Evgenidis et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Determination of germplasm diversity and genetic relationships among breeding materials is an invaluable aid in crop improvement strategies. This study assessed the breeding value of tomato source material. Two commercial hybrids along with an experimental hybrid and four cultivars were assessed with cluster and principal component analyses based on morphophysiological data, yield and quality, stability of performance, heterosis, and combining abilities. The assessment of commercial hybrids revealed a related origin and subsequently does not support the identification of promising offspring in their crossing. The assessment of the cultivars discriminated them according to origin and evolutionary and selection effects. On the Principal Component 1, the largest group with positive loading included, yield components, heterosis, general and specific combining ability, whereas the largest negative loading was obtained by qualitative and descriptive traits. The Principal Component 2 revealed two smaller groups, a positive one with phenotypic traits and a negative one with tolerance to inbreeding. Stability of performance was loaded positively and/or negatively. In conclusion, combing ability, yield components, and heterosis provided a mechanism for ensuring continued improvement in plant selection programs.

1. Introduction

Knowledge about levels and patterns of genetic diversity can be an invaluable aid in crop breeding for diverse applications [1], including analysis of genetic variability in cultivars [2, 3], identifying diverse parental combinations to create segregating progenies with maximum genetic variability for further selection [4], and introgressing desirable genes from diverse germplasm into the available genetic base [5]. An understanding of the genetic relationships among lines can be particularly useful in planning crosses, in assigning lines to specific heterotic groups, and in precise identification with respect to plant varietal protection [6]. Study of genetic diversity is the process by which variation among individuals or groups of individuals or populations is analyzed. Data often involves numerical measurements and, in many cases, combinations of different types of variables. Phylogenetic relationships based on morphophysiological data provide a way of making a relatively rapid assessment of the diversity present, so that a greater number of related operational taxonomic units (OTUs) [7] can be subsequently tested.

It is wellknown that maintenance or preservation of germplasm involves two principal considerations: (1) avoiding loss of genetic diversity and (2) avoiding costs [8]. Active collections are geared to meet the needs of the users of germplasm. Therefore, growouts of cv.s aiming at seed increase are relatively frequent in order to be evaluated. Evaluations of germplasm collections have the highest priority among germplasm functions. Germplasm enhancement embraces those activities required to aggregate useful genes and gene combinations into usable phenotypes. These aggregates could be considered as the feedstocks for varietal development programs. At this point, the present paper supports an approach to discriminate the breeding value of tomato source material, that is, commercial single-cross hybrids or open-pollinated cultivars (cv.s), during assessment. It is based on passport data, that is, morphophysiological data, yield potential, stability of performance, heterosis, and combining ability (general, GCA and specific, SCA), by the use of cluster and principal component analyses as a means of identifying sources of yield-enhancing genes [9].

2. Materials and Methods

2.1. Source Material

To assess tomato source material, the phylogenetic relationship within two different gene pool resources is suggested. The first source consists of single-cross hybrids, which have become the major segment of the modern tomato seed industry. The second source consists of open-pollinated well-adapted cv.s, which are mainly grown in the open field under lower-input systems. For hybrids, the phylogenetic relationship was based on agronomical data, that is, yield and quality components, on morphological data, that is, secondary phenotypic traits, and on inbreeding depression, while for cv.s, it was based on agronomical data, morphological data, and diallel-cross products of the cv.s, that is, heterosis, heterobeltiosis, and GCA and SCA constants. The hybrids are represented by the commercial hybrids Iron and Sahara (Geoponiko Spiti, Greek Seed Company) and by the experimental hybrid Theodora (National Agricultural Research Foundation, NAGREF Greece), and the open-pollinated cv.s by Artemida, Makedonia, Areti, and Olympia (NAGREF).

The hybrids Iron and Sahara were introduced for cultivation in the 1990s, and the cultivation area of these hybrids reached almost 0.2 of the area cultivated with tomato in Greece (Geoponiko Spiti, personal communication). Theodora is new hybrid and was developed by crossing the cv.s Artemida and Makedonia in the Agricultural Research Center of Northern Greece [10]. Makedonia is an old cv., which was developed by pure line selection from a local population of the late 1950s. The cv.s Areti, Artemida, and Olympia are new cv.s; cv. Areti was developed in 1998 and cv.s Artemida and Olympia were developed by Christakis and Fasoulas [11, 12]. All the above materials are indeterminate types.

2.2. Assessment Procedure

The experiments were conducted at the farm of the Agricultural Research Center of Northern Greece, near Thessaloniki, during 2003–2005. Randomized complete block designs (RCBDs) were used, with three replications, each consisting of 10 plants. In 2003, the hybrids Iron, Sahara, and Theodora were evaluated in comparison to their F2 generations. In 2005, the cv.s Artemida, Makedonia, Areti, and Olympia were evaluated in comparison to their simple diallel hybrids with reciprocals, which were obtained in 2004.

For each entry yield potential, fruit quality, physiological disorders, and plant description were obtained from each plant individually. Fruit harvested was counted, graded into different classes according to quality standards and sensitivity to physiological disorders, and weighed. Fruit quality was averaged across a sample of two fruits per plant in the traits: resistance to pressure, total solids (TS), total soluble solids (TSS), and pH. Reported data on plant and fruit descriptors were taken according to the International Union for the Protection of New Varieties of Plants. Hybrid assessment included the inbreeding depression of each F2 as the relative difference with reference to the hybrid [13], and the determination of undesirable traits, such as lack of stability of performance. The stability of performance was defined by the standardized mean ( = mean/standard deviation) of single plants [14, 15]. The cv. combining the largest mean yield ( ) with the largest is the most productive and stable across environments [14]. For this reason, is also a way of estimating genetic yield improvement [16]. Variety assessment included the estimation of heterosis and heterobeltiosis of their diallel hybrids, the determination of undesirable traits, such as lack of stability of performance, and estimation of GCA effects and SCA constants from cv. diallel crosses. The heterosis and heterobeltiosis were calculated as the F1 proportional performance compared to the average value of the parents and the best parent, respectively. The GCA and SCA were determined according to the Griffing [17] diallel-crossing system analysis Method 1, with parental values and reciprocal crosses. Crosses were considered as random effects.

2.3. Statistical Analyses

All RCBD experiments were analysed by analyses of variance and tests of significance at for each trait. For the determination of inbreeding depression, heterosis, heterobeltiosis, stability of performance, and combining abilities, the variables total and early yield were used.

The phylogenetic relationships were studied by UPGMA (unweighted pair group method arithmetic average) and by PCA (principal component analysis). Each hybrid or cv. was considered as one OTU [7]. A number of 22 traits for each hybrid (Table 5) and 35 traits for each cv. (Table 6) were transformed to standardize units. A dissimilarity matrix (DIST coefficient), based on all traits, was created for each group from the transformed data using average taxonomic distance [7]. The product moment correlation (CORR coefficient) for each group was also calculated. The DIST and CORR coefficients were calculated for all possible pairs to obtain the respective matrices and create the dendrograms. The cophenetic correlation [18] for each dendrogram was computed as a measure of goodness of fit (Mantel t-test) for the method of clustering used. Data transformations, matrices and dendrograms were calculated and visualized using NTSYS-pc software program [18]. Moreover, the PCA [19] was applied on our data. Two and three principal components were extracted for hybrids and cv.s, respectively. The standardized data projected on principal components. Two- and three-dimensional plots of projections of hybrids and cv.s were configured, respectively.

3. Results

3.1. Cluster Analysis

DIST and CORR matrices for hybrids and cv.s are presented in Tables 1 and 2, respectively. The dendrograms were created on the basis of the DIST and CORR matrices, for hybrids and for cv.s, which grouped both sources similarly. The cophenetic correlation for both DIST and CORR matrices of hybrids was equal to , while the cophenetic correlation of a cv. was equal to for DIST and for CORR matrix. These values indicate a very good fit of the data to the clustering method used [18]. Thus, only two dendrograms are presented, one for hybrids (Figure 1) and one for cv.s (Figure 2), both based on DIST matrices. The hybrids’ dendrogram indicated a close relationship between the hybrids Iron and Sahara, while the hybrid Theodora was grouped individually. The cv.s’ dendrogram showed a close relationship between cv.s Olympia, Areti and Makedonia, the relationship between cv. Areti and cv. Makedonia being even closer. The cv. Artemida was grouped individually.

Table 1: Matrices of average taxonomic distance (above diagonal) and product moment correlation (below diagonal) for hybrids.
Table 2: Matrices of average taxonomic distance (above diagonal) and product moment correlation (below diagonal) for cultivars.
Figure 1: Dendrogram produced by cluster analysis based on DIST (average taxonomic distance) matrix for hybrids.
Figure 2: Dendrogram produced by cluster analysis based on DIST (average taxonomic distance) matrix for cultivars.
3.2. Principal Component Analysis (PCA)

Tables 3 and 4 present the correlation of each hybrid and cv.y with the two and three PC’s, respectively. In case of hybrids, PC1 had maximum correlation with them and accounted for 62.93% of total variance, whereas PC2 accounted for the rest. According to the data, PC1 separated hybrid Theodora from hybrids Iron and Sahara. This is because the last hybrids are with negative correlation on PC1 and their projections in terms of PC1 (Figure 3) are almost on it. PC2 in turn separated hybrids Iron and Sahara. Iron is the only hybrid with a negative correlation to PC2 and its projection in terms of PC2 is almost on it. In the case of the cv.s (Table 4), PC1 accounted for 49.15% of total variance, whereas PC2 and PC3 accounted for 29.63% and 21.23%, respectively. PC1 separated cv. Artemida from cv.s Makedonia, Areti and Olympia. This is because the last cv.s have a negative correlation to PC1 and their projections in terms of PC1 (Figure 4) are almost on it. PC2 in turn separated cv. Olympia from cv.s Areti and Makedonia, which were grouped in the same subgroup (Figure 2). Cv. Olympia was the only cv. with a negative correlation to PC2 (Table 4) and its projection in terms of PC2 is almost on it (Figure 4). Finally, PC3 distinguished between cv.s Makedonia and Areti of the subgroup in the DIST dendrogram (Figure 2), since cv. Areti had a negative correlation to PC3 (Table 4).

Table 3: Correlation of each hybrid with the two principal components.
Table 4: Correlation of each cultivar with the three principal components.
Table 5: Loadings of the traits onto two principal components for hybrids.
Table 6: Loadings of the traits onto three principal components for cultivars.
Figure 3: Two-dimensional plot based on correlation of each hybrid with the two principal components.
Figure 4: Three-dimensional plot based on correlation of each cultivar with the three principal components.

Summarizing, the PCA confirmed in detail the grouping of the dendrograms based on either DIST or CORR matrices. Furthermore, PCA was more advantageous, because it detected the most important traits for the grouping. Since PC1 and PC2 explained the whole variability in hybrids and 78.77% of the total variance in cv.s, the most important traits for the separation are those with the biggest loading on PC1 and PC2. Tables 5 and 6 present the traits which contributed in separating hybrids and cv.s, respectively. Bold and italic fonts were used to group traits with the highest positive and negative loading, respectively. The largest group with positive loading on PC1 included yield and yield components (0.9169–0.9932), heterosis and heterobeltiosis (0.7970–0.9165), and GCA and SCA constants (0.9352–0.9779), whereas the largest negative loading was obtained mainly by qualitative and descriptive traits {(−0.7018)–(−0.9997)}. PC2 revealed two smaller groups of traits, one with some phenotypic traits loading positively (0.7965–0.9993), and a second one with tolerance to inbreeding loading negatively {(−0.8991)−(−0.9926)}. The stability of performance loaded positively in total yield (0.9700−0.9810), but negatively in early yield {(−0.8560)−(−0.8772)}, both for hybrids and cv.s.

4. Discussion

The management of genetic resources is the result of the effects of single alleles on various attributes of adaptation, yield, or quality of product, which are difficult to measure [20]. Selection recognizes those attributes that contribute to survivability and causes alleles that govern such attributes to increase in frequency over generations. On this basis, popular genetic materials could form the breeders’ initial material for developing cultivars. Breeding schemes from allogamous species were applied to autogamous species [21, (page 52)], just like tomatoes are. Passport data measured and described in the present study showed the entire genetic constitution of the hybrids or cultivars under consideration. Differences among them occur because of the original genes and past selection that created an assemblage of genes in the greater frequencies that are desired in modern hybrids or cultivars.

The phylogenetic relationship of the hybrids Iron and Sahara (Figures 1 and 3) revealed related origin and, subsequently, does not support the identification of promising offspring in their crossing. The hybrid Theodora was grouped separately showing a lack of relationship with the two randomly chosen commercial hybrids. This may be an indication of a narrowing in the genetic base of tomato commercial materials, because the two hybrids cultivated in the Mediterranean region showed a close relationship.

In the case of cv.s (Figures 2 and 4), the dendrogram and the three-dimensional plot separated them according to origin, evolutionary process and selection effects during the breeding procedure. Thus, cv. Areti, which was extracted from the old hybrid Carmello (Sluis en Groot, Enkhuizen, Holland) in the environment of Makedonia, in the 1980s, was clustered together with the long-established cv. Makedonia in the same subgroup, indicating possible common ancestors and similar evolutionary processes. This may be the reason that the degree of heterosis and heterobeltiosis for total and early yield between them, reflecting differences in gene frequencies [22], is the lowest [21], that is, an indication positively related to their genetic divergence. Cv. Olympia, which was obtained by applying honeycomb selection [23] in the F2 generation of the old hybrid Dombo (Bruinsma Seeds) in Southern Greece [12] in the same decade, was separated from the above despite the fact that it was included in the same main group, indicating different selection processes and also divergent gene pool resources. Finally, cv. Artemida, which was extracted by applying honeycomb selection in the F2 generation of the newer hybrid Vision (Enza Zaden, Seed Company) in Southern Greece [11, 12] in the 1990s was completely separate. The phenotypic and genetic distance among Artemida and the rest cv.s based on additive effects lead to the assumption that the choice of the certain cv. as germplasm may be correct [21].

PC1, which accounted for 62.93% of total variance of hybrids and for 49.15% of cultivars (Tables 3 and 4) is strongly associated with yield-related traits, such as yield components and yield stability. Heterosis and heterobeltiosis for total and early yield had a highly positive contribution to the separation of cv.s, as well as GCA and SCA (Table 6). This is in accordance with Hunter [24], who reported that heterosis and combining ability are reliable scientific methods of proof of genetic distance/conformity, and with Xiao et al. [25], who reported that heterosis indicates the genetic relatedness of crossed materials. Inbreeding depression reveal a highly negative load in separating hybrids (Table 5), thus contributing to the selection of hybrids with a desirable load of genes. Morphophysiological and qualitative traits were also contributed in the clustering of hybrids and cv.s. For hybrids, leaf dimensions, internodes’ length, and fruit traits, such as equatorial and polar diameter, ribbing at peduncle end, size of blossom scar, green shoulder before maturity, intensity of green color of shoulder, firmness, locule number, pericarp thickness, puffiness, soluble and total solid, and pH appeared to be the primary traits (Table 5). Similar traits were loaded onto the principal components for cv.s (Table 6). All diallel crosses of cv. Artemida produced highly heterotic products [21]. Heterosis probably also exists due to different allelic combinations at particular loci in each parent so that when they brought together in a hybrid combination, they complement each other, resulting in the expression of heterosis [26]. These loci may not directly relate to observable traits but could have an effect on the physiology of the plant. In autogamous species, such as the tomato, the genetic variance is expected to derive mainly from additive effects (Matzinger, referred by [9]). Heterosis may not be of direct interest, but heterotic crosses could produce desirable transgressive segregants. Usually, experimental evidence is needed, especially from the analysis of F2 and subsequent generations [9]. In this study, discrimination among tomato genotypes based on geographical origin, and evolutionary and selection processes, was successful in clustering into the same group the long established cv. Makedonia with the derivatives of the old hybrids Carmello and Dombo, that is, cv.s Areti and Olympia, respectively. This data supplies sufficient information to determine if heterosis is correlated with the geographical origin of the parents.

Comparing the two source groups (although each one had few representatives), it is obvious that (a) the two commercial hybrids, randomly selected, with the sole criterion being the preference of the growers in the Mediterranean environment, showed a close relationship in comparison to an unrelated hybrid and (b) the cv.s selected with the criterion of have been developed in national research stations, that is, a narrow environment, showed broader differentiation, especially cv. Artemida, which showed GCA consisting of a valuable aggregate for public or private use. The remarks above bring forward the contentious issue of where selection should be carried out. All views would agree that testing must include the target environment [27]. Perhaps the use of superior germplasm by breeding strategies to increase yields combined with improved cultural practices at the same time would offer a potential solution to this problem [27].

In conclusion, the flux of parental material in any breeding programme (private or public) is based on a working strategy, known as the assessment of the continual turnover of the cv.s. As older parents retreat, new ones enter from locally adapted cv.s and recombinant lines resulting from the F2 of elite hybrids. The whole phylogenetic study of relationships between hybrids or cv.s showed that in an autogamous species, such as the tomato, combining ability, yield components, and heterosis was sufficient to give information about the genetic relationship among hybrids or cv.s and elucidate the list of materials which may provide a route for developing elite breeding products.


  1. S. A. Mohammadi and B. M. Prasanna, “Analysis of genetic diversity in crop plants—salient statistical tools and considerations,” Crop Science, vol. 43, no. 4, pp. 1235–1248, 2003. View at Google Scholar
  2. J. S. C. Smith, “Diversity of United States hybrid maize germplasm. Isozymic and chromatographic evidence,” Crop Science, vol. 28, pp. 63–69, 1988. View at Google Scholar
  3. T. S. Cox, J. P. Murphy, and D. M. Rodgers, “Changes in genetic diversity in the red winter wheat regions of the United States,” Proceedings of the National Academy of Sciences of the United States of America, vol. 83, pp. 5583–5586, 1986. View at Google Scholar
  4. B. A. Barrett and K. K. Kidwell, “AFLP-based genetic diversity assessment among wheat cultivars from the Pacific Northwest,” Crop Science, vol. 38, no. 5, pp. 1261–1271, 1998. View at Google Scholar
  5. J. A. Thompson and R. L. Nelson, “Core set of primers to evaluate genetic diversity in soybean,” Crop Science, vol. 38, no. 5, pp. 1356–1362, 1998. View at Google Scholar
  6. A. R. Hallauer and J. B. Miranda Fihlo, Quantitative Genetics in Maize Breeding, Iowa State University Press, Ames, Iowa, USA, 2nd edition, 1995.
  7. P. H. A. Sneath and R. R. Sokal, Numerical Taxonomy: The Principles and Practice of Numerical Classification, W. H. Freeman and Co., San Francisco, Calif, USA, 1973.
  8. Q. Jones, “A national plant germplasm system,” in Conservation of Crop Germplasm—An International Perspective, pp. 27–33, Crop Science Society of America (CSSA), Madison, Wis, USA, 84. View at Google Scholar
  9. P. L. Spagnoletti Zeuli and C. O. Qualset, “The durum wheat core collection and the plant breeder,” in Core Collections of Plant Genetic Resources, T. Hodgkin, A. H. D. Brown, T. H. J. L. van Hintum, and E. A. V. Morates, Eds., pp. 213–228, I. P. G. R. I., A Wiley-Sayce Publication, Chiechester, UK, 1995. View at Google Scholar
  10. E. Traka-Mavrona and M. Koutsika-Sotiriou, ““Theodora”: a new greek tomato hybrid,” in Proceedings of the 10th Panhellenic Congress of Greek Society Genetics and Plant Breeding, pp. 277–281, Greek Society Genetics and Plant Breeding, Agricultural University of Athens, Athens, Greek, November 2004.
  11. P. A. Christakis and A. C. Fasoulas, “The recovery of recombinant inbreds outyielding the hybrid in tomato,” Journal of Agricultural Science, vol. 137, no. 2, pp. 179–183, 2001. View at Google Scholar
  12. P. A. Christakis and A. C. Fasoulas, “The effects of the genotype by environmental interaction on the fixation of heterosis in tomato,” Journal of Agricultural Science, vol. 139, no. 1, pp. 55–60, 2002. View at Publisher · View at Google Scholar
  13. M. R. Meghji, J. W. Dudley, R. Z. Lambert, and G. F. Sprague, “Inbreeding depression, inbred and hybrid grain yields and other traits of maize genotypes representing three eras,” Crop Science, vol. 24, pp. 545–549, 1984. View at Google Scholar
  14. V. A. Fasoula and D. A. Fasoula, “Honeycomb breeding: principles and application,” in Plant Breeding Reviews, vol. 18, John Wiley & Sons, New York, NY, USA, 2000. View at Google Scholar
  15. V. A. Fasoula and D. A. Fasoula, “Partitioning crop yield into genetic components,” in Handbook of Formulas and Software for Plant Geneticists and Breeders, M. S. Kang, Ed., pp. 321–327, The Haworth Press, New York, NY, USA, 2003. View at Google Scholar
  16. M. Tollenaar and J. Wu, “Yield improvement in temperate maize is attributable to greater stress tolerance,” Crop Science, vol. 39, no. 6, pp. 1597–1604, 1999. View at Google Scholar
  17. B. Griffing, “Concept of general and specific combining ability in relation to diallel crossing system,” Australian Journal of Biological Sciences, vol. 9, pp. 463–493, 1956. View at Google Scholar
  18. F. J. Rohlf, NTSYS-pc. Numerical Taxonomy and Multivariate Analysis System. Version 2.1, Exeter Software, New York, NY, USA, 2000.
  19. T. K. Broschat, “Principal component analysis in horticulture research,” Horticultural Science, vol. 14, no. 114, 117 pages, 1979. View at Google Scholar
  20. R. W. Allard, “Reproductive systems and dynamic management of genetic resources,” in Reproductive Biology and Plant Breeding, Y. Dattee, C. Dumas, and A. Gallais, Eds., pp. 325–334, Springer, Berlin, Germany, 1992. View at Google Scholar
  21. M. Koutsika-Sotiriou, E. Traka-Mavrona, and G. Evgenidis, “Assessment of tomato source breeding material through mating designs,” Journal of Agricultural Science, vol. 146, no. 3, pp. 301–310, 2008. View at Publisher · View at Google Scholar
  22. D. S. Falconer, Introduction to Quantitative Genetics, John Wiley & Sons, New York, NY, USA, 1989.
  23. A. C. Fasoulas, The Honeycomb Methodology of Plant Breeding, A. C. Fasoulas, Thessaloniki, Greece, 1988.
  24. B. R. Hunter, “Science based identification plant genetic material,” in Intellectual Property Rights: Protection of Plant Materials, P. F. Baenjiger, R. A. Kleese, and R. F. Barnes, Eds., CSSA, Madison, Wis, USA, 1993, Special Publication, 21:93-99. View at Google Scholar
  25. J. Xiao, J. Li, L. Yuan, and S. D. Tanksley, “Dominance is the major genetic basis of heterosis in rice as revealed by QTL analysis using molecular markers,” Genetics, vol. 140, no. 2, pp. 745–754, 1995. View at Google Scholar
  26. E. T. Bingham, “Role of chromosome blocks in heterosis and estimates of dominance and overdominance,” in Concepts and Breeding of Heterosis in Crop Plants, K. R. Lamkey and J. E. Staub, Eds., CSSA, Madison, Wis, USA, 1998, Special Publication no. 25:71-88. View at Google Scholar
  27. J. Janick, “Exploitation of heterosis: uniformity and stability,” in The Genetics and Exploitation of Heterosis in Crops, J. G. Coors and S. Pandey, Eds., pp. 319–333, ASA-CSSA-SSSA, Madison, Wis, USA, 1999. View at Google Scholar