Abstract

The dawdling development in genetic improvement of cotton with conventional breeding program is chiefly due to lack of complete knowledge on and precise manipulation of fiber productivity and quality. Naturally available cotton continues to be a resource for the upcoming breeding program, and contemporary technologies to exploit the available natural variation are outlined in this paper for further improvement of fiber. Particularly emphasis is given to application, obstacles, and perspectives of marker-assisted breeding since it appears to be more promising in manipulating novel genes that are available in the cotton germplasm. Deployment of system quantitative genetics in marker-assisted breeding program would be essential to realize its role in cotton. At the same time, role of genetic engineering and in vitro mutagenesis cannot be ruled out in genetic improvement of cotton.

1. Rationale for Genetic Improvement of Cotton Fiber

Plant trichomes exhibit number of biologically important roles including protection against biotic and abiotic factors, water absorption, secretion, alluring mechanisms, and more importantly, seed dispersal [1]. While the majority of plant trichomes are multicellular, cotton (Gossypium spp.) produces unicellular seed trichomes commonly called “fibers,” which are of considerable economic importance and hence make cotton as the leading cash crop in the world. Cotton remains the most miraculous fiber under the sun, even after 8000 years of its first use, and no other fiber come close to duplicating all of the desirable characteristics combined in cotton. Cotton plays important role in everyone’s life from the time we dry our faces on a soft cotton towel in the morning until we slide between fresh cotton sheets at night. It provides thousands of useful products and supports millions of jobs as it moves from field to fabric. Thus, cotton has its vital role in the economic, political, and social affairs of the world. World consumption of cotton fiber is approximately 27 million metric tons per year (http://www.cotton.org/). India ranks second in global cotton production and accounts for approximately 25% of the world’s total cotton area and 16% of global cotton production. The cotton industry in India has 1,543 spinning units, more than 281 composite mills, 1.72 million registered looms, and an installed capacity of 36.37 million spindles [2]. The textile industry employs 30 million people directly and is the second largest employer after agriculture and contributes 29.9% of the Indian agricultural gross domestic product [3]. Though India has the largest global area in cotton, yields of cotton are low, with an average yield of 445 kg/ha compared to the world average of 755 kg/ha. Hence, it is imperative to improve the fiber productivity by improving number of fibers per seed and reducing yield loss due to biotic and abiotic stresses [4]. Further, recent advancement in spinning technology and textile processing demands for improved fiber length, strength, and uniformity. The cotton fiber is composed of nearly pure cellulose, the largest component of plant biomass; hence it is an outstanding model for the study of cellulose biosynthesis besides plant cell wall and cell elongation [5]. In addition, compared to lignin, cellulose is easily convertible to biofuels. Thus, genetic improvement of cotton fiber and cellulose may also lead to the improvement of diverse biomass crops [1]. Therefore, it is necessary to genetically improve cotton productivity and quality since it has several economical and social applications.

2. Conventional Cotton Breeding: Contributions and Concerns

Modern crop varieties have been developed from plant populations that have shown genetic variability. Selection, as a systematic process, consists of identification and use of superior individuals coexisting within the population. In other words, the selection changes the genetic structure of the populations as a result of preserving superior alleles and discarding the undesirable ones [6]. Thus, crop improvement by selection primarily depends on (i) discovery and generation of genetic variability in agronomic traits and (ii) precise selection of genotypes with favourable characteristics, as a product of a recombination among superior alleles at different loci. In cases where the selection is based only upon the phenotypic values, the breeders commonly confront the problem of genotype x environment interaction [7], and they used to minimize this interaction by stratifying the selection plot or using progeny test [6].

The genus Gossypium L. has long been a focus of genetic, systematic, and breeding research. Cotton was among the first species to which the Mendelian principles were applied [8] and has a long history of improvement through breeding with sustained long-term yield gains. Gossypium consists of at least 45 diploid and 5 allotetraploid species. The allotetraploid cotton species, which include two commercially important cultivated species, G. hirsutum L. and G. barbadense L., were generated by A- and D-compound genomes. The best living and cultivating models of the ancestral A- and D-genome parents are G. herbaceum and G. raimondii, respectively [9]. The A-, D-, and AD genome groups have received special attention, as four different species (G. herbaceum (A1), G. arboreum (A2), G. hirsutum (AD1), and G. barbadense (AD2)) have been domesticated for their abundant seed trichomes and provide the foundation for the textile industry worldwide. Relationships among genome groups have been quantified in several studies, and the closest living relatives of the diploid genome donors to allopolyploid cotton have been identified [10]. The diploid donor of the allopolyploid AT genome (where the T subscript indicates the A-genome in the tetraploid (AD) nucleus) was a species much like modern G. arboreum or G. herbaceum, whereas the allopolyploid DT genome is derived from a progenitor similar to modern G. raimondii. These well-established relationships provide a phylogenetic framework to investigate the evolution both in terms of domesticated fiber production and polyploidy [1]. Interestingly, the A-genome species produce spinnable fiber and are cultivated on a limited scale, whereas the D-genome species do not [11]. Cytological observation indicated that synteny and colinearity are largely conserved among the five tetraploid species. Conventional breeding efforts aimed at interspecific introgression have transferred specific genes and useful traits, including stronger, longer, and finer fibers [12].

Despite these remarkable advances in cotton improvement during the twentieth century using traditional plant breeding approaches, yield potential has reportedly plateaued over the last 3 decades and has been hindered by complex antagonistic genetic relationships [13]. G. barbadense (L.) is the only 52-chromosome cultivated relative of upland cotton (G. hirsutum, 2n  =  52). It is valued for its fiber length and quality, whereas upland cotton is more valued for its high yield. While these species are hybridized easily, attempts to incorporate genes from G. barbadense into upland cotton have generally not achieved stable introgression of the G. barbadense fiber properties and have resulted in poor agronomic qualities of the progeny, distorted segregation, sterility, mote formation, and limited recombination due to incompatibility between the genomes [9]. As a consequence, it is believed that the reason for recent decline in cotton yield and fiber quality was mainly due to erosion in genetic diversity which is mainly because of repeated use of same breeding materials: G. hirsutum-based intraspecific cross combinations [13]. As revealed by isozymes and a range of DNA molecular markers including RAPDs, AFLPs, RFLPs, and microsatellites, the genetic diversity of widely cultivating cotton (G. hirsutum L.) cultivars and some of its related species had been found to be low [14]. A narrow genetic base especially among agriculturally elite types is restraining their utilization, and it is a bottleneck for cotton breeding and cultivar improvement [15]. Further, cotton fiber productivity and quality aspects are genetically and physiologically complex. Breeding for such traits is time consuming and difficult because of the polyploid nature of cotton. The paucity of information about genes that control important traits and the need for more extensive usage of diverse germplasm clearly portray the importance of new and innovative approaches. The quickly expanding knowledge on gene function and the availability of novel molecular tools are expected to offer new perspectives to solve these complex problems.

3. Next-Generation Breeding Strategies

To incorporate and/or manipulate novel gene(s) and thereby increasing genetic diversity, all the three contemporary approaches, namely, trait introgression via marker-assisted selection (MAS), genetic engineering, and in vitro mutagenesis, have advantages and disadvantages. Transgenics for improved cotton fiber quality and other value-added traits have gained much attention in private sector. Number of fiber specific promoters, foreign genes that govern several novel properties, such as colour, toughness, and thermal properties have been identified and incorporated into the cotton fiber [16, 17]. However, a practical realization of this approach in routine breeding program is yet to be demonstrated. There is also a renewed interest in chemical mutagenesis and transposon-mediated gene knock-outs as means to obtain single-gene mutants affecting phenotypes of interest, since the prospects of gene identification are high and every gene affecting a trait is potentially a target [18]. Nevertheless, very few studies have employed in vitro mutagenesis to test its effectiveness in improving fiber quality in cotton [19].

The limitations encountered in transgenics and mutagenesis research highlighted many challenges still faced by plant biologists in creating the next generation of designer plants. Fundamental research is still needed to elucidate biochemical and signalling pathways, as well as acquiring a better understanding of the underlying mechanisms that regulate gene expression. Gene discovery and strategies to produce the desired pattern of expression and phenotype, as well as improved transformation methodology, are yet to be elucidated. Further, at the time of this writing, the transgenic approach is feasible to engineer traits that are controlled by one or a few major genes and quantitative traits like yield which is governed by polygenes are not easily amenable through transformation.

It is increasingly believed that naturally available cotton continues to be a resource for further development of fiber at least for a decade or until a suitable tool to manipulate fiber growth and development is described [72]. In this context, interspecific trait introgression is particularly attractive since it utilizes a broad germplasm base, can be targeted to one or more specific traits, can be modulated to include thousands of genes or even entire genomes, and is readily coupled to high-throughput MAS [73]. However, as discussed, the biological and technical challenges of introgression increase as the phyletic distance between the donor and recipient genome increases. Although introgression of genes across species boundaries is a multifarious task, it is quite desirable because the gene pools of cultivated species do not contain all of the desired alleles [73]. Hence, contemporary tools for identification and introgression of best alleles from the available cotton germplasm for genetic improvement of cotton fiber are discussed in more detail.

4. Introgression of Novel Alleles from Cotton Germplasm: Marker-Assisted Breeding

Currently, novel molecular tools as the molecular markers have started to demonstrate their usefulness into practical plant breeding by facilitating the identification, characterization, and manipulation of the genetic variation on important agronomic traits via quantitative trait loci (QTL) mapping [74]. These molecular markers have been applied to unravel many biological questions in gene mapping, population genetics, phylogenetic reconstruction, paternity testing, forensic applications, comparative and evolutionary genomics, and map-based gene cloning. Over the years, advances in molecular biology have led to the introduction of several new types of molecular markers and permit establishment of high-resolution linkage maps in crop plants. These linkage maps allow breeders to identify the markers that linked to the trait of interest and use them in marker-assisted breeding program. The main reasons supporting the utilization of molecular markers in cotton breeding programs are the 100% heritability of the markers and their lower cost [75]. Hence, molecular markers are extensively being employed in selection of traits with low heritability, identification, and introgression of complex fiber productivity and quality traits from native or exotic germplasm into elite cultivar via MAS [76]. In plant breeding, MAS is a relatively new concept, nevertheless the original selection concept per se has not changed; that is, the purpose of the selection is to search and preserve the best genotypes, but using molecular markers. At the same time, it is necessary to consider effectiveness and cost of MAS (which is greatly influenced by the marker system used) besides polymorphism, technical feasibility, and so forth. The value, ease, and cost of measurement and nature of genetic control of agronomic traits will determine the way in which molecular markers may be effectively used in a breeding program [75]. A highly saturated marker linkage map is necessary for effective MAS. Both intraspecific meiotic configuration analysis and interspecific linkage analysis indicated that the cotton genome map is ~5000 cM or larger, which is considerably longer than genomes of bread wheat (3791 cM), soybean (3159 cM), corn (1807 cM), rice (1530 cM), tomato (1472 cM), and barley (1279 cM; http://www.ncbi.nlm.nih.gov/mapview/). A highly saturated genetic map of cotton with a 5,000-cM long genome will require 3,000 DNA markers to map at an average of 1 cM density [77]. The following sections describe achievements in QTL mapping of fiber quality traits, obstacles encountered, and suitable alternatives to realize the relevance of MAS in genetic improvement of cotton for fiber productivity and quality.

5. Achievements

Recent years have seen significant efforts in understanding the cotton genome in its structural and functional aspects. To date, hundreds of QTLs related to various aspects of fiber quality traits have been mapped (Table 1). Published genetic mapping of fiber quality traits clearly indicated that improved productivity is not always negatively associated with quality, and therefore both can be concurrently improved [78]. Further, many QTLs for fiber-related traits have been identified in the D-subgenome of tetraploid cotton (e.g., [20, 25, 28, 44, 45, 79]) despite the fact that D-genome produce very short fibers and hence not being cultivated. This suggests that the D-genome contains important genes or regulators of fiber morphogenesis and fiber properties.

5.1. Interspecific Introgression of Novel Alleles

The strategy of crossing two superior genotypes of given species to enhance the utilization of the genetic potential was mainly due to the finding that each of the two locally adapted species contained different alleles conferring fiber productivity and quality. This is contradictory to the prevailing strategy for QTL mapping, which is to choose parental lines with maximal phenotypic divergence [80]. Linkage maps in tetraploid cotton have been most densely populated by analysis of interspecific G. hirsutum × G. barbadense due to low levels of DNA polymorphism within cotton species. Mapping populations have also been developed for G. hirsutum × G. tomentosum, G. hirsutum × G. anomalum, and G. trilobum × G. raimondii (Table 1). The genomic exploration of other accessions of these species or other wild tetraploid cottons may yield still additional valuable alleles. There also exists considerable variability among “race stocks” (local land races) within G. hirsutum, which is also well worth for further investigation [81]. Although elite × elite crosses are typical of traditional plant breeding, interspecific crosses are rarely used in cotton breeding because of numerous barriers to gene flow as discussed above [23]. Marker-assisted selection mitigates many of the problems associated with interspecific crosses [23]. The finding that the G. hirsutum allele is favourable at some loci and the G. barbadense allele at other loci shows that recombination of favourable alleles from each of these species may form novel genotypes than either of the parental species.

5.2. Development of High-Throughput and Efficient Marker Types

The most detailed linkage map of tetraploid cotton has been constructed by Rong et al. [36] (Table 1). This map was composed of 2,584 loci in 26 linkage groups (LGs). However, the mapping population was composed of only 57 F2 plants, and most of the loci were RFLPs, which is laborious and time-consuming and requires the use of radioactive isotopes. The main breakthrough of DNA-based molecular markers was driven by the invention of PCR. The first widespread markers to take full advantage of PCR technology were microsatellites or simple sequence repeats (SSRs). SSRs are often markers of choice and regarded as useful markers for map construction in cotton because (i) SSRs are simple PCR-based, codominant, informative, prompt DNA markers and amenable for automation and multiplex or high-throughput analysis [65]; (ii) on an average, there is one microsatellite in every 170 kb DNA of the cotton genome [82]; (iii) SSR markers could provide a tool for identifying loci that are subgenome A or D specific; (iv) SSRs are present in the expression sequence tags (ESTs) at a frequency of 1.7% [83], and dense consensus maps including a number of EST-based functional markers become fundamental tools for comparing LGs and QTLs derived from different pedigrees. A high-density molecular map, especially one that includes functional SSR markers associated with fiber genes, will be very important in allowing direct tagging of target genes linked to fiber quality [53]. Further, the huge EST databases that are generated by sequencing initiatives allow the “in silico” identification of genetic variation such as single-nucleotide polymorphisms (SNPs). Recent advances in high-throughput SNP genotyping technologies can provide high-density EST/SNP map which is essential for increasing the efficiency of QTL mapping.

5.3. Public Databases

There are several genome databases that have exclusively developed to serve the international cotton research community. For example, the International Cotton Genome Initiative (ICGI; http://icgi.tamu.edu/), the Cotton Genome Database (CottonDB; http://www.cottondb.org/), the Cotton Marker Database (CMD; http://www.cottonmarker.org/), Comparative Evolutionary Genomics of Cotton (http://cottonevolution.info/), TropGENE Database (http://tropgenedb.cirad.fr/en/cotton.html), National Center for Biotechnology Information (http://ncbi.nlm.nih.gov/) for EST resources, BACMan resources at Plant Genome Mapping Laboratory (http://www.plantgenome.uga.edu/), the Cotton Diversity Database (http://cotton.agtec.uga.edu/), and so forth provide genomic, genetic, and taxonomic information including germplasm, markers, genetic and physical maps, associated traits, sequences, and bibliographic citations.

5.4. Functional Genomics

As discussed, the cotton fiber is a complex biological system which is the net result of the intricate interplay of elaborate developmentally regulated pathways consisting thousands of genes. As such this has been a difficult subject to tackle using conventional approaches, especially as cotton fiber does not lend itself easily to molecular analysis. It has been only in the last two decades that researchers have begun to focus on studying the underlying developmental and cellular mechanisms that control fiber properties. Since the first report of John and Crow who had cloned the E6 gene through differential screening of a fiber cDNA library in 1992, several reports have shown that many genes were expressed preferentially in cotton fibers [84, 85]. Many fiber-specific genes involved in fiber cell initiation, fiber elongation, or cell wall biogenesis have been identified from the comparisons of normal (wild type) versus fiber mutants of G. hirsutum species. Few reports have also investigated the mechanisms and genes underlying the important developmental differences between G. hirsutum and G. barbadense [86, 87]. It is currently estimated that the cotton fiber transcriptome consists ~18000 genes in case of cultivated diploid species. The cotton fiber transcriptome in allotetraploid species is similarly estimated at ~36000 genes, and it included homeologous loci from both the AT and DT genomes. The high genetic complexity of the fiber transcriptome in both diploid and tetraploid species accounts for 45–50% of all the genes in the cotton genome [84]. The relationship between gene-island distribution and functional expression profiling suggests the existence of functional coupling gene clusters in the cotton genome. Xu et al. [88] identified three fiber gene-rich islands associated with fiber initiation on chromosome 5, three islands for the early to middle elongation stage on chromosome 10, three islands for the middle to late elongation stage on chromosome 14, and an island on chromosome 15 for secondary cell wall deposition. Clustering of functionally related gene clusters in the cotton genome displaying similar transcriptional regulation indicates an organizational hierarchy and its implications for the genetic enhancement of fiber quality traits. Besides, genes that are preferentially expressed in cotton fiber have been characterized, and their promoters are being used in transgenic research [17].

6. Obstacles

Despite the enormous above said achievements, genetic improvement of cotton faces some specific challenges because of its polyploid genome structure, the large genome size, and so forth, and they are described hereunder.

6.1. Confronts with Mapping Population

Detection of QTLs is often limited by several factors such as genetic properties of QTLs, environmental effects, population size, and experimental error. Hence, it is desirable to independently confirm QTL mapping studies [89]. Such confirmation studies may involve independent populations constructed from the same parental genotypes or closely related genotypes used in the primary QTL mapping study. Sometimes, larger population sizes may also be used. Furthermore, some recent studies have proposed that QTL positions and effects should be evaluated in independent populations, because QTL mapping based on typical population sizes results in a low power of QTL detection and a large bias of QTL effects [89]. Unfortunately, due to constraints such as lack of research funding and time and possibly a lack of understanding of the need to confirm results, QTL mapping studies are rarely confirmed. Validation of “conserved” fiber quality QTLs across populations has not been conclusive, due to the fact that the majority of these QTL studies were either derived from small and mortal (F2 or backcross (BCs)) populations (Table 1). As compared to F2 or BCs, homozygous immortalized recombinant inbred lines (RILs) constitute the preferred material for QTL mapping in many crops [65]. RILs have not been widely utilized in cotton except in some cases (Table 1), mainly due to long development timelines and difficulties in production of sufficient seeds. Though there is no clear rule for the precise population size that is required for QTL analysis, it is increasingly believed that sampling limited numbers of progeny in mapping studies tends to cause the skewed distribution of QTL effects and identification of limited number of QTLs, even if many genes with equal and small effects actually control the trait [89]. Further, in several published reports, the number of LGs exceeds the gametic chromosome number (n  =  26), and numerous LGs are yet to be associated with specific chromosomes (Table 1) mainly due to lack of informative markers and use of small sample size. Moreover, common identities and common nomenclature have yet to be established among many LGs in the laboratory-specific maps [53]. Physical coverage of the cotton genome by these linkage maps also remains unknown. In most of the published maps the markers were not uniformly spaced over many LGs. He et al. [77] guess that these regions may be heterochromatin or gene rich. Clusters of markers with very limited recombination were frequently present which may be indicative of QTL-rich (/gene-rich) regions of cotton.

6.2. QTL × Environment Analysis

Relatively large numbers of QTL were detected for fiber quality traits, and most of the detected QTL explained only less than half of the total genetic variation (Table 1). What causes the remaining genetic variation that is unexplained by QTL in large samples? One possibility is that there are many QTLs with very small effects, as assumed in classical models of quantitative genetics and these remain undetected even with very large sample sizes. Another possibility is the higher-order epistatic interactions, which are refractory to QTL mapping [89]. Further, a recurring complication in the use of QTL data is that different parental combinations and/or experiments conducted in different environments often result in identification of partly or wholly nonoverlapping sets of QTL. The majority of such differences in the QTL landscape are presumed to be due to environment sensitivity of genes [76]. Hence, proper care of including QTL × environment interaction analysis, which was found to be limited in the published literature, will improve the further progress of QTL mapping towards MAS.

6.3. Incongruence among QTL Studies

The use of stringent statistical thresholds to infer QTL while controlling experiment-wise error rates is another reason for identification of only a small fraction of these nonoverlapping QTLs [90]. Small QTL with opposite phenotypic effects might occasionally be closely linked in coupling in early generation populations and separated only in advanced-generation populations after additional recombination. Comparison of multiple QTL mapping experiments by alignment to a common reference map offers a more complete picture of the genetic control of a trait than can be obtained in any one study [76]. However, lack of common set of anchored markers in the published reports limits the comparison of QTLs across the genetic backgrounds.

6.4. Complexities in Integration of Functional Genomics with QTLs

Fiber gene function is highly conserved in the genomes of wild and cultivated species, as well as diploid and tetraploid species, despite millions of years of evolutionary history [91]. The phenotypic variation in fiber properties therefore is more likely one of quantitative differences in gene expression as opposed to differences in the genotype at the DNA level [84]. Hence, further studies are required to understand the number of copies of the genes, their regulation, and specific function in fiber development. Though systematic transcriptomic approaches can be combined with QTL analyses (discussed below), these studies do not address the occurrence of alternative splicing or the posttranslational modifications of the proteins. In addition, proteins can move in and out of other macromolecular complexes and thus modifying their functionality. This level of complexity cannot be tackled using transcriptomics alone, and hence, it is vital to include proteomics [92]. On the other hand, biochemical functions of only a small proportion of the identified proteins have been demonstrated and/or determined based on the assumptions that proteins sharing conserved domains have the same activity. Hence, the leftover proteins (domains of unknown function) remain as a challenge for elucidation of their biological function. In addition to that quantitative data on proteome and metabolome is still in its infant stage, and protein-protein interactions and protein with other macromolecules remain to be revealed. Therefore, complete knowledge on fiber growth and development at molecular level and its integration with QTL mapping is essential to design next-generation breeding strategies.

7. Alternatives and Future Perspectives

The importance and successful applications of MAS in crop breeding program have been shown in tomato, soybean, maize, pearl millet, rice, and so forth, [93, 94]. However, the realization of value of MAS in routine cotton breeding program for fiber productivity and quality has been realized only in few reports [17]. It highlights several insights and improvement in the current methodologies and tools, and the following strategies are proposed for successful MAS in cotton.

7.1. Meta Analysis of QTLs—Synergy through Networks

Though QTLs for several common traits were mapped, direct comparisons cannot be conducted since no common markers existed among these studies. Detected QTLs are held up within family, the size of QTL effects that can be detected are limited, and inferences are restricted to a single population and set of conditions [95]. Thus, one direction for QTL analysis is to combine information from several or many studies by meta-analysis. Integration of QTL from different populations into a common map facilitates exploration of their allelic and homeologous relationships, though the level of resolution is limited by comparative marker densities, variation in recombination rates in different crosses, variation in gene densities across the genome, and other factors. Using a high-density reference genetic map which consists of 3475 loci in total, Rong et al. [76] reported alignment of 432 QTLs mapped in one diploid and 10 tetraploid interspecific cotton populations and depicted in a CMap resource. Lacape et al. [66] conducted meta-analysis of more than thousand QTLs obtained from the RIL and BC populations derived from the same parents and reported consistent meta-clusters for fiber colour, fineness, and length. As per their discussion, although their result on cotton fiber can hardly support the optimistic assumption that QTLs are accurate, they have shown that the reliability of QTL-calls and the estimated trait impact can be improved by integrating more replicates in the analysis. Hence, it is imperative to verify the regions of convergence with new maps which share common markers with the consensus map produced by Rong et al. [76] and Lacape et al. [66]. A network collaborative project was initiated recently at this university by involving four institutes that are actively engaged in cotton genetic improvement. This project provides impetus for application of molecular markers and strengthens the capacity of partner institutions in meta-analysis of identified QTLs. In contrast, the level of public-private collaborations in such analysis is almost negligible at present.

7.2. Map-Based Cloning

As QTL mapping results accumulate over the next years, attention will turn to clone QTLs and then to using them. This requires higher resolution of QTL mapping, combined with a dense marker map [96]. A centimorgan (cM), corresponding to a crossover of 1%, can be a span of 10–1000 kbp and can vary across species or even within the chromosome of the given species [94]. This region may contain both desirable and undesirable genes, and hence to avoid the linkage drag of undesirable traits, it is important to establish the causal relationship between the QTL and phenotype using positional or map-based cloning. The physical size of a cM in cotton is not prohibitive to map-based cloning, but the lengthy genetic map will require a large number of markers in order to be sufficiently close to most genes for “chromosome walking” [9]. A new high-throughput marker, SNPs, is gaining its importance in this context, but huge initial investment for its generation necessitates simple innovative and economic marker techniques. It is also important to note that instead of using anonymous DNA markers, development and use of gene-specific functional markers such as SRAP, TRAP, and PAAP may increase the efficiency of map-based cloning. Further, map-based cloning in polyploids such as cotton introduces a new technical challenge not encountered in diploid (or highly diploidized) organisms, for example, that virtually all “single-copy” DNA probes occur at two or more unlinked loci. This makes it difficult to assign megabase DNA clones to their site of origin. One possible approach to this problem is the utilization of diploids in physical mapping and map-based cloning.

7.3. Cotton Genome Sequencing

Decoding cotton genomes will be a foundation for improving understanding of the functional and agronomic significance of polyploidy and genome size variation within the Gossypium genus [1]. The whole-genome shotgun sequence of the smallest Gossypium genome, G. raimondii, provided fundamental information about gene content and organization [97]. This sequence will be used to query homologous and orthologous genomes and to investigate the gene and allele basis of phenotypic and evolutionary diversity for cotton improvement. A good parallel approach may be to search for candidates in species that are having naturally superior fiber qualities. Sequencing of G. raimondii genome established the critical initial template for characterizing the spectrum of diversity among the eight Gossypium genome types and three polyploid clades and provided a reference for sequencing many genomes in Gossypium species which is essential for further improvement of cotton.

7.4. Advances in Functional Genomics

Several studies performed to compare the structural differences in the genomes have shown that the difference is in the expression pattern, rather than in the presence or absence of particular genes. The comparison of gene expression profiling between contrasting genotypes with respect to fiber quality can be extended to transcription profiling at the QTL level, and the genes identified at such QTLs may potentially be better candidates for superior fiber quality. In addition to cDNA and oligonucleotide microarrays, tiling path arrays can also be used to study gene expression in plants [98]. The advantage of tiling path arrays over conventional microarrays is that they are not stuck-up with the gene structure and hence provide unbiased and more accurate information about the transcriptome. In addition, they provide knowledge on transcriptional control at the chromosomal level. The use of tiling path arrays could help to provide better understanding on the fiber transcriptome at the genome-wide level, and it is yet to be tried in cotton. This will result into a paradigm shift from MAS to genomics-assisted selection.

7.5. System Quantitative Genetics—Bridging Subdisciplines

The ultimate objective of QTL mapping is to identify the causal genes or even the causal sequence changes, the quantitative trait nucleotides (QTNs). While this remains a major challenge, it has been achieved in a few instances [99]. Identification of candidate genes and enrichment of functional markers within small targeted genomic regions are driven by the increasing availability of sequence resources, genomic databases and by technological developments. If functional candidate genes for a trait are not known, colocation of candidate gene polymorphisms with map positions, linkage to QTL, association of alleles with specific traits, or the identification of syntenic regions among genomes can help to select positional candidate genes for the trait [100]. In an another approach called genetical genomics, gene expression profiles are quantitatively assessed within a segregating population, and expression quantitative trait loci (eQTL) can be mapped like classical QTLs [66, 101]. Though global eQTL mapping studies, using whole genome microarrays, have been published in yeast, Arabidopsis, maize, and eucalyptus, it is in preliminary stage in cotton [66]. In addition, a comparative picture of transcript versus protein abundance indicates that functionally important changes in the levels of the former are not necessarily reflected in changes in the levels of the later. It also holds good for metabolomes too. Hence, genes, proteins, metabolites, and phenotypes should be considered simultaneously to unravel the complex molecular circuitry that operates within the cell. A complete elucidation of the genotype-phenotype map does not seem to be feasible unless we can include all possible causal variables in the network-inference methodology. One has to take a global perspective on life processes instead of individual components of the system. The network approach connecting all these subdisciplines indicates the emergence of a system quantitative genetics [94].

7.6. Association Mapping and Alternatives

Association mapping provides another route to identifying QTLs that have effects across a broader spectrum of germplasm, if false positives that are caused by population structure can be minimized [95]. In addition, QTL mapping in biparental populations reveals only a slice of the genetic architecture for a trait because only alleles that differ between the two parental lines will segregate. Therefore, more comprehensive analyses of genetic architecture require consideration of multiple populations that represent a larger sample of the standing genetic variation in the species [95]. An important genetic resource developed in recent years is the construction of nested association mapping (NAM) population. The NAM population is a novel approach for mapping genes underlying complex traits, in which the statistical power of QTL mapping is combined with the high (potentially gene-level) chromosomal resolution of association mapping, and it has been adapted in maize [99]. Although sufficient diversity must be present in each association mapping panel, too much phenotypic diversity (or poor adaptation to any specific growing environment) may make it difficult to phenotype a panel in an association study. Thus, more region-specific association mapping panels may need to be created that contain germplasm more suited to specific growing regions. One such panel is being created at this university.

7.7. Improved Databases

There is a great need to expand bioinformatic infrastructure for managing, curating, and annotating the cotton genomic sequences that will be generated in the near future. The cotton genome sequence and functional genomics database of the future should be able to host and manage cotton information resources using community-accepted genome annotation, nomenclature, and gene ontology. Some existing databases may be upgraded to effectively handle a large amount of data flow and community requests, but additional resources will be sought to support key bioinformatic needs.

8. Conclusion

Significant strides have been made particularly with regard to understanding the phenotypic and molecular diversity in the cotton germplasm, identification of QTLs linked to fiber productivity and quality. Yet, the application of molecular marker-assisted breeding tools to accelerate gains in cotton productivity has barely begun, and there is vast potential and need to expand the scope and impact of such innovative breeding program. Progress in this direction will be further enhanced by bringing the information generated through “omics” studies. Further, as discussed above, involvement of innovative strategies, resource pooling, capacity building to deploy marker-assisted breeding in cotton will eventually lead to develop cotton cultivars improved with improved productivity and quality.

Acknowledgments

This work is supported by the Department of Biotechnology, Ministry of Science and Technology, Government of India under Program Support for Research and Development in Agricultural Biotechnology at TNAU. The authors sincerely apologize for not citing many enlightening papers due to space limitations.