Research Article

Leveraging Comparative Genomics to Identify and Functionally Characterize Genes Associated with Sperm Phenotypes in Python bivittatus (Burmese Python)

Figure 1

Overview of the comparative genomics approach to identify and characterize python genes associated with sperm phenotypes. The set of mouse protein coding genes was used to select the subset of mouse genes for which phenotype annotation information was available. Starting with all mouse genes having phenotype annotation, we identified the subset corresponding to protein coding genes associated with only sperm phenotypes. This set of mouse protein sequences was subsequently used to identify the corresponding protein coding sequences in Python bivittatus (i.e., orthologous genes). For each protein coding sequence shared between mouse and python, a pairwise protein sequence alignment was generated and measures of sequence identity and significance were calculated. Gene ontology (GO) annotation provides gene level information about biological processes, cellular locations, and molecular functions of gene products. The existing GO annotation for each mouse gene was “added” to each python orthologous gene. Then set of python genes was analyzed for statistically significant enrichment of genes associated with particular GO annotation terms across the three GO categories (biological process, cellular component, and molecular function). This resulting set of annotated python genes provides additional biological, physiological, cellular, and molecular information about the roles of these genes in sperm production and function. Moreover, the annotation also offers an independent set of annotation information to help validate the python genes as truly being associated with sperm biology. Additionally, cellular pathways which are associated with the sperm associated gene set were identified along with human disorders caused by human orthologs of these genes.