International Journal of Genomics The latest articles from Hindawi Publishing Corporation © 2016 , Hindawi Publishing Corporation . All rights reserved. Integration of HIV in the Human Genome: Which Sites Are Preferential? A Genetic and Statistical Assessment Thu, 12 May 2016 14:21:30 +0000 Chromosomal fragile sites (FSs) are loci where gaps and breaks may occur and are preferential integration targets for some viruses, for example, Hepatitis B, Epstein-Barr virus, HPV16, HPV18, and MLV vectors. However, the integration of the human immunodeficiency virus (HIV) in Giemsa bands and in FSs is not yet completely clear. This study aimed to assess the integration preferences of HIV in FSs and in Giemsa bands using an in silico study. HIV integration positions from Jurkat cells were used and two nonparametric tests were applied to compare HIV integration in dark versus light bands and in FS versus non-FS (NFSs). The results show that light bands are preferential targets for integration of HIV-1 in Jurkat cells and also that it integrates with equal intensity in FSs and in NFSs. The data indicates that HIV displays different preferences for FSs compared to other viruses. The aim was to develop and apply an approach to predict the conditions and constraints of HIV insertion in the human genome which seems to adequately complement empirical data. Juliana Gonçalves, Elsa Moreira, Inês J. Sequeira, António S. Rodrigues, José Rueff, and Aldina Brás Copyright © 2016 Juliana Gonçalves et al. All rights reserved. A Quantitative Genomic Approach for Analysis of Fitness and Stress Related Traits in a Drosophila melanogaster Model Population Tue, 19 Apr 2016 13:03:34 +0000 The ability of natural populations to withstand environmental stresses relies partly on their adaptive ability. In this study, we used a subset of the Drosophila Genetic Reference Panel, a population of inbred, genome-sequenced lines derived from a natural population of Drosophila melanogaster, to investigate whether this population harbors genetic variation for a set of stress resistance and life history traits. Using a genomic approach, we found substantial genetic variation for metabolic rate, heat stress resistance, expression of a major heat shock protein, and egg-to-adult viability investigated at a benign and a higher stressful temperature. This suggests that these traits will be able to evolve. In addition, we outline an approach to conduct pathway associations based on genomic linear models, which has potential to identify adaptive genes and pathways, and therefore can be a valuable tool in conservation genomics. Palle Duun Rohde, Kristian Krag, Volker Loeschcke, Johannes Overgaard, Peter Sørensen, and Torsten Nygaard Kristensen Copyright © 2016 Palle Duun Rohde et al. All rights reserved. The Microbiome of Animals: Implications for Conservation Biology Mon, 18 Apr 2016 13:40:33 +0000 In recent years the human microbiome has become a growing area of research and it is becoming clear that the microbiome of humans plays an important role for human health. Extensive research is now going into cataloging and annotating the functional role of the human microbiome. The ability to explore and describe the microbiome of any species has become possible due to new methods for sequencing. These techniques allow comprehensive surveys of the composition of the microbiome of nonmodel organisms of which relatively little is known. Some attention has been paid to the microbiome of insect species including important vectors of pathogens of human and veterinary importance, agricultural pests, and model species. Together these studies suggest that the microbiome of insects is highly dependent on the environment, species, and populations and affects the fitness of species. These fitness effects can have important implications for the conservation and management of species and populations. Further, these results are important for our understanding of invasion of nonnative species, responses to pathogens, and responses to chemicals and global climate change in the present and future. Simon Bahrndorff, Tibebu Alemu, Temesgen Alemneh, and Jeppe Lund Nielsen Copyright © 2016 Simon Bahrndorff et al. All rights reserved. Transcriptomic Analysis of Resistant and Susceptible Bombyx mori Strains Following BmNPV Infection Provides Insights into the Antiviral Mechanisms Mon, 18 Apr 2016 07:33:11 +0000 Purpose. To decipher transcriptomic changes and related genes with potential functions against Bombyx mori nucleopolyhedrovirus infection and to increase the understanding of the enhanced virus resistance of silkworm on the transcriptomic level. Methods. We assembled and annotated transcriptomes of the Qiufeng (susceptible to infection) and QiufengN (resistant to infection) strains and performed comparative analysis in order to decipher transcriptomic changes and related genes with potential functions against BmNPV infection. Results. A total of 78,408 SNPs were identified in the Qiufeng strain of silkworm and 56,786 SNPs were identified in QiufengN strain. Besides, novel AS events were found in these 2 strains. In addition, 1,728 DEGs were identified in the QiufengN strain compared with Qiufeng strain. These DEGs were involved in GO terms related to membrane, metabolism, binding and catalytic activity, cellular processes, and organismal systems. The highest levels of gene representation were found in oxidative phosphorylation, phagosome, TCA cycle, arginine and proline metabolism, and pyruvate metabolism. Additionally, COG analysis indicated that DEGs were involved in “amino acid transport and metabolism” and “carbohydrate transport and metabolism.” Conclusion. We identified a series of major pathological changes in silkworm following infection and several functions were related to the antiviral mechanisms of silkworm. Gang Li, Heying Qian, Xufang Luo, Pingzhen Xu, Jianhua Yang, Mingzhu Liu, and Anying Xu Copyright © 2016 Gang Li et al. All rights reserved. The Whole Genome Assembly and Comparative Genomic Research of Thellungiella parvula (Extremophile Crucifer) Mitochondrion Mon, 11 Apr 2016 06:49:53 +0000 The complete nucleotide sequences of the mitochondrial (mt) genome of an extremophile species Thellungiella parvula (T. parvula) have been determined with the lengths of 255,773 bp. T. parvula mt genome is a circular sequence and contains 32 protein-coding genes, 19 tRNA genes, and three ribosomal RNA genes with a 11.5% coding sequence. The base composition of 27.5% A, 27.5% T, 22.7% C, and 22.3% G in descending order shows a slight bias of 55% AT. Fifty-three repeats were identified in the mitochondrial genome of T. parvula, including 24 direct repeats, 28 tandem repeats (TRs), and one palindromic repeat. Furthermore, a total of 199 perfect microsatellites have been mined with a high A/T content (83.1%) through simple sequence repeat (SSR) analysis and they were distributed unevenly within this mitochondrial genome. We also analyzed other plant mitochondrial genomes’ evolution in general, providing clues for the understanding of the evolution of organelles genomes in plants. Comparing with other Brassicaceae species, T. parvula is related to Arabidopsis thaliana whose characters of low temperature resistance have been well documented. This study will provide important genetic tools for other Brassicaceae species research and improve yields of economically important plants. Xuelin Wang, Changwei Bi, Yiqing Xu, Suyun Wei, Xiaogang Dai, Tongming Yin, and Ning Ye Copyright © 2016 Xuelin Wang et al. All rights reserved. Differential Methylation of Genomic Regions Associated with Heteroblasty Detected by M&M Algorithm in the Nonmodel Species Eucalyptus globulus Labill. Wed, 30 Mar 2016 16:17:26 +0000 Epigenetic regulation plays important biological roles in plants, including timing of flowering and endosperm development. Little is known about the mechanisms controlling heterochrony (the change in the timing or rate of developmental events during ontogeny) in Eucalyptus globulus. DNA methylation has been proposed as a potential heterochrony regulatory mechanism in model species, but its role during the vegetative phase in E. globulus has not been explored. In order to investigate the molecular mechanisms governing heterochrony in E. globulus, we have developed a workflow aimed at generating high-resolution hypermethylome and hypomethylome maps that have been tested in two stages of vegetative growth phase: juvenile (6-month leaves) and adult (30-month leaves). We used the M&M algorithm, a computational approach that integrates MeDIP-seq and MRE-seq data, to identify differentially methylated regions (DMRs). Thousands of DMRs between juvenile and adult leaves of E. globulus were found. Although further investigations are required to define the loci associated with heterochrony/heteroblasty that are regulated by DNA methylation, these results suggest that locus-specific methylation could be major regulators of vegetative phase change. This information can support future conservation programs, for example, selecting the best methylomes for a determinate environment in a restoration project. Rodrigo Hasbún, Carolina Iturra, Soraya Bravo, Boris Rebolledo-Jaramillo, and Luis Valledor Copyright © 2016 Rodrigo Hasbún et al. All rights reserved. Transcriptome Analysis of Bovine Ovarian Follicles at Predeviation and Onset of Deviation Stages of a Follicular Wave Mon, 21 Mar 2016 09:20:12 +0000 For two libraries (PDF1 and ODF1) using Illumina sequencing 44,082,301 and 43,708,132 clean reads were obtained, respectively. After being mapped to the bovine RefSeq database, 15,533 genes were identified to be expressed in both types of follicles (cut-off RPKM > 0.5), of which 719 were highly expressed in bovine follicles (cut-off RPKM > 100). Furthermore, 83 genes were identified as being differentially expressed in ODF1 versus PDF1, where 42 genes were upregulated and 41 genes were downregulated. KEGG pathway analysis revealed two upregulated genes in ODF1 versus PDF1, CYP11A1, and CYP19A1, which are important genes in the steroid hormone biosynthesis pathway. This study represents the first investigation of transcriptome of bovine follicles at predeviation and onset of deviation stages and provides a foundation for future investigation of the regulatory mechanisms involved in follicular development in cattle. Pengfei Li, Jinzhu Meng, Wenzhong Liu, George W. Smith, Jianbo Yao, and Lihua Lyu Copyright © 2016 Pengfei Li et al. All rights reserved. The Use of Genomics in Conservation Management of the Endangered Visayan Warty Pig (Sus cebifrons) Wed, 16 Mar 2016 14:00:03 +0000 The list of threatened and endangered species is growing rapidly, due to various anthropogenic causes. Many endangered species are present in captivity and actively managed in breeding programs in which often little is known about the founder individuals. Recent developments in genetic research techniques have made it possible to sequence and study whole genomes. In this study we used the critically endangered Visayan warty pig (Sus cebifrons) as a case study to test the use of genomic information as a tool in conservation management. Two captive populations of S. cebifrons exist, which originated from two different Philippine islands. We found some evidence for a recent split between the two island populations; however all individuals that were sequenced show a similar demographic history. Evidence for both past and recent inbreeding indicated that the founders were at least to some extent related. Together with this, the low level of nucleotide diversity compared to other Sus species potentially poses a threat to the viability of the captive populations. In conclusion, genomic techniques answered some important questions about this critically endangered mammal and can be a valuable toolset to inform future conservation management in other species as well. Rascha J. M. Nuijten, Mirte Bosse, Richard P. M. A. Crooijmans, Ole Madsen, Willem Schaftenaar, Oliver A. Ryder, Martien A. M. Groenen, and Hendrik-Jan Megens Copyright © 2016 Rascha J. M. Nuijten et al. All rights reserved. Use of Posttranscription Gene Silencing in Squash to Induce Resistance against the Egyptian Isolate of the Squash Leaf Curl Virus Sun, 13 Mar 2016 13:42:56 +0000 Squash leaf curl virus (SqLCV) is a bipartite begomovirus affecting squash plants. It is transmitted by whitefly Bemisia tabaci biotype B causing severe leaf curling, vein banding, and molting ending by stunting. In this study full-length genomic clone of SqLCV Egyptian isolated and posttranscriptional gene silencing (PTGS) has been induced to develop virus resistance. The Noubaria SqLCV has more than 95% homology with Jordon, Israel, Lebanon, Palestine, and Cairo isolates. Two genes fragment from SqLCV introduced in sense and antisense orientations using pFGC5049 vector to be expressed as hairpin RNA. The first fragment was 348 bp from replication associated protein gene (Rep). The second fragment was 879 bp representing the full sequence of the movement protein gene (BC1). Using real-time PCR, a silencing record of 97% has been recorded to Rep/TrAP construct; as a result it has prevented the appearance of viral symptoms in most tested plants up to two months after infection, while construct containing the BC1 gene scored a reduction in the accumulation of viral genome expression as appearing in real-time PCR results 4.6-fold giving a silencing of 79%, which had a positive effect on symptoms development in most tested plants. Omnia Taha, Inas Farouk, Abdelhadi Abdallah, and Naglaa A. Abdallah Copyright © 2016 Omnia Taha et al. All rights reserved. Transcriptome Profile of the Asian Giant Hornet (Vespa mandarinia) Using Illumina HiSeq 4000 Sequencing: De Novo Assembly, Functional Annotation, and Discovery of SSR Markers Sun, 10 Jan 2016 12:07:16 +0000 Vespa mandarinia found in the forests of East Asia, including Korea, occupies the highest rank in the arthropod food web within its geographical range. It serves as a source of nutrition in the form of Vespa amino acid mixture and is listed as a threatened species, although no conservation measures have been implemented. Here, we performed de novo assembly of the V. mandarinia transcriptome by Illumina HiSeq 4000 sequencing. Over 60 million raw reads and 59,184,811 clean reads were obtained. After assembly, a total of 66,837 unigenes were clustered, 40,887, 44,455, and 22,390 of which showed homologous matches against the PANM, Unigene, and KOG databases, respectively. A total of 15,675 unigenes were assigned to Gene Ontology terms, and 5,132 unigenes were mapped to 115 KEGG pathways. The zinc finger domain (C2H2-like), serine/threonine/dual specificity protein kinase domain, and RNA recognition motif domain were among the top InterProScan domains predicted for V. mandarinia sequences. Among the unigenes, we identified 534,922 cDNA simple sequence repeats as potential markers. This is the first transcriptomic analysis of the wasp V. mandarinia using Illumina HiSeq 4000. The obtained datasets should promote the search for new genes to understand the physiological attributes of this wasp. Bharat Bhusan Patnaik, So Young Park, Se Won Kang, Hee-Ju Hwang, Tae Hun Wang, Eun Bi Park, Jong Min Chung, Dae Kwon Song, Changmu Kim, Soonok Kim, Jae Bong Lee, Heon Cheon Jeong, Hong Seog Park, Yeon Soo Han, and Yong Seok Lee Copyright © 2016 Bharat Bhusan Patnaik et al. All rights reserved. Plant Comparative and Functional Genomics Thu, 31 Dec 2015 09:07:35 +0000 Xiaohan Yang, Jim Leebens-Mack, Feng Chen, and Yanbin Yin Copyright © 2015 Xiaohan Yang et al. All rights reserved. Quantification and Gene Expression Analysis of Histone Deacetylases in Common Bean during Rust Fungal Inoculation Mon, 28 Dec 2015 09:33:22 +0000 Histone deacetylases (HDACs) play an important role in plant growth, development, and defense processes and are one of the primary causes of epigenetic modifications in a genome. There was only one study reported on epigenetic modifications of the important legume crop, common bean, and its interaction with the fungal rust pathogen Uromyces appendiculatus prior to this project. We measured the total active HDACs levels in leaf tissues and observed expression patterns for the selected HDAC genes at 0, 12, and 84 hours after inoculation in mock inoculated and inoculated plants. Colorimetric analysis showed that the total amount of HDACs present in the leaf tissue decreased at 12 hours in inoculated plants compared to mock inoculated control plants. Gene expression analyses indicated that the expression pattern of gene PvSRT1 is similar to the trend of total active HDACs in this time course experiment. Gene PvHDA6 showed increased expression in the inoculated plants during the time points measured. This is one of the first attempts to study expression levels of HDACs in economically important legumes in the context of plant pathogen interactions. Findings from our study will be helpful to understand trends of total active HDACs and expression patterns of these genes under study during biotic stress. Kalpalatha Melmaiee, Venu (Kal) Kalavacharla, Adrianne Brown, Antonette Todd, Yaqoob Thurston, and Sathya Elavarthi Copyright © 2015 Kalpalatha Melmaiee et al. All rights reserved. Practical Calling Approach for Exome Array-Based Genome-Wide Association Studies in Korean Population Sun, 27 Dec 2015 13:10:07 +0000 Exome-based genotyping arrays are cost-effective and have recently been used as alternative platforms to whole-exome sequencing. However, the automated clustering algorithm in an exome array has a genotype calling problem in accuracy for identifying rare and low-frequency variants. To address these shortcomings, we present a practical approach for accurate genotype calling using the Illumina Infinium HumanExome BeadChip. We present comparison results and a statistical summary of our genotype data sets. Our data set comprises 14,647 Korean samples. To solve the limitation of automated clustering, we performed manual genotype clustering for the targeted identification of 46,076 variants that were identified using GenomeStudio software. To evaluate the effects of applying custom cluster files, we tested cluster files using 804 independent Korean samples and the same platform. Our study firstly suggests practical guidelines for exome chip quality control in Asian populations and provides valuable insight into an association study using exome chip. Tae-Joon Park, Lyong Heo, Sanghoon Moon, Young Jin Kim, Ji Hee Oh, Sohee Han, and Bong-Jo Kim Copyright © 2015 Tae-Joon Park et al. All rights reserved. Genome-Wide Identification and Characterization of the LRR-RLK Gene Family in Two Vernicia Species Sun, 13 Dec 2015 09:21:36 +0000 Leucine-rich repeat receptor-like kinases (LRR-RLKs) make up the largest group of RLKs in plants and play important roles in many key biological processes such as pathogen response and signal transduction. To date, most studies on LRR-RLKs have been conducted on model plants. Here, we identified 236 and 230 LRR-RLKs in two industrial oil-producing trees: Vernicia fordii and Vernicia montana, respectively. Sequence alignment analyses showed that the homology of the RLK domain (23.81%) was greater than that of the LRR domain (9.51%) among the Vf/VmLRR-RLKs. The conserved motif of the LRR domain in Vf/VmLRR-RLKs matched well the known plant LRR consensus sequence but differed at the third last amino acid (W or L). Phylogenetic analysis revealed that Vf/VmLRR-RLKs were grouped into 16 subclades. We characterized the expression profiles of Vf/VmLRR-RLKs in various tissue types including root, leaf, petal, and kernel. Further investigation revealed that Vf/VmLRR-RLK orthologous genes mainly showed similar expression patterns in response to tree wilt disease, except 4 pairs of Vf/VmLRR-RLKs that showed opposite expression trends. These results represent an extensive evaluation of LRR-RLKs in two industrial oil trees and will be useful for further functional studies on these proteins. Huiping Zhu, Yangdong Wang, Hengfu Yin, Ming Gao, Qiyan Zhang, and Yicun Chen Copyright © 2015 Huiping Zhu et al. All rights reserved. Divergence of the bZIP Gene Family in Strawberry, Peach, and Apple Suggests Multiple Modes of Gene Evolution after Duplication Mon, 07 Dec 2015 14:09:01 +0000 The basic leucine zipper (bZIP) transcription factors are the most diverse members of dimerizing transcription factors. In the present study, 50, 116, and 47 bZIP genes were identified in Malus domestica (apple), Prunus persica (peach), and Fragaria vesca (strawberry), respectively. Species-specific duplication was the main contributor to the large number of bZIPs observed in apple. After WGD in apple genome, orthologous bZIP genes corresponding to strawberry on duplicated regions in apple genome were retained. However, in peach ancestor, these syntenic regions were quickly lost or deleted. Maybe the positive selection contributed to the expansion of clade S to adapt to the development and environment stresses. In addition, purifying selection was mainly responsible for bZIP sequence-specific DNA binding. The analysis of orthologous pairs between chromosomes indicates that these orthologs derived from one gene duplication located on one of the nine ancient chromosomes in the Rosaceae. The comparative analysis of bZIP genes in three species provides information on the evolutionary fate of bZIP genes in apple and peach after they diverged from strawberry. Xiao-Long Wang, Yan Zhong, Zong-Ming Cheng, and Jin-Song Xiong Copyright © 2015 Xiao-Long Wang et al. All rights reserved. De Novo Assembly of the Pea (Pisum sativum L.) Nodule Transcriptome Tue, 24 Nov 2015 14:22:42 +0000 The large size and complexity of the garden pea (Pisum sativum L.) genome hamper its sequencing and the discovery of pea gene resources. Although transcriptome sequencing provides extensive information about expressed genes, some tissue-specific transcripts can only be identified from particular organs under appropriate conditions. In this study, we performed RNA sequencing of polyadenylated transcripts from young pea nodules and root tips on an Illumina GAIIx system, followed by de novo transcriptome assembly using the Trinity program. We obtained more than 58,000 and 37,000 contigs from “Nodules” and “Root Tips” assemblies, respectively. The quality of the assemblies was assessed by comparison with pea expressed sequence tags and transcriptome sequencing project data available from NCBI website. The “Nodules” assembly was compared with the “Root Tips” assembly and with pea transcriptome sequencing data from projects indicating tissue specificity. As a result, approximately 13,000 nodule-specific contigs were found and annotated by alignment to known plant protein-coding sequences and by Gene Ontology searching. Of these, 581 sequences were found to possess full CDSs and could thus be considered as novel nodule-specific transcripts of pea. The information about pea nodule-specific gene sequences can be applied for gene-based markers creation, polymorphism studies, and real-time PCR. Vladimir A. Zhukov, Alexander I. Zhernakov, Olga A. Kulaeva, Nikita I. Ershov, Alexey Y. Borisov, and Igor A. Tikhonovich Copyright © 2015 Vladimir A. Zhukov et al. All rights reserved. Expressed Sequence Tags Analysis and Design of Simple Sequence Repeats Markers from a Full-Length cDNA Library in Perilla frutescens (L.) Thu, 19 Nov 2015 13:06:04 +0000 Perilla frutescens is valuable as a medicinal plant as well as a natural medicine and functional food. However, comparative genomics analyses of P. frutescens are limited due to a lack of gene annotations and characterization. A full-length cDNA library from P. frutescens leaves was constructed to identify functional gene clusters and probable EST-SSR markers via analysis of 1,056 expressed sequence tags. Unigene assembly was performed using basic local alignment search tool (BLAST) homology searches and annotated Gene Ontology (GO). A total of 18 simple sequence repeats (SSRs) were designed as primer pairs. This study is the first to report comparative genomics and EST-SSR markers from P. frutescens will help gene discovery and provide an important source for functional genomics and molecular genetic research in this interesting medicinal plant. Eun Soo Seong, Ji Hye Yoo, Jae Hoo Choi, Chang Heum Kim, Mi Ran Jeon, Byeong Ju Kang, Jae Geun Lee, Seon Kang Choi, Bimal Kumar Ghimire, and Chang Yeon Yu Copyright © 2015 Eun Soo Seong et al. All rights reserved. De Novo Transcriptome Sequencing of the Orange-Fleshed Sweet Potato and Analysis of Differentially Expressed Genes Related to Carotenoid Biosynthesis Sun, 15 Nov 2015 13:07:50 +0000 Sweet potato, Ipomoea batatas (L.) Lam., is an important food crop worldwide. The orange-fleshed sweet potato is considered to be an important source of beta-carotene. In this study, the transcriptome profiles of an orange-fleshed sweet potato cultivar “Weiduoli” and its mutant “HVB-3” with high carotenoid content were determined by using the high-throughput sequencing technology. A total of 13,767,387 and 9,837,090 high-quality reads were produced from Weiduoli and HVB-3, respectively. These reads were de novo assembled into 58,277 transcripts and 35,909 unigenes with an average length of 596 bp and 533 bp, respectively. In all, 874 differentially expressed genes (DEGs) were obtained between Weiduoli and HVB-3, 401 of which were upregulated and 473 were downregulated in HVB-3 compared to Weiduoli. Of the 697 DEGs annotated, 316 DEGs had GO terms and 62 DEGs were mapped onto 50 pathways. The 22 DEGs and 31 transcription factors involved in carotenoid biosynthesis were identified between Weiduoli and HVB-3. In addition, 1,725 SSR markers were detected. This study provides the genomic resources for discovering the genes involved in carotenoid biosynthesis of sweet potato and other plants. Ruijie Li, Hong Zhai, Chen Kang, Degao Liu, Shaozhen He, and Qingchang Liu Copyright © 2015 Ruijie Li et al. All rights reserved. Quantitative Shotgun Proteomics Analysis of Rice Anther Proteins after Exposure to High Temperature Thu, 05 Nov 2015 12:45:31 +0000 In rice, the stage of development most sensitive to high temperature stress is flowering, and exposure at this stage can result in spikelet sterility, thereby leading to significant yield losses. In this study, protein expression patterns of rice anthers from Dianxi4, a high temperature tolerant Japonica rice variety, were compared between samples exposed to high temperature and those grown in natural field conditions in Korea. Shotgun proteomics analysis of three replicate control and high-temperature-treated samples identified 3,266 nonredundant rice anther proteins (false discovery rate < 0.01). We found that high levels of ATP synthase, cupin domain-containing proteins, and pollen allergen proteins were present in rice anthers. Comparative analyses of 1,944 reproducibly expressed proteins identified 139 differentially expressed proteins, with 95 increased and 44 decreased in response to high temperature conditions. Heat shock, DnaK family, and chaperone proteins showed highly increased expression, suggesting that the high temperature tolerance of Dianxi4 is achieved by stabilization of proteins in pollen cells. Trehalose synthase was also highly increased after heat treatment, suggesting a possible role for trehalose in preventing protein denaturation through desiccation. Mijeong Kim, Hijin Kim, Wondo Lee, Yoonjung Lee, Soon-Wook Kwon, and Joohyun Lee Copyright © 2015 Mijeong Kim et al. All rights reserved. Identification of Immune Related LRR-Containing Genes in Maize (Zea mays L.) by Genome-Wide Sequence Analysis Thu, 22 Oct 2015 07:52:58 +0000 A large number of immune receptors consist of nucleotide binding site-leucine rich repeat (NBS-LRR) proteins and leucine rich repeat-receptor-like kinases (LRR-RLK) that play a crucial role in plant disease resistance. Although many NBS-LRR genes have been previously identified in Zea mays, there are no reports on identifying NBS-LRR genes encoded in the N-terminal Toll/interleukin-1 receptor (TIR) motif and identifying genome-wide LRR-RLK genes. In the present study, 151 NBS-LRR genes and 226 LRR-RLK genes were identified after performing bioinformatics analysis of the entire maize genome. Of these identified genes, 64 NBS-LRR genes and four TIR-NBS-LRR genes were identified for the first time. The NBS-LRR genes are unevenly distributed on each chromosome with gene clusters located at the distal end of each chromosome, while LRR-RLK genes have a random chromosomal distribution with more paired genes. Additionally, six LRR-RLK/RLPs including FLS2, PSY1R, PSKR1, BIR1, SERK3, and Cf5 were characterized in Zea mays for the first time. Their predicted amino acid sequences have similar protein structures with their respective homologues in other plants, indicating that these maize LRR-RLK/RLPs have the same functions as their homologues act as immune receptors. The identified gene sequences would assist in the study of their functions in maize. Wei Song, Baoqiang Wang, Xinghua Li, Jianfen Wei, Ling Chen, Dongmin Zhang, Wenying Zhang, and Ronggai Li Copyright © 2015 Wei Song et al. All rights reserved. Novel Computational Technologies for Next-Generation Sequencing Data Analysis and Their Applications Tue, 20 Oct 2015 13:51:29 +0000 Chuan Yi Tang, Che-Lun Hung, Huiru Zheng, Chun-Yuan Lin, and Hai Jiang Copyright © 2015 Chuan Yi Tang et al. All rights reserved. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency Mon, 19 Oct 2015 11:52:56 +0000 Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. Rodrigo Aniceto, Rene Xavier, Valeria Guimarães, Fernanda Hondo, Maristela Holanda, Maria Emilia Walter, and Sérgio Lifschitz Copyright © 2015 Rodrigo Aniceto et al. All rights reserved. RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes Mon, 19 Oct 2015 09:10:16 +0000 Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software. Krisztian Buza, Bartek Wilczynski, and Norbert Dojer Copyright © 2015 Krisztian Buza et al. All rights reserved. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System Mon, 19 Oct 2015 07:49:36 +0000 The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. Yu Liu, Yang Hong, Chun-Yuan Lin, and Che-Lun Hung Copyright © 2015 Yu Liu et al. All rights reserved. A New Binning Method for Metagenomics by One-Dimensional Cellular Automata Sun, 18 Oct 2015 12:20:46 +0000 More and more developed and inexpensive next-generation sequencing (NGS) technologies allow us to extract vast sequence data from a sample containing multiple species. Characterizing the taxonomic diversity for the planet-size data plays an important role in the metagenomic studies, while a crucial step for doing the study is the binning process to group sequence reads from similar species or taxonomic classes. The metagenomic binning remains a challenge work because of not only the various read noises but also the tremendous data volume. In this work, we propose an unsupervised binning method for NGS reads based on the one-dimensional cellular automaton (1D-CA). Our binning method facilities to reduce the memory usage because 1D-CA costs only linear space. Experiments on synthetic dataset exhibit that our method is helpful to identify species of lower abundance compared to the proposed tool. Ying-Chih Lin Copyright © 2015 Ying-Chih Lin. All rights reserved. SimpLiFiCPM: A Simple and Lightweight Filter-Based Algorithm for Circular Pattern Matching Sun, 18 Oct 2015 12:14:25 +0000 This paper deals with the circular pattern matching (CPM) problem, which appears as an interesting problem in many biological contexts. CPM consists in finding all occurrences of the rotations of a pattern of length in a text of length . In this paper, we present SimpLiFiCPM (pronounced “Simplify CPM”), a simple and lightweight filter-based algorithm to solve the problem. We compare our algorithm with the state-of-the-art algorithms and the results are found to be excellent. Much of the speed of our algorithm comes from the fact that our filters are effective but extremely simple and lightweight. Md. Aashikur Rahman Azim, Costas S. Iliopoulos, M. Sohel Rahman, and M. Samiruzzaman Copyright © 2015 Md. Aashikur Rahman Azim et al. All rights reserved. Genome-Wide Identification of Genes Probably Relevant to the Uniqueness of Tea Plant (Camellia sinensis) and Its Cultivars Mon, 12 Oct 2015 14:16:05 +0000 Tea (Camellia sinensis) is a popular beverage all over the world and a number of studies have focused on the genetic uniqueness of tea and its cultivars. However, molecular mechanisms underlying these phenomena are largely undefined. In this report, based on expression data available from public databases, we performed a series of analyses to identify genes probably relevant to the uniqueness of C. sinensis and two of its cultivars (LJ43 and ZH2). Evolutionary analyses showed that the evolutionary rates of genes involved in the pathways were not significantly different among C. sinensis, C. oleifera, and C. azalea. Interestingly, a number of gene families, including genes involved in the pathways synthesizing iconic secondary metabolites of tea plant, were significantly upregulated, expressed in C. sinensis (LJ43) when compared to C. azalea, and this may partially explain its higher content of flavonoid, theanine, and caffeine. Further investigation showed that nonsynonymous mutations may partially contribute to the differences between the two cultivars of C. sinensis, such as the chlorina and higher contents of amino acids in ZH2. Genes identified as candidates are probably relevant to the uniqueness of C. sinensis and its cultivars should be good candidates for subsequent functional analyses and marker-assisted breeding. Yan Wei, Wang Jing, Zhou Youxiang, Zhao Mingming, Gong Yan, Ding Hua, Peng Lijun, and Hu Dingjin Copyright © 2015 Yan Wei et al. All rights reserved. Analysis of Polygala tenuifolia Transcriptome and Description of Secondary Metabolite Biosynthetic Pathways by Illumina Sequencing Mon, 12 Oct 2015 08:43:35 +0000 Radix polygalae, the dried roots of Polygala tenuifolia and P. sibirica, is one of the most well-known traditional Chinese medicinal plants. Radix polygalae contains various saponins, xanthones, and oligosaccharide esters and these compounds are responsible for several pharmacological properties. To provide basic breeding information, enhance molecular biological analysis, and determine secondary metabolite biosynthetic pathways of P. tenuifolia, we applied Illumina sequencing technology and de novo assembly. We also applied this technique to gain an overview of P. tenuifolia transcriptome from samples with different years. Using Illumina sequencing, approximately 67.2% of unique sequences were annotated by basic local alignment search tool similarity searches against public sequence databases. We classified the annotated unigenes by using Nr, Nt, GO, COG, and KEGG databases compared with NCBI. We also obtained many candidates CYP450s and UGTs by the analysis of genes in the secondary metabolite biosynthetic pathways, including putative terpenoid backbone and phenylpropanoid biosynthesis pathway. With this transcriptome sequencing, future genetic and genomics studies related to the molecular mechanisms associated with the chemical composition of P. tenuifolia may be improved. Genes involved in the enrichment of secondary metabolite biosynthesis-related pathways could enhance the potential applications of P. tenuifolia in pharmaceutical industries. Hongling Tian, Xiaoshuang Xu, Fusheng Zhang, Yaoqin Wang, Shuhong Guo, Xuemei Qin, and Guanhua Du Copyright © 2015 Hongling Tian et al. All rights reserved. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction Sun, 11 Oct 2015 13:23:54 +0000 Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species. Jianzhuang Yao, Hong Guo, and Xiaohan Yang Copyright © 2015 Jianzhuang Yao et al. All rights reserved. Significant Microsynteny with New Evolutionary Highlights Is Detected through Comparative Genomic Sequence Analysis of Maize CCCH IX Gene Subfamily Sun, 11 Oct 2015 10:28:29 +0000 CCCH zinc finger proteins, which are characterized by the presence of three cysteine residues and one histidine residue, play important roles in RNA processing in plants. Subfamily IX CCCH proteins were recently shown to function in stress tolerances. In this study, we analyzed CCCH IX genes in Zea mays, Oryza sativa, and Sorghum bicolor. These genes, which are almost intronless, were divided into four groups based on phylogenetic analysis. Microsynteny analysis revealed microsynteny in regions of some gene pairs, indicating that segmental duplication has played an important role in the expansion of this gene family. In addition, we calculated the dates of duplication by Ks analysis, finding that all microsynteny blocks were formed after the monocot-eudicot divergence. We found that deletions, multiplications, and inversions were shown to have occurred over the course of evolution. Moreover, the Ka/Ks ratios indicated that the genes in these three grass species are under strong purifying selection. Finally, we investigated the evolutionary patterns of some gene pairs conferring tolerance to abiotic stress, laying the foundation for future functional studies of these transcription factors. Wei-Jun Chen, Yang Zhao, Xiao-Jian Peng, Qing Dong, Jing Jin, Wei Zhou, Bei-Jiu Cheng, and Qing Ma Copyright © 2015 Wei-Jun Chen et al. All rights reserved.