BioMed Research International: Bioinformatics The latest articles from Hindawi Publishing Corporation © 2015 , Hindawi Publishing Corporation . All rights reserved. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction Sun, 04 Oct 2015 11:09:01 +0000 Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. Daqing Zhang, Jianfeng Xiao, Nannan Zhou, Mingyue Zheng, Xiaomin Luo, Hualiang Jiang, and Kaixian Chen Copyright © 2015 Daqing Zhang et al. All rights reserved. How to Use SNP_TATA_Comparator to Find a Significant Change in Gene Expression Caused by the Regulatory SNP of This Gene’s Promoter via a Change in Affinity of the TATA-Binding Protein for This Promoter Sun, 04 Oct 2015 07:28:06 +0000 The use of biomedical SNP markers of diseases can improve effectiveness of treatment. Genotyping of patients with subsequent searching for SNPs more frequent than in norm is the only commonly accepted method for identification of SNP markers within the framework of translational research. The bioinformatics applications aimed at millions of unannotated SNPs of the “1000 Genomes” can make this search for SNP markers more focused and less expensive. We used our Web service involving Fisher’s -score for candidate SNP markers to find a significant change in a gene’s expression. Here we analyzed the change caused by SNPs in the gene’s promoter via a change in affinity of the TATA-binding protein for this promoter. We provide examples and discuss how to use this bioinformatics application in the course of practical analysis of unannotated SNPs from the “1000 Genomes” project. Using known biomedical SNP markers, we identified 17 novel candidate SNP markers nearby: rs549858786 (rheumatoid arthritis); rs72661131 (cardiovascular events in rheumatoid arthritis); rs562962093 (stroke); rs563558831 (cyclophosphamide bioactivation); rs55878706 (malaria resistance, leukopenia), rs572527200 (asthma, systemic sclerosis, and psoriasis), rs371045754 (hemophilia B), rs587745372 (cardiovascular events); rs372329931, rs200209906, rs367732974, and rs549591993 (all four: cancer); rs17231520 and rs569033466 (both: atherosclerosis); rs63750953, rs281864525, and rs34166473 (all three: malaria resistance, thalassemia). Mikhail Ponomarenko, Dmitry Rasskazov, Olga Arkova, Petr Ponomarenko, Valentin Suslov, Ludmila Savinkova, and Nikolay Kolchanov Copyright © 2015 Mikhail Ponomarenko et al. All rights reserved. Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Zebrafish Embryo Thu, 01 Oct 2015 13:15:34 +0000 Recent progress in microscopy technologies, biological markers, and automated processing methods is making possible the development of gene expression atlases at cellular-level resolution over whole embryos. Raw data on gene expression is usually very noisy. This noise comes from both experimental (technical/methodological) and true biological sources (from stochastic biochemical processes). In addition, the cells or nuclei being imaged are irregularly arranged in 3D space. This makes the processing, extraction, and study of expression signals and intrinsic biological noise a serious challenge for 3D data, requiring new computational approaches. Here, we present a new approach for studying gene expression in nuclei located in a thick layer around a spherical surface. The method includes depth equalization on the sphere, flattening, interpolation to a regular grid, pattern extraction by Shaped 3D singular spectrum analysis (SSA), and interpolation back to original nuclear positions. The approach is demonstrated on several examples of gene expression in the zebrafish egg (a model system in vertebrate development). The method is tested on several different data geometries (e.g., nuclear positions) and different forms of gene expression patterns. Fully 3D datasets for developmental gene expression are becoming increasingly available; we discuss the prospects of applying 3D-SSA to data processing and analysis in this growing field. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Analysis of Chemical Properties of Edible and Medicinal Ginger by Metabolomics Approach Thu, 01 Oct 2015 13:06:22 +0000 In traditional herbal medicine, comprehensive understanding of bioactive constituent is important in order to analyze its true medicinal function. We investigated the chemical properties of medicinal and edible ginger cultivars using a liquid-chromatography mass spectrometry (LC-MS) approach. Our PCA results indicate the importance of acetylated derivatives of gingerol, not gingerol or shogaol, as the medicinal indicator. A newly developed ginger cultivar, Z. officinale cv. Ogawa Umare or “Ogawa Umare” (OG), contains more active ingredients, showing properties as a new resource for the production of herbal medicines derived from ginger in terms of its chemical constituents and rhizome yield. Ken Tanaka, Masanori Arita, Hiroaki Sakurai, Naoaki Ono, and Yasuhiro Tezuka Copyright © 2015 Ken Tanaka et al. All rights reserved. EMRlog Method for Computer Security for Electronic Medical Records with Logic and Data Mining Thu, 01 Oct 2015 13:04:50 +0000 The proper functioning of a hospital computer system is an arduous work for managers and staff. However, inconsistent policies are frequent and can produce enormous problems, such as stolen information, frequent failures, and loss of the entire or part of the hospital data. This paper presents a new method named EMRlog for computer security systems in hospitals. EMRlog is focused on two kinds of security policies: directive and implemented policies. Security policies are applied to computer systems that handle huge amounts of information such as databases, applications, and medical records. Firstly, a syntactic verification step is applied by using predicate logic. Then data mining techniques are used to detect which security policies have really been implemented by the computer systems staff. Subsequently, consistency is verified in both kinds of policies; in addition these subsets are contrasted and validated. This is performed by an automatic theorem prover. Thus, many kinds of vulnerabilities can be removed for achieving a safer computer system. Sergio Mauricio Martínez Monterrubio, Juan Frausto Solis, and Raúl Monroy Borja Copyright © 2015 Sergio Mauricio Martínez Monterrubio et al. All rights reserved. Cellular Metabolic Network Analysis: Discovering Important Reactions in Treponema pallidum Thu, 01 Oct 2015 11:46:40 +0000 T. pallidum, the syphilis-causing pathogen, performs very differently in metabolism compared with other bacterial pathogens. The desire for safe and effective vaccine of syphilis requests identification of important steps in T. pallidum’s metabolism. Here, we apply Flux Balance Analysis to represent the reactions quantitatively. Thus, it is possible to cluster all reactions in T. pallidum. By calculating minimal cut sets and analyzing topological structure for the metabolic network of T. pallidum, critical reactions are identified. As a comparison, we also apply the analytical approaches to the metabolic network of H. pylori to find coregulated drug targets and unique drug targets for different microorganisms. Based on the clustering results, all reactions are further classified into various roles. Therefore, the general picture of their metabolic network is obtained and two types of reactions, both of which are involved in nucleic acid metabolism, are found to be essential for T. pallidum. It is also discovered that both hubs of reactions and the isolated reactions in purine and pyrimidine metabolisms play important roles in T. pallidum. These reactions could be potential drug targets for treating syphilis. Xueying Chen, Min Zhao, and Hong Qu Copyright © 2015 Xueying Chen et al. All rights reserved. Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data Thu, 01 Oct 2015 07:26:59 +0000 With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data. Akihito Kikuchi, Toshimichi Ikemura, and Takashi Abe Copyright © 2015 Akihito Kikuchi et al. All rights reserved. Discovering Distinct Functional Modules of Specific Cancer Types Using Protein-Protein Interaction Networks Thu, 01 Oct 2015 07:05:17 +0000 Background. The molecular profiles exhibited in different cancer types are very different; hence, discovering distinct functional modules associated with specific cancer types is very important to understand the distinct functions associated with them. Protein-protein interaction networks carry vital information about molecular interactions in cellular systems, and identification of functional modules (subgraphs) in these networks is one of the most important applications of biological network analysis. Results. In this study, we developed a new graph theory based method to identify distinct functional modules from nine different cancer protein-protein interaction networks. The method is composed of three major steps: (i) extracting modules from protein-protein interaction networks using network clustering algorithms; (ii) identifying distinct subgraphs from the derived modules; and (iii) identifying distinct subgraph patterns from distinct subgraphs. The subgraph patterns were evaluated using experimentally determined cancer-specific protein-protein interaction data from the Ingenuity knowledgebase, to identify distinct functional modules that are specific to each cancer type. Conclusion. We identified cancer-type specific subgraph patterns that may represent the functional modules involved in the molecular pathogenesis of different cancer types. Our method can serve as an effective tool to discover cancer-type specific functional modules from large protein-protein interaction networks. Ru Shen, Xiaosheng Wang, and Chittibabu Guda Copyright © 2015 Ru Shen et al. All rights reserved. Development and Mining of a Volatile Organic Compound Database Thu, 01 Oct 2015 06:59:32 +0000 Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online. Azian Azamimi Abdullah, Md. Altaf-Ul-Amin, Naoaki Ono, Tetsuo Sato, Tadao Sugiura, Aki Hirai Morita, Tetsuo Katsuragi, Ai Muto, Takaaki Nishioka, and Shigehiko Kanaya Copyright © 2015 Azian Azamimi Abdullah et al. All rights reserved. Systematic Analysis of the Associations between Adverse Drug Reactions and Pathways Thu, 01 Oct 2015 06:52:17 +0000 Adverse drug reactions (ADRs) are responsible for drug candidate failure during clinical trials. It is crucial to investigate biological pathways contributing to ADRs. Here, we applied a large-scale analysis to identify overrepresented ADR-pathway combinations through merging clinical phenotypic data, biological pathway data, and drug-target relations. Evaluation was performed by scientific literature review and defining a pathway-based ADR-ADR similarity measure. The results showed that our method is efficient for finding the associations between ADRs and pathways. To more systematically understand the mechanisms of ADRs, we constructed an ADR-pathway network and an ADR-ADR network. Through network analysis on biology and pharmacology, it was found that frequent ADRs were associated with more pathways than infrequent and rare ADRs. Moreover, environmental information processing pathways contributed most to the observed ADRs. Integrating the system organ class of ADRs, we found that most classes tended to interact with other classes instead of themselves. ADR classes were distributed promiscuously in all the ADR cliques. These results reflected that drug perturbation to a certain pathway can cause changes in multiple organs, rather than in one specific organ. Our work not only provides a global view of the associations between ADRs and pathways, but also is helpful to understand the mechanisms of ADRs. Xiaowen Chen, Yanqiu Wang, Pingping Wang, Baofeng Lian, Chunquan Li, Jing Wang, Xia Li, and Wei Jiang Copyright © 2015 Xiaowen Chen et al. All rights reserved. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text Thu, 01 Oct 2015 06:50:59 +0000 The substrates of a transporter are not only useful for inferring function of the transporter, but also important to discover compound-compound interaction and to reconstruct metabolic pathway. Though plenty of data has been accumulated with the developing of new technologies such as in vitro transporter assays, the search for substrates of transporters is far from complete. In this article, we introduce METSP, a maximum-entropy classifier devoted to retrieve transporter-substrate pairs (TSPs) from semistructured text. Based on the high quality annotation from UniProt, METSP achieves high precision and recall in cross-validation experiments. When METSP is applied to 182,829 human transporter annotation sentences in UniProt, it identifies 3942 sentences with transporter and compound information. Finally, 1547 confidential human TSPs are identified for further manual curation, among which 58.37% pairs with novel substrates not annotated in public transporter databases. METSP is the first efficient tool to extract TSPs from semistructured annotation text in UniProt. This tool can help to determine the precise substrates and drugs of transporters, thus facilitating drug-target prediction, metabolic network reconstruction, and literature classification. Min Zhao, Yanming Chen, Dacheng Qu, and Hong Qu Copyright © 2015 Min Zhao et al. All rights reserved. A Glimpse to Background and Characteristics of Major Molecular Biological Networks Wed, 30 Sep 2015 13:30:47 +0000 Recently, biology has become a data intensive science because of huge data sets produced by high throughput molecular biological experiments in diverse areas including the fields of genomics, transcriptomics, proteomics, and metabolomics. These huge datasets have paved the way for system-level analysis of the processes and subprocesses of the cell. For system-level understanding, initially the elements of a system are connected based on their mutual relations and a network is formed. Among omics researchers, construction and analysis of biological networks have become highly popular. In this review, we briefly discuss both the biological background and topological properties of major types of omics networks to facilitate a comprehensive understanding and to conceptualize the foundation of network biology. Md. Altaf-Ul-Amin, Tetsuo Katsuragi, Tetsuo Sato, and Shigehiko Kanaya Copyright © 2015 Md. Altaf-Ul-Amin et al. All rights reserved. Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data Mon, 07 Sep 2015 06:56:22 +0000 Guohua Wang, Yunlong Liu, Dongxiao Zhu, Gunnar W. Klau, and Weixing Feng Copyright © 2015 Guohua Wang et al. All rights reserved. MicroRNA Promoter Identification in Arabidopsis Using Multiple Histone Markers Thu, 03 Sep 2015 13:14:36 +0000 A microRNA is a small noncoding RNA molecule, which functions in RNA silencing and posttranscriptional regulation of gene expression. To understand the mechanism of the activation of microRNA genes, the location of promoter regions driving their expression is required to be annotated precisely. Only a fraction of microRNA genes have confirmed transcription start sites (TSSs), which hinders our understanding of the transcription factor binding events. With the development of the next generation sequencing technology, the chromatin states can be inferred precisely by virtue of a combination of specific histone modifications. Using the genome-wide profiles of nine histone markers including H3K4me2, H3K4me3, H3K9Ac, H3K9me2, H3K18Ac, H3K27me1, H3K27me3, H3K36me2, and H3K36me3, we developed a computational strategy to identify the promoter regions of most microRNA genes in Arabidopsis, based upon the assumption that the distribution of histone markers around the TSSs of microRNA genes is similar to the TSSs of protein coding genes. Among 298 miRNA genes, our model identified 42 independent miRNA TSSs and 132 miRNA TSSs, which are located in the promoters of upstream genes. The identification of promoters will provide better understanding of microRNA regulation and can play an important role in the study of diseases at genetic level. Yuming Zhao, Fang Wang, and Liran Juan Copyright © 2015 Yuming Zhao et al. All rights reserved. Constructing a Genome-Wide LD Map of Wild A. gambiae Using Next-Generation Sequencing Thu, 03 Sep 2015 13:11:49 +0000 Anopheles gambiae is the major malaria vector in Africa. Examining the molecular basis of A. gambiae traits requires knowledge of both genetic variation and genome-wide linkage disequilibrium (LD) map of wild A. gambiae populations from malaria-endemic areas. We sequenced the genomes of nine wild A. gambiae mosquitoes individually using next-generation sequencing technologies and detected 2,219,815 common single nucleotide polymorphisms (SNPs), 88% of which are novel. SNPs are not evenly distributed across A. gambiae chromosomes. The low SNP-frequency regions overlay heterochromatin and chromosome inversion domains, consistent with the lower recombinant rates at these regions. Nearly one million SNPs that were genotyped correctly in all individual mosquitoes with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, we constructed a genome-wide LD map for wild A. gambiae from malaria-endemic areas in Kenya and made it available through a public Website. The average size of LD blocks is less than 40 bp, and several large LD blocks were also discovered clustered around the para gene, which is consistent with the effect of insecticide selective sweeps. The SNPs and the LD map will be valuable resources for scientific communities to dissect the A. gambiae genome. Xiaohong Wang, Yaw A. Afrane, Guiyun Yan, and Jun Li Copyright © 2015 Xiaohong Wang et al. All rights reserved. Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico Thu, 03 Sep 2015 11:55:27 +0000 Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS. Feng Min, Sumei Wang, and Li Zhang Copyright © 2015 Feng Min et al. All rights reserved. Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites Thu, 03 Sep 2015 09:24:16 +0000 Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5–20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation. Guohua Wang, Fang Wang, Qian Huang, Yu Li, Yunlong Liu, and Yadong Wang Copyright © 2015 Guohua Wang et al. All rights reserved. Active Microbial Communities Inhabit Sulphate-Methane Interphase in Deep Bedrock Fracture Fluids in Olkiluoto, Finland Thu, 03 Sep 2015 09:23:57 +0000 Active microbial communities of deep crystalline bedrock fracture water were investigated from seven different boreholes in Olkiluoto (Western Finland) using bacterial and archaeal 16S rRNA, dsrB, and mcrA gene transcript targeted 454 pyrosequencing. Over a depth range of 296–798 m below ground surface the microbial communities changed according to depth, salinity gradient, and sulphate and methane concentrations. The highest bacterial diversity was observed in the sulphate-methane mixing zone (SMMZ) at 250–350 m depth, whereas archaeal diversity was highest in the lowest boundaries of the SMMZ. Sulphide-oxidizing ε-proteobacteria (Sulfurimonas sp.) dominated in the SMMZ and γ-proteobacteria (Pseudomonas spp.) below the SMMZ. The active archaeal communities consisted mostly of ANME-2D and Thermoplasmatales groups, although Methermicoccaceae, Methanobacteriaceae, and Thermoplasmatales (SAGMEG, TMG) were more common at 415–559 m depth. Typical indicator microorganisms for sulphate-methane transition zones in marine sediments, such as ANME-1 archaea, α-, β- and δ-proteobacteria, JS1, Actinomycetes, Planctomycetes, Chloroflexi, and MBGB Crenarchaeota were detected at specific depths. DsrB genes were most numerous and most actively transcribed in the SMMZ while the mcrA gene concentration was highest in the deep methane rich groundwater. Our results demonstrate that active and highly diverse but sparse and stratified microbial communities inhabit the Fennoscandian deep bedrock ecosystems. Malin Bomberg, Mari Nyyssönen, Petteri Pitkänen, Anne Lehtinen, and Merja Itävaara Copyright © 2015 Malin Bomberg et al. All rights reserved. 454-Pyrosequencing Analysis of Bacterial Communities from Autotrophic Nitrogen Removal Bioreactors Utilizing Universal Primers: Effect of Annealing Temperature Thu, 03 Sep 2015 09:22:02 +0000 Identification of anaerobic ammonium oxidizing (anammox) bacteria by molecular tools aimed at the evaluation of bacterial diversity in autotrophic nitrogen removal systems is limited by the difficulty to design universal primers for the Bacteria domain able to amplify the anammox 16S rRNA genes. A metagenomic analysis (pyrosequencing) of total bacterial diversity including anammox population in five autotrophic nitrogen removal technologies, two bench-scale models (MBR and Low Temperature CANON) and three full-scale bioreactors (anammox, CANON, and DEMON), was successfully carried out by optimization of primer selection and PCR conditions (annealing temperature). The universal primer 530F was identified as the best candidate for total bacteria and anammox bacteria diversity coverage. Salt-adjusted optimum annealing temperature of primer 530F was calculated (47°C) and hence a range of annealing temperatures of 44–49°C was tested. Pyrosequencing data showed that annealing temperature of 45°C yielded the best results in terms of species richness and diversity for all bioreactors analyzed. Alejandro Gonzalez-Martinez, Alejandro Rodriguez-Sanchez, Belén Rodelas, Ben A. Abbas, Maria Victoria Martinez-Toledo, Mark C. M. van Loosdrecht, F. Osorio, and Jesus Gonzalez-Lopez Copyright © 2015 Alejandro Gonzalez-Martinez et al. All rights reserved. mmnet: An R Package for Metagenomics Systems Biology Analysis Thu, 03 Sep 2015 09:22:02 +0000 The human microbiome plays important roles in human health and disease. Previous microbiome studies focused mainly on single pure species function and overlooked the interactions in the complex communities on system-level. A metagenomic approach introduced recently integrates metagenomic data with community-level metabolic network modeling, but no comprehensive tool was available for such kind of approaches. To facilitate these kinds of studies, we developed an R package, mmnet, to implement community-level metabolic network reconstruction. The package also implements a set of functions for automatic analysis pipeline construction including functional annotation of metagenomic reads, abundance estimation of enzymatic genes, community-level metabolic network reconstruction, and integrated network analysis. The result can be represented in an intuitive way and sent to Cytoscape for further exploration. The package has substantial potentials in metagenomic studies that focus on identifying system-level variations of human microbiome associated with disease. Yang Cao, Xiaofei Zheng, Fei Li, and Xiaochen Bo Copyright © 2015 Yang Cao et al. All rights reserved. Genetic Interactions Explain Variance in Cingulate Amyloid Burden: An AV-45 PET Genome-Wide Association and Interaction Study in the ADNI Cohort Thu, 03 Sep 2015 09:20:58 +0000 Alzheimer’s disease (AD) is the most common neurodegenerative disorder. Using discrete disease status as the phenotype and computing statistics at the single marker level may not be able to address the underlying biological interactions that contribute to disease mechanism and may contribute to the issue of “missing heritability.” We performed a genome-wide association study (GWAS) and a genome-wide interaction study (GWIS) of an amyloid imaging phenotype, using the data from Alzheimer’s Disease Neuroimaging Initiative. We investigated the genetic main effects and interaction effects on cingulate amyloid-beta (A) load in an effort to better understand the genetic etiology of A deposition that is a widely studied AD biomarker. PLINK was used in the single marker GWAS, and INTERSNP was used to perform the two-marker GWIS, focusing only on SNPs with for the GWAS analysis. Age, sex, and diagnosis were used as covariates in both analyses. Corrected p values using the Bonferroni method were reported. The GWAS analysis revealed significant hits within or proximal to APOE, APOC1, and TOMM40 genes, which were previously implicated in AD. The GWIS analysis yielded 8 novel SNP-SNP interaction findings that warrant replication and further investigation. Jin Li, Qiushi Zhang, Feng Chen, Jingwen Yan, Sungeun Kim, Lei Wang, Weixing Feng, Andrew J. Saykin, Hong Liang, and Li Shen Copyright © 2015 Jin Li et al. All rights reserved. How to Isolate a Plant’s Hypomethylome in One Shot Thu, 03 Sep 2015 09:14:51 +0000 Genome assembly remains a challenge for large and/or complex plant genomes due to their abundant repetitive regions resulting in studies focusing on gene space instead of the whole genome. Thus, DNA enrichment strategies facilitate the assembly by increasing the coverage and simultaneously reducing the complexity of the whole genome. In this paper we provide an easy, fast, and cost-effective variant of MRE-seq to obtain a plant’s hypomethylome by an optimized methyl filtration protocol followed by next generation sequencing. The method is demonstrated on three plant species with knowingly large and/or complex (polyploid) genomes: Oryza sativa, Picea abies, and Crocus sativus. The identified hypomethylomes show clear enrichment for genes and their flanking regions and clear reduction of transposable elements. Additionally, genomic sequences around genes are captured including regulatory elements in introns and up- and downstream flanks. High similarity of the results obtained by a de novo assembly approach with a reference based mapping in rice supports the applicability for studying and understanding the genomes of nonmodel organisms. Hence we show the high potential of MRE-seq in a wide range of scenarios for the direct analysis of methylation differences, for example, between ecotypes, individuals, within or across species harbouring large, and complex genomes. Elisabeth Wischnitzki, Eva Maria Sehr, Karin Hansel-Hohl, Maria Berenyi, Kornel Burg, and Silvia Fluch Copyright © 2015 Elisabeth Wischnitzki et al. All rights reserved. Data Acquisition and Processing in Biology and Medicine Wed, 26 Aug 2015 10:13:46 +0000 Cheng-Hong Yang, Yu-Jie Huang, An Liu, Yi Rong, and Tsair-Fwu Lee Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. The Combinational Polymorphisms of ORAI1 Gene Are Associated with Preventive Models of Breast Cancer in the Taiwanese Tue, 25 Aug 2015 14:02:28 +0000 The ORAI calcium release-activated calcium modulator 1 (ORAI1) has been proven to be an important gene for breast cancer progression and metastasis. However, the protective association model between the single nucleotide polymorphisms (SNPs) of ORAI1 gene was not investigated. Based on a published data set of 345 female breast cancer patients and 290 female controls, we used a particle swarm optimization (PSO) algorithm to identify the possible protective models of breast cancer association in terms of the SNPs of ORAI1 gene. Results showed that the PSO-generated models of 2-SNP (rs12320939-TT/rs12313273-CC), 3-SNP (rs12320939-TT/rs12313273-CC/rs712853-(TT/TC)), 4-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs712853-(TT/TC)), and 5-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs6486795-CC/rs712853-(TT/TC)) displayed low values of odds ratios (0.409–0.425) for breast cancer association. Taken together, these results suggested that our proposed PSO strategy is powerful to identify the combinational SNPs of rs12320939, rs12313273, rs7135617, rs6486795, and rs712853 of ORAI1 gene with a strongly protective association in breast cancer. Fu Ou-Yang, Yu-Da Lin, Li-Yeh Chuang, Hsueh-Wei Chang, Cheng-Hong Yang, and Ming-Feng Hou Copyright © 2015 Fu Ou-Yang et al. All rights reserved. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information Tue, 25 Aug 2015 08:22:17 +0000 Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. Chi Zhang, Li Tong, Ying Zeng, Jingfang Jiang, Haibing Bu, Bin Yan, and Jianxin Li Copyright © 2015 Chi Zhang et al. All rights reserved. Tennis Elbow Diagnosis Using Equivalent Uniform Voltage to Fit the Logistic and the Probit Diseased Probability Models Tue, 25 Aug 2015 07:46:17 +0000 To develop the logistic and the probit models to analyse electromyographic (EMG) equivalent uniform voltage- (EUV-) response for the tenderness of tennis elbow. In total, 78 hands from 39 subjects were enrolled. In this study, surface EMG (sEMG) signal is obtained by an innovative device with electrodes over forearm region. The analytical endpoint was defined as Visual Analog Score (VAS) 3+ tenderness of tennis elbow. The logistic and the probit diseased probability (DP) models were established for the VAS score and EMG absolute voltage-time histograms (AVTH). TV50 is the threshold equivalent uniform voltage predicting a 50% risk of disease. Twenty-one out of 78 samples (27%) developed VAS 3+ tenderness of tennis elbow reported by the subject and confirmed by the physician. The fitted DP parameters were TV50 = 153.0 mV (CI: 136.3–169.7 mV), γ50 = 0.84 (CI: 0.78–0.90) and TV50 = 155.6 mV (CI: 138.9–172.4 mV), m = 0.54 (CI: 0.49–0.59) for logistic and probit models, respectively. When the EUV ≥ 153 mV, the DP of the patient is greater than 50% and vice versa. The logistic and the probit models are valuable tools to predict the DP of VAS 3+ tenderness of tennis elbow. Tsair-Fwu Lee, Wei-Chun Lin, Hung-Yu Wang, Shu-Yuan Lin, Li-Fu Wu, Shih-Sian Guo, Hsiang-Jui Huang, Hui-Min Ting, and Pei-Ju Chao Copyright © 2015 Tsair-Fwu Lee et al. All rights reserved. A Data Hiding Technique to Synchronously Embed Physiological Signals in H.264/AVC Encoded Video for Medicine Healthcare Tue, 25 Aug 2015 07:45:41 +0000 The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal’s samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of −2.6196% for high and medium motion video sequences. Raul Peña, Alfonso Ávila, David Muñoz, and Juan Lavariega Copyright © 2015 Raul Peña et al. All rights reserved. Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation Mon, 24 Aug 2015 14:18:37 +0000 This paper presents a data-driven multiscale entropy measure to reveal the scale dependent information quantity of electroencephalogram (EEG) recordings. This work is motivated by the previous observations on the nonlinear and nonstationary nature of EEG over multiple time scales. Here, a new framework of entropy measures considering changing dynamics over multiple oscillatory scales is presented. First, to deal with nonstationarity over multiple scales, EEG recording is decomposed by applying the empirical mode decomposition (EMD) which is known to be effective for extracting the constituent narrowband components without a predetermined basis. Following calculation of Renyi entropy of the probability distributions of the intrinsic mode functions extracted by EMD leads to a data-driven multiscale Renyi entropy. To validate the performance of the proposed entropy measure, actual EEG recordings from rats experiencing 7 min cardiac arrest followed by resuscitation were analyzed. Simulation and experimental results demonstrate that the use of the multiscale Renyi entropy leads to better discriminative capability of the injury levels and improved correlations with the neurological deficit evaluation after 72 hours after cardiac arrest, thus suggesting an effective diagnostic and prognostic tool. Young-Seok Choi Copyright © 2015 Young-Seok Choi. All rights reserved. Gene Network Analysis of Glucose Linked Signaling Pathways and Their Role in Human Hepatocellular Carcinoma Cell Growth and Survival in HuH7 and HepG2 Cell Lines Mon, 24 Aug 2015 11:19:10 +0000 Cancer progression may be affected by metabolism. In this study, we aimed to analyze the effect of glucose on the proliferation and/or survival of human hepatocellular carcinoma (HCC) cells. Human gene datasets regulated by glucose were compared to gene datasets either dysregulated in HCC or regulated by other signaling pathways. Significant numbers of common genes suggested putative involvement in transcriptional regulations by glucose. Real-time proliferation assays using high (4.5 g/L) versus low (1 g/L) glucose on two human HCC cell lines and specific inhibitors of selected pathways were used for experimental validations. High glucose promoted HuH7 cell proliferation but not that of HepG2 cell line. Gene network analyses suggest that gene transcription by glucose could be mediated at 92% through ChREBP in HepG2 cells, compared to 40% in either other human cells or rodent healthy liver, with alteration of LKB1 (serine/threonine kinase 11) and NOX (NADPH oxidases) signaling pathways and loss of transcriptional regulation of PPARGC1A (peroxisome-proliferator activated receptors gamma coactivator 1) target genes by high glucose. Both PPARA and PPARGC1A regulate transcription of genes commonly regulated by glycolysis, by the antidiabetic agent metformin and by NOX, suggesting their major interplay in the control of HCC progression. Emmanuelle Berger, Nathalie Vega, Michèle Weiss-Gayet, and Alain Géloën Copyright © 2015 Emmanuelle Berger et al. All rights reserved. Applying NGS Data to Find Evolutionary Network Biomarkers from the Early and Late Stages of Hepatocellular Carcinoma Thu, 20 Aug 2015 07:07:01 +0000 Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. In this study, we used a systems biology approach to construct protein-protein interaction networks (PPINs) for early-stage and late-stage liver cancer. By comparing the networks of these two stages, we found that the two networks showed some common mechanisms and some significantly different mechanisms. To obtain differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two stages of liver cancer by systems biology method using NGS data from cancer cells and adjacent noncancer cells. Using carcinogenesis relevance values (CRVs), we identified 43 and 80 significant proteins and their PPINs (network markers) for early-stage and late-stage liver cancer. To investigate the evolution of network biomarkers in the carcinogenesis process, a primary pathway analysis showed that common pathways of the early and late stages were those related to ordinary cancer mechanisms. A pathway specific to the early stage was the mismatch repair pathway, while pathways specific to the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This study provides a new direction for cancer-targeted therapies at different stages. Yung-Hao Wong, Chia-Chou Wu, Chih-Lung Lin, Ting-Shou Chen, Tzu-Hao Chang, and Bor-Sen Chen Copyright © 2015 Yung-Hao Wong et al. All rights reserved. The ABCC6 Transporter as a Paradigm for Networking from an Orphan Disease to Complex Disorders Tue, 18 Aug 2015 09:35:55 +0000 The knowledge on the genetic etiology of complex disorders largely results from the study of rare monogenic disorders. Often these common and rare diseases show phenotypic overlap, though monogenic diseases generally have a more extreme symptomatology. ABCC6, the gene responsible for pseudoxanthoma elasticum, an autosomal recessive ectopic mineralization disorder, can be considered a paradigm gene with relevance that reaches far beyond this enigmatic orphan disease. Indeed, common traits such as chronic kidney disease or cardiovascular disorders have been linked to the ABCC6 gene. While during the last decade the awareness of the wide ramifications of ABCC6 has increased significantly, the gene itself and the transmembrane transporter it encodes have not unveiled all of the mysteries that surround them. To gain more insights, multiple approaches are being used including next-generation sequencing, computational methods, and various “omics” technologies. Much effort is made to place the vast amount of data that is gathered in an integrated system-biological network; the involvement of ABCC6 in common disorders provides a good view on the wide implications and potential of such a network. In this review, we summarize the network approaches used to study ABCC6 and the role of this gene in several complex diseases. Eva Y. G. De Vilder, Mohammad Jakir Hosen, and Olivier M. Vanakker Copyright © 2015 Eva Y. G. De Vilder et al. All rights reserved. An Affinity Propagation-Based DNA Motif Discovery Algorithm Mon, 10 Aug 2015 09:57:56 +0000 The planted motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy. Chunxiao Sun, Hongwei Huo, Qiang Yu, Haitao Guo, and Zhigang Sun Copyright © 2015 Chunxiao Sun et al. All rights reserved. Statistical Analysis of High-Dimensional Genetic Data in Complex Traits Tue, 04 Aug 2015 14:52:57 +0000 Taesung Park, Kristel Van Steen, Xiang-Yang Lou, and Momiao Xiong Copyright © 2015 Taesung Park et al. All rights reserved. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data Tue, 04 Aug 2015 11:27:26 +0000 Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called “large P and small N” problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration. Sungho Won, Hosik Choi, Suyeon Park, Juyoung Lee, Changyi Park, and Sunghoon Kwon Copyright © 2015 Sungho Won et al. All rights reserved. Detection of Epistatic and Gene-Environment Interactions Underlying Three Quality Traits in Rice Using High-Throughput Genome-Wide Data Tue, 04 Aug 2015 11:23:29 +0000 With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been available, enabling uncovering genetic architecture of complex traits by genome-wide association study (GWAS). However, the current GWAS strategy usually ignores epistatic and gene-environment interactions due to absence of appropriate methodology and heavy computational burden. This study proposed a new GWAS strategy by combining the graphics processing unit- (GPU-) based generalized multifactor dimensionality reduction (GMDR) algorithm with mixed linear model approach. The reliability and efficiency of the analytical methods were verified through Monte Carlo simulations, suggesting that a population size of nearly 150 recombinant inbred lines (RILs) had a reasonable resolution for the scenarios considered. Further, a GWAS was conducted with the above two-step strategy to investigate the additive, epistatic, and gene-environment associations between 701,867 SNPs and three important quality traits, gelatinization temperature, amylose content, and gel consistency, in a RIL population with 138 individuals derived from super-hybrid rice Xieyou9308 in two environments. Four significant SNPs were identified with additive, epistatic, and gene-environment interaction effects. Our study showed that the mixed linear model approach combining with the GPU-based GMDR algorithm is a feasible strategy for implementing GWAS to uncover genetic architecture of crop complex traits. Haiming Xu, Beibei Jiang, Yujie Cao, Yingxin Zhang, Xiaodeng Zhan, Xihong Shen, Shihua Cheng, Xiangyang Lou, and Liyong Cao Copyright © 2015 Haiming Xu et al. All rights reserved. Systems Biology Approaches to Mining High Throughput Biological Data Tue, 04 Aug 2015 11:22:20 +0000 Fang-Xiang Wu, Min Li, Jishou Ruan, and Feng Luo Copyright © 2015 Fang-Xiang Wu et al. All rights reserved. Dynamic Model for RNA-seq Data Analysis Tue, 04 Aug 2015 11:20:34 +0000 By measuring messenger RNA levels for all genes in a sample, RNA-seq provides an attractive option to characterize the global changes in transcription. RNA-seq is becoming the widely used platform for gene expression profiling. However, real transcription signals in the RNA-seq data are confounded with measurement and sequencing errors and other random biological/technical variation. To extract biologically useful transcription process from the RNA-seq data, we propose to use the second ODE for modeling the RNA-seq data. We use differential principal analysis to develop statistical methods for estimation of location-varying coefficients of the ODE. We validate the accuracy of the ODE model to fit the RNA-seq data by prediction analysis and 5-fold cross validation. To further evaluate the performance of the ODE model for RNA-seq data analysis, we used the location-varying coefficients of the second ODE as features to classify the normal and tumor cells. We demonstrate that even using the ODE model for single gene we can achieve high classification accuracy. We also conduct response analysis to investigate how the transcription process responds to the perturbation of the external signals and identify dozens of genes that are related to cancer. Lerong Li and Momiao Xiong Copyright © 2015 Lerong Li and Momiao Xiong. All rights reserved. Robust Association Tests for the Replication of Genome-Wide Association Studies Tue, 04 Aug 2015 11:15:36 +0000 In genome-wide association study (GWAS), robust genetic association tests such as maximum of three CATTs (MAX3), each corresponding to recessive, additive, and dominant genetic models, the minimum p value of Pearson’s Chi-square test with 2 degrees of freedom, and CATT based on additive genetic model (MIN2), genetic model selection (GMS), and genetic model exclusion (GME) methods have been shown to provide better power performance under wide range of underlying genetic models. In this paper, we demonstrate how these robust tests can be applied to the replication study of GWAS and how the overall statistical significance can be evaluated using the combined test formed by p values of the discovery and replication studies. Jungnam Joo, Ju-Hyun Park, Bora Lee, Boram Park, Sohee Kim, Kyong-Ah Yoon, Jin Soo Lee, and Nancy L. Geller Copyright © 2015 Jungnam Joo et al. All rights reserved. Clique-Based Clustering of Correlated SNPs in a Gene Can Improve Performance of Gene-Based Multi-Bin Linear Combination Test Tue, 04 Aug 2015 10:59:47 +0000 Gene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster. The test has degrees of freedom equal to the number of clusters and can be more powerful than a fully quadratic or fully linear test statistic. In this study, we develop a new SNP clustering algorithm designed to find cliques, which are complete subnetworks of SNPs with all pairwise correlations above a threshold. We evaluate the performance of the MLC test using the clique-based CLQ algorithm versus using the tag-SNP-based LDSelect algorithm. In our numerical power calculations we observed that the two clustering algorithms produce identical clusters about 40~60% of the time, yielding similar power on average. However, because the CLQ algorithm tends to produce smaller clusters with stronger positive correlation, the MLC test is less likely to be affected by the occurrence of opposing signs in the individual SNP effect coefficients. Yun Joo Yoo, Sun Ah Kim, and Shelley B. Bull Copyright © 2015 Yun Joo Yoo et al. All rights reserved. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population Mon, 03 Aug 2015 13:21:57 +0000 Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. Woojoo Lee, Andrey Alexeyenko, Maria Pernemalm, Justine Guegan, Philippe Dessen, Vladimir Lazar, Janne Lehtiö, and Yudi Pawitan Copyright © 2015 Woojoo Lee et al. All rights reserved. Detecting Genetic Interactions for Quantitative Traits Using -Spacing Entropy Measure Mon, 03 Aug 2015 13:10:36 +0000 A number of statistical methods for detecting gene-gene interactions have been developed in genetic association studies with binary traits. However, many phenotype measures are intrinsically quantitative and categorizing continuous traits may not always be straightforward and meaningful. Association of gene-gene interactions with an observed distribution of such phenotypes needs to be investigated directly without categorization. Information gain based on entropy measure has previously been successful in identifying genetic associations with binary traits. We extend the usefulness of this information gain by proposing a nonparametric evaluation method of conditional entropy of a quantitative phenotype associated with a given genotype. Hence, the information gain can be obtained for any phenotype distribution. Because any functional form, such as Gaussian, is not assumed for the entire distribution of a trait or a given genotype, this method is expected to be robust enough to be applied to any phenotypic association data. Here, we show its use to successfully identify the main effect, as well as the genetic interactions, associated with a quantitative trait. Jaeyong Yee, Min-Seok Kwon, Seohoon Jin, Taesung Park, and Mira Park Copyright © 2015 Jaeyong Yee et al. All rights reserved. A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype Mon, 03 Aug 2015 13:06:31 +0000 Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies. Seungyeoun Lee, Yongkang Kim, Min-Seok Kwon, and Taesung Park Copyright © 2015 Seungyeoun Lee et al. All rights reserved. On the Estimation of Heritability with Family-Based and Population-Based Samples Mon, 03 Aug 2015 13:00:35 +0000 For a family-based sample, the phenotypic variance-covariance matrix can be parameterized to include the variance of a polygenic effect that has then been estimated using a variance component analysis. However, with the advent of large-scale genomic data, the genetic relationship matrix (GRM) can be estimated and can be utilized to parameterize the variance of a polygenic effect for population-based samples. Therefore narrow sense heritability, which is both population and trait specific, can be estimated with both population- and family-based samples. In this study we estimate heritability from both family-based and population-based samples, collected in Korea, and the heritability estimates from the pooled samples were, for height, 0.60; body mass index (BMI), 0.32; log-transformed triglycerides (log TG), 0.24; total cholesterol (TCHL), 0.30; high-density lipoprotein (HDL), 0.38; low-density lipoprotein (LDL), 0.29; systolic blood pressure (SBP), 0.23; and diastolic blood pressure (DBP), 0.24. Furthermore, we found differences in how heritability is estimated—in particular the amount of variance attributable to common environment in twins can be substantial—which indicates heritability estimates should be interpreted with caution. Youngdoe Kim, Young Lee, Sungyoung Lee, Nam Hee Kim, Jeongmin Lim, Young Jin Kim, Ji Hee Oh, Haesook Min, Meehee Lee, Hyeon-Jeong Seo, So-Hyun Lee, Joohon Sung, Nam H. Cho, Bong-Jo Kim, Bok-Ghee Han, Robert C. Elston, Sungho Won, and Juyoung Lee Copyright © 2015 Youngdoe Kim et al. All rights reserved. Differential Expression Analysis in RNA-Seq by a Naive Bayes Classifier with Local Normalization Mon, 03 Aug 2015 11:48:07 +0000 To improve the applicability of RNA-seq technology, a large number of RNA-seq data analysis methods and correction algorithms have been developed. Although these new methods and algorithms have steadily improved transcriptome analysis, greater prediction accuracy is needed to better guide experimental designs with computational results. In this study, a new tool for the identification of differentially expressed genes with RNA-seq data, named GExposer, was developed. This tool introduces a local normalization algorithm to reduce the bias of nonrandomly positioned read depth. The naive Bayes classifier is employed to integrate fold change, transcript length, and GC content to identify differentially expressed genes. Results on several independent tests show that GExposer has better performance than other methods. The combination of the local normalization algorithm and naive Bayes classifier with three attributes can achieve better results; both false positive rates and false negative rates are reduced. However, only a small portion of genes is affected by the local normalization and GC content correction. Yongchao Dou, Xiaomei Guo, Lingling Yuan, David R. Holding, and Chi Zhang Copyright © 2015 Yongchao Dou et al. All rights reserved. -Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data Mon, 03 Aug 2015 11:22:13 +0000 With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear -profiles clustering method, which can be seen as the nonlinear counterpart of the -means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that -profiles clustering not only outperformed traditional linear -means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which -profile clustering generated biologically meaningful results. Kai Wang, Qing Zhao, Jianwei Lu, and Tianwei Yu Copyright © 2015 Kai Wang et al. All rights reserved. ProSim: A Method for Prioritizing Disease Genes Based on Protein Proximity and Disease Similarity Mon, 03 Aug 2015 10:49:13 +0000 Predicting disease genes for a particular genetic disease is very challenging in bioinformatics. Based on current research studies, this challenge can be tackled via network-based approaches. Furthermore, it has been highlighted that it is necessary to consider disease similarity along with the protein’s proximity to disease genes in a protein-protein interaction (PPI) network in order to improve the accuracy of disease gene prioritization. In this study we propose a new algorithm called proximity disease similarity algorithm (ProSim), which takes both of the aforementioned properties into consideration, to prioritize disease genes. To illustrate the proposed algorithm, we have conducted six case studies, namely, prostate cancer, Alzheimer’s disease, diabetes mellitus type 2, breast cancer, colorectal cancer, and lung cancer. We employed leave-one-out cross validation, mean enrichment, tenfold cross validation, and ROC curves to evaluate our proposed method and other existing methods. The results show that our proposed method outperforms existing methods such as PRINCE, RWR, and DADA. Gamage Upeksha Ganegoda, Yu Sheng, and Jianxin Wang Copyright © 2015 Gamage Upeksha Ganegoda et al. All rights reserved. Screening Ingredients from Herbs against Pregnane X Receptor in the Study of Inductive Herb-Drug Interactions: Combining Pharmacophore and Docking-Based Rank Aggregation Mon, 03 Aug 2015 09:57:17 +0000 The issue of herb-drug interactions has been widely reported. Herbal ingredients can activate nuclear receptors and further induce the gene expression alteration of drug-metabolizing enzyme and/or transporter. Therefore, the herb-drug interaction will happen when the herbs and drugs are coadministered. This kind of interaction is called inductive herb-drug interactions. Pregnane X Receptor (PXR) and drug-metabolizing target genes are involved in most of inductive herb-drug interactions. To predict this kind of herb-drug interaction, the protocol could be simplified to only screen agonists of PXR from herbs because the relations of drugs with their metabolizing enzymes are well studied. Here, a combinational in silico strategy of pharmacophore modelling and docking-based rank aggregation (DRA) was employed to identify PXR’s agonists. Firstly, 305 ingredients were screened out from 820 ingredients as candidate agonists of PXR with our pharmacophore model. Secondly, DRA was used to rerank the result of pharmacophore filtering. To validate our prediction, a curated herb-drug interaction database was built, which recorded 380 herb-drug interactions. Finally, among the top 10 herb ingredients from the ranking list, 6 ingredients were reported to involve in herb-drug interactions. The accuracy of our method is higher than other traditional methods. The strategy could be extended to studies on other inductive herb-drug interactions. Zhijie Cui, Hong Kang, Kailin Tang, Qi Liu, Zhiwei Cao, and Ruixin Zhu Copyright © 2015 Zhijie Cui et al. All rights reserved. Gene Signature of Human Oral Mucosa Fibroblasts: Comparison with Dermal Fibroblasts and Induced Pluripotent Stem Cells Mon, 03 Aug 2015 09:43:12 +0000 Oral mucosa is a useful material for regeneration therapy with the advantages of its accessibility and versatility regardless of age and gender. However, little is known about the molecular characteristics of oral mucosa. Here we report the first comparative profiles of the gene signatures of human oral mucosa fibroblasts (hOFs), human dermal fibroblasts (hDFs), and hOF-derived induced pluripotent stem cells (hOF-iPSCs), linking these with biological roles by functional annotation and pathway analyses. As a common feature of fibroblasts, both hOFs and hDFs expressed glycolipid metabolism-related genes at higher levels compared with hOF-iPSCs. Distinct characteristics of hOFs compared with hDFs included a high expression of glycoprotein genes, involved in signaling, extracellular matrix, membrane, and receptor proteins, besides a low expression of HOX genes, the hDFs-markers. The results of the pathway analyses indicated that tissue-reconstructive, proliferative, and signaling pathways are active, whereas senescence-related genes in p53 pathway are inactive in hOFs. Furthermore, more than half of hOF-specific genes were similarly expressed to those of hOF-iPSC genes and might be controlled by WNT signaling. Our findings demonstrated that hOFs have unique cellular characteristics in specificity and plasticity. These data may provide useful insight into application of oral fibroblasts for direct reprograming. Keiko Miyoshi, Taigo Horiguchi, Ayako Tanimura, Hiroko Hagita, and Takafumi Noma Copyright © 2015 Keiko Miyoshi et al. All rights reserved. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs Mon, 03 Aug 2015 09:33:16 +0000 Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. Liang-Tsung Huang, Chao-Chin Wu, Lien-Fu Lai, and Yun-Ju Li Copyright © 2015 Liang-Tsung Huang et al. All rights reserved. Similarities in Gene Expression Profiles during In Vitro Aging of Primary Human Embryonic Lung and Foreskin Fibroblasts Mon, 03 Aug 2015 08:07:15 +0000 Replicative senescence is of fundamental importance for the process of cellular aging, since it is a property of most of our somatic cells. Here, we elucidated this process by comparing gene expression changes, measured by RNA-seq, in fibroblasts originating from two different tissues, embryonic lung (MRC-5) and foreskin (HFF), at five different time points during their transition into senescence. Although the expression patterns of both fibroblast cell lines can be clearly distinguished, the similar differential expression of an ensemble of genes was found to correlate well with their transition into senescence, with only a minority of genes being cell line specific. Clustering-based approaches further revealed common signatures between the cell lines. Investigation of the mRNA expression levels at various time points during the lifespan of either of the fibroblasts resulted in a number of monotonically up- and downregulated genes which clearly showed a novel strong link to aging and senescence related processes which might be functional. In terms of expression profiles of differentially expressed genes with age, common genes identified here have the potential to rule the transition into senescence of embryonic lung and foreskin fibroblasts irrespective of their different cellular origin. Shiva Marthandan, Steffen Priebe, Mario Baumgart, Marco Groth, Alessandro Cellerino, Reinhard Guthke, Peter Hemmerich, and Stephan Diekmann Copyright © 2015 Shiva Marthandan et al. All rights reserved. AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model Mon, 03 Aug 2015 07:44:11 +0000 Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods. Jianzhu Ma and Sheng Wang Copyright © 2015 Jianzhu Ma and Sheng Wang. All rights reserved. Module Based Differential Coexpression Analysis Method for Type 2 Diabetes Mon, 03 Aug 2015 07:40:09 +0000 More and more studies have shown that many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional biological pathway or network and are highly correlated. Differential coexpression analysis, as a more comprehensive technique to the differential expression analysis, was raised to research gene regulatory networks and biological pathways of phenotypic changes through measuring gene correlation changes between disease and normal conditions. In this paper, we propose a gene differential coexpression analysis algorithm in the level of gene sets and apply the algorithm to a publicly available type 2 diabetes (T2D) expression dataset. Firstly, we calculate coexpression biweight midcorrelation coefficients between all gene pairs. Then, we select informative correlation pairs using the “differential coexpression threshold” strategy. Finally, we identify the differential coexpression gene modules using maximum clique concept and k-clique algorithm. We apply the proposed differential coexpression analysis method on simulated data and T2D data. Two differential coexpression gene modules about T2D were detected, which should be useful for exploring the biological function of the related genes. Lin Yuan, Chun-Hou Zheng, Jun-Feng Xia, and De-Shuang Huang Copyright © 2015 Lin Yuan et al. All rights reserved. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection Mon, 03 Aug 2015 07:26:06 +0000 Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification. Yifei Chen, Yuxing Sun, and Bing-Qing Han Copyright © 2015 Yifei Chen et al. All rights reserved. Spatially Enhanced Differential RNA Methylation Analysis from Affinity-Based Sequencing Data with Hidden Markov Model Sun, 02 Aug 2015 14:09:30 +0000 With the development of new sequencing technology, the entire N6-methyl-adenosine (m6A) RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq), making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset. Yu-Chen Zhang, Shao-Wu Zhang, Lian Liu, Hui Liu, Lin Zhang, Xiaodong Cui, Yufei Huang, and Jia Meng Copyright © 2015 Yu-Chen Zhang et al. All rights reserved. Advanced Computational Approaches for Medical Genetics and Genomics Thu, 30 Jul 2015 06:34:50 +0000 Zhi Wei, Xiao Chang, and Junwen Wang Copyright © 2015 Zhi Wei et al. All rights reserved. Identification of Gene Biomarkers for Distinguishing Small-Cell Lung Cancer from Non-Small-Cell Lung Cancer Using a Network-Based Approach Tue, 28 Jul 2015 08:11:43 +0000 Lung cancer consists of two main subtypes: small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC) that are classified according to their physiological phenotypes. In this study, we have developed a network-based approach to identify molecular biomarkers that can distinguish SCLC from NSCLC. By identifying positive and negative coexpression gene pairs in normal lung tissues, SCLC, or NSCLC samples and using functional association information from the STRING network, we first construct a lung cancer-specific gene association network. From the network, we obtain gene modules in which genes are highly functionally associated with each other and are either positively or negatively coexpressed in the three conditions. Then, we identify gene modules that not only are differentially expressed between cancer and normal samples, but also show distinctive expression patterns between SCLC and NSCLC. Finally, we select genes inside those modules with discriminating coexpression patterns between the two lung cancer subtypes and predict them as candidate biomarkers that are of diagnostic use. Fei Long, Jia-Hang Su, Bin Liang, Li-Li Su, and Shu-Juan Jiang Copyright © 2015 Fei Long et al. All rights reserved. Network-Based Association Study of Obesity and Type 2 Diabetes with Gene Expression Profiles Mon, 27 Jul 2015 07:52:11 +0000 The increased prevalence of obesity and type 2 diabetes (T2D) has become an important factor affecting the health of the human. Obesity is commonly considered as a major risk factor for the development of T2D. However, the molecular mechanisms of the disease relations are not well discovered yet. In this study, the combination of multiple differential expression profiles and a comprehensive biological network of obesity and T2D allowed us to identify and compare the disease-responsive active modules and subclusters. The results demonstrated that the connection between obesity and T2D mainly relied on several pathways involved in the digestive metabolism, immunization, and signal transduction, such as adipocytokine, chemokine signaling pathway, T cell receptor signaling pathway, and MAPK signaling pathways. The relationships of almost all of these pathways with obesity and T2D have been verified by the previous reports individually. We also found that the different parts in the same pathway are activated in obesity and T2D. The association of cancer, obesity, and T2D was identified too here. As a conclusion, our network-based method not only gives better support for the close connection between obesity and T2D, but also provides a systemic view in understanding the molecular functions underneath the links. It should be helpful in the development of new therapies for obesity, T2D, and the associated diseases. Siyi Zhang, Bo Wang, Jingsong Shi, and Jing Li Copyright © 2015 Siyi Zhang et al. All rights reserved. Gene Coexpression and Evolutionary Conservation Analysis of the Human Preimplantation Embryos Mon, 27 Jul 2015 07:05:30 +0000 Evolutionary developmental biology (EVO-DEVO) tries to decode evolutionary constraints on the stages of embryonic development. Two models—the “funnel-like” model and the “hourglass” model—have been proposed by investigators to illustrate the fluctuation of selective pressure on these stages. However, selective indices of stages corresponding to mammalian preimplantation embryonic development (PED) were undetected in previous studies. Based on single cell RNA sequencing of stages during human PED, we used coexpression method to identify gene modules activated in each of these stages. Through measuring the evolutionary indices of gene modules belonging to each stage, we observed change pattern of selective constraints on PED for the first time. The selective pressure decreases from the zygote stage to the 4-cell stage and increases at the 8-cell stage and then decreases again from 8-cell stage to the late blastocyst stages. Previous EVO-DEVO studies concerning the whole embryo development neglected the fluctuation of selective pressure in these earlier stages, and the fluctuation was potentially correlated with events of earlier stages, such as zygote genome activation (ZGA). Such oscillation in an earlier stage would further affect models of the evolutionary constraints on whole embryo development. Therefore, these earlier stages should be measured intensively in future EVO-DEVO studies. Tiancheng Liu, Lin Yu, Guohui Ding, Zhen Wang, Lei Liu, Hong Li, and Yixue Li Copyright © 2015 Tiancheng Liu et al. All rights reserved. Statistical Genomic Approach Identifies Association between FSHR Polymorphisms and Polycystic Ovary Morphology in Women with Polycystic Ovary Syndrome Sun, 26 Jul 2015 12:40:17 +0000 Background. Single-nucleotide polymorphisms (SNPs) in the follicle stimulating hormone receptor (FSHR) gene are associated with PCOS. However, their relationship to the polycystic ovary (PCO) morphology remains unknown. This study aimed to investigate whether PCOS related SNPs in the FSHR gene are associated with PCO in women with PCOS. Methods. Patients were grouped into PCO () and non-PCO () groups. Genomic genotypes were profiled using Affymetrix human genome SNP chip 6. Two polymorphisms (rs2268361 and rs2349415) of FSHR were analyzed using a statistical approach. Results. Significant differences were found in the allele distributions of the GG genotype of rs2268361 between the PCO and non-PCO groups (27.6% GG, 53.4% GA, and 19.0% AA versus 33.3% GG, 36.5% GA, and 30.2% AA), while no significant differences were found in the allele distributions of the GG genotype of rs2349415. When rs2268361 was considered, there were statistically significant differences of serum follicle stimulating hormone, estradiol, and sex hormone binding globulin between genotypes in the PCO group. In case of the rs2349415 SNP, only serum sex hormone binding globulin was statistically different between genotypes in the PCO group. Conclusions. Functional variants in FSHR gene may contribute to PCO susceptibility in women with PCOS. Tao Du, Yu Duan, Kaiwen Li, Xiaomiao Zhao, Renmin Ni, Yu Li, and Dongzi Yang Copyright © 2015 Tao Du et al. All rights reserved. Deciphering the Correlation between Breast Tumor Samples and Cell Lines by Integrating Copy Number Changes and Gene Expression Profiles Sun, 26 Jul 2015 11:35:30 +0000 Breast cancer is one of the most common cancers with high incident rate and high mortality rate worldwide. Although different breast cancer cell lines were widely used in laboratory investigations, accumulated evidences have indicated that genomic differences exist between cancer cell lines and tissue samples in the past decades. The abundant molecular profiles of cancer cell lines and tumor samples deposited in the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas now allow a systematical comparison of the breast cancer cell lines with breast tumors. We depicted the genomic characteristics of breast primary tumors based on the copy number variation and gene expression profiles and the breast cancer cell lines were compared to different subgroups of breast tumors. We identified that some of the breast cancer cell lines show high correlation with the tumor group that agrees with previous knowledge, while a big part of them do not, including the most used MCF7, MDA-MB-231, and T-47D. We presented a computational framework to identify cell lines that mostly resemble a certain tumor group for the breast tumor study. Our investigation presents a useful guide to bridge the gap between cell lines and tumors and helps to select the most suitable cell line models for personalized cancer studies. Yi Sun and Qi Liu Copyright © 2015 Yi Sun and Qi Liu. All rights reserved. Network Comparison of Inflammation in Colorectal Cancer and Alzheimer’s Disease Sun, 26 Jul 2015 09:45:50 +0000 Recently, a large clinical study revealed an inverse correlation of individual risk of cancer versus Alzheimer’s disease (AD). However, no explanation exists for this anticorrelation at the molecular level; however, inflammation is crucial to the pathogenesis of both diseases, necessitating a need to understand differing signaling usage during inflammatory responses distinct to both diseases. Using a subpathway analysis approach, we identified numerous well-known and previously unknown pathways enriched in datasets from both diseases. Here, we present the quantitative importance of the inflammatory response in the two disease pathologies and summarize signal transduction pathways common to both diseases that are affected by inflammation. Sungjin Park, Seok Jong Yu, Yongseong Cho, Curt Balch, Jinhyuk Lee, Yon Hui Kim, and Seungyoon Nam Copyright © 2015 Sungjin Park et al. All rights reserved. The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics Sun, 26 Jul 2015 09:44:05 +0000 In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g., dominance and epistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e., ,000) the predictive accuracy of ridge regression is slightly higher than the classical genome-wide association study approach of repeated simple regression (i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially. Ronald de Vlaming and Patrick J. F. Groenen Copyright © 2015 Ronald de Vlaming and Patrick J. F. Groenen. All rights reserved. FARMS: A New Algorithm for Variable Selection Sun, 26 Jul 2015 07:39:02 +0000 Large datasets including an extensive number of covariates are generated these days in many different situations, for instance, in detailed genetic studies of outbreed human populations or in complex analyses of immune responses to different infections. Aiming at informing clinical interventions or vaccine design, methods for variable selection identifying those variables with the optimal prediction performance for a specific outcome are crucial. However, testing for all potential subsets of variables is not feasible and alternatives to existing methods are needed. Here, we describe a new method to handle such complex datasets, referred to as FARMS, that combines forward and all subsets regression for model selection. We apply FARMS to a host genetic and immunological dataset of over 800 individuals from Lima (Peru) and Durban (South Africa) who were HIV infected and tested for antiviral immune responses. This dataset includes more than 500 explanatory variables: around 400 variables with information on HIV immune reactivity and around 100 individual genetic characteristics. We have implemented FARMS in R statistical language and we showed that FARMS is fast and outcompetes other comparable commonly used approaches, thus providing a new tool for the thorough analysis of complex datasets without the need for massive computational infrastructure. Susana Perez-Alvarez, Guadalupe Gómez, and Christian Brander Copyright © 2015 Susana Perez-Alvarez et al. All rights reserved. Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods Sun, 26 Jul 2015 07:38:47 +0000 MicroRNAs constitute an important class of noncoding, single-stranded, ~22 nucleotide long RNA molecules encoded by endogenous genes. They play an important role in regulating gene transcription and the regulation of normal development. MicroRNAs can be associated with disease; however, only a few microRNA-disease associations have been confirmed by traditional experimental approaches. We introduce two methods to predict microRNA-disease association. The first method, KATZ, focuses on integrating the social network analysis method with machine learning and is based on networks derived from known microRNA-disease associations, disease-disease associations, and microRNA-microRNA associations. The other method, CATAPULT, is a supervised machine learning method. We applied the two methods to 242 known microRNA-disease associations and evaluated their performance using leave-one-out cross-validation and 3-fold cross-validation. Experiments proved that our methods outperformed the state-of-the-art methods. Quan Zou, Jinjin Li, Qingqi Hong, Ziyu Lin, Yun Wu, Hua Shi, and Ying Ju Copyright © 2015 Quan Zou et al. All rights reserved. Low-Rank and Sparse Matrix Decomposition for Genetic Interaction Data Sun, 26 Jul 2015 07:34:19 +0000 Background. Epistatic miniarray profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. One approach to analyze EMAP data is to identify gene modules with densely interacting genes. In addition, genetic interaction score ( score) reflects the degree of synergizing or mitigating effect of two mutants, which is also informative. Statistical approaches that exploit both modularity and the pairwise interactions may provide more insight into the underlying biology. However, the high missing rate in EMAP data hinders the development of such approaches. To address the above problem, we adopted the matrix decomposition methodology “low-rank and sparse decomposition” (LRSDec) to decompose EMAP data matrix into low-rank part and sparse part. Results. LRSDec has been demonstrated as an effective technique for analyzing EMAP data. We applied a synthetic dataset and an EMAP dataset studying RNA-related processes in Saccharomyces cerevisiae. Global views of the genetic cross talk between different RNA-related protein complexes and processes have been structured, and novel functions of genes have been predicted. Yishu Wang, Dejie Yang, and Minghua Deng Copyright © 2015 Yishu Wang et al. All rights reserved. Biometrics and Biosecurity 2014 Tue, 21 Jul 2015 11:00:24 +0000 Tai-hoon Kim, Sabah Mohammed, Wai-Chi Fang, and Carlos Ramos Copyright © 2015 Tai-hoon Kim et al. All rights reserved. A Multilayer Secure Biomedical Data Management System for Remotely Managing a Very Large Number of Diverse Personal Healthcare Devices Mon, 13 Jul 2015 08:25:22 +0000 In this paper, a multilayer secure biomedical data management system for managing a very large number of diverse personal health devices is proposed. The system has the following characteristics: the system supports international standard communication protocols to achieve interoperability. The system is integrated in the sense that both a PHD communication system and a remote PHD management system work together as a single system. Finally, the system proposed in this paper provides user/message authentication processes to securely transmit biomedical data measured by PHDs based on the concept of a biomedical signature. Some experiments, including the stress test, have been conducted to show that the system proposed/constructed in this study performs very well even when a very large number of PHDs are used. For a stress test, up to 1,200 threads are made to represent the same number of PHD agents. The loss ratio of the ISO/IEEE 11073 messages in the normal system is as high as 14% when 1,200 PHD agents are connected. On the other hand, no message loss occurs in the multilayered system proposed in this study, which demonstrates the superiority of the multilayered system to the normal system with regard to heavy traffic. KeeHyun Park and SeungHyeon Lim Copyright © 2015 KeeHyun Park and SeungHyeon Lim. All rights reserved. Towards a Food Safety Knowledge Base Applicable in Crisis Situations and Beyond Mon, 13 Jul 2015 08:23:27 +0000 In case of contamination in the food chain, fast action is required in order to reduce the numbers of affected people. In such situations, being able to predict the fate of agents in foods would help risk assessors and decision makers in assessing the potential effects of a specific contamination event and thus enable them to deduce the appropriate mitigation measures. One efficient strategy supporting this is using model based simulations. However, application in crisis situations requires ready-to-use and easy-to-adapt models to be available from the so-called food safety knowledge bases. Here, we illustrate this concept and its benefits by applying the modular open source software tools PMM-Lab and FoodProcess-Lab. As a fictitious sample scenario, an intentional ricin contamination at a beef salami production facility was modelled. Predictive models describing the inactivation of ricin were reviewed, relevant models were implemented with PMM-Lab, and simulations on residual toxin amounts in the final product were performed with FoodProcess-Lab. Due to the generic and modular modelling concept implemented in these tools, they can be applied to simulate virtually any food safety contamination scenario. Apart from the application in crisis situations, the food safety knowledge base concept will also be useful in food quality and safety investigations. Alexander Falenski, Armin A. Weiser, Christian Thöns, Bernd Appel, Annemarie Käsbohrer, and Matthias Filter Copyright © 2015 Alexander Falenski et al. All rights reserved. A Multimodal User Authentication System Using Faces and Gestures Mon, 13 Jul 2015 07:58:22 +0000 As a novel approach to perform user authentication, we propose a multimodal biometric system that uses faces and gestures obtained from a single vision sensor. Unlike typical multimodal biometric systems using physical information, the proposed system utilizes gesture video signals combined with facial images. Whereas physical information such as face, fingerprints, and iris is fixed and not changeable, behavioral information such as gestures and signatures can be freely changed by the user, similar to a password. Therefore, it can be a countermeasure when the physical information is exposed. We aim to investigate the potential possibility of using gestures as a signal for biometric system and the robustness of the proposed multimodal user authentication system. Through computational experiments on a public database, we confirm that gesture information can help to improve the authentication performance. Hyunsoek Choi and Hyeyoung Park Copyright © 2015 Hyunsoek Choi and Hyeyoung Park. All rights reserved. Quantification of Hepatorenal Index for Computer-Aided Fatty Liver Classification with Self-Organizing Map and Fuzzy Stretching from Ultrasonography Mon, 13 Jul 2015 07:57:28 +0000 Accurate measures of liver fat content are essential for investigating hepatic steatosis. For a noninvasive inexpensive ultrasonographic analysis, it is necessary to validate the quantitative assessment of liver fat content so that fully automated reliable computer-aided software can assist medical practitioners without any operator subjectivity. In this study, we attempt to quantify the hepatorenal index difference between the liver and the kidney with respect to the multiple severity status of hepatic steatosis. In order to do this, a series of carefully designed image processing techniques, including fuzzy stretching and edge tracking, are applied to extract regions of interest. Then, an unsupervised neural learning algorithm, the self-organizing map, is designed to establish characteristic clusters from the image, and the distribution of the hepatorenal index values with respect to the different levels of the fatty liver status is experimentally verified to estimate the differences in the distribution of the hepatorenal index. Such findings will be useful in building reliable computer-aided diagnostic software if combined with a good set of other characteristic feature sets and powerful machine learning classifiers in the future. Kwang Baek Kim and Chang Won Kim Copyright © 2015 Kwang Baek Kim and Chang Won Kim. All rights reserved. Biometrics Analysis and Evaluation on Korean Makgeolli Using Brainwaves and Taste Biological Sensor System Mon, 13 Jul 2015 07:53:08 +0000 There are several methods available in measuring food taste. The sensory evaluation, for instance, is a typical method for panels to test of taste and recognize smell with their nose by measuring the degree of taste characteristic, intensity, and pleasure. There are many issues entailed in the traditional sensory evaluation method such as forming a panel and evaluation cost; moreover, it is only localized in particular areas. Accordingly, this paper aimed to select food in one particular area, and compare and review the content between sensory evaluations using a taste biological sensor, as well as presenting an analysis of brainwaves using EEG and finally a proposal of a new method for sensory evaluation. In this paper, the researchers have conducted a sensory evaluation whereas a maximum of nine points were accumulated by purchasing eight types of rice wine. These eight types of Makgeolli were generalized by generating multidimensional data with the use of TS-5000z, thus learning mapping points and scaling them. The contribution of this paper, therefore, is to overcome the disadvantages of the sensory evaluation with the usage of the suggested taste biological sensor system. Yong-Sung Kim and Yong-Suk Kim Copyright © 2015 Yong-Sung Kim and Yong-Suk Kim. All rights reserved. Bilateral Image Subtraction and Multivariate Models for the Automated Triaging of Screening Mammograms Thu, 09 Jul 2015 11:35:14 +0000 Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. This work presents a computer-aided diagnosis (CADx) method aimed to automatically triage mammogram sets. The method coregisters the left and right mammograms, extracts image features, and classifies the subjects into risk of having malignant calcifications (CS), malignant masses (MS), and healthy subject (HS). In this study, 449 subjects (197 CS, 207 MS, and 45 HS) from a public database were used to train and evaluate the CADx. Percentile-rank (-rank) and -normalizations were used. For the -rank, the CS versus HS model achieved a cross-validation accuracy of 0.797 with an area under the receiver operating characteristic curve (AUC) of 0.882; the MS versus HS model obtained an accuracy of 0.772 and an AUC of 0.842. For the -normalization, the CS versus HS model achieved an accuracy of 0.825 with an AUC of 0.882 and the MS versus HS model obtained an accuracy of 0.698 and an AUC of 0.807. The proposed method has the potential to rank cases with high probability of malignant findings aiding in the prioritization of radiologists work list. José Celaya-Padilla, Antonio Martinez-Torteya, Juan Rodriguez-Rojas, Jorge Galvan-Tejada, Victor Treviño, and José Tamez-Peña Copyright © 2015 José Celaya-Padilla et al. All rights reserved. Functional and Structural Consequences of Damaging Single Nucleotide Polymorphisms in Human Prostate Cancer Predisposition Gene RNASEL Wed, 08 Jul 2015 09:10:00 +0000 A commonly diagnosed cancer, prostate cancer (PrCa), is being regulated by the gene RNASEL previously known as PRCA1 codes for ribonuclease L which is an integral part of interferon regulated system that mediates antiviral and antiproliferative role of the interferons. Both somatic and germline mutations have been implicated to cause prostate cancer. With an array of available Single Nucleotide Polymorphism data on dbSNP this study is designed to sort out functional SNPs in RNASEL by implementing different authentic computational tools such as SIFT, PolyPhen, SNPs&GO, Fathmm, ConSurf, UTRScan, PDBsum, Tm-Align, I-Mutant, and Project HOPE for functional and structural assessment, solvent accessibility, molecular dynamics, and energy minimization study. Among 794 RNASEL SNP entries 124 SNPs were found nonsynonymous from which SIFT predicted 13 nsSNPs as nontolerable whereas PolyPhen-2 predicted 28. SNPs found on the 3′ and 5′ UTR were also assessed. By analyzing six tools having different perspectives an aggregate result was produced where nine nsSNPs were found to be most likely to exert deleterious effect. 3D models of mutated proteins were generated to determine the functional and structural effect of the mutations on ribonuclease L. The initial findings were reinforced by the results from I-Mutant and Project HOPE as these tools predicted significant structural and functional instability of the mutated proteins. Expasy-ProSit tool defined the mutations to be situated in the functional domains of the protein. Considering previous analysis this study revealed a conclusive result deducing the available SNP data on the database by identifying the most damaging three nsSNP rs151296858 (G59S), rs145415894 (A276V), and rs35896902 (R592H). As such studies involving polymorphisms of RNASEL were none to be found, the results of the current study would certainly be helpful in future prospects concerning prostate cancer in males. Amit Datta, Md. Habibul Hasan Mazumder, Afrin Sultana Chowdhury, and Md. Anayet Hasan Copyright © 2015 Amit Datta et al. All rights reserved. A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets Sun, 05 Jul 2015 07:40:37 +0000 New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. Yipu Zhang and Ping Wang Copyright © 2015 Yipu Zhang and Ping Wang. All rights reserved. Big Data Analytics in Healthcare Thu, 02 Jul 2015 12:06:04 +0000 The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined. Ashwin Belle, Raghuram Thiagarajan, S. M. Reza Soroushmehr, Fatemeh Navidi, Daniel A. Beard, and Kayvan Najarian Copyright © 2015 Ashwin Belle et al. All rights reserved. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature Thu, 25 Jun 2015 06:28:00 +0000 The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures. Ning Ye, Hengfu Yin, Jingjing Liu, Xiaogang Dai, and Tongming Yin Copyright © 2015 Ning Ye et al. All rights reserved. Combined Analysis of SNP Array Data Identifies Novel CNV Candidates and Pathways in Ependymoma and Mesothelioma Mon, 22 Jun 2015 06:06:41 +0000 Copy number variation is a class of structural genomic modifications that includes the gain and loss of a specific genomic region, which may include an entire gene. Many studies have used low-resolution techniques to identify regions that are frequently lost or amplified in cancer. Usually, researchers choose to use proprietary or non-open-source software to detect these regions because the graphical interface tends to be easier to use. In this study, we combined two different open-source packages into an innovative strategy to identify novel copy number variations and pathways associated with cancer. We used a mesothelioma and ependymoma published datasets to assess our tool. We detected previously described and novel copy number variations that are associated with cancer chemotherapy resistance. We also identified altered pathways associated with these diseases, like cell adhesion in patients with mesothelioma and negative regulation of glutamatergic synaptic transmission in ependymoma patients. In conclusion, we present a novel strategy using open-source software to identify copy number variations and altered pathways associated with cancer. Gabriel Wajnberg, Benilton S. Carvalho, Carlos G. Ferreira, and Fabio Passetti Copyright © 2015 Gabriel Wajnberg et al. All rights reserved. Network-Based Logistic Classification with an Enhanced Solver Reveals Biomarker and Subnetwork Signatures for Diagnosing Lung Cancer Tue, 16 Jun 2015 08:08:23 +0000 Identifying biomarker and signaling pathway is a critical step in genomic studies, in which the regularization method is a widely used feature extraction approach. However, most of the regularizers are based on -norm and their results are not good enough for sparsity and interpretation and are asymptotically biased, especially in genomic research. Recently, we gained a large amount of molecular interaction information about the disease-related biological processes and gathered them through various databases, which focused on many aspects of biological systems. In this paper, we use an enhanced penalized solver to penalize network-constrained logistic regression model called an enhanced net, where the predictors are based on gene-expression data with biologic network knowledge. Extensive simulation studies showed that our proposed approach outperforms regularization, the old penalized solver, and the Elastic net approaches in terms of classification accuracy and stability. Furthermore, we applied our method for lung cancer data analysis and found that our method achieves higher predictive accuracy than regularization, the old penalized solver, and the Elastic net approaches, while fewer but informative biomarkers and pathways are selected. Hai-Hui Huang, Yong Liang, and Xiao-Ying Liu Copyright © 2015 Hai-Hui Huang et al. All rights reserved. The Impact of Normalization Methods on RNA-Seq Data Analysis Mon, 15 Jun 2015 12:14:24 +0000 High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. J. Zyprych-Walczak, A. Szabelska, L. Handschuh, K. Górczak, K. Klamecka, M. Figlerowicz, and I. Siatkowski Copyright © 2015 J. Zyprych-Walczak et al. All rights reserved. A Robust Supervised Variable Selection for Noisy High-Dimensional Data Tue, 02 Jun 2015 06:53:56 +0000 The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers. Jan Kalina and Anna Schlenker Copyright © 2015 Jan Kalina and Anna Schlenker. All rights reserved. Toward a Literature-Driven Definition of Big Data in Healthcare Tue, 02 Jun 2015 06:08:12 +0000 Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals and the number of variables for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with . Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data. Emilie Baro, Samuel Degoul, Régis Beuscart, and Emmanuel Chazard Copyright © 2015 Emilie Baro et al. All rights reserved. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies Mon, 01 Jun 2015 14:35:07 +0000 Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. Alexandre G. de Brevern, Jean-Philippe Meyniel, Cécile Fairhead, Cécile Neuvéglise, and Alain Malpertuy Copyright © 2015 Alexandre G. de Brevern et al. All rights reserved. Improved Diagnostic Multimodal Biomarkers for Alzheimer’s Disease and Mild Cognitive Impairment Thu, 28 May 2015 06:17:43 +0000 The early diagnosis of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) is very important for treatment research and patient care purposes. Few biomarkers are currently considered in clinical settings, and their use is still optional. The objective of this work was to determine whether multimodal and nonpreviously AD associated features could improve the classification accuracy between AD, MCI, and healthy controls, which may impact future AD biomarkers. For this, Alzheimer’s Disease Neuroimaging Initiative database was mined for case-control candidates. At least 652 baseline features extracted from MRI and PET analyses, biological samples, and clinical data up to February 2014 were used. A feature selection methodology that includes a genetic algorithm search coupled to a logistic regression classifier and forward and backward selection strategies was used to explore combinations of features. This generated diagnostic models with sizes ranging from 3 to 8, including well documented AD biomarkers, as well as unexplored image, biochemical, and clinical features. Accuracies of 0.85, 0.79, and 0.80 were achieved for HC-AD, HC-MCI, and MCI-AD classifications, respectively, when evaluated using a blind test set. In conclusion, a set of features provided additional and independent information to well-established AD biomarkers, aiding in the classification of MCI and AD. Antonio Martínez-Torteya, Víctor Treviño, and José G. Tamez-Peña Copyright © 2015 Antonio Martínez-Torteya et al. All rights reserved. Quantitative Assessment of the Association between Genetic Variants in MicroRNAs and Colorectal Cancer Risk Wed, 20 May 2015 09:31:47 +0000 Background. The associations between polymorphisms in microRNAs and the susceptibility of colorectal cancer (CRC) were inconsistent in previous studies. This study aims to quantify the strength of the correlation between the four common polymorphisms among microRNAs (hsa-mir-146a rs2910164, hsa-mir-149 rs2292832, hsa-mir-196a2 rs11614913, and hsa-mir-499 rs3746444) and CRC risk. Methods. We searched PubMed, Web of Knowledge, and CNKI to find relevant studies. The combined odds ratio (OR) with 95% confidence interval (95% CI) was used to estimate the strength of the association in a fixed or random effect model. Results. 15 studies involving 5,486 CRC patients and 7,184 controls were included. Meta-analyses showed that rs3746444 had association with CRC risk in Caucasians (OR = 0.57, 95% CI = 0.34–0.95). In the subgroup analysis, we found significant associations between rs2910164 and CRC in hospital based studies (OR = 1.24, 95% CI = 1.03–1.49). rs2292832 may be a high risk factor of CRC in population based studied (OR = 1.18, 95% CI = 1.08–1.38). Conclusion. This meta-analysis showed that rs2910164 and rs2292832 may increase the risk of CRC. However, rs11614913 polymorphism may reduce the risk of CRC. rs3746444 may have a decreased risk to CRC in Caucasians. Xiao-Xu Liu, Meng Wang, Dan Xu, Jian-Hai Yang, Hua-Feng Kang, Xi-Jing Wang, Shuai Lin, Peng-Tao Yang, Xing-Han Liu, and Zhi-Jun Dai Copyright © 2015 Xiao-Xu Liu et al. All rights reserved. Multiblock Discriminant Analysis for Integrative Genomic Study Sun, 17 May 2015 11:45:02 +0000 Human diseases are abnormal medical conditions in which multiple biological components are complicatedly involved. Nevertheless, most contributions of research have been made with a single type of genetic data such as Single Nucleotide Polymorphism (SNP) or Copy Number Variation (CNV). Furthermore, epigenetic modifications and transcriptional regulations have to be considered to fully exploit the knowledge of the complex human diseases as well as the genomic variants. We call the collection of the multiple heterogeneous data “multiblock data.” In this paper, we propose a novel Multiblock Discriminant Analysis (MultiDA) method that provides a new integrative genomic model for the multiblock analysis and an efficient algorithm for discriminant analysis. The integrative genomic model is built by exploiting the representative genomic data including SNP, CNV, DNA methylation, and gene expression. The efficient algorithm for the discriminant analysis identifies discriminative factors of the multiblock data. The discriminant analysis is essential to discover biomarkers in computational biology. The performance of the proposed MultiDA was assessed by intensive simulation experiments, where the outstanding performance comparing the related methods was reported. As a target application, we applied MultiDA to human brain data of psychiatric disorders. The findings and gene regulatory network derived from the experiment are discussed. Mingon Kang, Dong-Chul Kim, Chunyu Liu, and Jean Gao Copyright © 2015 Mingon Kang et al. All rights reserved. Intelligent Informatics in Translational Medicine Wed, 06 May 2015 08:30:36 +0000 Hao-Teng Chang, Tatsuya Akutsu, Sorin Draghici, Oliver Ray, and Tun-Wen Pai Copyright © 2015 Hao-Teng Chang et al. All rights reserved. Implication of Caspase-3 as a Common Therapeutic Target for Multineurodegenerative Disorders and Its Inhibition Using Nonpeptidyl Natural Compounds Mon, 04 May 2015 13:43:02 +0000 Caspase-3 has been identified as a key mediator of neuronal apoptosis. The present study identifies caspase-3 as a common player involved in the regulation of multineurodegenerative disorders, namely, Alzheimer’s disease (AD), Parkinson’s disease (PD), Huntington’s disease (HD), and amyotrophic lateral sclerosis (ALS). The protein interaction network prepared using STRING database provides a strong evidence of caspase-3 interactions with the metabolic cascade of the said multineurodegenerative disorders, thus characterizing it as a potential therapeutic target for multiple neurodegenerative disorders. In silico molecular docking of selected nonpeptidyl natural compounds against caspase-3 exposed potent leads against this common therapeutic target. Rosmarinic acid and curcumin proved to be the most promising ligands (leads) mimicking the inhibitory action of peptidyl inhibitors with the highest Gold fitness scores 57.38 and 53.51, respectively. These results were in close agreement with the fitness score predicted using X-score, a consensus based scoring function to calculate the binding affinity. Nonpeptidyl inhibitors of caspase-3 identified in the present study expeditiously mimic the inhibitory action of the previously identified peptidyl inhibitors. Since, nonpeptidyl inhibitors are preferred drug candidates, hence, discovery of natural compounds as nonpeptidyl inhibitors is a significant transition towards feasible drug development for neurodegenerative disorders. Saif Khan, Khurshid Ahmad, Eyad M. A. Alshammari, Mohd Adnan, Mohd Hassan Baig, Mohtashim Lohani, Pallavi Somvanshi, and Shafiul Haque Copyright © 2015 Saif Khan et al. All rights reserved. The TF-miRNA Coregulation Network in Oral Lichen Planus Sun, 03 May 2015 12:33:38 +0000 Oral lichen planus (OLP) is a chronic inflammatory disease that affects oral mucosa, some of which may finally develop into oral squamous cell carcinoma. Therefore, pinpointing the molecular mechanisms underlying the pathogenesis of OLP is important to develop efficient treatments for OLP. Recently, the accumulation of the large amount of omics data, especially transcriptome data, provides opportunities to investigate OLPs from a systematic perspective. In this paper, assuming that the OLP associated genes have functional relationships, we present a new approach to identify OLP related gene modules from gene regulatory networks. In particular, we find that the gene modules regulated by both transcription factors (TFs) and microRNAs (miRNAs) play important roles in the pathogenesis of OLP and many genes in the modules have been reported to be related to OLP in the literature. Yu-Ling Zuo, Di-Ping Gong, Bi-Ze Li, Juan Zhao, Ling-Yue Zhou, Fang-Yang Shao, Zhao Jin, and Yuan He Copyright © 2015 Yu-Ling Zuo et al. All rights reserved. Prediction of Metabolic Gene Biomarkers for Neurodegenerative Disease by an Integrated Network-Based Approach Sun, 03 May 2015 11:35:44 +0000 Neurodegenerative diseases (NDs), such as Parkinson’s disease (PD) and Huntington’s disease (HD), have become more and more common among aged people worldwide. One hallmark of NDs is the presence of intracellular accumulation of specific pathogenic proteins that may result from abnormal function of metabolic processes. Previously, we have developed a computational method named Met-express that predicted key enzyme-coding genes in cancer development by integrating cancer gene coexpression network with the metabolic network. Here, we applied Met-express to predict key enzyme-coding genes in both PD and HD. Functional enrichment analysis and literature review of predicted genes suggested that there might be some common pathogenic metabolic pathways for PD and HD. We further found that the predicted genes had significant functional association with known disease genes, with some of them already documented as biomarkers or therapeutic targets for NDs. As such, the predicted metabolic genes may be of use as novel biomarkers not only for ND diagnosis but also for potential therapeutic treatments. Qi Ni, Xianming Su, Jingqi Chen, and Weidong Tian Copyright © 2015 Qi Ni et al. All rights reserved. Identification of Gene and MicroRNA Signatures for Oral Cancer Developed from Oral Leukoplakia Sun, 03 May 2015 11:12:40 +0000 In clinic, oral leukoplakia (OLK) may develop into oral cancer. However, the mechanism underlying this transformation is still unclear. In this work, we present a new pipeline to identify oral cancer related genes and microRNAs (miRNAs) by integrating both gene and miRNA expression profiles. In particular, we find some network modules as well as their miRNA regulators that play important roles in the development of OLK to oral cancer. Among these network modules, 91.67% of genes and 37.5% of miRNAs have been previously reported to be related to oral cancer in literature. The promising results demonstrate the effectiveness and efficiency of our proposed approach. Guanghui Zhu, Yuan He, Shaofang Yang, Beimin Chen, Min Zhou, and Xin-Jian Xu Copyright © 2015 Guanghui Zhu et al. All rights reserved. A Heparan Sulfate-Binding Cell Penetrating Peptide for Tumor Targeting and Migration Inhibition Sun, 03 May 2015 09:21:33 +0000 As heparan sulfate proteoglycans (HSPGs) are known as co-receptors to interact with numerous growth factors and then modulate downstream biological activities, overexpression of HS/HSPG on cell surface acts as an increasingly reliable prognostic factor in tumor progression. Cell penetrating peptides (CPPs) are short-chain peptides developed as functionalized vectors for delivery approaches of impermeable agents. On cell surface negatively charged HS provides the initial attachment of basic CPPs by electrostatic interaction, leading to multiple cellular effects. Here a functional peptide (CPPecp) has been identified from critical HS binding region in hRNase3, a unique RNase family member with in vitro antitumor activity. In this study we analyze a set of HS-binding CPPs derived from natural proteins including CPPecp. In addition to cellular binding and internalization, CPPecp demonstrated multiple functions including strong binding activity to tumor cell surface with higher HS expression, significant inhibitory effects on cancer cell migration, and suppression of angiogenesis in vitro and in vivo. Moreover, different from conventional highly basic CPPs, CPPecp facilitated magnetic nanoparticle to selectively target tumor site in vivo. Therefore, CPPecp could engage its capacity to be developed as biomaterials for diagnostic imaging agent, therapeutic supplement, or functionalized vector for drug delivery. Chien-Jung Chen, Kang-Chiao Tsai, Ping-Hsueh Kuo, Pei-Lin Chang, Wen-Ching Wang, Yung-Jen Chuang, and Margaret Dah-Tsyr Chang Copyright © 2015 Chien-Jung Chen et al. All rights reserved. A Survey on the Computational Approaches to Identify Drug Targets in the Postgenomic Era Tue, 28 Apr 2015 07:02:35 +0000 Identifying drug targets plays essential roles in designing new drugs and combating diseases. Unfortunately, our current knowledge about drug targets is far from comprehensive. Screening drug targets in the lab is an expensive and time-consuming procedure. In the past decade, the accumulation of various types of omics data makes it possible to develop computational approaches to predict drug targets. In this paper, we make a survey on the recent progress being made on computational methodologies that have been developed to predict drug targets based on different kinds of omics data and drug property data. Yan-Fen Dai and Xing-Ming Zhao Copyright © 2015 Yan-Fen Dai and Xing-Ming Zhao. All rights reserved. A Large-Scale Structural Classification of Antimicrobial Peptides Mon, 27 Apr 2015 12:43:57 +0000 Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. AMPs have abundant sequences and structures, two fundamental resources for bioinformatics researches, but analyses on how they associate with each other are either nonexistent or limited to partial classification and data. We thus present A Database of Anti-Microbial peptides (ADAM), which contains 7,007 unique sequences and 759 structures, to systematically establish comprehensive associations between AMP sequences and structures through structural folds and to provide an easy access to view their relationships. 30 distinct AMP structural fold clusters with more than one structure are detected and about a thousand AMPs are associated with at least one structural fold cluster. According to ADAM, AMP structural folds are limited—AMPs only cover about 3% of the overall protein fold space. Hao-Ting Lee, Chen-Che Lee, Je-Ruei Yang, Jim Z. C. Lai, and Kuan Y. Chang Copyright © 2015 Hao-Ting Lee et al. All rights reserved. Predicting Flavin and Nicotinamide Adenine Dinucleotide-Binding Sites in Proteins Using the Fragment Transformation Method Mon, 27 Apr 2015 11:48:34 +0000 We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified. Chih-Hao Lu, Chin-Sheng Yu, Yu-Feng Lin, and Jin-Yi Chen Copyright © 2015 Chih-Hao Lu et al. All rights reserved. Functional Genomics, Genetics, and Bioinformatics Wed, 22 Apr 2015 06:20:06 +0000 Youping Deng, Hongwei Wang, Ryuji Hamamoto, David Schaffer, and Shiwei Duan Copyright © 2015 Youping Deng et al. All rights reserved. Integrated Analysis of Multiscale Large-Scale Biological Data for Investigating Human Disease Mon, 20 Apr 2015 09:13:13 +0000 Tao Huang, Lei Chen, Mingyue Zheng, and Jiangning Song Copyright © 2015 Tao Huang et al. All rights reserved. Application of Systems Biology and Bioinformatics Methods in Biochemistry and Biomedicine 2014 Sun, 19 Apr 2015 11:37:29 +0000 Yudong Cai, Tao Huang, Lei Chen, and Bing Niu Copyright © 2015 Yudong Cai et al. All rights reserved. A Practical and Scalable Tool to Find Overlaps between Sequences Sun, 19 Apr 2015 10:48:30 +0000 The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment. Maan Haj Rachid and Qutaibah Malluhi Copyright © 2015 Maan Haj Rachid and Qutaibah Malluhi. All rights reserved. High Order Gene-Gene Interactions in Eight Single Nucleotide Polymorphisms of Renin-Angiotensin System Genes for Hypertension Association Study Sun, 19 Apr 2015 09:58:13 +0000 Several single nucleotide polymorphisms (SNPs) of renin-angiotensin system (RAS) genes are associated with hypertension (HT) but most of them are focusing on single locus effects. Here, we introduce an unbalanced function based on multifactor dimensionality reduction (MDR) for multiloci genotypes to detect high order gene-gene (SNP-SNP) interaction in unbalanced cases and controls of HT data. Eight SNPs of three RAS genes (angiotensinogen, AGT; angiotensin-converting enzyme, ACE; angiotensin II type 1 receptor, AT1R) in HT and non-HT subjects were included that showed no significant genotype differences. In 2- to 6-locus models of the SNP-SNP interaction, the SNPs of AGT and ACE genes were associated with hypertension (bootstrapping odds ratio [Boot-OR] = 1.972~3.785; 95%, confidence interval (CI) 1.26~6.21; ). In 7- and 8-locus model, SNP A1166C of AT1R gene is joined to improve the maximum Boot-OR values of 4.050 to 4.483; CI = 2.49 to 7.29; . In conclusion, the epistasis networks are identified by eight SNP-SNP interaction models. AGT, ACE, and AT1R genes have overall effects with susceptibility to hypertension, where the SNPs of ACE have a mainly hypertension-associated effect and show an interacting effect to SNPs of AGT and AT1R genes. Cheng-Hong Yang, Yu-Da Lin, Shyh-Jong Wu, Li-Yeh Chuang, and Hsueh-Wei Chang Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. Effect of Electrode Shape on Impedance of Single HeLa Cell: A COMSOL Simulation Thu, 16 Apr 2015 08:39:21 +0000 In disease prophylaxis, single cell inspection provides more detailed data compared to conventional examinations. At the individual cell level, the electrical properties of the cell are helpful for understanding the effects of cellular behavior. The electric field distribution affects the results of single cell impedance measurements whereas the electrode geometry affects the electric field distributions. Therefore, this study obtained numerical solutions by using the COMSOL multiphysics package to perform FEM simulations of the effects of electrode geometry on microfluidic devices. An equivalent circuit model incorporating the PBS solution, a pair of electrodes, and a cell is used to obtain the impedance of a single HeLa cell. Simulations indicated that the circle and parallel electrodes provide higher electric field strength compared to cross and standard electrodes at the same operating voltage. Additionally, increasing the operating voltage reduces the impedance magnitude of a single HeLa cell in all electrode shapes. Decreasing impedance magnitude of the single HeLa cell increases measurement sensitivity, but higher operational voltage will damage single HeLa cell. Min-Haw Wang and Wen-Hao Chang Copyright © 2015 Min-Haw Wang and Wen-Hao Chang. All rights reserved. Evolutionary Pattern and Regulation Analysis to Support Why Diversity Functions Existed within PPAR Gene Family Members Wed, 15 Apr 2015 14:21:56 +0000 Peroxisome proliferators-activated receptor (PPAR) gene family members exhibit distinct patterns of distribution in tissues and differ in functions. The purpose of this study is to investigate the evolutionary impacts on diversity functions of PPAR members and the regulatory differences on gene expression patterns. 63 homology sequences of PPAR genes from 31 species were collected and analyzed. The results showed that three isolated types of PPAR gene family may emerge from twice times of gene duplication events. The conserved domains of HOLI (ligand binding domain of hormone receptors) domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors) are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3′ UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3′ UTR are essential for PPARs evolution and diversity functions acquired. Tianyu Zhou, Xiping Yan, Guosong Wang, Hehe Liu, Xiang Gan, Tao Zhang, Jiwen Wang, and Liang Li Copyright © 2015 Tianyu Zhou et al. All rights reserved. Relationship between Hyperuricemia and Haar-Like Features on Tongue Images Wed, 15 Apr 2015 13:36:54 +0000 Objective. To investigate differences in tongue images of subjects with and without hyperuricemia. Materials and Methods. This population-based case-control study was performed in 2012-2013. We collected data from 46 case subjects with hyperuricemia and 46 control subjects, including results of biochemical examinations and tongue images. Symmetrical Haar-like features based on integral images were extracted from tongue images. T-tests were performed to determine the ability of extracted features to distinguish between the case and control groups. We first selected features using the common criterion , then conducted further examination of feature characteristics and feature selection using means and standard deviations of distributions in the case and control groups. Results. A total of 115,683 features were selected using the criterion . The maximum area under the receiver operating characteristic curve (AUC) of these features was 0.877. The sensitivity of the feature with the maximum AUC value was 0.800 and specificity was 0.826 when the Youden index was maximized. Features that performed well were concentrated in the tongue root region. Conclusions. Symmetrical Haar-like features enabled discrimination of subjects with and without hyperuricemia in our sample. The locations of these discriminative features were in agreement with the interpretation of tongue appearance in traditional Chinese and Western medicine. Yan Cui, Shizhong Liao, Hongwu Wang, Hongyu Liu, Wenhua Wang, and Liqun Yin Copyright © 2015 Yan Cui et al. All rights reserved. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling Wed, 15 Apr 2015 12:38:00 +0000 An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. Hala Alshamlan, Ghada Badr, and Yousef Alohali Copyright © 2015 Hala Alshamlan et al. All rights reserved. Identification of Novel Breast Cancer Subtype-Specific Biomarkers by Integrating Genomics Analysis of DNA Copy Number Aberrations and miRNA-mRNA Dual Expression Profiling Wed, 15 Apr 2015 11:49:05 +0000 Breast cancer is a heterogeneous disease with well-defined molecular subtypes. Currently, comparative genomic hybridization arrays (aCGH) techniques have been developed rapidly, and recent evidences in studies of breast cancer suggest that tumors within gene expression subtypes share similar DNA copy number aberrations (CNA) which can be used to further subdivide subtypes. Moreover, subtype-specific miRNA expression profiles are also proposed as novel signatures for breast cancer classification. The identification of mRNA or miRNA expression-based breast cancer subtypes is considered an instructive means of prognosis. Here, we conducted an integrated analysis based on copy number aberrations data and miRNA-mRNA dual expression profiling data to identify breast cancer subtype-specific biomarkers. Interestingly, we found a group of genes residing in subtype-specific CNA regions that also display the corresponding changes in mRNAs levels and their target miRNAs’ expression. Among them, the predicted direct correlation of BRCA1-miR-143-miR-145 pairs was selected for experimental validation. The study results indicated that BRCA1 positively regulates miR-143-miR-145 expression and miR-143-miR-145 can serve as promising novel biomarkers for breast cancer subtyping. In our integrated genomics analysis and experimental validation, a new frame to predict candidate biomarkers of breast cancer subtype is provided and offers assistance in order to understand the potential disease etiology of the breast cancer subtypes. Dongguo Li, Hong Xia, Zhen-ya Li, Lin Hua, and Lin Li Copyright © 2015 Dongguo Li et al. All rights reserved. Improving the Understanding of Pathogenesis of Human Papillomavirus 16 via Mapping Protein-Protein Interaction Network Wed, 15 Apr 2015 11:48:17 +0000 The human papillomavirus 16 (HPV16) has high risk to lead various cancers and afflictions, especially, the cervical cancer. Therefore, investigating the pathogenesis of HPV16 is very important for public health. Protein-protein interaction (PPI) network between HPV16 and human was used as a measure to improve our understanding of its pathogenesis. By adopting sequence and topological features, a support vector machine (SVM) model was built to predict new interactions between HPV16 and human proteins. All interactions were comprehensively investigated and analyzed. The analysis indicated that HPV16 enlarged its scope of influence by interacting with human proteins as much as possible. These interactions alter a broad array of cell cycle progression. Furthermore, not only was HPV16 highly prone to interact with hub proteins and bottleneck proteins, but also it could effectively affect a breadth of signaling pathways. In addition, we found that the HPV16 evolved into high carcinogenicity on the condition that its own reproduction had been ensured. Meanwhile, this work will contribute to providing potential new targets for antiviral therapeutics and help experimental research in the future. Yongcheng Dong, Qifan Kuang, Xu Dai, Rong Li, Yiming Wu, Weijia Leng, Yizhou Li, and Menglong Li Copyright © 2015 Yongcheng Dong et al. All rights reserved. Coexpression Pattern Analysis of NPM1-Associated Genes in Chronic Myelogenous Leukemia Wed, 15 Apr 2015 09:10:06 +0000 Background. Nucleophosmin 1 (NPM1) plays an important role in ribosomal synthesis and malignancies, but NPM1 mutations occur rarely in the blast-crisis and chronic-phase chronic myelogenous leukemia (CML) patients. The NPM1-associated gene set (GCM_NPM1), in total 116 genes including NPM1, was chosen as the candidate gene set for the coexpression analysis. We wonder if NPM1-associated genes can affect the ribosomal synthesis and translation process in CML. Results. We presented a distribution-based approach for gene pair classification by identifying a disease-specific cutoff point that classified the coexpressed gene pairs into strong and weak coexpression structures. The differences in the coexpression patterns between the normal and the CML groups were reflected from the overall structure by performing two-sample Kolmogorov-Smirnov test. Our developed method effectively identified the coexpression pattern differences from the overall structure: for the maximum deviation . Moreover, we found that genes involved in the ribosomal synthesis and translation process tended to be coexpressed in the CML group. Conclusion. Our developed method can identify the coexpression difference between two different groups. Dysregulation of ribosomal synthesis and translation process may be related to the CML disease. Our significant findings may provide useful information for the novel CML mechanism exploration and cancer treatment. Fengfeng Wang, Lawrence W. C. Chan, Nancy B. Y. Tsui, S. C. Cesar Wong, Parco M. Siu, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2015 Fengfeng Wang et al. All rights reserved. Prediction of Antifungal Activity of Gemini Imidazolium Compounds Wed, 15 Apr 2015 08:44:15 +0000 The progress of antimicrobial therapy contributes to the development of strains of fungi resistant to antimicrobial drugs. Since cationic surfactants have been described as good antifungals, we present a SAR study of a novel homologous series of 140 bis-quaternary imidazolium chlorides and analyze them with respect to their biological activity against Candida albicans as one of the major opportunistic pathogens causing a wide spectrum of diseases in human beings. We characterize a set of features of these compounds, concerning their structure, molecular descriptors, and surface active properties. SAR study was conducted with the help of the Dominance-Based Rough Set Approach (DRSA), which involves identification of relevant features and relevant combinations of features being in strong relationship with a high antifungal activity of the compounds. The SAR study shows, moreover, that the antifungal activity is dependent on the type of substituents and their position at the chloride moiety, as well as on the surface active properties of the compounds. We also show that molecular descriptors MlogP, HOMO-LUMO gap, total structure connectivity index, and Wiener index may be useful in prediction of antifungal activity of new chemical compounds. Łukasz Pałkowski, Jerzy Błaszczyński, Andrzej Skrzypczak, Jan Błaszczak, Alicja Nowaczyk, Joanna Wróblewska, Sylwia Kożuszko, Eugenia Gospodarek, Roman Słowiński, and Jerzy Krysiński Copyright © 2015 Łukasz Pałkowski et al. All rights reserved. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction Wed, 15 Apr 2015 08:27:29 +0000 It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that , , , and are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future. Hui Liu, Xiaomiao Shi, Dongmei Guo, Zuowei Zhao, and Yimin Copyright © 2015 Hui Liu et al. All rights reserved. Identification of Novel Potential Vaccine Candidates against Tuberculosis Based on Reverse Vaccinology Wed, 15 Apr 2015 08:06:14 +0000 Tuberculosis (TB) is a chronic infectious disease, considered as the second leading cause of death worldwide, caused by Mycobacterium tuberculosis. The limited efficacy of the bacillus Calmette-Guérin (BCG) vaccine against pulmonary TB and the emergence of multidrug-resistant TB warrants the need for more efficacious vaccines. Reverse vaccinology uses the entire proteome of a pathogen to select the best vaccine antigens by in silico approaches. M. tuberculosis H37Rv proteome was analyzed with NERVE (New Enhanced Reverse Vaccinology Environment) prediction software to identify potential vaccine targets; these 331 proteins were further analyzed with VaxiJen for the determination of their antigenicity value. Only candidates with values ≥0.5 of antigenicity and 50% of adhesin probability and without homology with human proteins or transmembrane regions were selected, resulting in 73 antigens. These proteins were grouped by families in seven groups and analyzed by amino acid sequence alignments, selecting 16 representative proteins. For each candidate, a search of the literature and protein analysis with different bioinformatics tools, as well as a simulation of the immune response, was conducted. Finally, we selected six novel vaccine candidates, EsxL, PE26, PPE65, PE_PGRS49, PBP1, and Erp, from M. tuberculosis that can be used to improve or design new TB vaccines. Gloria P. Monterrubio-López, Jorge A. González-Y-Merchand, and Rosa María Ribas-Aparicio Copyright © 2015 Gloria P. Monterrubio-López et al. All rights reserved. Transcriptomic Analysis of mRNAs in Human Monocytic Cells Expressing the HIV-1 Nef Protein and Their Exosomes Wed, 15 Apr 2015 07:57:37 +0000 The Nef protein of human immunodeficiency virus (HIV) promotes viral replication and progression to AIDS. Besides its well-studied effects on intracellular signaling, Nef also functions through its secretion in exosomes, which are nanovesicles containing proteins, microRNAs, and mRNAs and are important for intercellular communication. Nef expression enhances exosome secretion and these exosomes can enter uninfected CD4 T cells leading to apoptotic death. We have recently reported the first miRNome analysis of exosomes secreted from Nef-expressing U937monocytic cells. Here we show genome-wide transcriptome analysis of Nef-expressing U937 cells and their exosomes. We identified four key mRNAs preferentially retained in Nef-expressing cells; these code for MECP2, HMOX1, AARSD1, and ATF2 and are important for chromatin modification and gene expression. Interestingly, their target miRNAs are exported out in exosomes. We also identified three key mRNAs selectively secreted in exosomes from Nef-expressing U937 cells and their corresponding miRNAs being preferentially retained in cells. These are AATK, SLC27A1, and CDKAL and are important in apoptosis and fatty acid transport. Thus, our study identifies selectively expressed mRNAs in Nef-expressing U937 cells and their exosomes and supports a new mode on intercellular regulation by the HIV-1 Nef protein. Madeeha Aqil, Saurav Mallik, Sanghamitra Bandyopadhyay, Ujjwal Maulik, and Shahid Jameel Copyright © 2015 Madeeha Aqil et al. All rights reserved. Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods Wed, 15 Apr 2015 06:14:19 +0000 Glycation is a nonenzymatic process in which proteins react with reducing sugar molecules. The identification of glycation sites in protein may provide guidelines to understand the biological function of protein glycation. In this study, we developed a computational method to predict protein glycation sites by using the support vector machine classifier. The experimental results showed that the prediction accuracy was 85.51% and an overall MCC was 0.70. Feature analysis indicated that the composition of -spaced amino acid pairs feature contributed the most for glycation sites prediction. Yan Liu, Wenxiang Gu, Wenyi Zhang, and Jianan Wang Copyright © 2015 Yan Liu et al. All rights reserved. A Gas Chromatography-Mass Spectrometry Based Study on Urine Metabolomics in Rats Chronically Poisoned with Hydrogen Sulfide Tue, 14 Apr 2015 17:02:10 +0000 Gas chromatography-mass spectrometry (GS-MS) in combination with multivariate statistical analysis was applied to explore the metabolic variability in urine of chronically hydrogen sulfide- (H2S-) poisoned rats relative to control ones. The changes in endogenous metabolites were studied by partial least squares-discriminate analysis (PLS-DA) and independent-samples t-test. The metabolic patterns of H2S-poisoned group are separated from the control, suggesting that the metabolic profiles of H2S-poisoned rats were markedly different from the controls. Moreover, compared to the control group, the level of alanine, d-ribose, tetradecanoic acid, L-aspartic acid, pentanedioic acid, cholesterol, acetate, and oleic acid in rat urine of the poisoning group decreased, while the level of glycine, d-mannose, arabinofuranose, and propanoic acid increased. These metabolites are related to amino acid metabolism as well as energy and lipid metabolism in vivo. Studying metabolomics using GC-MS allows for a comprehensive overview of the metabolism of the living body. This technique can be employed to decipher the mechanism of chronic H2S poisoning, thus promoting the use of metabolomics in clinical toxicology. Mingjie Deng, Meiling Zhang, Fa Sun, Jianshe Ma, Lufeng Hu, Xuezhi Yang, Guanyang Lin, and Xianqin Wang Copyright © 2015 Mingjie Deng et al. All rights reserved. An Integrated Modeling and Experimental Approach to Study the Influence of Environmental Nutrients on Biofilm Formation of Pseudomonas aeruginosa Tue, 14 Apr 2015 16:59:46 +0000 The availability of nutrient components in the environment was identified as a critical regulator of virulence and biofilm formation in Pseudomonas aeruginosa. This work proposes the first systems-biology approach to quantify microbial biofilm formation upon the change of nutrient availability in the environment. Specifically, the change of fluxes of metabolic reactions that were positively associated with P. aeruginosa biofilm formation was used to monitor the trend for P. aeruginosa to form a biofilm. The uptake rates of nutrient components were changed according to the change of the nutrient availability. We found that adding each of the eleven amino acids (Arg, Tyr, Phe, His, Iso, Orn, Pro, Glu, Leu, Val, and Asp) to minimal medium promoted P. aeruginosa biofilm formation. Both modeling and experimental approaches were further developed to quantify P. aeruginosa biofilm formation for four different availability levels for each of the three ions that include ferrous ions, sulfate, and phosphate. The developed modeling approach correctly predicted the amount of biofilm formation. By comparing reaction flux change upon the change of nutrient concentrations, metabolic reactions used by P. aeruginosa to regulate its biofilm formation are mainly involved in arginine metabolism, glutamate production, magnesium transport, acetate metabolism, and the TCA cycle. Zhaobin Xu, Sabina Islam, Thomas K. Wood, and Zuyi Huang Copyright © 2015 Zhaobin Xu et al. All rights reserved. Computer-Simulated Biopsy Marking System for Endoscopic Surveillance of Gastric Lesions: A Pilot Study Tue, 14 Apr 2015 16:57:39 +0000 Endoscopic tattoo with India ink injection for surveillance of premalignant gastric lesions is technically cumbersome and may not be durable. The aim of the study is to evaluate the accuracy of a novel, computer-simulated biopsy marking system (CSBMS) developed for the endoscopic marking of gastric lesions. Twenty-five patients with history of gastric intestinal metaplasia received both CSBMS-guided marking and India ink injection in five points in the stomach at index endoscopy. A second endoscopy was performed at three months. Primary outcome was accuracy of CSBMS (distance between CSBMS probe-guided site and tattoo site measured by CSBMS). The mean accuracy of CSBMS at angularis was  mm, antral lesser curvature  mm, antral greater curvature  mm, antral anterior wall  mm, and antral posterior wall  mm. CSBMS ( versus seconds; ) required less procedure time compared to endoscopic tattooing. No adverse events were encountered. CSBMS accurately identified previously marked gastric sites by endoscopic tattooing within 1 cm on follow-up endoscopy. Weiling Hu, Bin Wang, Leimin Sun, Shujie Chen, Liangjing Wang, Kan Wang, Jiaguo Wu, John J. Kim, Jiquan Liu, Ning Dai, Huilong Duan, and Jianmin Si Copyright © 2015 Weiling Hu et al. All rights reserved. Intuitive Web-Based Experimental Design for High-Throughput Biomedical Data Tue, 14 Apr 2015 11:18:12 +0000 Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model. Andreas Friedrich, Erhan Kenar, Oliver Kohlbacher, and Sven Nahnsen Copyright © 2015 Andreas Friedrich et al. All rights reserved. A Method for Generating New Datasets Based on Copy Number for Cancer Analysis Wed, 08 Apr 2015 12:22:40 +0000 New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer. Shinuk Kim, Mark Kon, and Hyunsik Kang Copyright © 2015 Shinuk Kim et al. All rights reserved. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process Mon, 06 Apr 2015 12:30:15 +0000 Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders. A. Mesut Erzurumluoglu, Santiago Rodriguez, Hashem A. Shihab, Denis Baird, Tom G. Richardson, Ian N. M. Day, and Tom R. Gaunt Copyright © 2015 A. Mesut Erzurumluoglu et al. All rights reserved. Genetic Polymorphism in Extracellular Regulators of Wnt Signaling Pathway Sun, 05 Apr 2015 10:46:55 +0000 The Wnt signaling pathway is mediated by a family of secreted glycoproteins through canonical and noncanonical mechanism. The signaling pathways are regulated by various modulators, which are classified into two classes on the basis of their interaction with either Wnt or its receptors. Secreted frizzled-related proteins (sFRPs) are the member of class that binds to Wnt protein and antagonizes Wnt signaling pathway. The other class consists of Dickkopf (DKK) proteins family that binds to Wnt receptor complex. The present review discusses the disease related association of various polymorphisms in Wnt signaling modulators. Furthermore, this review also highlights that some of the sFRPs and DKKs are unable to act as an antagonist for Wnt signaling pathway and thus their function needs to be explored more extensively. Garima Sharma, Ashish Ranjan Sharma, Eun-Min Seo, and Ju-Suk Nam Copyright © 2015 Garima Sharma et al. All rights reserved. Integrative Analysis of CRISPR/Cas9 Target Sites in the Human HBB Gene Tue, 31 Mar 2015 09:39:46 +0000 Recently, the clustered regularly interspaced short palindromic repeats (CRISPR) system has emerged as a powerful customizable artificial nuclease to facilitate precise genetic correction for tissue regeneration and isogenic disease modeling. However, previous studies reported substantial off-target activities of CRISPR system in human cells, and the enormous putative off-target sites are labor-intensive to be validated experimentally, thus motivating bioinformatics methods for rational design of CRISPR system and prediction of its potential off-target effects. Here, we describe an integrative analytical process to identify specific CRISPR target sites in the human β-globin gene (HBB) and predict their off-target effects. Our method includes off-target analysis in both coding and noncoding regions, which was neglected by previous studies. It was found that the CRISPR target sites in the introns have fewer off-target sites in the coding regions than those in the exons. Remarkably, target sites containing certain transcriptional factor motif have enriched binding sites of relevant transcriptional factor in their off-target sets. We also found that the intron sites have fewer SNPs, which leads to less variation of CRISPR efficiency in different individuals during clinical applications. Our studies provide a standard analytical procedure to select specific CRISPR targets for genetic correction. Yumei Luo, Detu Zhu, Zhizhuo Zhang, Yaoyong Chen, and Xiaofang Sun Copyright © 2015 Yumei Luo et al. All rights reserved. In Silico Search of Energy Metabolism Inhibitors for Alternative Leishmaniasis Treatments Mon, 30 Mar 2015 13:56:04 +0000 Leishmaniasis is a complex disease that affects mammals and is caused by approximately 20 distinct protozoa from the genus Leishmania. Leishmaniasis is an endemic disease that exerts a large socioeconomic impact on poor and developing countries. The current treatment for leishmaniasis is complex, expensive, and poorly efficacious. Thus, there is an urgent need to develop more selective, less expensive new drugs. The energy metabolism pathways of Leishmania include several interesting targets for specific inhibitors. In the present study, we sought to establish which energy metabolism enzymes in Leishmania could be targets for inhibitors that have already been approved for the treatment of other diseases. We were able to identify 94 genes and 93 Leishmania energy metabolism targets. Using each gene’s designation as a search criterion in the TriTrypDB database, we located the predicted peptide sequences, which in turn were used to interrogate the DrugBank, Therapeutic Target Database (TTD), and PubChem databases. We identified 44 putative targets of which 11 are predicted to be amenable to inhibition by drugs which have already been approved for use in humans for 11 of these targets. We propose that these drugs should be experimentally tested and potentially used in the treatment of leishmaniasis. Lourival A. Silva, Marina C. Vinaud, Ana Maria Castro, Pedro Vítor L. Cravo, and José Clecildo B. Bezerra Copyright © 2015 Lourival A. Silva et al. All rights reserved. Evaluation and Application of the Strand-Specific Protocol for Next-Generation Sequencing Sun, 29 Mar 2015 07:06:19 +0000 Next-generation sequencing (NGS) has become a powerful sequencing tool, applied in a wide range of biological studies. However, the traditional sample preparation protocol for NGS is non-strand-specific (NSS), leading to biased estimates of expression for transcripts overlapped at the antisense strand. Strand-specific (SS) protocols have recently been developed. In this study, we prepared the same RNA sample by using the SS and NSS protocols, followed by sequencing with Illumina HiSeq platform. Using real-time quantitative PCR as a standard, we first proved that the SS protocol more precisely estimates gene expressions compared with the NSS protocol, particularly for those overlapped at the antisense strand. In addition, we also showed that the sequence reads from the SS protocol are comparable with those from conventional NSS protocols in many aspects. Finally, we also mapped a fraction of sequence reads back to the antisense strand of the known genes, originally without annotated genes located. Using sequence assembly and PCR validation, we succeeded in identifying and characterizing the novel antisense genes. Our results show that the SS protocol performs more accurately than the traditional NSS protocol and can be applied in future studies. Kuo-Wang Tsai, Bill Chang, Cheng-Tsung Pan, Wei-Chen Lin, Ting-Wen Chen, and Sung-Chou Li Copyright © 2015 Kuo-Wang Tsai et al. All rights reserved. Distributed Artificial Intelligence Models for Knowledge Discovery in Bioinformatics Wed, 25 Mar 2015 13:17:46 +0000 Juan M. Corchado, Isabelle Bichindaritz, and Juan F. De Paz Copyright © 2015 Juan M. Corchado et al. All rights reserved. A Linear-RBF Multikernel SVM to Classify Big Text Corpora Mon, 23 Mar 2015 08:13:54 +0000 Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. R. Romero, E. L. Iglesias, and L. Borrajo Copyright © 2015 R. Romero et al. All rights reserved. A Network Flow Approach to Predict Protein Targets and Flavonoid Backbones to Treat Respiratory Syncytial Virus Infection Mon, 23 Mar 2015 06:08:56 +0000 Background. Respiratory syncytial virus (RSV) infection is the major cause of respiratory disease in lower respiratory tract in infants and young children. Attempts to develop effective vaccines or pharmacological treatments to inhibit RSV infection without undesired effects on human health have been unsuccessful. However, RSV infection has been reported to be affected by flavonoids. The mechanisms underlying viral inhibition induced by these compounds are largely unknown, making the development of new drugs difficult. Methods. To understand the mechanisms induced by flavonoids to inhibit RSV infection, a systems pharmacology-based study was performed using microarray data from primary culture of human bronchial cells infected by RSV, together with compound-proteomic interaction data available for Homo sapiens. Results. After an initial evaluation of 26 flavonoids, 5 compounds (resveratrol, quercetin, myricetin, apigenin, and tricetin) were identified through topological analysis of a major chemical-protein (CP) and protein-protein interacting (PPI) network. In a nonclustered form, these flavonoids regulate directly the activity of two protein bottlenecks involved in inflammation and apoptosis. Conclusions. Our findings may potentially help uncovering mechanisms of action of early RSV infection and provide chemical backbones and their protein targets in the difficult quest to develop new effective drugs. José Eduardo Vargas, Renato Puga, Joice de Faria Poloni, Luis Fernando Saraiva Macedo Timmers, Barbara Nery Porto, Osmar Norberto de Souza, Diego Bonatto, Paulo Márcio Condessa Pitrez, and Renato Tetelbom Stein Copyright © 2015 José Eduardo Vargas et al. All rights reserved. Identification of Novel Thyroid Cancer-Related Genes and Chemicals Using Shortest Path Algorithm Sun, 22 Mar 2015 11:26:51 +0000 Thyroid cancer is a typical endocrine malignancy. In the past three decades, the continued growth of its incidence has made it urgent to design effective treatments to treat this disease. To this end, it is necessary to uncover the mechanism underlying this disease. Identification of thyroid cancer-related genes and chemicals is helpful to understand the mechanism of thyroid cancer. In this study, we generalized some previous methods to discover both disease genes and chemicals. The method was based on shortest path algorithm and applied to discover novel thyroid cancer-related genes and chemicals. The analysis of the final obtained genes and chemicals suggests that some of them are crucial to the formation and development of thyroid cancer. It is indicated that the proposed method is effective for the discovery of novel disease genes and chemicals. Yang Jiang, Peiwei Zhang, Li-Peng Li, Yi-Chun He, Ru-jian Gao, and Yu-Fei Gao Copyright © 2015 Yang Jiang et al. All rights reserved. A Meta-Analysis Strategy for Gene Prioritization Using Gene Expression, SNP Genotype, and eQTL Data Sun, 22 Mar 2015 10:56:57 +0000 In order to understand disease pathogenesis, improve medical diagnosis, or discover effective drug targets, it is important to identify significant genes deeply involved in human disease. For this purpose, many earlier approaches attempted to prioritize candidate genes using gene expression profiles or SNP genotype data, but they often suffer from producing many false-positive results. To address this issue, in this paper, we propose a meta-analysis strategy for gene prioritization that employs three different genetic resources—gene expression data, single nucleotide polymorphism (SNP) genotype data, and expression quantitative trait loci (eQTL) data—in an integrative manner. For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources. This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes. Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources. Jingmin Che and Miyoung Shin Copyright © 2015 Jingmin Che and Miyoung Shin. All rights reserved. Analysis of Environmental Stress Factors Using an Artificial Growth System and Plant Fitness Optimization Sun, 22 Mar 2015 10:35:00 +0000 The environment promotes evolution. Evolutionary processes represent environmental adaptations over long time scales; evolution of crop genomes is not inducible within the relatively short time span of a human generation. Extreme environmental conditions can accelerate evolution, but such conditions are often stress inducing and disruptive. Artificial growth systems can be used to induce and select genomic variation by changing external environmental conditions, thus, accelerating evolution. By using cloud computing and big-data analysis, we analyzed environmental stress factors for Pleurotus ostreatus by assessing, evaluating, and predicting information of the growth environment. Through the indexing of environmental stress, the growth environment can be precisely controlled and developed into a technology for improving crop quality and production. Meonghun Lee and Hyun Yoe Copyright © 2015 Meonghun Lee and Hyun Yoe. All rights reserved. Agent-Based Spatiotemporal Simulation of Biomolecular Systems within the Open Source MASON Framework Sun, 22 Mar 2015 10:04:24 +0000 Agent-based modelling is being used to represent biological systems with increasing frequency and success. This paper presents the implementation of a new tool for biomolecular reaction modelling in the open source Multiagent Simulator of Neighborhoods framework. The rationale behind this new tool is the necessity to describe interactions at the molecular level to be able to grasp emergent and meaningful biological behaviour. We are particularly interested in characterising and quantifying the various effects that facilitate biocatalysis. Enzymes may display high specificity for their substrates and this information is crucial to the engineering and optimisation of bioprocesses. Simulation results demonstrate that molecule distributions, reaction rate parameters, and structural parameters can be adjusted separately in the simulation allowing a comprehensive study of individual effects in the context of realistic cell environments. While higher percentage of collisions with occurrence of reaction increases the affinity of the enzyme to the substrate, a faster reaction (i.e., turnover number) leads to a smaller number of time steps. Slower diffusion rates and molecular crowding (physical hurdles) decrease the collision rate of reactants, hence reducing the reaction rate, as expected. Also, the random distribution of molecules affects the results significantly. Gael Pérez-Rodríguez, Martín Pérez-Pérez, Daniel Glez-Peña, Florentino Fdez-Riverola, Nuno F. Azevedo, and Anália Lourenço Copyright © 2015 Gael Pérez-Rodríguez et al. All rights reserved. Using the eServices Platform for Detecting Behavior Patterns Deviation in the Elderly Assisted Living: A Case Study Sun, 22 Mar 2015 09:33:56 +0000 World’s aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly’s behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly’s interaction in a service provider platform, called eServices—Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. This pattern recognition will further activate the adequate response. With the system evolution, it will learn to predict potential danger situations for a specified user, acting preventively and ensuring the elderly’s safety and well-being. As the eServices platform is still in development, synthetic data, based on real data sample and empiric knowledge, is being used to populate the initial dataset. The presented work is a proof of concept of knowledge extraction using the eServices platform information. Regardless of not using real data, this work proves to be an asset, achieving a good performance in preventing alert situations. Isabel Marcelino, David Lopes, Michael Reis, Fernando Silva, Rosalía Laza, and António Pereira Copyright © 2015 Isabel Marcelino et al. All rights reserved. A Distributed Multiagent System Architecture for Body Area Networks Applied to Healthcare Monitoring Sun, 22 Mar 2015 09:23:02 +0000 In the last years the area of health monitoring has grown significantly, attracting the attention of both academia and commercial sectors. At the same time, the availability of new biomedical sensors and suitable network protocols has led to the appearance of a new generation of wireless sensor networks, the so-called wireless body area networks. Nowadays, these networks are routinely used for continuous monitoring of vital parameters, movement, and the surrounding environment of people, but the large volume of data generated in different locations represents a major obstacle for the appropriate design, development, and deployment of more elaborated intelligent systems. In this context, we present an open and distributed architecture based on a multiagent system for recognizing human movements, identifying human postures, and detecting harmful activities. The proposed system evolved from a single node for fall detection to a multisensor hardware solution capable of identifying unhampered falls and analyzing the users’ movement. The experiments carried out contemplate two different scenarios and demonstrate the accuracy of our proposal as a real distributed movement monitoring and accident detection system. Moreover, we also characterize its performance, enabling future analyses and comparisons with similar approaches. Filipe Felisberto, Rosalía Laza, Florentino Fdez-Riverola, and António Pereira Copyright © 2015 Filipe Felisberto et al. All rights reserved. RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases Sun, 22 Mar 2015 09:18:34 +0000 High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available. Joel Perdiz Arrais and José Luís Oliveira Copyright © 2015 Joel Perdiz Arrais and José Luís Oliveira. All rights reserved. Probabilistic Inference of Biological Networks via Data Integration Sun, 22 Mar 2015 09:02:27 +0000 There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%. Mark F. Rogers, Colin Campbell, and Yiming Ying Copyright © 2015 Mark F. Rogers et al. All rights reserved. aCGH-MAS: Analysis of aCGH by means of Multiagent System Sun, 22 Mar 2015 08:55:39 +0000 There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. Juan F. De Paz, Rocío Benito, Javier Bajo, Ana Eugenia Rodríguez, and María Abáigar Copyright © 2015 Juan F. De Paz et al. All rights reserved. Gene Knockout Identification Using an Extension of Bees Hill Flux Balance Analysis Sun, 22 Mar 2015 08:47:09 +0000 Microbial strain optimisation for the overproduction of a desired phenotype has been a popular topic in recent years. Gene knockout is a genetic engineering technique that can modify the metabolism of microbial cells to obtain desirable phenotypes. Optimisation algorithms have been developed to identify the effects of gene knockout. However, the complexities of metabolic networks have made the process of identifying the effects of genetic modification on desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to a combinatorial problem in obtaining optimal gene knockout. The computational time increases exponentially as the size of the problem increases. This work reports an extension of Bees Hill Flux Balance Analysis (BHFBA) to identify optimal gene knockouts to maximise the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by integrating OptKnock into BHFBA for validating the results automatically. The results show that the extension of BHFBA is suitable, reliable, and applicable in predicting gene knockout. Through several experiments conducted on Escherichia coli, Bacillus subtilis, and Clostridium thermocellum as model organisms, extension of BHFBA has shown better performance in terms of computational time, stability, growth rate, and production yield of desired phenotypes. Yee Wen Choon, Mohd Saberi Mohamad, Safaai Deris, Chuii Khim Chong, Sigeru Omatu, and Juan Manuel Corchado Copyright © 2015 Yee Wen Choon et al. All rights reserved. Modelling the Longevity of Dental Restorations by means of a CBR System Thu, 19 Mar 2015 14:23:43 +0000 The lifespan of dental restorations is limited. Longevity depends on the material used and the different characteristics of the dental piece. However, it is not always the case that the best and longest lasting material is used since patients may prefer different treatments according to how noticeable the material is. Over the last 100 years, the most commonly used material has been silver amalgam, which, while very durable, is somewhat aesthetically displeasing. Our study is based on the collection of data from the charts, notes, and radiographic information of restorative treatments performed by Dr. Vera in 1993, the analysis of the information by computer artificial intelligence to determine the most appropriate restoration, and the monitoring of the evolution of the dental restoration. The data will be treated confidentially according to the Organic Law 15/1999 on 13 December on the Protection of Personal Data. This paper also presents a clustering technique capable of identifying the most significant cases with which to instantiate the case-base. In order to classify the cases, a mixture of experts is used which incorporates a Bayesian network and a multilayer perceptron; the combination of both classifiers is performed with a neural network. Ignacio J. Aliaga, Vicente Vera, Juan F. De Paz, Alvaro E. García, and Mohd Saberi Mohamad Copyright © 2015 Ignacio J. Aliaga et al. All rights reserved. Bladder Carcinoma Data with Clinical Risk Factors and Molecular Markers: A Cluster Analysis Thu, 19 Mar 2015 13:41:50 +0000 Bladder cancer occurs in the epithelial lining of the urinary bladder and is amongst the most common types of cancer in humans, killing thousands of people a year. This paper is based on the hypothesis that the use of clinical and histopathological data together with information about the concentration of various molecular markers in patients is useful for the prediction of outcomes and the design of treatments of nonmuscle invasive bladder carcinoma (NMIBC). A population of 45 patients with a new diagnosis of NMIBC was selected. Patients with benign prostatic hyperplasia (BPH), muscle invasive bladder carcinoma (MIBC), carcinoma in situ (CIS), and NMIBC recurrent tumors were not included due to their different clinical behavior. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT) and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR). A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior. Enrique Redondo-Gonzalez, Leandro Nunes de Castro, Jesús Moreno-Sierra, María Luisa Maestro de las Casas, Vicente Vera-Gonzalez, Daniel Gomes Ferrari, and Juan Manuel Corchado Copyright © 2015 Enrique Redondo-Gonzalez et al. All rights reserved. The Plant Growth-Promoting Bacteria Azospirillum amazonense: Genomic Versatility and Phytohormone Pathway Thu, 19 Mar 2015 12:07:19 +0000 The rhizosphere bacterium Azospirillum amazonense associates with plant roots to promote plant growth. Variation in replicon numbers and rearrangements is common among Azospirillum strains, and characterization of these naturally occurring differences can improve our understanding of genome evolution. We performed an in silico comparative genomic analysis to understand the genomic plasticity of A. amazonense. The number of A. amazonense-specific coding sequences was similar when compared with the six closely related bacteria regarding belonging or not to the Azospirillum genus. Our results suggest that the versatile gene repertoire found in A. amazonense genome could have been acquired from distantly related bacteria from horizontal transfer. Furthermore, the identification of coding sequence related to phytohormone production, such as flavin-monooxygenase and aldehyde oxidase, is likely to represent the tryptophan-dependent TAM pathway for auxin production in this bacterium. Moreover, the presence of the coding sequence for nitrilase indicates the presence of the alternative route that uses IAN as an intermediate for auxin synthesis, but it remains to be established whether the IAN pathway is the Trp-independent route. Future investigations are necessary to support the hypothesis that its genomic structure has evolved to meet the requirement for adaptation to the rhizosphere and interaction with host plants. Ricardo Cecagno, Tiago Ebert Fritsch, and Irene Silveira Schrank Copyright © 2015 Ricardo Cecagno et al. All rights reserved. Identification of Subtype Specific miRNA-mRNA Functional Regulatory Modules in Matched miRNA-mRNA Expression Data: Multiple Myeloma as a Case Thu, 19 Mar 2015 11:44:23 +0000 Identification of miRNA-mRNA modules is an important step to elucidate their combinatorial effect on the pathogenesis and mechanisms underlying complex diseases. Current identification methods primarily are based upon miRNA-target information and matched miRNA and mRNA expression profiles. However, for heterogeneous diseases, the miRNA-mRNA regulatory mechanisms may differ between subtypes, leading to differences in clinical behavior. In order to explore the pathogenesis of each subtype, it is important to identify subtype specific miRNA-mRNA modules. In this study, we integrated the Ping-Pong algorithm and multiobjective genetic algorithm to identify subtype specific miRNA-mRNA functional regulatory modules (MFRMs) through integrative analysis of three biological data sets: GO biological processes, miRNA target information, and matched miRNA and mRNA expression data. We applied our method on a heterogeneous disease, multiple myeloma (MM), to identify MM subtype specific MFRMs. The constructed miRNA-mRNA regulatory networks provide modular outlook at subtype specific miRNA-mRNA interactions. Furthermore, clustering analysis demonstrated that heterogeneous MFRMs were able to separate corresponding MM subtypes. These subtype specific MFRMs may aid in the further elucidation of the pathogenesis of each subtype and may serve to guide MM subtype diagnosis and treatment. Yunpeng Zhang, Wei Liu, Yanjun Xu, Chunquan Li, Yingying Wang, Haixiu Yang, Chunlong Zhang, Fei Su, Yixue Li, and Xia Li Copyright © 2015 Yunpeng Zhang et al. All rights reserved. Shaped Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Drosophila Embryo Thu, 19 Mar 2015 10:25:53 +0000 In recent years, with the development of automated microscopy technologies, the volume and complexity of image data on gene expression have increased tremendously. The only way to analyze quantitatively and comprehensively such biological data is by developing and applying new sophisticated mathematical approaches. Here, we present extensions of 2D singular spectrum analysis (2D-SSA) for application to 2D and 3D datasets of embryo images. These extensions, circular and shaped 2D-SSA, are applied to gene expression in the nuclear layer just under the surface of the Drosophila (fruit fly) embryo. We consider the commonly used cylindrical projection of the ellipsoidal Drosophila embryo. We demonstrate how circular and shaped versions of 2D-SSA help to decompose expression data into identifiable components (such as trend and noise), as well as separating signals from different genes. Detection and improvement of under- and overcorrection in multichannel imaging is addressed, as well as the extraction and analysis of 3D features in 3D gene expression patterns. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Effect of Celastrol on Growth Inhibition of Prostate Cancer Cells through the Regulation of hERG Channel In Vitro Thu, 19 Mar 2015 10:23:31 +0000 Objective. To explore the antiprostate cancer effects of Celastrol on prostate cancer cells’ proliferation, apoptosis, and cell cycle distribution, as well as the correlation to the regulation of hERG. Methods. DU145 cells were treated with various concentrations of Celastrol (0.25–16.0 μmol/L) for 0–72 hours. MTT assay was used to evaluate the inhibition effect of Celastrol on the growth of DU145 cells. Cell apoptosis was detected through both Annexin-V FITC/PI double-labeled cytometry and Hoechst 33258. Cell cycle regulation was examined by a propidium iodide method. Western blot and RT-PCR technologies were applied to assess the expression level of hERG in DU145 cells. Results. Celastrol presented striking growth inhibition and apoptosis induction potency on DU145 cells in vitro in a time- and dose-dependent manner. The IC50 value of Celastrol for 24 hours was 2.349 ± 0.213 μmol/L. Moreover, Celastrol induced DU145 cell apoptosis in a cell cycle-dependent manner, which means Celastrol could arrest DU145 cells in G0/G1 phase; accordingly, cells in S phase decreased gradually and no obvious changes were found in G2/M phase cells. Through transmission electron microscope, apoptotic bodies containing nuclear fragments were found in Celastrol-treated DU145 cells. Overexpression of hERG channel was found in DU145 cells, while Celastrol could downregulate it at both protein and mRNA level in a dose-dependent manner (). Conclusions. Celastrol exhibits its antiprostate cancer effects partially through the downregulation of the expression level of hERG channel in DU145 cells, suggesting that Celastrol may be a potential agent against prostate cancer with a mechanism of blocking the hERG channel. Nan Ji, Jinjun Li, Zexiong Wei, Fanhu Kong, Hongyan Jin, Xiaoya Chen, Yan Li, and Youping Deng Copyright © 2015 Nan Ji et al. All rights reserved. The Construction of Common and Specific Significance Subnetworks of Alzheimer’s Disease from Multiple Brain Regions Thu, 19 Mar 2015 09:58:56 +0000 Alzheimer’s disease (AD) is a progressively and fatally neurodegenerative disorder and leads to irreversibly cognitive and memorial damage in different brain regions. The identification and analysis of the dysregulated pathways and subnetworks among affected brain regions will provide deep insights for the pathogenetic mechanism of AD. In this paper, commonly and specifically significant subnetworks were identified from six AD brain regions. Protein-protein interaction (PPI) data were integrated to add molecular biological information to construct the functional modules of six AD brain regions by Heinz algorithm. Then, the simulated annealing algorithm based on edge weight is applied to predicting and optimizing the maximal scoring networks for common and specific genes, respectively, which can remove the weak interactions and add the prediction of strong interactions to increase the accuracy of the networks. The identified common subnetworks showed that inflammation of the brain nerves is one of the critical factors of AD and calcium imbalance may be a link among several causative factors in AD pathogenesis. In addition, the extracted specific subnetworks for each brain region revealed many biologically functional mechanisms to understand AD pathogenesis. Wei Kong, Xiaoyang Mou, Na Zhang, Weiming Zeng, Shasha Li, and Yang Yang Copyright © 2015 Wei Kong et al. All rights reserved. The Expression and Distributions of ANP32A in the Developing Brain Thu, 19 Mar 2015 09:34:47 +0000 Acidic (leucine-rich) nuclear phosphoprotein 32 family, member A (ANP32A), has multiple functions involved in neuritogenesis, transcriptional regulation, and apoptosis. However, whether ANP32A has an effect on the mammalian developing brain is still in question. In this study, it was shown that brain was the organ that expressed the most abundant ANP32A by human multiple tissue expression (MTE) array. The distribution of ANP32A in the different adult brain areas was diverse dramatically, with high expression in cerebellum, temporal lobe, and cerebral cortex and with low expression in pons, medulla oblongata, and spinal cord. The expression of ANP32A was higher in the adult brain than in the fetal brain of not only humans but also mice in a time-dependent manner. ANP32A signals were dispersed accordantly in embryonic mouse brain. However, ANP32A was abundant in the granular layer of the cerebellum and the cerebral cortex when the mice were growing up, as well as in the Purkinje cells of the cerebellum. The variation of expression levels and distribution of ANP32A in the developing brain would imply that ANP32A may play an important role in mammalian brain development, especially in the differentiation and function of neurons in the cerebellum and the cerebral cortex. Shanshan Wang, Yunliang Wang, Qingshan Lu, Xinshan Liu, Fuyu Wang, Xiaodong Ma, Chunping Cui, Chenghe Shi, Jinfeng Li, and Dajin Zhang Copyright © 2015 Shanshan Wang et al. All rights reserved. Protecting Intestinal Epithelial Cell Number 6 against Fission Neutron Irradiation through NF-κB Signaling Pathway Thu, 19 Mar 2015 09:13:32 +0000 The purpose of this paper is to explore the change of NF-κB signaling pathway in intestinal epithelial cell induced by fission neutron irradiation and the influence of the PI3K/Akt pathway inhibitor LY294002. Three groups of IEC-6 cell lines were given: control group, neutron irradiation of 4Gy group, and neutron irradiation of 4Gy with LY294002 treatment group. Except the control group, the other groups were irradiated by neutron of 4Gy. LY294002 was given before 24 hours of neutron irradiation. At 6 h and 24 h after neutron irradiation, the morphologic changes, proliferation ability, apoptosis, and necrosis rates of the IEC-6 cell lines were assayed and the changes of NF-κB and PI3K/Akt pathway were detected. At 6 h and 24 h after neutron irradiation of 4Gy, the proliferation ability of the IEC-6 cells decreased and lots of apoptotic and necrotic cells were found. The injuries in LY294002 treatment and neutron irradiation group were more serious than those in control and neutron irradiation groups. The results suggest that IEC-6 cells were obviously damaged and induced serious apoptosis and necrosis by neutron irradiation of 4Gy; the NF-κB signaling pathway in IEC-6 was activated by neutron irradiation which could protect IEC-6 against injury by neutron irradiation; LY294002 could inhibit the activity of IEC-6 cells. Gong-Min Chang, Ya-Bing Gao, Shui-Ming Wang, Xin-Ping Xu, Li Zhao, Jing Zhang, Jin-Feng Li, Yun-Liang Wang, and Rui-Yun Peng Copyright © 2015 Gong-Min Chang et al. All rights reserved. Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms Tue, 17 Mar 2015 13:03:24 +0000 Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues’s method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues’s method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis. Chien-Hung Huang, Huai-Shun Peng, and Ka-Lok Ng Copyright © 2015 Chien-Hung Huang et al. All rights reserved. Prediction of Drug Indications Based on Chemical Interactions and Chemical Similarities Mon, 02 Mar 2015 09:49:30 +0000 Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs. Guohua Huang, Yin Lu, Changhong Lu, Mingyue Zheng, and Yu-Dong Cai Copyright © 2015 Guohua Huang et al. All rights reserved. Predicting the Functions of Long Noncoding RNAs Using RNA-Seq Based on Bayesian Network Sat, 28 Feb 2015 07:50:45 +0000 Long noncoding RNAs (lncRNAs) have been shown to play key roles in various biological processes. However, functions of most lncRNAs are poorly characterized. Here, we represent a framework to predict functions of lncRNAs through construction of a regulatory network between lncRNAs and protein-coding genes. Using RNA-seq data, the transcript profiles of lncRNAs and protein-coding genes are constructed. Using the Bayesian network method, a regulatory network, which implies dependency relations between lncRNAs and protein-coding genes, was built. In combining protein interaction network, highly connected coding genes linked by a given lncRNA were subsequently used to predict functions of the lncRNA through functional enrichment. Application of our method to prostate RNA-seq data showed that 762 lncRNAs in the constructed regulatory network were assigned functions. We found that lncRNAs are involved in diverse biological processes, such as tissue development or embryo development (e.g., nervous system development and mesoderm development). By comparison with functions inferred using the neighboring gene-based method and functions determined using lncRNA knockdown experiments, our method can provide comparable predicted functions of lncRNAs. Overall, our method can be applied to emerging RNA-seq data, which will help researchers identify complex relations between lncRNAs and coding genes and reveal important functions of lncRNAs. Yun Xiao, Yanling Lv, Hongying Zhao, Yonghui Gong, Jing Hu, Feng Li, Jinyuan Xu, Jing Bai, Fulong Yu, and Xia Li Copyright © 2015 Yun Xiao et al. All rights reserved. ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function Wed, 25 Feb 2015 13:26:55 +0000 Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. Robson da Silva Lopes, Walas Jhony Lopes Moraes, Thiago de Souza Rodrigues, and Daniella Castanheira Bartholomeu Copyright © 2015 Robson da Silva Lopes et al. All rights reserved. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity Mon, 23 Feb 2015 07:09:56 +0000 This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity. Xin Yi Ng, Bakhtiar Affendi Rosdi, and Shahriza Shahrudin Copyright © 2015 Xin Yi Ng et al. All rights reserved. Novel Candidate Key Drivers in the Integrative Network of Genes, MicroRNAs, Methylations, and Copy Number Variations in Squamous Cell Lung Carcinoma Mon, 23 Feb 2015 07:03:50 +0000 The mechanisms of lung cancer are highly complex. Not only mRNA gene expression but also microRNAs, DNA methylation, and copy number variation (CNV) play roles in tumorigenesis. It is difficult to incorporate so much information into a single model that can comprehensively reflect all these lung cancer mechanisms. In this study, we analyzed the 129 TCGA (The Cancer Genome Atlas) squamous cell lung carcinoma samples with gene expression, microRNA expression, DNA methylation, and CNV data. First, we used variance inflation factor (VIF) regression to build the whole genome integrative network. Then, we isolated the lung cancer subnetwork by identifying the known lung cancer genes and their direct regulators. This subnetwork was refined by the Bayesian method, and the directed regulations among mRNA genes, microRNAs, methylations, and CNVs were obtained. The novel candidate key drivers in this refined subnetwork, such as the methylation of ARHGDIB and HOXD3, microRNA let-7a and miR-31, and the CNV of AGAP2, were identified and analyzed. On three large public available lung cancer datasets, the key drivers ARHGDIB and HOXD3 demonstrated significant associations with the overall survival of lung cancer patients. Our results provide new insights into lung cancer mechanisms. Tao Huang, Jing Yang, and Yu-dong Cai Copyright © 2015 Tao Huang et al. All rights reserved. A miRNA-Driven Inference Model to Construct Potential Drug-Disease Associations for Drug Repositioning Thu, 19 Feb 2015 10:16:58 +0000 Increasing evidence discovered that the inappropriate expression of microRNAs (miRNAs) will lead to many kinds of complex diseases and drugs can regulate the expression level of miRNAs. Therefore human diseases may be treated by targeting some specific miRNAs with drugs, which provides a new perspective for drug repositioning. However, few studies have attempted to computationally predict associations between drugs and diseases via miRNAs for drug repositioning. In this paper, we developed an inference model to achieve this aim by combining experimentally supported drug-miRNA associations and miRNA-disease associations with the assumption that drugs will form associations with diseases when they share some significant miRNA partners. Experimental results showed excellent performance of our model. Case studies demonstrated that some of the strongly predicted drug-disease associations can be confirmed by the publicly accessible database CTD (, which indicated the usefulness of our inference model. Moreover, candidate miRNAs as molecular hypotheses underpinning the associations were listed to guide future experiments. The predicted results were released for further studies. We expect that this study will provide help in our understanding of drug-disease association prediction and in the roles of miRNAs in drug repositioning. Hailin Chen and Zuping Zhang Copyright © 2015 Hailin Chen and Zuping Zhang. All rights reserved. Prediction of Protein-Protein Interactions Related to Protein Complexes Based on Protein Interaction Networks Tue, 03 Feb 2015 13:32:51 +0000 A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptive k-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology. Peng Liu, Lei Yang, Daming Shi, and Xianglong Tang Copyright © 2015 Peng Liu et al. All rights reserved. Novel Numerical Characterization of Protein Sequences Based on Individual Amino Acid and Its Application Mon, 02 Feb 2015 13:44:51 +0000 The hydrophobicity and hydrophilicity of amino acids play a very important role in protein folding and its interaction with the environment and other molecules, as well as its catalytic mechanism. Based on the two physicochemical indexes, a 2D graphical representation of protein sequences is introduced; meanwhile, a new numerical characteristic has been proposed to compute the distance of different sequences for analysis of sequence similarity/dissimilarity on the basis of this graphical representation. Furthermore, we apply the new distance in the similarities/dissimilarities of ND5 proteins of nine species and predict the four major classes based on the dataset containing 639 domains. The results show that the method is simple and effective. Yan-ping Zhang, Ya-jun Sheng, Wei Zheng, Ping-an He, and Ji-shuo Ruan Copyright © 2015 Yan-ping Zhang et al. All rights reserved. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm Mon, 02 Feb 2015 06:51:40 +0000 The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis. Jianning Wu and Bin Wu Copyright © 2015 Jianning Wu and Bin Wu. All rights reserved. Detecting Key Genes Regulated by miRNAs in Dysfunctional Crosstalk Pathway of Myasthenia Gravis Sun, 01 Feb 2015 10:23:29 +0000 Myasthenia gravis (MG) is a neuromuscular autoimmune disorder resulting from autoantibodies attacking components of the neuromuscular junction. Recent studies have implicated the aberrant expression of microRNAs (miRNAs) in the pathogenesis of MG; however, the underlying mechanisms remain largely unknown. This study aimed to identify key genes regulated by miRNAs in MG. Six dysregulated pathways were identified through differentially expressed miRNAs and mRNAs in MG, and significant crosstalk was detected between five of these. Notably, crosstalk between the “synaptic long-term potentiation” pathway and four others was mediated by five genes involved in the MAPK signaling pathway. Furthermore, 14 key genes regulated by miRNAs were detected, of which six—MAPK1, RAF1, PGF, PDGFRA, EP300, and PPP1CC—mediated interactions between the dysregulated pathways. MAPK1 and RAF1 were responsible for most of this crosstalk (80%), likely reflecting their central roles in MG pathogenesis. In addition, most key genes were enriched in immune-related local areas that were strongly disordered in MG. These results provide new insight into the pathogenesis of MG and offer new potential targets for therapeutic intervention. Yuze Cao, Jianjian Wang, Huixue Zhang, Qinghua Tian, Lixia Chen, Shangwei Ning, Peifang Liu, Xuesong Sun, Xiaoyu Lu, Chang Song, Shuai Zhang, Bo Xiao, and Lihua Wang Copyright © 2015 Yuze Cao et al. All rights reserved. Conformational B-Cell Epitope Prediction Method Based on Antigen Preprocessing and Mimotopes Analysis Thu, 29 Jan 2015 06:48:20 +0000 Identification of epitopes which invokes strong humoral responses is an essential issue in the field of immunology. Various computational methods that have been developed based on the antigen structures and the mimotopes these years narrow the search for experimental validation. These methods can be divided into two categories: antigen structure-based methods and mimotope-based methods. Though new methods of the two kinds have been proposed in these years, they cannot maintain a high degree of satisfaction in various circumstances. In this paper, we proposed a new conformational B-cell epitope prediction method based on antigen preprocessing and mimotopes analysis. The method classifies the antigen surface residues into “epitopes” and “nonepitopes” by six epitope propensity scales, removing the “nonepitopes” and using the preprocessed antigen for epitope prediction based on mimotope sequences. The proposed method gives out the mean F score of 0.42 on the testing dataset. When compared with other publicly available servers by using the testing dataset, the new method yields better performance. The results demonstrate the proposed method is competent for the conformational B-cell epitope prediction. Pingping Sun, Haixu Ju, Baowen Zhang, Yu Gu, Bo Liu, Yanxin Huang, Huijie Zhang, and Yuxin Li Copyright © 2015 Pingping Sun et al. All rights reserved. Helicase and Its Interacting Factors: Regulation Mechanism, Characterization, Structure, and Application for Drug Design Wed, 28 Jan 2015 14:39:55 +0000 Cheng-Yang Huang, Yoshito Abe, Huangen Ding, and I-Fang Chung Copyright © 2015 Cheng-Yang Huang et al. All rights reserved. Automated Training for Algorithms That Learn from Genomic Data Wed, 28 Jan 2015 07:04:42 +0000 Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable. Gokcen Cilingir and Shira L. Broschat Copyright © 2015 Gokcen Cilingir and Shira L. Broschat. All rights reserved. Protein Complex Discovery by Interaction Filtering from Protein Interaction Networks Using Mutual Rank Coexpression and Sequence Similarity Tue, 27 Jan 2015 14:15:39 +0000 The evaluation of the biological networks is considered the essential key to understanding the complex biological systems. Meanwhile, the graph clustering algorithms are mostly used in the protein-protein interaction (PPI) network analysis. The complexes introduced by the clustering algorithms include noise proteins. The error rate of the noise proteins in the PPI network researches is about 40–90%. However, only 30–40% of the existing interactions in the PPI databases depend on the specific biological function. It is essential to eliminate the noise proteins and the interactions from the complexes created via clustering methods. We have introduced new methods of weighting interactions in protein clusters and the splicing of noise interactions and proteins-based interactions on their weights. The coexpression and the sequence similarity of each pair of proteins are considered the edge weight of the proteins in the network. The results showed that the edge filtering based on the amount of coexpression acts similar to the node filtering via graph-based characteristics. Regarding the removal of the noise edges, the edge filtering has a significant advantage over the graph-based method. The edge filtering based on the amount of sequence similarity has the ability to remove the noise proteins and the noise interactions. Ali Kazemi-Pour, Bahram Goliaei, and Hamid Pezeshk Copyright © 2015 Ali Kazemi-Pour et al. All rights reserved. Regulation of DEAH/RHA Helicases by G-Patch Proteins Tue, 27 Jan 2015 11:17:53 +0000 RNA helicases from the DEAH/RHA family are present in all the processes of RNA metabolism. The function of two helicases from this family, Prp2 and Prp43, is regulated by protein partners containing a G-patch domain. The G-patch is a glycine-rich domain discovered by sequence alignment, involved in protein-protein and protein-nucleic acid interaction. Although it has been shown to stimulate the helicase’s enzymatic activities, the precise role of the G-patch domain remains unclear. The role of G-patch proteins in the regulation of Prp43 activity has been studied in the two biological processes in which it is involved: splicing and ribosome biogenesis. Depending on the pathway, the activity of Prp43 is modulated by different G-patch proteins. A particular feature of the structure of DEAH/RHA helicases revealed by the Prp43 structure is the OB-fold domain in C-terminal part. The OB-fold has been shown to be a platform responsible for the interaction with G-patch proteins and RNA. Though there is still no structural data on the G-patch domain, in the current model, the interaction between the helicase, the G-patch protein, and RNA leads to a cooperative binding of RNA and conformational changes of the helicase. Julien Robert-Paganin, Stéphane Réty, and Nicolas Leulliot Copyright © 2015 Julien Robert-Paganin et al. All rights reserved. Virtual Screening of Acetylcholinesterase Inhibitors Using the Lipinski’s Rule of Five and ZINC Databank Thu, 22 Jan 2015 06:24:23 +0000 Alzheimer’s disease (AD) is a progressive and neurodegenerative pathology that can affect people over 65 years of age. It causes several complications, such as behavioral changes, language deficits, depression, and memory impairments. One of the methods used to treat AD is the increase of acetylcholine (ACh) in the brain by using acetylcholinesterase inhibitors (AChEIs). In this study, we used the ZINC databank and the Lipinski’s rule of five to perform a virtual screening and a molecular docking (using Auto Dock Vina 1.1.1) aiming to select possible compounds that have quaternary ammonium atom able to inhibit acetylcholinesterase (AChE) activity. The molecules were obtained by screening and further in vitro assays were performed to analyze the most potent inhibitors through the IC50 value and also to describe the interaction models between inhibitors and enzyme by molecular docking. The results showed that compound D inhibited AChE activity from different vertebrate sources and butyrylcholinesterase (BChE) from Equus ferus (EfBChE), with IC50 ranging from 1.69 ± 0.46 to 5.64 ± 2.47 µM. Compound D interacted with the peripheral anionic subsite in both enzymes, blocking substrate entrance to the active site. In contrast, compound C had higher specificity as inhibitor of EfBChE. In conclusion, the screening was effective in finding inhibitors of AChE and BuChE from different organisms. Pablo Andrei Nogara, Rogério de Aquino Saraiva, Diones Caeran Bueno, Lílian Juliana Lissner, Cristiane Lenz Dalla Corte, Marcos M. Braga, Denis Broock Rosemberg, and João Batista Teixeira Rocha Copyright © 2015 Pablo Andrei Nogara et al. All rights reserved. Mammalian Cell Culture Process for Monoclonal Antibody Production: Nonlinear Modelling and Parameter Estimation Mon, 19 Jan 2015 08:06:35 +0000 Monoclonal antibodies (mAbs) are at present one of the fastest growing products of pharmaceutical industry, with widespread applications in biochemistry, biology, and medicine. The operation of mAbs production processes is predominantly based on empirical knowledge, the improvements being achieved by using trial-and-error experiments and precedent practices. The nonlinearity of these processes and the absence of suitable instrumentation require an enhanced modelling effort and modern kinetic parameter estimation strategies. The present work is dedicated to nonlinear dynamic modelling and parameter estimation for a mammalian cell culture process used for mAb production. By using a dynamical model of such kind of processes, an optimization-based technique for estimation of kinetic parameters in the model of mammalian cell culture process is developed. The estimation is achieved as a result of minimizing an error function by a particle swarm optimization (PSO) algorithm. The proposed estimation approach is analyzed in this work by using a particular model of mammalian cell culture, as a case study, but is generic for this class of bioprocesses. The presented case study shows that the proposed parameter estimation technique provides a more accurate simulation of the experimentally observed process behaviour than reported in previous studies. Dan Selişteanu, Dorin Șendrescu, Vlad Georgeanu, and Monica Roman Copyright © 2015 Dan Selişteanu et al. All rights reserved. Simultaneous Parameters Identifiability and Estimation of an E. coli Metabolic Network Model Tue, 06 Jan 2015 08:05:04 +0000 This work proposes a procedure for simultaneous parameters identifiability and estimation in metabolic networks in order to overcome difficulties associated with lack of experimental data and large number of parameters, a common scenario in the modeling of such systems. As case study, the complex real problem of parameters identifiability of the Escherichia coli K-12 W3110 dynamic model was investigated, composed by 18 differential ordinary equations and 35 kinetic rates, containing 125 parameters. With the procedure, model fit was improved for most of the measured metabolites, achieving 58 parameters estimated, including 5 unknown initial conditions. The results indicate that simultaneous parameters identifiability and estimation approach in metabolic networks is appealing, since model fit to the most of measured metabolites was possible even when important measures of intracellular metabolites and good initial estimates of parameters are not available. Kese Pontes Freitas Alberton, André Luís Alberton, Jimena Andrea Di Maggio, Vanina Gisela Estrada, María Soledad Díaz, and Argimiro Resende Secchi Copyright © 2015 Kese Pontes Freitas Alberton et al. All rights reserved. DNASynth: A Computer Program for Assembly of Artificial Gene Parts in Decreasing Temperature Tue, 06 Jan 2015 05:58:05 +0000 Artificial gene synthesis requires consideration of nucleotide sequence development as well as long DNA molecule assembly protocols. The nucleotide sequence of the molecule must meet many conditions including particular preferences of the host organism for certain codons, avoidance of specific regulatory subsequences, and a lack of secondary structures that inhibit expression. The chemical synthesis of DNA molecule has limitations in terms of strand length; thus, the creation of artificial genes requires the assembly of long DNA molecules from shorter fragments. In the approach presented, the algorithm and the computer program address both tasks: developing the optimal nucleotide sequence to encode a given peptide for a given host organism and determining the long DNA assembly protocol. These tasks are closely connected; a change in codon usage may lead to changes in the optimal assembly protocol, and the lack of a simple assembly protocol may be addressed by changing the nucleotide sequence. The computer program presented in this study was tested with real data from an experiment in a wet biological laboratory to synthesize a peptide. The benefit of the presented algorithm and its application is the shorter time, compared to polymerase cycling assembly, needed to produce a ready synthetic gene. Robert M. Nowak, Anna Wojtowicz-Krawiec, and Andrzej Plucienniczak Copyright © 2015 Robert M. Nowak et al. All rights reserved. Novel Computing Technologies for Bioinformatics and Cheminformatics Sun, 28 Dec 2014 07:06:35 +0000 Chuan Yi Tang, Che-Lun Hung, Ching-Hsien Hsu, Huiru Zheng, and Chun-Yuan Lin Copyright © 2014 Chuan Yi Tang et al. All rights reserved. Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data Sun, 28 Dec 2014 06:47:37 +0000 Julia Tzu-Ya Weng, Li-Ching Wu, Wen-Chi Chang, Tzu-Hao Chang, Tatsuya Akutsu, and Tzong-Yi Lee Copyright © 2014 Julia Tzu-Ya Weng et al. All rights reserved. Phenomics Research on Coronary Heart Disease Based on Human Phenotype Ontology Mon, 15 Dec 2014 06:53:45 +0000 The characteristics of holistic, dynamics, complexity, and spatial and temporal features enable “Omics” and theories of TCM to interlink with each other. HPO, namely, “characterization,” can be understood as a sorting and generalization of the manifestations shown by people with diseases on the basis of the phenomics. Syndrome is the overall “manifestation” of human body pathological and physiological changes expressed by four diagnostic methods’ information. The four diagnostic methods’ data could be the most objective and direct manifestations of human body under morbid conditions. In this aspect, it is consistent with the connation of “characterization.” Meanwhile, the four diagnostic methods’ data also equip us with features of characterization in HPO. In our study, we compared 107 pieces of four diagnostic methods’ information with the “characterization database” to further analyze data of four diagnostic methods’ characterization in accordance with the common characteristics of four diagnostic methods’ information and characterization and integrated 107 pieces of four diagnostic methods’ data to relevant items in HPO and finished the expansion of characterization information in HPO. Qi Shi, Kuo Gao, Huihui Zhao, Juan Wang, Xing Zhai, Peng Lu, Jianxin Chen, and Wei Wang Copyright © 2014 Qi Shi et al. All rights reserved. Erratum to “A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats” Mon, 24 Nov 2014 00:00:00 +0000 Shuaibin Lian, Qingyan Li, Zhiming Dai, Qian Xiang, and Xianhua Dai Copyright © 2014 Shuaibin Lian et al. All rights reserved. A Least Square Method Based Model for Identifying Protein Complexes in Protein-Protein Interaction Network Thu, 23 Oct 2014 12:45:40 +0000 Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity. Qiguo Dai, Maozu Guo, Yingjie Guo, Xiaoyan Liu, Yang Liu, and Zhixia Teng Copyright © 2014 Qiguo Dai et al. All rights reserved. Evolution of Network Biomarkers from Early to Late Stage Bladder Cancer Samples Thu, 18 Sep 2014 06:53:32 +0000 We use a systems biology approach to construct protein-protein interaction networks (PPINs) for early and late stage bladder cancer. By comparing the networks of these two stages, we find that both networks showed very significantly different mechanisms. To obtain the differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two bladder cancer stages using microarray data from cancer cells and their adjacent noncancer cells, respectively. With their carcinogenesis relevance values (CRVs), we identified 152 and 50 significant proteins and their PPI networks (network markers) for early and late stage bladder cancer by statistical assessment. To investigate the evolution of network biomarkers in the carcinogenesis process, primary pathway analysis showed that the significant pathways of early stage bladder cancer are related to ordinary cancer mechanisms, while the ribosome pathway and spliceosome pathway are most important for late stage bladder cancer. Their only intersection is the ubiquitin mediated proteolysis pathway in the whole stage of bladder cancer. The evolution of network biomarkers from early to late stage can reveal the carcinogenesis of bladder cancer. The findings in this study are new clues specific to this study and give us a direction for targeted cancer therapy, and it should be validated in vivo or in vitro in the future. Yung-Hao Wong, Cheng-Wei Li, and Bor-Sen Chen Copyright © 2014 Yung-Hao Wong et al. All rights reserved. MicroRNA Expression Profiling Altered by Variant Dosage of Radiation Exposure Tue, 16 Sep 2014 08:57:42 +0000 Various biological effects are associated with radiation exposure. Irradiated cells may elevate the risk for genetic instability, mutation, and cancer under low levels of radiation exposure, in addition to being able to extend the postradiation side effects in normal tissues. Radiation-induced bystander effect (RIBE) is the focus of rigorous research as it may promote the development of cancer even at low radiation doses. Alterations in the DNA sequence could not explain these biological effects of radiation and it is thought that epigenetics factors may be involved. Indeed, some microRNAs (or miRNAs) have been found to correlate radiation-induced damages and may be potential biomarkers for the various biological effects caused by different levels of radiation exposure. However, the regulatory role that miRNA plays in this aspect remains elusive. In this study, we profiled the expression changes in miRNA under fractionated radiation exposure in human peripheral blood mononuclear cells. By utilizing publicly available microRNA knowledge bases and performing cross validations with our previous gene expression profiling under the same radiation condition, we identified various miRNA-gene interactions specific to different doses of radiation treatment, providing new insights for the molecular underpinnings of radiation injury. Kuei-Fang Lee, Yi-Cheng Chen, Paul Wei-Che Hsu, Ingrid Y. Liu, and Lawrence Shih-Hsin Wu Copyright © 2014 Kuei-Fang Lee et al. All rights reserved. WISCOD: A Statistical Web-Enabled Tool for the Identification of Significant Protein Coding Regions Mon, 15 Sep 2014 05:37:19 +0000 Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global value called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms. Mireia Vilardell, Genis Parra, and Sergi Civit Copyright © 2014 Mireia Vilardell et al. All rights reserved. EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction Thu, 11 Sep 2014 10:40:30 +0000 We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968. Chih-Hao Lu, Chin-Sheng Yu, Yu-Tung Chien, and Shao-Wei Huang Copyright © 2014 Chih-Hao Lu et al. All rights reserved. Computational Biophysical, Biochemical, and Evolutionary Signature of Human R-Spondin Family Proteins, the Member of Canonical Wnt/β-Catenin Signaling Pathway Mon, 08 Sep 2014 08:19:35 +0000 In human, Wnt/β-catenin signaling pathway plays a significant role in cell growth, cell development, and disease pathogenesis. Four human (Rspo)s are known to activate canonical Wnt/β-catenin signaling pathway. Presently, (Rspo)s serve as therapeutic target for several human diseases. Henceforth, basic understanding about the molecular properties of (Rspo)s is essential. We approached this issue by interpreting the biochemical and biophysical properties along with molecular evolution of (Rspo)s thorough computational algorithm methods. Our analysis shows that signal peptide length is roughly similar in (Rspo)s family along with similarity in aa distribution pattern. In Rspo3, four N-glycosylation sites were noted. All members are hydrophilic in nature and showed alike GRAVY values, approximately. Conversely, Rspo3 contains the maximum positively charged residues while Rspo4 includes the lowest. Four highly aligned blocks were recorded through Gblocks. Phylogenetic analysis shows Rspo4 is being rooted with Rspo2 and similarly Rspo3 and Rspo1 have the common point of origin. Through phylogenomics study, we developed a phylogenetic tree of sixty proteins () with the orthologs and paralogs seed sequences. Protein-protein network was also illustrated. Results demonstrated in our study may help the future researchers to unfold significant physiological and therapeutic properties of (Rspo)s in various disease models. Ashish Ranjan Sharma, Chiranjib Chakraborty, Sang-Soo Lee, Garima Sharma, Jeong Kyo Yoon, C. George Priya Doss, Dong-Keun Song, and Ju-Suk Nam Copyright © 2014 Ashish Ranjan Sharma et al. All rights reserved. Gene Expression Profiling of Biological Pathway Alterations by Radiation Exposure Mon, 08 Sep 2014 00:00:00 +0000 Though damage caused by radiation has been the focus of rigorous research, the mechanisms through which radiation exerts harmful effects on cells are complex and not well-understood. In particular, the influence of low dose radiation exposure on the regulation of genes and pathways remains unclear. In an attempt to investigate the molecular alterations induced by varying doses of radiation, a genome-wide expression analysis was conducted. Peripheral blood mononuclear cells were collected from five participants and each sample was subjected to 0.5 Gy, 1 Gy, 2.5 Gy, and 5 Gy of cobalt 60 radiation, followed by array-based expression profiling. Gene set enrichment analysis indicated that the immune system and cancer development pathways appeared to be the major affected targets by radiation exposure. Therefore, 1 Gy radioactive exposure seemed to be a critical threshold dosage. In fact, after 1 Gy radiation exposure, expression levels of several genes including FADD, TNFRSF10B, TNFRSF8, TNFRSF10A, TNFSF10, TNFSF8, CASP1, and CASP4 that are associated with carcinogenesis and metabolic disorders showed significant alterations. Our results suggest that exposure to low-dose radiation may elicit changes in metabolic and immune pathways, potentially increasing the risk of immune dysfunctions and metabolic disorders. Kuei-Fang Lee, Julia Tzu-Ya Weng, Paul Wei-Che Hsu, Yu-Hsiang Chi, Ching-Kai Chen, Ingrid Y. Liu, Yi-Cheng Chen, and Lawrence Shih-Hsin Wu Copyright © 2014 Kuei-Fang Lee et al. All rights reserved. Systematic Expression Profiling Analysis Identifies Specific MicroRNA-Gene Interactions that May Differentiate between Active and Latent Tuberculosis Infection Thu, 04 Sep 2014 00:00:00 +0000 Tuberculosis (TB) is the second most common cause of death from infectious diseases. About 90% of those infected are asymptomatic—the so-called latent TB infections (LTBI), with a 10% lifetime chance of progressing to active TB. To further understand the molecular pathogenesis of TB, several molecular studies have attempted to compare the expression profiles between healthy controls and active TB or LTBI patients. However, the results vary due to diverse genetic backgrounds and study designs and the inherent complexity of the disease process. Thus, developing a sensitive and efficient method for the detection of LTBI is both crucial and challenging. For the present study, we performed a systematic analysis of the gene and microRNA profiles of healthy individuals versus those affected with TB or LTBI. Combined with a series of in silico analysis utilizing publicly available microRNA knowledge bases and published literature data, we have uncovered several microRNA-gene interactions that specifically target both the blood and lungs. Some of these molecular interactions are novel and may serve as potential biomarkers of TB and LTBI, facilitating the development for a more sensitive, efficient, and cost-effective diagnostic assay for TB and LTBI for the Taiwanese population. Lawrence Shih-Hsin Wu, Shih-Wei Lee, Kai-Yao Huang, Tzong-Yi Lee, Paul Wei-Che Hsu, and Julia Tzu-Ya Weng Copyright © 2014 Lawrence Shih-Hsin Wu et al. All rights reserved. Human Umbilical Cord Mesenchymal Stem Cells Infected with Adenovirus Expressing HGF Promote Regeneration of Damaged Neuron Cells in a Parkinson’s Disease Model Wed, 03 Sep 2014 08:15:20 +0000 Parkinson’s disease (PD) is a neurodegenerative movement disorder that is characterized by the progressive degeneration of the dopaminergic (DA) pathway. Mesenchymal stem cells derived from human umbilical cord (hUC-MSCs) have great potential for developing a therapeutic agent as such. HGF is a multifunctional mediator originally identified in hepatocytes and has recently been reported to possess various neuroprotective properties. This study was designed to investigate the protective effect of hUC-MSCs infected by an adenovirus carrying the HGF gene on the PD cell model induced by MPP+ on human bone marrow neuroblastoma cells. Our results provide evidence that the cultural supernatant from hUC-MSCs expressing HGF could promote regeneration of damaged PD cells at higher efficacy than the supernatant from hUC-MSCs alone. And intracellular free Ca2+ obviously decreased after treatment with cultural supernatant from hUC-MSCs expressing HGF, while the expression of CaBP-D28k, an intracellular calcium binding protein, increased. Therefore our study clearly demonstrated that cultural supernatant of MSC overexpressing HGF was capable of eliciting regeneration of damaged PD model cells. This effect was probably achieved through the regulation of intracellular Ca2+ levels by modulating of CaBP-D28k expression. Xin-Shan Liu, Jin-Feng Li, Shan-Shan Wang, Yu-Tong Wang, Yu-Zhen Zhang, Hong-Lei Yin, Shuang Geng, Hui-Cui Gong, Bing Han, and Yun-Liang Wang Copyright © 2014 Xin-Shan Liu et al. All rights reserved. Structural Comparison, Substrate Specificity, and Inhibitor Binding of AGPase Small Subunit from Monocot and Dicot: Present Insight and Future Potential Tue, 02 Sep 2014 11:29:57 +0000 ADP-glucose pyrophosphorylase (AGPase) is the first rate limiting enzyme of starch biosynthesis pathway and has been exploited as the target for greater starch yield in several plants. The structure-function analysis and substrate binding specificity of AGPase have provided enormous potential for understanding the role of specific amino acid or motifs responsible for allosteric regulation and catalytic mechanisms, which facilitate the engineering of AGPases. We report the three-dimensional structure, substrate, and inhibitor binding specificity of AGPase small subunit from different monocot and dicot crop plants. Both monocot and dicot subunits were found to exploit similar interactions with the substrate and inhibitor molecule as in the case of their closest homologue potato tuber AGPase small subunit. Comparative sequence and structural analysis followed by molecular docking and electrostatic surface potential analysis reveal that rearrangements of secondary structure elements, substrate, and inhibitor binding residues are strongly conserved and follow common folding pattern and orientation within monocot and dicot displaying a similar mode of allosteric regulation and catalytic mechanism. The results from this study along with site-directed mutagenesis complemented by molecular dynamics simulation will shed more light on increasing the starch content of crop plants to ensure the food security worldwide. Kishore Sarma, Priyabrata Sen, Madhumita Barooah, Manabendra D. Choudhury, Shubhadeep Roychoudhury, and Mahendra K. Modi Copyright © 2014 Kishore Sarma et al. All rights reserved. A Review of Feature Extraction Software for Microarray Gene Expression Data Sun, 31 Aug 2014 07:10:08 +0000 When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. Ching Siang Tan, Wai Soon Ting, Mohd Saberi Mohamad, Weng Howe Chan, Safaai Deris, and Zuraini Ali Shah Copyright © 2014 Ching Siang Tan et al. All rights reserved. The Mcm2-7 Replicative Helicase: A Promising Chemotherapeutic Target Thu, 28 Aug 2014 15:15:54 +0000 Numerous eukaryotic replication factors have served as chemotherapeutic targets. One replication factor that has largely escaped drug development is the Mcm2-7 replicative helicase. This heterohexameric complex forms the licensing system that assembles the replication machinery at origins during initiation, as well as the catalytic core of the CMG (Cdc45-Mcm2-7-GINS) helicase that unwinds DNA during elongation. Emerging evidence suggests that Mcm2-7 is also part of the replication checkpoint, a quality control system that monitors and responds to DNA damage. As the only replication factor required for both licensing and DNA unwinding, Mcm2-7 is a major cellular regulatory target with likely cancer relevance. Mutations in at least one of the six MCM genes are particularly prevalent in squamous cell carcinomas of the lung, head and neck, and prostrate, and MCM mutations have been shown to cause cancer in mouse models. Moreover various cellular regulatory proteins, including the Rb tumor suppressor family members, bind Mcm2-7 and inhibit its activity. As a preliminary step toward drug development, several small molecule inhibitors that target Mcm2-7 have been recently discovered. Both its structural complexity and essential role at the interface between DNA replication and its regulation make Mcm2-7 a potential chemotherapeutic target. Nicholas E. Simon and Anthony Schwacha Copyright © 2014 Nicholas E. Simon and Anthony Schwacha. All rights reserved. Crystal Structure of a Conserved Hypothetical Protein MJ0927 from Methanocaldococcus jannaschii Reveals a Novel Quaternary Assembly in the Nif3 Family Thu, 28 Aug 2014 15:06:43 +0000 A Nif3 family protein of Methanocaldococcus jannaschii, MJ0927, is highly conserved from bacteria to humans. Although several structures of bacterial Nif3 proteins are known, no structure representing archaeal Nif3 has yet been reported. The crystal structure of Methanocaldococcus jannaschii MJ0927 was determined at 2.47 Å resolution to understand the structural differences between the bacterial and archaeal Nif3 proteins. Intriguingly, MJ0927 is found to adopt an unusual assembly comprising a trimer of dimers that forms a cage-like architecture. Electrophoretic mobility-shift assays indicate that MJ0927 binds to both single-stranded and double-stranded DNA. Structural analysis of MJ0927 reveals a positively charged region that can potentially explain its DNA-binding capability. Taken together, these data suggest that MJ0927 adopts a novel quartenary architecture that could play various DNA-binding roles in Methanocaldococcus jannaschii. Sheng-Chia Chen, Chi-Hung Huang, Chia Shin Yang, Shu-Min Kuan, Ching-Ting Lin, Shan-Ho Chou, and Yeh Chen Copyright © 2014 Sheng-Chia Chen et al. All rights reserved. Relationship between CCR and NT-proBNP in Chinese HF Patients, and Their Correlations with Severity of HF Thu, 28 Aug 2014 09:42:10 +0000 Aim. To evaluate the relationship between creatinine clearance rate (CCR) and the level of N-terminal pro-B-type natriuretic peptide (NT-proBNP) in heart failure (HF) patients and their correlations with HF severity. Methods and Results. Two hundred and one Chinese patients were grouped according to the New York Heart Association (NYHA) classification as NYHA 1-2 and 3-4 groups and 135 cases out of heart failure patients as control group. The following variables were compared among these three groups: age, sex, body mass index (BMI), smoking status, hypertension, diabetes, NT-proBNP, creatinine (Cr), uric acid (UA), left ventricular end-diastolic diameter (LVEDD), and CCR. The biomarkers of NT-proBNP, Cr, UA, LVEDD, and CCR varied significantly in the three groups, and these variables were positively correlated with the NHYA classification. The levels of NT-proBNP and CCR were closely related to the occurrence of HF and were independent risk factors for HF. At the same time, there was a significant negative correlation between the levels of NT-proBNP and CCR. The area under the receiver operating characteristic curve suggested that the NT-proBNP and CCR have high accuracy for diagnosis of HF and have clinical diagnostic value. Conclusion. NT-proBNP and CCR may be important biomarkers in evaluating the severity of HF. Zhigang Lu, Bo Wang, Yunliang Wang, Xueqing Qian, Wei Zheng, and Meng Wei Copyright © 2014 Zhigang Lu et al. All rights reserved. Establishing Standards for Studying Renal Function in Mice through Measurements of Body Size-Adjusted Creatinine and Urea Levels Wed, 27 Aug 2014 12:35:10 +0000 Strategies for obtaining reliable results are increasingly implemented in order to reduce errors in the analysis of human and veterinary samples; however, further data are required for murine samples. Here, we determined an average factor from the murine body surface area for the calculation of biochemical renal parameters, assessed the effects of storage and freeze-thawing of C57BL/6 mouse samples on plasmatic and urinary urea, and evaluated the effects of using two different urea-measurement techniques. After obtaining 24 h urine samples, blood was collected, and body weight and length were established. The samples were evaluated after collection or stored at −20°C and −70°C. At different time points (0, 4, and 90 days), these samples were thawed, the creatinine and/or urea concentrations were analyzed, and samples were restored at these temperatures for further measurements. We show that creatinine clearance measurements should be adjusted according to the body surface area, which was calculated based on the weight and length of the animal. Repeated freeze-thawing cycles negatively affected the urea concentration; the urea concentration was more reproducible when using the modified Berthelot reaction rather than the ultraviolet method. Our findings will facilitate standardization and optimization of methodology as well as understanding of renal and other biochemical data obtained from mice. Wellington Francisco Rodrigues, Camila Botelho Miguel, Marcelo Henrique Napimoga, Carlo Jose Freire Oliveira, and Javier Emilio Lazo-Chica Copyright © 2014 Wellington Francisco Rodrigues et al. All rights reserved. Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection Wed, 27 Aug 2014 12:02:00 +0000 Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method. Xiuquan Du and Jiaxing Cheng Copyright © 2014 Xiuquan Du and Jiaxing Cheng. All rights reserved. Crystal Structure of Deinococcus radiodurans RecQ Helicase Catalytic Core Domain: The Interdomain Flexibility Wed, 27 Aug 2014 08:21:26 +0000 RecQ DNA helicases are key enzymes in the maintenance of genome integrity, and they have functions in DNA replication, recombination, and repair. In contrast to most RecQs, RecQ from Deinococcus radiodurans (DrRecQ) possesses an unusual domain architecture that is crucial for its remarkable ability to repair DNA. Here, we determined the crystal structures of the DrRecQ helicase catalytic core and its ADP-bound form, revealing interdomain flexibility in its first RecA-like and winged-helix (WH) domains. Additionally, the WH domain of DrRecQ is positioned in a different orientation from that of the E. coli RecQ (EcRecQ). These results suggest that the orientation of the protein during DNA-binding is significantly different when comparing DrRecQ and EcRecQ. Sheng-Chia Chen, Chi-Hung Huang, Chia Shin Yang, Tzong-Der Way, Ming-Chung Chang, and Yeh Chen Copyright © 2014 Sheng-Chia Chen et al. All rights reserved. Characterization of Putative cis-Regulatory Elements in Genes Preferentially Expressed in Arabidopsis Male Meiocytes Wed, 27 Aug 2014 08:05:05 +0000 Meiosis is essential for plant reproduction because it is the process during which homologous chromosome pairing, synapsis, and meiotic recombination occur. The meiotic transcriptome is difficult to investigate because of the size of meiocytes and the confines of anther lobes. The recent development of isolation techniques has enabled the characterization of transcriptional profiles in male meiocytes of Arabidopsis. Gene expression in male meiocytes shows unique features. The direct interaction of transcription factors (TFs) with DNA regulatory sequences forms the basis for the specificity of transcriptional regulation. Here, we identified putative cis-regulatory elements (CREs) associated with male meiocyte-expressed genes using in silico tools. The upstream regions (1 kb) of the top 50 genes preferentially expressed in Arabidopsis meiocytes possessed conserved motifs. These motifs are putative binding sites of TFs, some of which share common functions, such as roles in cell division. In combination with cell-type-specific analysis, our findings could be a substantial aid for the identification and experimental verification of the protein-DNA interactions for the specific TFs that drive gene expression in meiocytes. Junhua Li, Jinhong Yuan, and Mingjun Li Copyright © 2014 Junhua Li et al. All rights reserved. Function Formula Oriented Construction of Bayesian Inference Nets for Diagnosis of Cardiovascular Disease Wed, 27 Aug 2014 06:47:48 +0000 An intelligent cardiovascular disease (CVD) diagnosis system using hemodynamic parameters (HDPs) derived from sphygmogram (SPG) signal is presented to support the emerging patient-centric healthcare models. To replicate clinical approach of diagnosis through a staged decision process, the Bayesian inference nets (BIN) are adapted. New approaches to construct a hierarchical multistage BIN using defined function formulas and a method employing fuzzy logic (FL) technology to quantify inference nodes with dynamic values of statistical parameters are proposed. The suggested methodology is validated by constructing hierarchical Bayesian fuzzy inference nets (HBFIN) to diagnose various heart pathologies from the deduced HDPs. The preliminary diagnostic results show that the proposed methodology has salient validity and effectiveness in the diagnosis of cardiovascular disease. Booma Devi Sekar and Mingchui Dong Copyright © 2014 Booma Devi Sekar and Mingchui Dong. All rights reserved. High-Throughput Functional Screening of Steroid Substrates with Wild-Type and Chimeric P450 Enzymes Tue, 26 Aug 2014 10:40:59 +0000 The promiscuity of a collection of enzymes consisting of 31 wild-type and synthetic variants of CYP1A enzymes was evaluated using a series of 14 steroids and 2 steroid-like chemicals, namely, nootkatone, a terpenoid, and mifepristone, a drug. For each enzyme-substrate couple, the initial steady-state velocity of metabolite formation was determined at a substrate saturating concentration. For that, a high-throughput approach was designed involving automatized incubations in 96-well microplate with sixteen 6-point kinetics per microplate and data acquisition using LC/MS system accepting 96-well microplate for injections. The resulting dataset was used for multivariate statistics aimed at sorting out the correlations existing between tested enzyme variants and ability to metabolize steroid substrates. Functional classifications of both CYP1A enzyme variants and steroid substrate structures were obtained allowing the delineation of global structural features for both substrate recognition and regioselectivity of oxidation. Philippe Urban, Gilles Truan, and Denis Pompon Copyright © 2014 Philippe Urban et al. All rights reserved. Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model Mon, 18 Aug 2014 10:52:22 +0000 Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection. Zhu-Hong You, Shuai Li, Xin Gao, Xin Luo, and Zhen Ji Copyright © 2014 Zhu-Hong You et al. All rights reserved. Drug Repositioning Discovery for Early- and Late-Stage Non-Small-Cell Lung Cancer Mon, 18 Aug 2014 07:02:32 +0000 Drug repositioning is a popular approach in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development time. Non-small-cell lung cancer (NSCLC) is one of the leading causes of death worldwide. To reduce the biological heterogeneity effects among different individuals, both normal and cancer tissues were taken from the same patient, hence allowing pairwise testing. By comparing early- and late-stage cancer patients, we can identify stage-specific NSCLC genes. Differentially expressed genes are clustered separately to form up- and downregulated communities that are used as queries to perform enrichment analysis. The results suggest that pathways for early- and late-stage cancers are different. Sets of up- and downregulated genes were submitted to the cMap web resource to identify potential drugs. To achieve high confidence drug prediction, multiple microarray experimental results were merged by performing meta-analysis. The results of a few drug findings are supported by MTT assay or clonogenic assay data. In conclusion, we have been able to assess the potential existing drugs to identify novel anticancer drugs, which may be helpful in drug repositioning discovery for NSCLC. Chien-Hung Huang, Peter Mu-Hsin Chang, Yong-Jie Lin, Cheng-Hsu Wang, Chi-Ying F. Huang, and Ka-Lok Ng Copyright © 2014 Chien-Hung Huang et al. All rights reserved. Systematic Analysis of the Association between Gut Flora and Obesity through High-Throughput Sequencing and Bioinformatics Approaches Thu, 14 Aug 2014 12:10:54 +0000 Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI) 24) were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant genera of bacteria in case samples (with a BMI 27) were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). A principal coordinate analysis (PCoA) demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher’s ). The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity. Chih-Min Chiu, Wei-Chih Huang, Shun-Long Weng, Han-Chi Tseng, Chao Liang, Wei-Chi Wang, Ting Yang, Tzu-Ling Yang, Chen-Tsung Weng, Tzu-Hao Chang, and Hsien-Da Huang Copyright © 2014 Chih-Min Chiu et al. All rights reserved. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products Tue, 12 Aug 2014 10:16:15 +0000 Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO) with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products based on multiple ontologies of hierarchical GO annotations. The performance of our algorithm was evaluated and found to be superior to the other tested methods. We implemented the proposed algorithm in a software package, FSim, based on statistical and computing environment. It can be used to discover functionally related genes for a given gene, group of genes, or set of function terms. Conclusions. FSim is a flexible tool to analyze functional gene groups based on the GO annotation databases. Qiang Hu, ZhiGang Wang, and ZhengGuo Zhang Copyright © 2014 Qiang Hu et al. All rights reserved. Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm Tue, 12 Aug 2014 00:00:00 +0000 Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews’ correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews’ correlation coefficients of 0.2239. Guohua Huang, Lin Lu, Kaiyan Feng, Jun Zhao, Yuchao Zhang, Yaochen Xu, Ning Zhang, Bi-Qing Li, Weiping Huang, and Yu-Dong Cai Copyright © 2014 Guohua Huang et al. All rights reserved. Novel Approach for Coexpression Analysis of E2F1–3 and MYC Target Genes in Chronic Myelogenous Leukemia Sun, 10 Aug 2014 08:29:13 +0000 Background. Chronic myelogenous leukemia (CML) is characterized by tremendous amount of immature myeloid cells in the blood circulation. E2F1–3 and MYC are important transcription factors that form positive feedback loops by reciprocal regulation in their own transcription processes. Since genes regulated by E2F1–3 or MYC are related to cell proliferation and apoptosis, we wonder if there exists difference in the coexpression patterns of genes regulated concurrently by E2F1–3 and MYC between the normal and the CML states. Results. We proposed a method to explore the difference in the coexpression patterns of those candidate target genes between the normal and the CML groups. A disease-specific cutoff point for coexpression levels that classified the coexpressed gene pairs into strong and weak coexpression classes was identified. Our developed method effectively identified the coexpression pattern differences from the overall structure. Moreover, we found that genes related to the cell adhesion and angiogenesis properties were more likely to be coexpressed in the normal group when compared to the CML group. Conclusion. Our findings may be helpful in exploring the underlying mechanisms of CML and provide useful information in cancer treatment. Fengfeng Wang, Lawrence W. C. Chan, William C. S. Cho, Petrus Tang, Jun Yu, Chi-Ren Shyu, Nancy B. Y. Tsui, S. C. Cesar Wong, Parco M. Siu, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2014 Fengfeng Wang et al. All rights reserved. A Genome-Wide Identification of Genes Undergoing Recombination and Positive Selection in Neisseria Sun, 10 Aug 2014 08:23:34 +0000 Currently, there is particular interest in the molecular mechanisms of adaptive evolution in bacteria. Neisseria is a genus of gram negative bacteria, and there has recently been considerable focus on its two human pathogenic species N. meningitidis and N. gonorrhoeae. Until now, no genome-wide studies have attempted to scan for the genes related to adaptive evolution. For this reason, we selected 18 Neisseria genomes (14 N. meningitidis, 3 N. gonorrhoeae and 1 commensal N. lactamics) to conduct a comparative genome analysis to obtain a comprehensive understanding of the roles of natural selection and homologous recombination throughout the history of adaptive evolution. Among the 1012 core orthologous genes, we identified 635 genes with recombination signals and 10 genes that showed significant evidence of positive selection. Further functional analyses revealed that no functional bias was found in the recombined genes. Positively selected genes are prone to DNA processing and iron uptake, which are essential for the fundamental life cycle. Overall, the results indicate that both recombination and positive selection play crucial roles in the adaptive evolution of Neisseria genomes. The positively selected genes and the corresponding amino acid sites provide us with valuable targets for further research into the detailed mechanisms of adaptive evolution in Neisseria. Dong Yu, Yuan Jin, Zhiqiu Yin, Hongguang Ren, Wei Zhou, Long Liang, and Junjie Yue Copyright © 2014 Dong Yu et al. All rights reserved. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration Wed, 06 Aug 2014 08:37:56 +0000 Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes. Jian Zhang, ZhiHao Xing, Mingming Ma, Ning Wang, Yu-Dong Cai, Lei Chen, and Xun Xu Copyright © 2014 Jian Zhang et al. All rights reserved. C-Terminal Domain Swapping of SSB Changes the Size of the ssDNA Binding Site Mon, 04 Aug 2014 06:33:19 +0000 Single-stranded DNA-binding protein (SSB) plays an important role in DNA metabolism, including DNA replication, repair, and recombination, and is therefore essential for cell survival. Bacterial SSB consists of an N-terminal ssDNA-binding/oligomerization domain and a flexible C-terminal protein-protein interaction domain. We characterized the ssDNA-binding properties of Klebsiella pneumoniae SSB (KpSSB), Salmonella enterica Serovar Typhimurium LT2 SSB (StSSB), Pseudomonas aeruginosa PAO1 SSB (PaSSB), and two chimeric KpSSB proteins, namely, KpSSBnStSSBc and KpSSBnPaSSBc. The C-terminal domain of StSSB or PaSSB was exchanged with that of KpSSB through protein chimeragenesis. By using the electrophoretic mobility shift assay, we characterized the stoichiometry of KpSSB, StSSB, PaSSB, KpSSBnStSSBc, and KpSSBnPaSSBc, complexed with a series of ssDNA homopolymers. The binding site sizes were determined to be , , , , and nucleotides (nt), respectively. Comparison of the binding site sizes of KpSSB, KpSSBnStSSBc, and KpSSBnPaSSBc showed that the C-terminal domain swapping of SSB changes the size of the binding site. Our observations suggest that not only the conserved N-terminal domain but also the C-terminal domain of SSB is an important determinant for ssDNA binding. Yen-Hua Huang and Cheng-Yang Huang Copyright © 2014 Yen-Hua Huang and Cheng-Yang Huang. All rights reserved. The Effects of the Context-Dependent Codon Usage Bias on the Structure of the nsp1α of Porcine Reproductive and Respiratory Syndrome Virus Sun, 03 Aug 2014 07:47:26 +0000 The information about the crystal structure of porcine reproductive and respiratory syndrome virus (PRRSV) leader protease nsp1α is available to analyze the roles of tRNA abundance of pigs and codon usage of the nsp1α gene in the formation of this protease. The effects of tRNA abundance of the pigs and the synonymous codon usage and the context-dependent codon bias (CDCB) of the nsp1α on shaping the specific folding units (α-helix, β-strand, and the coil) in the nsp1α were analyzed based on the structural information about this protease from protein data bank (PDB: 3IFU) and the nsp1α of the 191 PRRSV strains. By mapping the overall tRNA abundance along the nsp1α, we found that there is no link between the fluctuation of the overall tRNA abundance and the specific folding units in the nsp1α, and the low translation speed of ribosome caused by the tRNA abundance exists in the nsp1α. The strong correlation between some synonymous codon usage and the specific folding units in the nsp1α was found, and the phenomenon of CDCB exists in the specific folding units of the nsp1α. These findings provide an insight into the roles of the synonymous codon usage and CDCB in the formation of PRRSV nsp1α structure. Yao-zhong Ding, Ya-nan You, Dong-jie Sun, Hao-tai Chen, Yong-lu Wang, Hui-yun Chang, Li Pan, Yu-zhen Fang, Zhong-wang Zhang, Peng Zhou, Jian-liang Lv, Xin-sheng Liu, Jun-jun Shao, Fu-rong Zhao, Tong Lin, Laszlo Stipkovits, Zygmunt Pejsak, Yong-guang Zhang, and Jie Zhang Copyright © 2014 Yao-zhong Ding et al. All rights reserved. Detecting Epistatic Interactions in Metagenome-Wide Association Studies by metaBOOST Thu, 24 Jul 2014 18:41:12 +0000 Material and Methods. We recall the definition of epistasis and extend it for metagenomic biomarkers and then we describe the overview of our method metaBOOST and provide detailed information about each step of metaBOOST. Results. We describe the data sources for both simulation studies and real metagenomic datasets. Then, we describe the procedure of simulation studies and provide results for it. After that, we conduct real datasets studies and report the results. Conclusions and Discussion. Finally, we conclude our method and discuss some possible improvements for the future. Mengmeng Wu and Rui Jiang Copyright © 2014 Mengmeng Wu and Rui Jiang. All rights reserved. The N-Terminal Domain of Human DNA Helicase Rtel1 Contains a Redox Active Iron-Sulfur Cluster Thu, 24 Jul 2014 09:20:31 +0000 Human telomere length regulator Rtel1 is a superfamily II DNA helicase and is essential for maintaining proper length of telomeres in chromosomes. Here we report that the N-terminal domain of human Rtel1 (RtelN) expressed in Escherichia coli cells produces a protein that contains a redox active iron-sulfur cluster with the redox midpoint potential of −248 ± 10 mV (pH 8.0). The iron-sulfur cluster in RtelN is sensitive to hydrogen peroxide and nitric oxide, indicating that reactive oxygen/nitrogen species may modulate the DNA helicase activity of Rtel1 via modification of its iron-sulfur cluster. Purified RtelN retains a weak binding affinity for the single-stranded (ss) and double-stranded (ds) DNA in vitro. However, modification of the iron-sulfur cluster by hydrogen peroxide or nitric oxide does not significantly affect the DNA binding activity of RtelN, suggesting that the iron-sulfur cluster is not directly involved in the DNA interaction in the N-terminal domain of Rtel1. Aaron P. Landry and Huangen Ding Copyright © 2014 Aaron P. Landry and Huangen Ding. All rights reserved. Security Mechanism Based on Hospital Authentication Server for Secure Application of Implantable Medical Devices Thu, 24 Jul 2014 07:55:14 +0000 After two recent security attacks against implantable medical devices (IMDs) have been reported, the privacy and security risks of IMDs have been widely recognized in the medical device market and research community, since the malfunctioning of IMDs might endanger the patient’s life. During the last few years, a lot of researches have been carried out to address the security-related issues of IMDs, including privacy, safety, and accessibility issues. A physician accesses IMD through an external device called a programmer, for diagnosis and treatment. Hence, cryptographic key management between IMD and programmer is important to enforce a strict access control. In this paper, a new security architecture for the security of IMDs is proposed, based on a 3-Tier security model, where the programmer interacts with a Hospital Authentication Server, to get permissions to access IMDs. The proposed security architecture greatly simplifies the key management between IMDs and programmers. Also proposed is a security mechanism to guarantee the authenticity of the patient data collected from IMD and the nonrepudiation of the physician’s treatment based on it. The proposed architecture and mechanism are analyzed and compared with several previous works, in terms of security and performance. Chang-Seop Park Copyright © 2014 Chang-Seop Park. All rights reserved.