BioMed Research International: Bioinformatics The latest articles from Hindawi © 2017 , Hindawi Limited . All rights reserved. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features Tue, 14 Nov 2017 00:00:00 +0000 DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature. Rianon Zaman, Shahana Yasmin Chowdhury, Mahmood A. Rashid, Alok Sharma, Abdollah Dehzangi, and Swakkhar Shatabda Copyright © 2017 Rianon Zaman et al. All rights reserved. Reversible Data Hiding in FTIR Microspectroscopy Images with Tamper Indication and Payload Error Correction Mon, 13 Nov 2017 00:00:00 +0000 Fourier transform infrared (FTIR) microspectroscopy images contain information from the whole infrared spectrum used for microspectroscopic analyses. In combination with the FTIR image, visible light images are used to depict the area from which the FTIR spectral image was sampled. These two images are traditionally acquired as separate files. This paper proposes a histogram shifting-based data hiding technique to embed visible light images in FTIR spectral images producing single entities. The primary objective is to improve data management efficiency. Secondary objectives are confidentiality, availability, and reliability. Since the integrity of biomedical data is vital, the proposed method applies reversible data hiding. After extraction of the embedded data, the FTIR image is reversed to its original state. Furthermore, the proposed method applies authentication tags generated with keyed Hash-Based Message Authentication Codes (HMAC) to detect tampered or corrupted areas of FTIR images. The experimental results show that the FTIR spectral images carrying the payload maintain good perceptual fidelity and the payload can be reliably recovered even after bit flipping or cropping attacks. It has been also shown that extraction successfully removes all modifications caused by the payload. Finally, authentication tags successfully indicated tampered FTIR image areas. Angelos Fylakis, Anja Keskinarkaus, Juha Partala, Simo Saarakkala, and Tapio Seppänen Copyright © 2017 Angelos Fylakis et al. All rights reserved. Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules Thu, 09 Nov 2017 00:00:00 +0000 Named-Entity Recognition is commonly used to identify biological entities such as proteins, genes, and chemical compounds found in scientific articles. The Human Phenotype Ontology (HPO) is an ontology that provides a standardized vocabulary for phenotypic abnormalities found in human diseases. This article presents the Identifying Human Phenotypes (IHP) system, tuned to recognize HPO entities in unstructured text. IHP uses Stanford CoreNLP for text processing and applies Conditional Random Fields trained with a rich feature set, which includes linguistic, orthographic, morphologic, lexical, and context features created for the machine learning-based classifier. However, the main novelty of IHP is its validation step based on a set of carefully crafted manual rules, such as the negative connotation analysis, that combined with a dictionary can filter incorrectly identified entities, find missed entities, and combine adjacent entities. The performance of IHP was evaluated using the recently published HPO Gold Standardized Corpora (GSC), where the system Bio-LarK CR obtained the best -measure of 0.56. IHP achieved an -measure of 0.65 on the GSC. Due to inconsistencies found in the GSC, an extended version of the GSC was created, adding 881 entities and modifying 4 entities. IHP achieved an -measure of 0.863 on the new GSC. Manuel Lobo, Andre Lamurias, and Francisco M. Couto Copyright © 2017 Manuel Lobo et al. All rights reserved. Corrigendum to “Topography Prediction of Helical Transmembrane Proteins by a New Modification of the Sliding Window Method” Sun, 05 Nov 2017 00:00:00 +0000 Maria N. Simakova and Nikolai N. Simakov Copyright © 2017 Maria N. Simakova and Nikolai N. Simakov. All rights reserved. Erratum to “Automatic Segmentation of Ultrasound Tomography Image” Sun, 05 Nov 2017 00:00:00 +0000 Shibin Wu, Shaode Yu, Ling Zhuang, Xinhua Wei, Mark Sak, Neb Duric, Jiani Hu, and Yaoqin Xie Copyright © 2017 Shibin Wu et al. All rights reserved. Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer Sun, 29 Oct 2017 00:00:00 +0000 Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries. Pedro Sernadela, Lorena González-Castro, Claudio Carta, Eelke van der Horst, Pedro Lopes, Rajaram Kaliyaperumal, Mark Thompson, Rachel Thompson, Núria Queralt-Rosinach, Estrella Lopez, Libby Wood, Agata Robertson, Claudia Lamanna, Mette Gilling, Michael Orth, Roxana Merino-Martinez, Manuel Posada, Domenica Taruscio, Hanns Lochmüller, Peter Robinson, Marco Roos, and José Luís Oliveira Copyright © 2017 Pedro Sernadela et al. All rights reserved. Prediction and Validation of Hub Genes Associated with Colorectal Cancer by Integrating PPI Network and Gene Expression Data Wed, 25 Oct 2017 00:00:00 +0000 Although hundreds of colorectal cancer- (CRC-) related genes have been screened, the significant hub genes still need to be further identified. The aim of this study was to identify the hub genes based on protein-protein interaction network and uncover their clinical value. Firstly, 645 CRC patients’ data from the Tumor Cancer Genome Atlas were downloaded and analyzed to screen the differential expression genes (DEGs). And then, the Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis was performed, and PPI network of the DEGs was constructed by Cytoscape software. Finally, four hub genes (CXCL3, ELF5, TIMP1, and PHLPP2) were obtained from four subnets and further validated in our clinical setting and TCGA dataset. The results showed that mRNA expression of CXCL3, ELF5, and TIMP1 was increased in CRC tissues, whereas PHLPP2 mRNA expression was decreased. More importantly, high expression of CXCL3, ELF5, and TIMP1 was significantly associated with lymphatic invasion, distance metastasis, and advanced tumor stage. In addition, a shorter overall survival was observed in patients with increased CXCL3, TIMP1, and ELF5 expression and decreased PHLPP2 expression. In conclusion, the four hub genes screened by our strategy could serve as novel biomarkers for prognosis prediction of CRC patients. Yongfu Xiong, Wenxian You, Rong Wang, Linglong Peng, and Zhongxue Fu Copyright © 2017 Yongfu Xiong et al. All rights reserved. Sulfonanilide Derivatives in Identifying Novel Aromatase Inhibitors by Applying Docking, Virtual Screening, and MD Simulations Studies Tue, 17 Oct 2017 00:00:00 +0000 Breast cancer is one of the leading causes of death noticed in women across the world. Of late the most successful treatments rendered are the use of aromatase inhibitors (AIs). In the current study, a two-way approach for the identification of novel leads has been adapted. 81 chemical compounds were assessed to understand their potentiality against aromatase along with the four known drugs. Docking was performed employing the CDOCKER protocol available on the Discovery Studio (DS v4.5). Exemestane has displayed a higher dock score among the known drug candidates and is labeled as reference. Out of 81 ligands 14 have exhibited higher dock scores than the reference. In the second approach, these 14 compounds were utilized for the generation of the pharmacophore. The validated four-featured pharmacophore was then allowed to screen Chembridge database and the potential Hits were obtained after subjecting them to Lipinski’s rule of five and the ADMET properties. Subsequently, the acquired 3,050 Hits were escalated to molecular docking utilizing GOLD v5.0. Finally, the obtained Hits were consequently represented to be ideal lead candidates that were escalated to the MD simulations and binding free energy calculations. Additionally, the gene-disease association was performed to delineate the associated disease caused by CYP19A1. Shailima Rampogu, Minky Son, Chanin Park, Hyong-Ha Kim, Jung-Keun Suh, and Keun Woo Lee Copyright © 2017 Shailima Rampogu et al. All rights reserved. Dissection of Factors Affecting the Variability of the Peptide Bond Geometry and Planarity Sun, 15 Oct 2017 07:05:52 +0000 Proteins frequently assume complex three-dimensional structures characterized by marginal thermodynamic stabilities. In this scenario, deciphering the folding code of these molecular giants with clay feet is a cumbersome task. Studies performed in last years have shown that the interplay between backbone geometry and local conformation has an important impact on protein structures. Although the variability of several geometrical parameters of protein backbone has been established, the role of the structural context in determining these effects has been hitherto limited to the valence bond angle τ (NCαC). We here investigated the impact of different factors on the observed variability of backbone geometry and peptide bond planarity. These analyses corroborate the notion that the local conformation expressed in terms of dihedrals plays a predominant role in dictating the variability of these parameters. The impact of secondary structure is limited to bond angles which involve atoms that are usually engaged in H-bonds and, therefore, more susceptible to the structural context. Present data also show that the nature of the side chain has a significant impact on angles such as NCαCβ and CβCαC. In conclusion, our analyses strongly support the use of variability of protein backbone geometry in structure refinement, validation, and prediction. Nicole Balasco, Luciana Esposito, Amarinder Singh Thind, Mario Rosario Guarracino, and Luigi Vitagliano Copyright © 2017 Nicole Balasco et al. All rights reserved. A Novel Phosphorylation Site-Kinase Network-Based Method for the Accurate Prediction of Kinase-Substrate Relationships Thu, 12 Oct 2017 00:00:00 +0000 Protein phosphorylation is catalyzed by kinases which regulate many aspects that control death, movement, and cell growth. Identification of the phosphorylation site-specific kinase-substrate relationships (ssKSRs) is important for understanding cellular dynamics and provides a fundamental basis for further disease-related research and drug design. Although several computational methods have been developed, most of these methods mainly use local sequence of phosphorylation sites and protein-protein interactions (PPIs) to construct the prediction model. While phosphorylation presents very complicated processes and is usually involved in various biological mechanisms, the aforementioned information is not sufficient for accurate prediction. In this study, we propose a new and powerful computational approach named KSRPred for ssKSRs prediction, by introducing a novel phosphorylation site-kinase network (pSKN) profiles that can efficiently incorporate the relationships between various protein kinases and phosphorylation sites. The experimental results show that the pSKN profiles can efficiently improve the prediction performance in collaboration with local sequence and PPI information. Furthermore, we compare our method with the existing ssKSRs prediction tools and the results demonstrate that KSRPred can significantly improve the prediction performance compared with existing tools. Minghui Wang, Tao Wang, Binghua Wang, Yu Liu, and Ao Li Copyright © 2017 Minghui Wang et al. All rights reserved. Automatic Segmentation of Ultrasound Tomography Image Sun, 10 Sep 2017 00:00:00 +0000 Ultrasound tomography (UST) image segmentation is fundamental in breast density estimation, medicine response analysis, and anatomical change quantification. Existing methods are time consuming and require massive manual interaction. To address these issues, an automatic algorithm based on GrabCut (AUGC) is proposed in this paper. The presented method designs automated GrabCut initialization for incomplete labeling and is sped up with multicore parallel programming. To verify performance, AUGC is applied to segment thirty-two in vivo UST volumetric images. The performance of AUGC is validated with breast overlapping metrics (Dice coefficient (), Jaccard (), and False positive (FP)) and time cost (TC). Furthermore, AUGC is compared to other methods, including Confidence Connected Region Growing (CCRG), watershed, and Active Contour based Curve Delineation (ACCD). Experimental results indicate that AUGC achieves the highest accuracy ( and and ) and takes on average about 4 seconds to process a volumetric image. It was said that AUGC benefits large-scale studies by using UST images for breast cancer screening and pathological quantification. Shibin Wu, Shaode Yu, Ling Zhuang, Xinhua Wei, Mark Sak, Neb Duric, Jiani Hu, and Yaoqin Xie Copyright © 2017 Shibin Wu et al. All rights reserved. Application of the Subtractive Genomics and Molecular Docking Analysis for the Identification of Novel Putative Drug Targets against Salmonella enterica subsp. enterica serovar Poona Thu, 17 Aug 2017 00:00:00 +0000 The emergence of novel pathogenic strains with increased antibacterial resistance patterns poses a significant threat to the management of infectious diseases. In this study, we aimed at utilizing the subtractive genomic approach to identify novel drug targets against Salmonella enterica subsp. enterica serovar Poona strain ATCC BAA-1673. We employed in silico bioinformatics tools to subtract the strain-specific paralogous and host-specific homologous sequences from the bacterial proteome. The sorted proteome was further refined to identify the essential genes in the pathogenic bacterium using the database of essential genes (DEG). We carried out metabolic pathway and subcellular location analysis of the essential proteins of the pathogen to elucidate the involvement of these proteins in important cellular processes. We found 52 unique essential proteins in the target proteome that could be utilized as novel targets to design newer drugs. Further, we investigated these proteins in the DrugBank databases and 11 of the unique essential proteins showed druggability according to the FDA approved drug bank databases with diverse broad-spectrum property. Molecular docking analyses of the novel druggable targets with the drugs were carried out by AutoDock Vina option based on scoring functions. The results showed promising candidates for novel drugs against Salmonella infections. Tanvir Hossain, Mohammad Kamruzzaman, Talita Zahin Choudhury, Hamida Nooreen Mahmood, A. H. M. Nurun Nabi, and Md. Ismail Hosen Copyright © 2017 Tanvir Hossain et al. All rights reserved. Corrigendum to “Detecting Key Genes Regulated by miRNAs in Dysfunctional Crosstalk Pathway of Myasthenia Gravis” Mon, 14 Aug 2017 07:01:30 +0000 Yuze Cao, Jianjian Wang, Huixue Zhang, Qinghua Tian, Lixia Chen, Shangwei Ning, Peifang Liu, Xuesong Sun, Xiaoyu Lu, Chang Song, Shuai Zhang, Bo Xiao, and Lihua Wang Copyright © 2017 Yuze Cao et al. All rights reserved. Response to: Comment on “Detecting Key Genes Regulated by miRNAs in Dysfunctional Crosstalk Pathway of Myasthenia Gravis” Mon, 14 Aug 2017 00:00:00 +0000 Yuze Cao, Jianjian Wang, Huixue Zhang, Qinghua Tian, Lixia Chen, Shangwei Ning, Peifang Liu, Xuesong Sun, Xiaoyu Lu, Chang Song, Shuai Zhang, Bo Xiao, and Lihua Wang Copyright © 2017 Yuze Cao et al. All rights reserved. Robustification of Naïve Bayes Classifier and Its Application for Microarray Gene Expression Data Analysis Mon, 07 Aug 2017 00:00:00 +0000 The naïve Bayes classifier (NBC) is one of the most popular classifiers for class prediction or pattern recognition from microarray gene expression data (MGED). However, it is very much sensitive to outliers with the classical estimates of the location and scale parameters. It is one of the most important drawbacks for gene expression data analysis by the classical NBC. The gene expression dataset is often contaminated by outliers due to several steps involved in the data generating process from hybridization of DNA samples to image analysis. Therefore, in this paper, an attempt is made to robustify the Gaussian NBC by the minimum -divergence method. The role of minimum -divergence method in this article is to produce the robust estimators for the location and scale parameters based on the training dataset and outlier detection and modification in test dataset. The performance of the proposed method depends on the tuning parameter . It reduces to the traditional naïve Bayes classifier when . We investigated the performance of the proposed beta naïve Bayes classifier (-NBC) in a comparison with some popular existing classifiers (NBC, KNN, SVM, and AdaBoost) using both simulated and real gene expression datasets. We observed that the proposed method improved the performance over the others in presence of outliers. Otherwise, it keeps almost equal performance. Md. Shakil Ahmed, Md. Shahjaman, Md. Masud Rana, and Md. Nurul Haque Mollah Copyright © 2017 Md. Shakil Ahmed et al. All rights reserved. Retina Image Vessel Segmentation Using a Hybrid CGLI Level Set Method Thu, 03 Aug 2017 08:31:55 +0000 As a nonintrusive method, the retina imaging provides us with a better way for the diagnosis of ophthalmologic diseases. Extracting the vessel profile automatically from the retina image is an important step in analyzing retina images. A novel hybrid active contour model is proposed to segment the fundus image automatically in this paper. It combines the signed pressure force function introduced by the Selective Binary and Gaussian Filtering Regularized Level Set (SBGFRLS) model with the local intensity property introduced by the Local Binary fitting (LBF) model to overcome the difficulty of the low contrast in segmentation process. It is more robust to the initial condition than the traditional methods and is easily implemented compared to the supervised vessel extraction methods. Proposed segmentation method was evaluated on two public datasets, DRIVE (Digital Retinal Images for Vessel Extraction) and STARE (Structured Analysis of the Retina) (the average accuracy of 0.9390 with 0.7358 sensitivity and 0.9680 specificity on DRIVE datasets and average accuracy of 0.9409 with 0.7449 sensitivity and 0.9690 specificity on STARE datasets). The experimental results show that our method is effective and our method is also robust to some kinds of pathology images compared with the traditional level set methods. Guannan Chen, Meizhu Chen, Jichun Li, and Encai Zhang Copyright © 2017 Guannan Chen et al. All rights reserved. Robust Significance Analysis of Microarrays by Minimum β-Divergence Method Thu, 27 Jul 2017 00:00:00 +0000 Identification of differentially expressed (DE) genes with two or more conditions is an important task for discovery of few biomarker genes. Significance Analysis of Microarrays (SAM) is a popular statistical approach for identification of DE genes for both small- and large-sample cases. However, it is sensitive to outlying gene expressions and produces low power in presence of outliers. Therefore, in this paper, an attempt is made to robustify the SAM approach using the minimum β-divergence estimators instead of the maximum likelihood estimators of the parameters. We demonstrated the performance of the proposed method in a comparison of some other popular statistical methods such as ANOVA, SAM, LIMMA, KW, EBarrays, GaGa, and BRIDGE using both simulated and real gene expression datasets. We observe that all methods show good and almost equal performance in absence of outliers for the large-sample cases, while in the small-sample cases only three methods (SAM, LIMMA, and proposed) show almost equal and better performance than others with two or more conditions. However, in the presence of outliers, on an average, only the proposed method performs better than others for both small- and large-sample cases with each condition. Md. Shahjaman, Nishith Kumar, Md. Manir Hossain Mollah, Md. Shakil Ahmed, Anjuman Ara Begum, S. M. Shahinul Islam, and Md. Nurul Haque Mollah Copyright © 2017 Md. Shahjaman et al. All rights reserved. Syntool: A Novel Region-Based Intolerance Score to Single Nucleotide Substitution for Synonymous Mutations Predictions Based on 123,136 Individuals Mon, 24 Jul 2017 00:00:00 +0000 Background. Synonymous mutation is the single nucleotide change that does not cause an amino acid change but can affect the rate and efficiency of translation. So recent increase in our knowledge has revealed a substantial contribution of synonymous mutations to human disease risk and other complex traits. Nevertheless, there are still rarely synonymous mutation prediction methods. Methods. Nonsynonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. Here we defined synonymous and missense variation as single nucleotide substitution variation. And then we evaluated the intolerance of genic transcripts to single nucleotide substitution variation based on gnomAD 123136 individuals. After regressing all variations on common variations, we defined residuals of regression model as every genomics region intolerance scores. Results. We constructed a total of 24799 nonoverlapped region-based intolerance score by their intolerance to single nucleotide substitution variation (Syntool). The results show that Syntool score can discriminate synonymous disease causing mutations in Human Gene Mutation Database (HGMD Professional) and ClinVar database much better than others. Taken together, this study provides a novel prediction system for synonymous mutations, called Syntool, which could be helpful in identifying candidate synonymous disease causing mutations. Tongda Zhang, Yiran Wu, Zhangzhang Lan, Quan Shi, Ying Yang, and Jian Guo Copyright © 2017 Tongda Zhang et al. All rights reserved. Corrigendum to “Comparative Study of Exome Copy Number Variation Estimation Tools Using Array Comparative Genomic Hybridization as Control” Tue, 18 Jul 2017 07:11:01 +0000 Yan Guo, Quanhu Sheng, David C. Samuels, Brian Lehmann, Joshua A. Bauer, Jennifer Pietenpol, and Yu Shyr Copyright © 2017 Yan Guo et al. All rights reserved. Significantly Reduced Blood Pressure Measurement Variability for Both Normotensive and Hypertensive Subjects: Effect of Polynomial Curve Fitting of Oscillometric Pulses Wed, 12 Jul 2017 07:28:12 +0000 This study aimed to compare within-subject blood pressure (BP) variabilities from different measurement techniques. Cuff pressures from three repeated BP measurements were obtained from 30 normotensive and 30 hypertensive subjects. Automatic BPs were determined from the pulses with normalised peak amplitude larger than a threshold (0.5 for SBP, 0.7 for DBP, and 1.0 for MAP). They were also determined from cuff pressures associated with the above thresholds on a fitted curve polynomial curve of the oscillometric pulse peaks. Finally, the standard deviation (SD) of three repeats and its coefficient of variability (CV) were compared between the two automatic techniques. For the normotensive group, polynomial curve fitting significantly reduced SD of repeats from 3.6 to 2.5 mmHg for SBP and from 3.7 to 2.1 mmHg for MAP and reduced CV from 3.0% to 2.2% for SBP and from 4.3% to 2.4% for MAP (all ). For the hypertensive group, SD of repeats decreased from 6.5 to 5.5 mmHg for SBP and from 6.7 to 4.2 mmHg for MAP, and CV decreased from 4.2% to 3.6% for SBP and from 5.8% to 3.8% for MAP (all ). In conclusion, polynomial curve fitting of oscillometric pulses had the ability to reduce automatic BP measurement variability. Fangwei Yang, Fei Chen, Mingping Zhu, Aiqing Chen, and Dingchang Zheng Copyright © 2017 Fangwei Yang et al. All rights reserved. Integrating Genome-Wide Association and eQTLs Studies Identifies the Genes and Gene Sets Associated with Diabetes Wed, 28 Jun 2017 09:42:23 +0000 Aim. To identify novel candidate genes and gene sets for diabetes. Methods. We performed an integrative analysis of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) data for diabetes. Summary data was driven from a large-scale GWAS of diabetes, totally involving 58,070 individuals. eQTLs dataset included 923,021 cis-eQTL for 14,329 genes and 4,732 trans-eQTL for 2,612 genes. Integrative analysis of GWAS and eQTLs data was conducted by summary data-based Mendelian randomization (SMR). To identify the gene sets associated with diabetes, the SMR single gene analysis results were further subjected to gene set enrichment analysis (GSEA). A total of 13,311 annotated gene sets were analyzed in this study. Results. SMR analysis identified 6 genes significantly associated with fasting glucose, such as C11ORF10 ( value = 6.04 × 10−8), MRPL33 ( value = 1.24 × 10−7), and FADS1 ( value = 2.39 × 10−7). Gene set analysis identified HUANG_FOXA2_TARGETS_UP (false discovery rate = 0.047) associated with fasting glucose. Conclusion. Our study provides novel clues for clarifying the genetic mechanism of diabetes. This study also illustrated the good performance of SMR approach and extended it to gene set association analysis for complex diseases. Xiao Liang, Awen He, Wenyu Wang, Li Liu, Yanan Du, Qianrui Fan, Ping Li, Yan Wen, Jingcan Hao, Xiong Guo, and Feng Zhang Copyright © 2017 Xiao Liang et al. All rights reserved. Protein Function Prediction Using Deep Restricted Boltzmann Machines Wed, 28 Jun 2017 09:07:42 +0000 Accurately annotating biological functions of proteins is one of the key tasks in the postgenome era. Many machine learning based methods have been applied to predict functional annotations of proteins, but this task is rarely solved by deep learning techniques. Deep learning techniques recently have been successfully applied to a wide range of problems, such as video, images, and nature language processing. Inspired by these successful applications, we investigate deep restricted Boltzmann machines (DRBM), a representative deep learning technique, to predict the missing functional annotations of partially annotated proteins. Experimental results on Homo sapiens, Saccharomyces cerevisiae, Mus musculus, and Drosophila show that DRBM achieves better performance than other related methods across different evaluation metrics, and it also runs faster than these comparing methods. Xianchun Zou, Guijun Wang, and Guoxian Yu Copyright © 2017 Xianchun Zou et al. All rights reserved. The Serum Analysis of Dampness Syndrome in Patients with Coronary Heart Disease and Chronic Renal Failure Based on the Theory of “Same Syndromes in Different Diseases” Wed, 21 Jun 2017 00:00:00 +0000 Aim. To analyze the serum metabolites in patients with coronary heart disease (CHD) showing dampness syndrome and patients with chronic renal failure (CRF) showing dampness syndrome and to seek the substance that serves as the underlying basis of dampness syndrome in “same syndromes in different diseases.” Methods. Metabolic spectrum by GC-MS was performed using serum samples from 29 patients with CHD showing dampness syndrome and 32 patients with CRF showing dampness syndrome. The principal component analysis and statistical analysis of partial least squares were performed to detect the metabolites with different levels of expression in patients with CHD and CRF. Furthermore, by comparing the VIP value and data mining in METLIN and HMDB, we identified the common metabolites in both patient groups. Results. (1) Ten differential metabolites were found in patients with CHD showing dampness syndrome when compared to healthy subjects. Meanwhile, nine differential metabolites were found in patients with CRF showing dampness syndrome when compared to healthy subjects. (2) There were 9 differential metabolites identified when the serum metabolites of the CHD patients with dampness syndrome were compared to those of CRF patients with dampness syndrome. There were 4 common metabolites found in the serums of both patient groups. Yiming Hao, Xue Yuan, Peng Qian, Guanfeng Bai, and Yiqin Wang Copyright © 2017 Yiming Hao et al. All rights reserved. ExCNVSS: A Noise-Robust Method for Copy Number Variation Detection in Whole Exome Sequencing Data Sun, 18 Jun 2017 09:15:22 +0000 Copy number variations (CNVs) are structural variants associated with human diseases. Recent studies verified that disease-related genes are based on the extraction of rare de novo and transmitted CNVs from exome sequencing data. The need for more efficient and accurate methods has increased, which still remains a challenging problem due to coverage biases, as well as the sparse, small-sized, and noncontinuous nature of exome sequencing. In this study, we developed a new CNV detection method, ExCNVSS, based on read coverage depth evaluation and scale-space filtering to resolve these problems. We also developed the method ExCNVSSnoRatio, which is a version of ExCNVSS, for applying to cases with an input of test data only without the need to consider the availability of a matched control. To evaluate the performance of our method, we tested it with 11 different simulated data sets and 10 real HapMap samples’ data. The results demonstrated that ExCNVSS outperformed three other state-of-the-art methods and that our method corrected for coverage biases and detected all-sized CNVs even without matched control data. Jinhwa Kong, Jaemoon Shin, Jungim Won, Keonbae Lee, Unjoo Lee, and Jeehee Yoon Copyright © 2017 Jinhwa Kong et al. All rights reserved. Drug Target Protein-Protein Interaction Networks: A Systematic Perspective Sun, 11 Jun 2017 00:00:00 +0000 The identification and validation of drug targets are crucial in biomedical research and many studies have been conducted on analyzing drug target features for getting a better understanding on principles of their mechanisms. But most of them are based on either strong biological hypotheses or the chemical and physical properties of those targets separately. In this paper, we investigated three main ways to understand the functional biomolecules based on the topological features of drug targets. There are no significant differences between targets and common proteins in the protein-protein interactions network, indicating the drug targets are neither hub proteins which are dominant nor the bridge proteins. According to some special topological structures of the drug targets, there are significant differences between known targets and other proteins. Furthermore, the drug targets mainly belong to three typical communities based on their modularity. These topological features are helpful to understand how the drug targets work in the PPI network. Particularly, it is an alternative way to predict potential targets or extract nontargets to test a new drug target efficiently and economically. By this way, a drug target’s homologue set containing 102 potential target proteins is predicted in the paper. Yanghe Feng, Qi Wang, and Tengjiao Wang Copyright © 2017 Yanghe Feng et al. All rights reserved. Detecting the Candidate Gender Determinants by Bioinformatic Prediction of miRNAs and Their Targets from Transcriptome Sequences of the Male and Female Flowers in Salix suchowensis Tue, 30 May 2017 00:00:00 +0000 MicroRNAs (miRNAs) belong to a class of small, noncoding, and endogenous single-stranded RNAs that negatively regulate gene expression at the posttranscriptional level. Potential miRNAs can be identified based on sequence homology since miRNAs are highly conserved in plants. In this study, we aligned the expressed sequence tags derived from flower buds of male and female S. suchowensis to miRNAs in the miRBase, which enable us to identify 34 potential miRNAs from flower buds of the alternate sexes. Among them, 11 were from the female and 23 were from the male. Analyzing sequence complementarity led to identification of 124 and 55 miRNA targets in the male and female flower buds, respectively. By mapping the target genes of the predicted miRNAs to the sequence assemblies of S. suchowensis, a miR156 mediated gene was detected at the gender locus of willow, which was a transcription factor involved in flower development. It is noteworthy that this target is not expressed in male flower, while it is expressed fairly highly in female flower based on the transcriptome data derived from the alternate sexes of willows. This study provides new bioinformatic clue for further exploring the genetic mechanism underlying gender determination in willows. Suyun Wei, Ning Ye, and Tongming Yin Copyright © 2017 Suyun Wei et al. All rights reserved. A Novel ECG Eigenvalue Detection Algorithm Based on Wavelet Transform Wed, 17 May 2017 00:00:00 +0000 This study investigated an electrocardiogram (ECG) eigenvalue automatic analysis and detection method; ECG eigenvalues were used to reverse the myocardial action potential in order to achieve automatic detection and diagnosis of heart disease. Firstly, the frequency component of the feature signal was extracted based on the wavelet transform, which could be used to locate the signal feature after the energy integral processing. Secondly, this study established a simultaneous equations model of action potentials of the myocardial membrane, using ECG eigenvalues for regression fitting, in order to accurately obtain the eigenvalue vector of myocardial membrane potential. The experimental results show that the accuracy of ECG eigenvalue recognition is more than 99.27%, and the accuracy rate of detection of heart disease such as myocardial ischemia and heart failure is more than 86.7%. Ziran Peng and Guojun Wang Copyright © 2017 Ziran Peng and Guojun Wang. All rights reserved. A Review on Recent Computational Methods for Predicting Noncoding RNAs Wed, 03 May 2017 00:00:00 +0000 Noncoding RNAs (ncRNAs) play important roles in various cellular activities and diseases. In this paper, we presented a comprehensive review on computational methods for ncRNA prediction, which are generally grouped into four categories: () homology-based methods, that is, comparative methods involving evolutionarily conserved RNA sequences and structures, () de novo methods using RNA sequence and structure features, () transcriptional sequencing and assembling based methods, that is, methods designed for single and pair-ended reads generated from next-generation RNA sequencing, and () RNA family specific methods, for example, methods specific for microRNAs and long noncoding RNAs. In the end, we summarized the advantages and limitations of these methods and pointed out a few possible future directions for ncRNA prediction. In conclusion, many computational methods have been demonstrated to be effective in predicting ncRNAs for further experimental validation. They are critical in reducing the huge number of potential ncRNAs and pointing the community to high confidence candidates. In the future, high efficient mapping technology and more intrinsic sequence features (e.g., motif and -mer frequencies) and structure features (e.g., minimum free energy, conserved stem-loop, or graph structures) are suggested to be combined with the next- and third-generation sequencing platforms to improve ncRNA prediction. Yi Zhang, Haiyun Huang, Dahan Zhang, Jing Qiu, Jiasheng Yang, Kejing Wang, Lijuan Zhu, Jingjing Fan, and Jialiang Yang Copyright © 2017 Yi Zhang et al. All rights reserved. Identification of “BRAF-Positive” Cases Based on Whole-Slide Image Analysis Mon, 24 Apr 2017 00:00:00 +0000 A key requirement for precision medicine is the accurate identification of patients that would respond to a specific treatment or those that represent a high-risk group, and a plethora of molecular biomarkers have been proposed for this purpose during the last decade. Their application in clinical settings, however, is not always straightforward due to relatively high costs of some tests, limited availability of the biological material and time, and procedural constraints. Hence, there is an increasing interest in constructing tissue-based surrogate biomarkers that could be applied with minimal overhead directly to histopathology images and which could be used for guiding the selection of eventual further molecular tests. In the context of colorectal cancer, we present a method for constructing a surrogate biomarker that is able to predict with high accuracy whether a sample belongs to the “BRAF-positive” group, a high-risk group comprising V600E BRAF mutants and BRAF-mutant-like tumors. Our model is trained to mimic the predictions of a 64-gene signature, the current definition of BRAF-positive group, thus effectively identifying histopathology image features that can be linked to a molecular score. Since the only required input is the routine histopathology image, the model can easily be integrated in the diagnostic workflow. Vlad Popovici, Aleš Křenek, and Eva Budinská Copyright © 2017 Vlad Popovici et al. All rights reserved. Mining of Microbial Genomes for the Novel Sources of Nitrilases Wed, 12 Apr 2017 00:00:00 +0000 Next-generation DNA sequencing (NGS) has made it feasible to sequence large number of microbial genomes and advancements in computational biology have opened enormous opportunities to mine genome sequence data for novel genes and enzymes or their sources. In the present communication in silico mining of microbial genomes has been carried out to find novel sources of nitrilases. The sequences selected were analyzed for homology and considered for designing motifs. The manually designed motifs based on amino acid sequences of nitrilases were used to screen 2000 microbial genomes (translated to proteomes). This resulted in identification of one hundred thirty-eight putative/hypothetical sequences which could potentially code for nitrilase activity. In vitro validation of nine predicted sources of nitrilases was done for nitrile/cyanide hydrolyzing activity. Out of nine predicted nitrilases, Gluconacetobacter diazotrophicus, Sphingopyxis alaskensis, Saccharomonospora viridis, and Shimwellia blattae were specific for aliphatic nitriles, whereas nitrilases from Geodermatophilus obscurus, Nocardiopsis dassonvillei, Runella slithyformis, and Streptomyces albus possessed activity for aromatic nitriles. Flavobacterium indicum was specific towards potassium cyanide (KCN) which revealed the presence of nitrilase homolog, that is, cyanide dihydratase with no activity for either aliphatic, aromatic, or aryl nitriles. The present study reports the novel sources of nitrilases and cyanide dihydratase which were not reported hitherto by in silico or in vitro studies. Nikhil Sharma, Neerja Thakur, Tilak Raj, Savitri, and Tek Chand Bhalla Copyright © 2017 Nikhil Sharma et al. All rights reserved. Screening of Tumor Suppressor Genes in Metastatic Colorectal Cancer Tue, 04 Apr 2017 00:00:00 +0000 Most tumor suppressor genes are commonly inactivated in the development of colorectal cancer (CRC). The activation of tumor suppressor genes may be beneficial to suppress the development and metastasis of CRC. This study analyzed genes expression and methylation levels in different stages of CRC. Genes with downregulated mRNA expression and upregulated methylation level in advanced CRC were screened as the potential tumor suppressor genes. After comparing the methylation level of screened genes, we found that MBD1 gene had downregulated mRNA expression and upregulated methylation levels in advanced CRC and continuously upregulated methylation level in the progression of CRC. Enrichment analysis revealed that genes expression in accordance with the elevated expression of MBD1 mainly located on chromosomes 17p13 and 17p12 and 8 tumor suppressor genes located on chromosome 17p13. Further enrichment analysis of transcription factor binding site identified that SP1 binding site had higher enrichment and could bind with MBD1. In conclusion, MBD1 may be a tumor suppressor gene in advanced CRC and affect the development and metastasis of CRC by regulating 8 tumor suppressor genes through binding with SP1. Lu Qi and Yanqing Ding Copyright © 2017 Lu Qi and Yanqing Ding. All rights reserved. Screening for Key Pathways Associated with the Development of Osteoporosis by Bioinformatics Analysis Thu, 30 Mar 2017 07:37:22 +0000 Objectives. We aimed to find the key pathways associated with the development of osteoporosis. Methods. We downloaded expression profile data of GSE35959 and analyzed the differentially expressed genes (DEGs) in 3 comparison groups (old_op versus middle, old_op versus old, and old_op versus senescent). KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analyses were carried out. Besides, Venn diagram analysis and gene functional interaction (FI) network analysis were performed. Results. Totally 520 DEGs, 966 DEGs, and 709 DEGs were obtained in old_op versus middle, old_op versus old, and old_op versus senescent groups, respectively. Lysosome pathway was the significantly enriched pathways enriched by intersection genes. The pathways enriched by subnetwork modules suggested that mitotic metaphase and anaphase and signaling by Rho GTPases in module 1 had more proteins from module. Conclusions. Lysosome pathway, mitotic metaphase and anaphase, and signaling by Rho GTPases may be involved in the development of osteoporosis. Furthermore, Rho GTPases may regulate the balance of bone resorption and bone formation via controlling osteoclast and osteoblast. These 3 pathways may be regarded as the treatment targets for osteoporosis. Yanqing Liu, Yueqiu Wang, Yanxia Zhang, Zhiyong Liu, Hongfei Xiang, Xianbo Peng, Bohua Chen, and Guyou Jia Copyright © 2017 Yanqing Liu et al. All rights reserved. Empirical Driven Automatic Detection of Lobulation Imaging Signs in Lung CT Wed, 29 Mar 2017 00:00:00 +0000 Computer-aided detection (CAD) of lobulation can help radiologists to diagnose/detect lung diseases easily and accurately. Compared to CAD of nodule and other lung lesions, CAD of lobulation remained an unexplored problem due to very complex and varying nature of lobulation. Thus, many state-of-the-art methods could not detect successfully. Hence, we revisited classical methods with the capability of extracting undulated characteristics and designed a sliding window based framework for lobulation detection in this paper. Under the designed framework, we investigated three categories of lobulation classification algorithms: template matching, feature based classifier, and bending energy. The resultant detection algorithms were evaluated through experiments on LISS database. The experimental results show that the algorithm based on combination of global context feature and BOF encoding has best overall performance, resulting in score of 0.1009. Furthermore, bending energy method is shown to be appropriate for reducing false positives. We performed bending energy method following the LIOP-LBP mixture feature, the average positive detection per image was reduced from 30 to 22, and score increased to 0.0643 from 0.0599. To the best of our knowledge this is the first kind of work for direct lobulation detection and first application of bending energy to any kind of lobulation work. Guanghui Han, Xiabi Liu, Nouman Q. Soomro, Jia Sun, Yanfeng Zhao, Xinming Zhao, and Chunwu Zhou Copyright © 2017 Guanghui Han et al. All rights reserved. Joint Covariate Detection on Expression Profiles for Selecting Prognostic miRNAs in Glioblastoma Mon, 20 Mar 2017 00:00:00 +0000 An important application of expression profiles is to stratify patients into high-risk and low-risk groups using limited but key covariates associated with survival outcomes. Prior to that, variables considered to be associated with survival outcomes are selected. A combination of single variables, each of which is significantly related to survival outcomes, is always regarded to be candidates for posterior patient stratification. Instead of individually significant variables, a combination that contains not only significant but also insignificant variables is supposed to be concentrated on. By means of bottom-up enumeration on each pair of variables, we propose a joint covariate detection strategy to select candidates that not only correspond to close association with survival outcomes but also help to make a clear stratification of patients. Experimental results on a publicly available dataset of glioblastoma multiforme indicate that the selected pair composed of an individually significant and an insignificant miRNA keeps a better performance than the combination of significant single variables. The selected miRNA pair is ultimately regarded to be associated with the prognosis of glioblastoma multiforme by further pathway analysis. Chengqi Sun and Xudong Zhao Copyright © 2017 Chengqi Sun and Xudong Zhao. All rights reserved. Bioinformatic Approaches for Fungal Omics Wed, 15 Mar 2017 07:44:21 +0000 Guohua Xiao, Xinyu Zhang, and Qiang Gao Copyright © 2017 Guohua Xiao et al. All rights reserved. Racial Differences in Esophageal Squamous Cell Carcinoma: Incidence and Molecular Features Tue, 14 Mar 2017 10:13:02 +0000 The incidence and histological type of esophageal cancer are highly variable depending on geographic location and race/ethnicity. Here we want to determine if racial difference exists in the molecular features of esophageal cancer. We firstly confirmed that the incidence rate of esophagus adenocarcinoma (EA) was higher in Whites than in Asians and Blacks, while the incidence of esophageal squamous cell carcinoma (ESCC) was highest in Asians. Then we compared the genome-wide somatic mutations, methylation, and gene expression to identify differential genes by race. The mutation frequencies of some genes in the same pathway showed opposite difference between Asian and White patients, but their functional effects to the pathway may be consistent. The global patterns of methylation and expression were similar, which reflected the common characteristics of ESCC tumors from different populations. A small number of genes had significant differences between Asians and Whites. More interesting, the racial differences of COL11A1 were consistent across multiple molecular levels, with higher mutation frequency, higher methylation, and lower expression in White patients. This indicated that COL11A1 might play important roles in ESCC, especially in White population. Additional studies are needed to further explore their functions in esophageal cancer. Shirui Chen, Kai Zhou, Liguang Yang, Guohui Ding, and Hong Li Copyright © 2017 Shirui Chen et al. All rights reserved. Analysis of Differentially Expressed Genes in Gastrocnemius Muscle between DGAT1 Transgenic Mice and Wild-Type Mice Mon, 13 Mar 2017 00:00:00 +0000 Adipose tissue was the major energy deposition site of the mammals and provided the energy for the body and released the external pressure to the internal organs. In animal production, fat deposition in muscle can affect the meat quality, especially the intramuscular fat (IMF) content. Diacylglycerol acyltransferase-1 (DGAT1) was the key enzyme to control the synthesis of the triacylglycerol in adipose tissue. In order to better understand the regulation mechanism of the DGAT1 in the intramuscular fat deposition, the global gene expression profiling was performed in gastrocnemius muscle between DGAT1 transgenic mice and wild-type mice by microarray. 281 differentially expressed transcripts were identified with at least 1.5-fold change and the value < 0.05. 169 transcripts were upregulated and 112 transcripts were downregulated. Ten genes (SREBF1, DUSP1, PLAGL1, FKBP5, ZBTB16, PPP1R3C, CDC14A, GLUL, PDK4, and UCP3) were selected to validate the reliability of the chip’s results by the real-time PCR. The finding of RT-PCR was consistent with the gene chip. Seventeen signal pathways were analyzed using KEGG pathway database and the pathways concentrated mainly on the G-protein coupled receptor protein signaling pathway, signal transduction, oxidation-reduction reaction, olfactory receptor activity, protein binding, and zinc ion binding. This study implied a function role of DGAT1 in the synthesis of TAG, insulin resistance, and IMF deposition. Fei Ying, Hao Gu, Yuanzhu Xiong, and Bo Zuo Copyright © 2017 Fei Ying et al. All rights reserved. Electrostatic Switch Function in the Mechanism of Protein Kinase A I Activation: Results of the Molecular Dynamics Simulation Tue, 07 Mar 2017 00:00:00 +0000 We used molecular dynamics to find the average path of the A-domain conformational transition in protein kinase A Iα. We obtained thirteen productive trajectories and processed them sequentially using factor and cross-correlation analyses. The conformational transition is presented as partly deterministic sequence of six events. Event B represents transition of the phosphate binding cassette. Main participants of this event form electrostatic switch cAMP(O6)–A202(N-H)–G199(C=O). Through this switch, cAMP transmits information about its binding to hydrophobic switch L203–Y229 and thus triggers conformational transition of A-domain. Events C and D consist in N3A-motif displacement towards phosphate binding cassette and B/C-helix rotation. Event E involves an increase in interaction energy between Y229 and β-subdomain. Taken together, events B, E, and D correspond to the hinge movement towards β-barrel. Transition of B/C-helix turn (a.a. 229–234) from α-form to π-form accounts for event F. Event G implies that π-helical turn is replaced by kink. Emerging in the resulting conformation, electrostatic interaction R241–E200 facilitates kink formation. The obtained data on the mechanism of cAMP-dependent activation of PKA Iα may contribute to new approaches to designing pharmaceuticals based on cAMP analogs. Olga N. Rogacheva, Boris F. Shchegolev, Elena A. Vershinina, Alexander A. Tokmakov, and Vasiliy E. Stefanov Copyright © 2017 Olga N. Rogacheva et al. All rights reserved. Computational Analysis of Specific MicroRNA Biomarkers for Noninvasive Early Cancer Detection Sun, 05 Mar 2017 00:00:00 +0000 Cancer is a complex disease residing in various tissues of human body, accompanied with many abnormalities and mutations in genomes, transcriptome, and epigenome. Early detection plays a crucial role in extending survival time of all major cancer types. Recent advances in microarray and sequencing techniques have given more support to identifying effective biomarkers for early detection of cancer. MicroRNAs (miRNAs) are more and more frequently used as candidates for biomarkers in cancer related studies due to their regulation of target gene expression. In this paper, the comparative analysis is used to discover miRNA expression patterns in cancer versus normal samples on early stage of eight prevalent cancer types. Our work focuses on the specific miRNAs biomarkers identification and function analysis. Several identified miRNA biomarkers in this paper are matched well with those reported in existing researches, and most of them could serve as potential candidate indicators for clinical early diagnosis applications. Tianci Song, Yanchun Liang, Zhongbo Cao, Wei Du, and Ying Li Copyright © 2017 Tianci Song et al. All rights reserved. Intelligent Informatics in Translational Medicine 2016 Tue, 28 Feb 2017 14:07:12 +0000 Hao-Teng Chang, Tatsuya Akutsu, Oliver Ray, Sorin Draghici, and Tun-Wen Pai Copyright © 2017 Hao-Teng Chang et al. All rights reserved. Analysis of the Bacterial Communities in Two Liquors of Soy Sauce Aroma as Revealed by High-Throughput Sequencing of the 16S rRNA V4 Hypervariable Region Tue, 28 Feb 2017 11:20:18 +0000 Chinese liquor is one of the world’s oldest distilled alcoholic beverages and an important commercial fermented product in China. The Chinese liquor fermentation process has three stages: making Daqu (the starter), stacking fermentation on the ground, and liquor fermentation in pits. We investigated the bacterial diversity of Maotai and Guotai Daqu and liquor fermentation using high-throughput sequencing of the V4 hypervariable region of the 16S rRNA gene. A total of 70,297 sequences were obtained from the Daqu samples and clustered into 17 phyla. The composition of the bacterial communities in the Daqu from these two soy sauce aroma-style Chinese liquors was the same, although some bacterial species changed in abundance. Between the Daqu and liquor fermentation samples, 12 bacterial phyla increased. The abundance of Lactobacillus and Pseudomonas increased in the liquor fermentation. This study has used high-throughput sequencing to provide new insights into the bacterial composition of the Chinese liquor Daqu and fermentation. Similarities in the distribution of bacteria in the soy sauce aroma-style Chinese liquors Daqu suggest that the abundance of bacteria might be generally concerned to other liquor. Jing Tang, Xiaoxin Tang, Ming Tang, Ximin Zhang, Xiaorong Xu, and Yin Yi Copyright © 2017 Jing Tang et al. All rights reserved. A Combined Random Forests and Active Contour Model Approach for Fully Automatic Segmentation of the Left Atrium in Volumetric MRI Sun, 19 Feb 2017 08:40:58 +0000 Segmentation of the left atrium (LA) from cardiac magnetic resonance imaging (MRI) datasets is of great importance for image guided atrial fibrillation ablation, LA fibrosis quantification, and cardiac biophysical modelling. However, automated LA segmentation from cardiac MRI is challenging due to limited image resolution, considerable variability in anatomical structures across subjects, and dynamic motion of the heart. In this work, we propose a combined random forests (RFs) and active contour model (ACM) approach for fully automatic segmentation of the LA from cardiac volumetric MRI. Specifically, we employ the RFs within an autocontext scheme to effectively integrate contextual and appearance information from multisource images together for LA shape inferring. The inferred shape is then incorporated into a volume-scalable ACM for further improving the segmentation accuracy. We validated the proposed method on the cardiac volumetric MRI datasets from the STACOM 2013 and HVSMR 2016 databases and showed that it outperforms other latest automated LA segmentation methods. Validation metrics, average Dice coefficient (DC) and average surface-to-surface distance (S2S), were computed as and  mm, versus those of 0.6222–0.878 and 1.34–8.72 mm, obtained by other methods, respectively. Chao Ma, Gongning Luo, and Kuanquan Wang Copyright © 2017 Chao Ma et al. All rights reserved. Integrating miRNA and mRNA Expression Profiling Uncovers miRNAs Underlying Fat Deposition in Sheep Wed, 15 Feb 2017 00:00:00 +0000 MicroRNAs (miRNAs) are endogenous, noncoding RNAs that regulate various biological processes including adipogenesis and fat metabolism. Here, we adopted a deep sequencing approach to determine the identity and abundance of miRNAs involved in fat deposition in adipose tissues from fat-tailed (Kazakhstan sheep, KS) and thin-tailed (Tibetan sheep, TS) sheep breeds. By comparing HiSeq data of these two breeds, 539 miRNAs were shared in both breeds, whereas 179 and 97 miRNAs were uniquely expressed in KS and TS, respectively. We also identified 35 miRNAs that are considered to be putative novel miRNAs. The integration of miRNA-mRNA analysis revealed that miRNA-associated targets were mainly involved in the gene ontology (GO) biological processes concerning cellular process and metabolic process, and miRNAs play critical roles in fat deposition through their ability to regulate fundamental pathways. These pathways included the MAPK signaling pathway, FoxO and Wnt signaling pathway, and focal adhesion. Taken together, our results define miRNA expression signatures that may contribute to fat deposition and lipid metabolism in sheep. Guangxian Zhou, Xiaolong Wang, Chao Yuan, Danju Kang, Xiaochun Xu, Jiping Zhou, Rongqing Geng, Yuxin Yang, Zhaoxia Yang, and Yulin Chen Copyright © 2017 Guangxian Zhou et al. All rights reserved. Two Efficient Techniques to Find Approximate Overlaps between Sequences Wed, 15 Feb 2017 00:00:00 +0000 The next-generation sequencing (NGS) technology outputs a huge number of sequences (reads) that require further processing. After applying prefiltering techniques in order to eliminate redundancy and to correct erroneous reads, an overlap-based assembler typically finds the longest exact suffix-prefix match between each ordered pair of the input reads. However, another trend has been evolving for the purpose of solving an approximate version of the overlap problem. The main benefit of this direction is the ability to skip time-consuming error-detecting techniques which are applied in the prefiltering stage. In this work, we present and compare two techniques to solve the approximate overlap problem. The first adapts a compact prefix tree to efficiently solve the approximate all-pairs suffix-prefix problem, while the other utilizes a well-known principle, namely, the pigeonhole principle, to identify a potential overlap match in order to ultimately solve the same problem. Our results show that our solution using the pigeonhole principle has better space and time consumption over an FM-based solution, while our solution based on prefix tree has the best space consumption between all three solutions. The number of mismatches (hamming distance) is used to define the approximate matching between strings in our work. Maan Haj Rachid Copyright © 2017 Maan Haj Rachid. All rights reserved. Developing an App by Exploiting Web-Based Mobile Technology to Inspect Controlled Substances in Patient Care Units Tue, 14 Feb 2017 00:00:00 +0000 We selected iOS in this study as the App operation system, Objective-C as the programming language, and Oracle as the database to develop an App to inspect controlled substances in patient care units. Using a web-enabled smartphone, pharmacist inspection can be performed on site and the inspection result can be directly recorded into HIS through the Internet, so human error of data translation can be minimized and the work efficiency and data processing can be improved. This system not only is fast and convenient compared to the conventional paperwork, but also provides data security and accuracy. In addition, there are several features to increase inspecting quality: (1) accuracy of drug appearance, (2) foolproof mechanism to avoid input errors or miss, (3) automatic data conversion without human judgments, (4) online alarm of expiry date, and (5) instant inspection result to show not meted items. This study has successfully turned paper-based medication inspection into inspection using a web-based mobile device. Ying-Hao Lu, Li-Yao Lee, Ying-Lan Chen, Hsing-I Cheng, Wen-Tsung Tsai, Chen-Chun Kuo, Chung-Yu Chen, and Yaw-Bin Huang Copyright © 2017 Ying-Hao Lu et al. All rights reserved. Comparison and Validation of Putative Pathogenicity-Related Genes Identified by T-DNA Insertional Mutagenesis and Microarray Expression Profiling in Magnaporthe oryzae Tue, 14 Feb 2017 00:00:00 +0000 High-throughput technologies of functional genomics such as T-DNA insertional mutagenesis and microarray expression profiling have been employed to identify genes related to pathogenicity in Magnaporthe oryzae. However, validation of the functions of individual genes identified by these high-throughput approaches is laborious. In this study, we compared two published lists of genes putatively related to pathogenicity in M. oryzae identified by T-DNA insertional mutagenesis (comprising 1024 genes) and microarray expression profiling (comprising 236 genes), respectively, and then validated the functions of some overlapped genes between the two lists by knocking them out using the method of target gene replacement. Surprisingly, only 13 genes were overlapped between the two lists, and none of the four genes selected from the overlapped genes exhibited visible phenotypic changes on vegetative growth, asexual reproduction, and infection ability in their knockout mutants. Our results suggest that both of the lists might contain large proportions of unrelated genes to pathogenicity and therefore comparing the two gene lists is hardly helpful for the identification of genes that are more likely to be involved in pathogenicity as we initially expected. Ying Wang, Ying Wáng, Qi Tan, Ying Nv Gao, Yan Li, and Da Peng Bao Copyright © 2017 Ying Wang et al. All rights reserved. Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services Sun, 12 Feb 2017 00:00:00 +0000 With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective. Longxiang Shi, Shijian Li, Xiaoran Yang, Jiaheng Qi, Gang Pan, and Binbin Zhou Copyright © 2017 Longxiang Shi et al. All rights reserved. NFP: An R Package for Characterizing and Comparing of Annotated Biological Networks Thu, 09 Feb 2017 00:00:00 +0000 Large amounts of various biological networks exist for representing different types of interaction data, such as genetic, metabolic, gene regulatory, and protein-protein relationships. Recent approaches on biological network study are based on different mathematical concepts. It is necessary to construct a uniform framework to judge the functionality of biological networks. We recently introduced a knowledge-based computational framework that reliably characterized biological networks in system level. The method worked by making systematic comparisons to a set of well-studied “basic networks,” measuring both the functional and topological similarities. A biological network could be characterized as a spectrum-like vector consisting of similarities to basic networks. Here, to facilitate the application, development, and adoption of this framework, we present an R package called NFP. This package extends our previous pipeline, offering a powerful set of functions for Network Fingerprint analysis. The software shows great potential in biological network study. The open source NFP R package is freely available under the GNU General Public License v2.0 at CRAN along with the vignette. Yang Cao, Wenjian Xu, Chao Niu, Xiaochen Bo, and Fei Li Copyright © 2017 Yang Cao et al. All rights reserved. The Correlation-Base-Selection Algorithm for Diagnostic Schizophrenia Based on Blood-Based Gene Expression Signatures Thu, 09 Feb 2017 00:00:00 +0000 Microarray analysis of gene expression is often used to diagnose different types of disease. Many studies report remarkable achievements in nervous system disease. Clinical diagnosis of schizophrenia (SCZ) still depends on doctors’ experience, which is unreliable and needs to be more objective and quantified. To solve this problem, we collected whole blood gene expression data from four studies, including 152 individuals with schizophrenia (SCZ) and 138 normal controls in different regions. The correlation-based feature selection (CFS, one of the machine learning methods) algorithm was applied in this study, and 103 significantly differentially expressed genes between patients and controls, called “feature genes,” were selected; then, a model for SCZ diagnosis was built. The samples were subdivided into 10 groups, and cross-validation showed that the model we constructed achieved nearly 100% classification accuracy. Mathematical evaluation of the datasets before and after data processing proved the effectiveness of our algorithm. Feature genes were enriched in Parkinson’s disease, oxidative phosphorylation, and TGF-beta signaling pathways, which were previously reported to be associated with SCZ. These results suggest that the analysis of gene expression in whole blood by our model could be a useful tool for diagnosing SCZ. Hang Zhang, Ziyang Xie, Yuwen Yang, Yizhen Zhao, Bao Zhang, and Jing Fang Copyright © 2017 Hang Zhang et al. All rights reserved. Bioinformatics Study of Structural Patterns in Plant MicroRNA Precursors Thu, 09 Feb 2017 00:00:00 +0000 According to the RNA world theory, RNAs which stored genetic information and catalyzed chemical reactions had their contribution in the formation of current living organisms. In recent years, researchers studied this molecule diversity, i.a. focusing on small non-coding regulatory RNAs. Among them, of particular interest is evolutionarily ancient, 19–24 nt molecule of microRNA (miRNA). It has been already recognized as a regulator of gene expression in eukaryotes. In plants, miRNA plays a key role in the response to stress conditions and it participates in the process of growth and development. MicroRNAs originate from primary transcripts (pri-miRNA) encoded in the nuclear genome. They are processed from single-stranded stem-loop RNA precursors containing hairpin structures. While the mechanism of mature miRNA production in animals is better understood, its biogenesis in plants remains less clear. Herein, we present the results of bioinformatics analysis aimed at discovering how plant microRNAs are recognized within their precursors (pre-miRNAs). The study has been focused on sequential and structural motif identification in the neighbourhood of microRNA. J. Miskiewicz, K. Tomczyk, A. Mickiewicz, J. Sarzynska, and M. Szachniuk Copyright © 2017 J. Miskiewicz et al. All rights reserved. Visual Display of 5p-arm and 3p-arm miRNA Expression with a Mobile Application Wed, 08 Feb 2017 00:00:00 +0000 MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization. Chao-Yu Pan, Wei-Ting Kuo, Chien-Yuan Chiu, and Wen-chang Lin Copyright © 2017 Chao-Yu Pan et al. All rights reserved. Big Data and Network Biology 2016 Thu, 26 Jan 2017 08:52:04 +0000 Shigehiko Kanaya, Md. Altaf-Ul-Amin, Samuel K. Kiboi, and Farit Mochamad Afendi Copyright © 2017 Shigehiko Kanaya et al. All rights reserved. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach Sun, 22 Jan 2017 09:11:26 +0000 Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool. Yasser Abduallah, Turki Turki, Kevin Byron, Zongxuan Du, Miguel Cervantes-Cervantes, and Jason T. L. Wang Copyright © 2017 Yasser Abduallah et al. All rights reserved. Pharmacoinformatics, Adaptive Evolution, and Elucidation of Six Novel Compounds for Schizophrenia Treatment by Targeting DAOA (G72) Isoforms Thu, 19 Jan 2017 12:09:16 +0000 Studies on Schizophrenia so far reveal a complex picture of neurological malfunctioning reported to be strongly associated with DAOA. Detailed sequence analyses proved DAOA as a primate specific gene having conserved gene desert region on both upstream and downstream region. The analyses of 10 MB chromosomal region of primates, birds, rodents, and reptiles having DAOA evidenced the conserved part in primates and in the rest of species, while DAOA is only present in primates. DAOA has four isoforms having one interaction partner DAO. Protein-protein analyses of four DAOA isoforms with DAO were performed individually and find potential interacting residues computationally. It was observed that molecular docking of approved FDA drugs revealed efficient results but there was no common drug with effective binding to all DAOA isoforms. Library of compounds was constructed by virtual screening of 2D similarity search against recommended SZ drugs in conjunction with their physiochemical properties. Molecular docking resulted in six novel compounds exhibiting maximum binding affinity with selected four DAOA isoforms. However not the entire schizophrenic population responds to the single drug and interestingly in this study six novel compounds having promising results and same binding site to that DAOA that may be used to interact with DAO against four DAOA isoforms were observed. Sheikh Arslan Sehgal Copyright © 2017 Sheikh Arslan Sehgal. All rights reserved. Molecular Cloning, Bioinformatic Analysis, and Expression of Bombyx mori Lebocin 5 Gene Related to Beauveria bassiana Infection Tue, 17 Jan 2017 06:24:05 +0000 A full-length cDNA of lebocin 5 (BmLeb5) was first cloned from silkworm, Bombyx mori, by rapid amplification of cDNA ends. The BmLeb5 gene is 808 bp in length and the open reading frame encodes a 179-amino acid hydroxyproline-rich peptide. Bioinformatic analysis results showed that BmLeb5 owns an O-glycosylation site and four RXXR motifs as other lebocins. Sequence similarity and phylogenic analysis results indicated that lebocins form a multiple gene family in silkworm as cecropins. Quantitative real-time PCR analysis revealed that BmLeb5 was highest expressed in the fat body. In the silkworm larvae infected by Beauveria bassiana, the expression level of BmLeb5 was upregulated in the fat body and hemolymph which are the most important immune tissues in silkworm. The recombinant protein of BmLeb5 was for the first time successfully expressed with prokaryotic expression system and purified. There are no reports so far that the expression of lebocins could be induced by entomopathogenic fungus. Our study suggested that BmLeb5 might play an important role in the immune response of silkworm to defend B. bassiana infection. The results also provided helpful information for further studying the lebocin family functioned in antifungal immune response in the silkworm. Dingding Lü, Chengxiang Hou, Guangxing Qin, Kun Gao, Tian Chen, and Xijie Guo Copyright © 2017 Dingding Lü et al. All rights reserved. Construction of Multilevel Structure for Avian Influenza Virus System Based on Granular Computing Mon, 16 Jan 2017 14:14:36 +0000 Exploring the genetic structure of influenza viruses attracts the attention in the field of molecular ecology and medical genetics, whose epidemics cause morbidity and mortality worldwide. The rapid variations in RNA strand and changes of protein structure of the virus result in low-accuracy subtyping identification and make it difficult to develop effective drugs and vaccine. This paper constructs the evolutionary structure of avian influenza virus system considering both hemagglutinin and neuraminidase protein fragments. An optimization model was established to determine the rational granularity of the virus system for exploring the intrinsic relationship among the subtypes based on the fuzzy hierarchical evaluation index. Thus, an algorithm was presented to extract the rational structure. Furthermore, to reduce the systematic and computational complexity, the granular signatures of virus system were identified based on the coarse-grained idea and then its performance was evaluated through a designed classifier. The results showed that the obtained virus signatures could approximate and reflect the whole avian influenza virus system, indicating that the proposed method could identify the effective virus signatures. Once a new molecular virus is detected, it is efficient to identify the homologous virus hierarchically. Yang Li, Qi-Hao Liang, Meng-Meng Sun, Xu-Qing Tang, and Ping Zhu Copyright © 2017 Yang Li et al. All rights reserved. Epidermal Growth Factor Signaling towards Proliferation: Modeling and Logic Inference Using Forward and Backward Search Mon, 16 Jan 2017 00:00:00 +0000 In biological systems, pathways define complex interaction networks where multiple molecular elements are involved in a series of controlled reactions producing responses to specific biomolecular signals. These biosystems are dynamic and there is a need for mathematical and computational methods able to analyze the symbolic elements and the interactions between them and produce adequate readouts of such systems. In this work, we use rewriting logic to analyze the cellular signaling of epidermal growth factor (EGF) and its cell surface receptor (EGFR) in order to induce cellular proliferation. Signaling is initiated by binding the ligand protein EGF to the membrane-bound receptor EGFR so as to trigger a reactions path which have several linked elements through the cell from the membrane till the nucleus. We present two different types of search for analyzing the EGF/proliferation system with the help of Pathway Logic tool, which provides a knowledge-based development environment to carry out the modeling of the signaling. The first one is a standard (forward) search. The second one is a novel approach based on narrowing, which allows us to trace backwards the causes of a given final state. The analysis allows the identification of critical elements that have to be activated to provoke proliferation. Adrián Riesco, Beatriz Santos-Buitrago, Javier De Las Rivas, Merrill Knapp, Gustavo Santos-García, and Carolyn Talcott Copyright © 2017 Adrián Riesco et al. All rights reserved. Database of Periodic DNA Regions in Major Genomes Sun, 15 Jan 2017 12:40:05 +0000 Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes. Felix E. Frenkel, Maria A. Korotkova, and Eugene V. Korotkov Copyright © 2017 Felix E. Frenkel et al. All rights reserved. Novel Approach to Classify Plants Based on Metabolite-Content Similarity Mon, 09 Jan 2017 06:42:15 +0000 Secondary metabolites are bioactive substances with diverse chemical structures. Depending on the ecological environment within which they are living, higher plants use different combinations of secondary metabolites for adaptation (e.g., defense against attacks by herbivores or pathogenic microbes). This suggests that the similarity in metabolite content is applicable to assess phylogenic similarity of higher plants. However, such a chemical taxonomic approach has limitations of incomplete metabolomics data. We propose an approach for successfully classifying 216 plants based on their known incomplete metabolite content. Structurally similar metabolites have been clustered using the network clustering algorithm DPClus. Plants have been represented as binary vectors, implying relations with structurally similar metabolite groups, and classified using Ward’s method of hierarchical clustering. Despite incomplete data, the resulting plant clusters are consistent with the known evolutional relations of plants. This finding reveals the significance of metabolite content as a taxonomic marker. We also discuss the predictive power of metabolite content in exploring nutritional and medicinal properties in plants. As a byproduct of our analysis, we could predict some currently unknown species-metabolite relations. Kang Liu, Azian Azamimi Abdullah, Ming Huang, Takaaki Nishioka, Md. Altaf-Ul-Amin, and Shigehiko Kanaya Copyright © 2017 Kang Liu et al. All rights reserved. The Double Layer Methodology and the Validation of Eigenbehavior Techniques Applied to Lifestyle Modeling Wed, 04 Jan 2017 00:00:00 +0000 A novel methodology, the double layer methodology (DLM), for modeling an individual’s lifestyle and its relationships with health indicators is presented. The DLM is applied to model behavioral routines emerging from self-reports of daily diet and activities, annotated by 21 healthy subjects over 2 weeks. Unsupervised clustering on the first layer of the DLM separated our population into two groups. Using eigendecomposition techniques on the second layer of the DLM, we could find activity and diet routines, predict behaviors in a portion of the day (with an accuracy of 88% for diet and 66% for activity), determine between day and between individual similarities, and detect individual’s belonging to a group based on behavior (with an accuracy up to 64%). We found that clustering based on health indicators was mapped back into activity behaviors, but not into diet behaviors. In addition, we showed the limitations of eigendecomposition for lifestyle applications, in particular when applied to noisy and sparse behavioral data such as dietary information. Finally, we proposed the use of the DLM for supporting adaptive and personalized recommender systems for stimulating behavior change. Giuseppina Schiavone, Bishal Lamichhane, and Chris Van Hoof Copyright © 2017 Giuseppina Schiavone et al. All rights reserved. RNA Sequencing Reveals Xyr1 as a Transcription Factor Regulating Gene Expression beyond Carbohydrate Metabolism Tue, 27 Dec 2016 12:46:04 +0000 Xyr1 has been demonstrated to be the main transcription activator of (hemi)cellulases in the well-known cellulase producer Trichoderma reesei. This study comprehensively investigates the genes regulated by Xyr1 through RNA sequencing to produce the transcription profiles of T. reesei Rut-C30 and its xyr1 deletion mutant (Δxyr1), cultured on lignocellulose or glucose. xyr1 deletion resulted in 467 differentially expressed genes on inducing medium. Almost all functional genes involved in (hemi)cellulose degradation and many transporters belonging to the sugar porter family in the major facilitator superfamily (MFS) were downregulated in Δxyr1. By contrast, all differentially expressed protease, lipase, chitinase, some ATP-binding cassette transporters, and heat shock protein-encoding genes were upregulated in Δxyr1. When cultured on glucose, a total of 281 genes were expressed differentially in Δxyr1, most of which were involved in energy, solute transport, lipid, amino acid, and monosaccharide as well as secondary metabolism. Electrophoretic mobility shift assays confirmed that the intracellular β-glucosidase bgl2, the putative nonenzymatic cellulose-attacking gene cip1, the MFS lactose transporter lp, the nmrA-like gene, endo T, the acid protease pepA, and the small heat shock protein hsp23 were probable Xyr1-targets. These results might help elucidate the regulation system for synthesis and secretion of (hemi)cellulases in T. reesei Rut-C30. Liang Ma, Ling Chen, Lei Zhang, Gen Zou, Rui Liu, Yanping Jiang, and Zhihua Zhou Copyright © 2016 Liang Ma et al. All rights reserved. META2: Intercellular DNA Methylation Pairwise Annotation and Integrative Analysis Tue, 27 Dec 2016 06:57:50 +0000 Genome-wide deciphering intercellular differential DNA methylation as well as its roles in transcriptional regulation remains elusive in cancer epigenetics. Here we developed a toolkit META2 for DNA methylation annotation and analysis, which aims to perform integrative analysis on differentially methylated loci and regions through deep mining and statistical comparison methods. META2 contains multiple versatile functions for investigating and annotating DNA methylation profiles. Benchmarked with T-47D cell, we interrogated the association within differentially methylated CpG (DMC) and region (DMR) candidate count and region length and identified major transition zones as clues for inferring statistically significant DMRs; together we validated those DMRs with the functional annotation. Thus META2 can provide a comprehensive analysis approach for epigenetic research and clinical study. Binhua Tang Copyright © 2016 Binhua Tang. All rights reserved. A Systematic Framework for Drug Repositioning from Integrated Omics and Drug Phenotype Profiles Using Pathway-Drug Network Mon, 26 Dec 2016 13:15:10 +0000 Drug repositioning offers new clinical indications for old drugs. Recently, many computational approaches have been developed to repurpose marketed drugs in human diseases by mining various of biological data including disease expression profiles, pathways, drug phenotype expression profiles, and chemical structure data. However, despite encouraging results, a comprehensive and efficient computational drug repositioning approach is needed that includes the high-level integration of available resources. In this study, we propose a systematic framework employing experimental genomic knowledge and pharmaceutical knowledge to reposition drugs for a specific disease. Specifically, we first obtain experimental genomic knowledge from disease gene expression profiles and pharmaceutical knowledge from drug phenotype expression profiles and construct a pathway-drug network representing a priori known associations between drugs and pathways. To discover promising candidates for drug repositioning, we initialize node labels for the pathway-drug network using identified disease pathways and known drugs associated with the phenotype of interest and perform network propagation in a semisupervised manner. To evaluate our method, we conducted some experiments to reposition 1309 drugs based on four different breast cancer datasets and verified the results of promising candidate drugs for breast cancer by a two-step validation procedure. Consequently, our experimental results showed that the proposed framework is quite useful approach to discover promising candidates for breast cancer treatment. Erkhembayar Jadamba and Miyoung Shin Copyright © 2016 Erkhembayar Jadamba and Miyoung Shin. All rights reserved. Identification of Potential Key lncRNAs and Genes Associated with Aging Based on Microarray Data of Adipocytes from Mice Wed, 21 Dec 2016 07:16:03 +0000 Objective. This study aimed to screen potential crucial lncRNAs and genes involved in aging. Methods. The data of 9 peripheral white adipocytes, respectively, taken from male C57BL/6J mice (6 months, 14 months, and 18 months of age) in GSE25905 were used in this study. Differentially time series expressed lncRNA genes (DE-lncRNAs) and mRNA genes (DEGs) were identified. After cluster analysis of lncRNAs expression pattern, target genes of DE-lncRNAs were predicted from the DEGs, and functional analysis for target genes was conducted. Results. A total of 8301 time series-related DEGs and 43 time series-related DE-lncRNAs were identified. Among them, 41 DE-lncRNAs targeted 1880 DEGs. The DEGs positively regulated by DE-lncRNAs were mainly related to the development of blood vessel and the pathways of cholesterol biosynthesis and elastic fibre formation. Furthermore, the DEGs negatively regulated by DE-lncRNAs were correlated with protein metabolism. Conclusion. These DE-lncRNAs and DEGs are potentially involved in the process of aging. Yi Yang, Zongyan Teng, Songyan Meng, and Weigang Yu Copyright © 2016 Yi Yang et al. All rights reserved. Optimal Control Model of Tumor Treatment with Oncolytic Virus and MEK Inhibitor Wed, 21 Dec 2016 07:10:59 +0000 Tumors are a serious threat to human health. The oncolytic virus is a kind of tumor killer virus which can infect and lyse cancer cells and spread through the tumor, while leaving normal cells largely unharmed. Mathematical models can help us to understand the tumor-virus dynamics and find better treatment strategies. This paper gives a new mathematical model of tumor therapy with oncolytic virus and MEK inhibitor. Stable analysis was given. Because mitogen-activated protein kinase (MEK) can not only lead to greater oncolytic virus infection into cancer cells, but also limit the replication of the virus, in order to provide the best dosage of MEK inhibitors and balance the positive and negative effect of the inhibitors, we put forward an optimal control problem of the inhibitor. The optimal strategies are given by theory and simulation. Yongmei Su, Chen Jia, and Ying Chen Copyright © 2016 Yongmei Su et al. All rights reserved. Horizontally Transferred Genetic Elements in the Tsetse Fly Genome: An Alignment-Free Clustering Approach Using Batch Learning Self-Organising Map (BLSOM) Thu, 15 Dec 2016 16:12:14 +0000 Tsetse flies (Glossina spp.) are the primary vectors of trypanosomes, which can cause human and animal African trypanosomiasis in Sub-Saharan African countries. The objective of this study was to explore the genome of Glossina morsitans morsitans for evidence of horizontal gene transfer (HGT) from microorganisms. We employed an alignment-free clustering method, that is, batch learning self-organising map (BLSOM), in which sequence fragments are clustered based on the similarity of oligonucleotide frequencies independently of sequence homology. After an initial scan of HGT events using BLSOM, we identified 3.8% of the tsetse fly genome as HGT candidates. The predicted donors of these HGT candidates included known symbionts, such as Wolbachia, as well as bacteria that have not previously been associated with the tsetse fly. We detected HGT candidates from diverse bacteria such as Bacillus and Flavobacteria, suggesting a past association between these taxa. Functional annotation revealed that the HGT candidates encoded loci in various functional pathways, such as metabolic and antibiotic biosynthesis pathways. These findings provide a basis for understanding the coevolutionary history of the tsetse fly and its microbes and establish the effectiveness of BLSOM for the detection of HGT events. Ryo Nakao, Takashi Abe, Shunsuke Funayama, and Chihiro Sugimoto Copyright © 2016 Ryo Nakao et al. All rights reserved. Multichannel Convolutional Neural Network for Biological Relation Extraction Wed, 07 Dec 2016 13:58:14 +0000 The plethora of biomedical relations which are embedded in medical logs (records) demands researchers’ attention. Previous theoretical and practical focuses were restricted on traditional machine learning techniques. However, these methods are susceptible to the issues of “vocabulary gap” and data sparseness and the unattainable automation process in feature extraction. To address aforementioned issues, in this work, we propose a multichannel convolutional neural network (MCCNN) for automated biomedical relation extraction. The proposed model has the following two contributions: it enables the fusion of multiple (e.g., five) versions in word embeddings; the need for manual feature engineering can be obviated by automated feature learning with convolutional neural network (CNN). We evaluated our model on two biomedical relation extraction tasks: drug-drug interaction (DDI) extraction and protein-protein interaction (PPI) extraction. For DDI task, our system achieved an overall -score of 70.2% compared to the standard linear SVM based system (e.g., 67.0%) on DDIExtraction 2013 challenge dataset. And for PPI task, we evaluated our system on Aimed and BioInfer PPI corpus; our system exceeded the state-of-art ensemble SVM system by 2.7% and 5.6% on -scores. Chanqin Quan, Lei Hua, Xiao Sun, and Wenjun Bai Copyright © 2016 Chanqin Quan et al. All rights reserved. Workflow for Genome-Wide Determination of Pre-mRNA Splicing Efficiency from Yeast RNA-seq Data Tue, 06 Dec 2016 06:48:06 +0000 Pre-mRNA splicing represents an important regulatory layer of eukaryotic gene expression. In the simple budding yeast Saccharomyces cerevisiae, about one-third of all mRNA molecules undergo splicing, and splicing efficiency is tightly regulated, for example, during meiotic differentiation. S. cerevisiae features a streamlined, evolutionarily highly conserved splicing machinery and serves as a favourite model for studies of various aspects of splicing. RNA-seq represents a robust, versatile, and affordable technique for transcriptome interrogation, which can also be used to study splicing efficiency. However, convenient bioinformatics tools for the analysis of splicing efficiency from yeast RNA-seq data are lacking. We present a complete workflow for the calculation of genome-wide splicing efficiency in S. cerevisiae using strand-specific RNA-seq data. Our pipeline takes sequencing reads in the FASTQ format and provides splicing efficiency values for the 5′ and 3′ splice junctions of each intron. The pipeline is based on up-to-date open-source software tools and requires very limited input from the user. We provide all relevant scripts in a ready-to-use form. We demonstrate the functionality of the workflow using RNA-seq datasets from three spliceosome mutants. The workflow should prove useful for studies of yeast splicing mutants or of regulated splicing, for example, under specific growth conditions. Martin Převorovský, Martina Hálová, Kateřina Abrhámová, Jiří Libus, and Petr Folk Copyright © 2016 Martin Převorovský et al. All rights reserved. Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network Wed, 30 Nov 2016 11:55:04 +0000 Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets. Xue Jiang, Han Zhang, and Xiongwen Quan Copyright © 2016 Xue Jiang et al. All rights reserved. Functional Genomics, Genetics, and Bioinformatics 2016 Tue, 22 Nov 2016 06:41:07 +0000 Youping Deng, Hongwei Wang, Ryuji Hamamoto, Shiwei Duan, Mehdi Pirooznia, and Yongsheng Bai Copyright © 2016 Youping Deng et al. All rights reserved. A Multiagent System for Dynamic Data Aggregation in Medical Research Wed, 16 Nov 2016 09:04:20 +0000 The collection of medical data for research purposes is a challenging and long-lasting process. In an effort to accelerate and facilitate this process we propose a new framework for dynamic aggregation of medical data from distributed sources. We use agent-based coordination between medical and research institutions. Our system employs principles of peer-to-peer network organization and coordination models to search over already constructed distributed databases and to identify the potential contributors when a new database has to be built. Our framework takes into account both the requirements of a research study and current data availability. This leads to better definition of database characteristics such as schema, content, and privacy parameters. We show that this approach enables a more efficient way to collect data for medical research. Alevtina Dubovitskaya, Visara Urovi, Imanol Barba, Karl Aberer, and Michael Ignaz Schumacher Copyright © 2016 Alevtina Dubovitskaya et al. All rights reserved. Identification of Five Novel Salmonella Typhi-Specific Genes as Markers for Diagnosis of Typhoid Fever Using Single-Gene Target PCR Assays Tue, 15 Nov 2016 06:17:29 +0000 Salmonella Typhi (S. Typhi) causes typhoid fever which is a disease characterised by high mortality and morbidity worldwide. In order to curtail the transmission of this highly infectious disease, identification of new markers that can detect the pathogen is needed for development of sensitive and specific diagnostic tests. In this study, genomic comparison of S. Typhi with other enteric pathogens was performed, and 6 S. Typhi genes, that is, STY0201, STY0307, STY0322, STY0326, STY2020, and STY2021, were found to be specific in silico. Six PCR assays each targeting a unique gene were developed to test the specificity of these genes in vitro. The diagnostic sensitivities and specificities of each assay were determined using 39 S. Typhi, 62 non-Typhi Salmonella, and 10 non-Salmonella clinical isolates. The results showed that 5 of these genes, that is, STY0307, STY0322, STY0326, STY2020, and STY2021, demonstrated 100% sensitivity (39/39) and 100% specificity (0/72). The detection limit of the 5 PCR assays was 32 pg for STY0322, 6.4 pg for STY0326, STY2020, and STY2021, and 1.28 pg for STY0307. In conclusion, 5 PCR assays using STY0307, STY0322, STY0326, STY2020, and STY2021 were developed and found to be highly specific at single-gene target resolution for diagnosis of typhoid fever. Yuan Xin Goay, Kai Ling Chin, Clarissa Ling Ling Tan, Chiann Ying Yeoh, Ja’afar Nuhu Ja’afar, Abdul Rahman Zaidah, Suresh Venkata Chinni, and Kia Kien Phua Copyright © 2016 Yuan Xin Goay et al. All rights reserved. An Entropy-Based Position Projection Algorithm for Motif Discovery Wed, 02 Nov 2016 07:05:46 +0000 Motif discovery problem is crucial for understanding the structure and function of gene expression. Over the past decades, many attempts using consensus and probability training model for motif finding are successful. However, the most existing motif discovery algorithms are still time-consuming or easily trapped in a local optimum. To overcome these shortcomings, in this paper, we propose an entropy-based position projection algorithm, called EPP, which designs a projection process to divide the dataset and explores the best local optimal solution. The experimental results on real DNA sequences, Tompa data, and ChIP-seq data show that EPP is advantageous in dealing with the motif discovery problem and outperforms current widely used algorithms. Yipu Zhang, Ping Wang, and Maode Yan Copyright © 2016 Yipu Zhang et al. All rights reserved. Comparative Proteomic Analysis of Light-Induced Mycelial Brown Film Formation in Lentinula edodes Thu, 27 Oct 2016 06:12:57 +0000 Light-induced brown film (BF) formation by the vegetative mycelium of Lentinula edodes is important for ensuring the quantity and quality of this edible mushroom. Nevertheless, the molecular mechanism underlying this phenotype is still unclear. In this study, a comparative proteomic analysis of mycelial BF formation in L. edodes was performed. Seventy-three protein spots with at least a twofold difference in abundance on two-dimensional electrophoresis (2DE) maps were observed, and 52 of them were successfully identified by matrix-assisted laser desorption/ionization tandem time-of-flight mass spectrometry (MALDI-TOF/TOF/MS). These proteins were classified into the following functional categories: small molecule metabolic processes (39%), response to oxidative stress (5%), and organic substance catabolic processes (5%), followed by oxidation-reduction processes (3%), single-organism catabolic processes (3%), positive regulation of protein complex assembly (3%), and protein metabolic processes (3%). Interestingly, four of the proteins that were upregulated in response to light exposure were nucleoside diphosphate kinases. To our knowledge, this is the first proteomic analysis of the mechanism of BF formation in L. edodes. Our data will provide a foundation for future detailed investigations of the proteins linked to BF formation. Li Hua Tang, Qi Tan, Da Peng Bao, Xue Hong Zhang, Hua Hua Jian, Yan Li, Rui heng Yang, and Ying Wang Copyright © 2016 Li Hua Tang et al. All rights reserved. PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets Mon, 24 Oct 2016 13:30:14 +0000 Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even higher computing performance. To efficiently identify motifs in large DNA data sets, a new algorithm called PairMotifChIP is proposed by extracting and combining pairs of -mers in the input with relatively small Hamming distance. In particular, a method for rapidly extracting pairs of -mers is designed, which can be used not only for PairMotifChIP, but also for other DNA data mining tasks with the same demand. Experimental results on the simulated data show that the proposed algorithm can find motifs successfully and runs faster than the state-of-the-art motif discovery algorithms. Furthermore, the validity of the proposed algorithm has been verified on real data. Qiang Yu, Hongwei Huo, and Dazheng Feng Copyright © 2016 Qiang Yu et al. All rights reserved. Computational Analysis of Damaging Single-Nucleotide Polymorphisms and Their Structural and Functional Impact on the Insulin Receptor Thu, 20 Oct 2016 16:06:18 +0000 Single-nucleotide polymorphisms (SNPs) associated with complex disorders can create, destroy, or modify protein coding sites. Single amino acid substitutions in the insulin receptor (INSR) are the most common forms of genetic variations that account for various diseases like Donohue syndrome or Leprechaunism, Rabson-Mendenhall syndrome, and type A insulin resistance. We analyzed the deleterious nonsynonymous SNPs (nsSNPs) in INSR gene based on different computational methods. Analysis of INSR was initiated with PROVEAN followed by PolyPhen and I-Mutant servers to investigate the effects of 57 nsSNPs retrieved from database of SNP (dbSNP). A total of 18 mutations that were found to exert damaging effects on the INSR protein structure and function were chosen for further analysis. Among these mutations, our computational analysis suggested that 13 nsSNPs decreased protein stability and might have resulted in loss of function. Therefore, the probability of their involvement in disease predisposition increases. In the lack of adequate prior reports on the possible deleterious effects of nsSNPs, we have systematically analyzed and characterized the functional variants in coding region that can alter the expression and function of INSR gene. In silico characterization of nsSNPs affecting INSR gene function can aid in better understanding of genetic differences in disease susceptibility. Zabed Mahmud, Syeda Umme Fahmida Malik, Jahed Ahmed, and Abul Kalam Azad Copyright © 2016 Zabed Mahmud et al. All rights reserved. Correlation-Based Network Generation, Visualization, and Analysis as a Powerful Tool in Biological Studies: A Case Study in Cancer Cell Metabolism Wed, 19 Oct 2016 13:03:02 +0000 In the last decade vast data sets are being generated in biological and medical studies. The challenge lies in their summary, complexity reduction, and interpretation. Correlation-based networks and graph-theory based properties of this type of networks can be successfully used during this process. However, the procedure has its pitfalls and requires specific knowledge that often lays beyond classical biology and includes many computational tools and software. Here we introduce one of a series of methods for correlation-based network generation and analysis using freely available software. The pipeline allows the user to control each step of the network generation and provides flexibility in selection of correlation methods and thresholds. The pipeline was implemented on published metabolomics data of a population of human breast carcinoma cell lines MDA-MB-231 under two conditions: normal and hypoxia. The analysis revealed significant differences between the metabolic networks in response to the tested conditions. The network under hypoxia had 1.7 times more significant correlations between metabolites, compared to normal conditions. Unique metabolic interactions were identified which could lead to the identification of improved markers or aid in elucidating the mechanism of regulation between distantly related metabolites induced by the cancer growth. Albert Batushansky, David Toubiana, and Aaron Fait Copyright © 2016 Albert Batushansky et al. All rights reserved. Semisupervised Learning Based Disease-Symptom and Symptom-Therapeutic Substance Relation Extraction from Biomedical Literature Sun, 16 Oct 2016 08:26:37 +0000 With the rapid growth of biomedical literature, a large amount of knowledge about diseases, symptoms, and therapeutic substances hidden in the literature can be used for drug discovery and disease therapy. In this paper, we present a method of constructing two models for extracting the relations between the disease and symptom and symptom and therapeutic substance from biomedical texts, respectively. The former judges whether a disease causes a certain physiological phenomenon while the latter determines whether a substance relieves or eliminates a certain physiological phenomenon. These two kinds of relations can be further utilized to extract the relations between disease and therapeutic substance. In our method, first two training sets for extracting the relations between the disease-symptom and symptom-therapeutic substance are manually annotated and then two semisupervised learning algorithms, that is, Co-Training and Tri-Training, are applied to utilize the unlabeled data to boost the relation extraction performance. Experimental results show that exploiting the unlabeled data with both Co-Training and Tri-Training algorithms can enhance the performance effectively. Qinlin Feng, Yingyi Gui, Zhihao Yang, Lei Wang, and Yuxia Li Copyright © 2016 Qinlin Feng et al. All rights reserved. Single-Trial Sparse Representation-Based Approach for VEP Extraction Tue, 11 Oct 2016 07:06:50 +0000 Sparse representation is a powerful tool in signal denoising, and visual evoked potentials (VEPs) have been proven to have strong sparsity over an appropriate dictionary. Inspired by this idea, we present in this paper a novel sparse representation-based approach to solving the VEP extraction problem. The extraction process is performed in three stages. First, instead of using the mixed signals containing the electroencephalogram (EEG) and VEPs, we utilise an EEG from a previous trial, which did not contain VEPs, to identify the parameters of the EEG autoregressive (AR) model. Second, instead of the moving average (MA) model, sparse representation is used to model the VEPs in the autoregressive-moving average (ARMA) model. Finally, we calculate the sparse coefficients and derive VEPs by using the AR model. Next, we tested the performance of the proposed algorithm with synthetic and real data, after which we compared the results with that of an AR model with exogenous input modelling and a mixed overcomplete dictionary-based sparse component decomposition method. Utilising the synthetic data, the algorithms are then employed to estimate the latencies of P100 of the VEPs corrupted by added simulated EEG at different signal-to-noise ratio (SNR) values. The validations demonstrate that our method can well preserve the details of the VEPs for latency estimation, even in low SNR environments. Nannan Yu, Funian Hu, Dexuan Zou, Qisheng Ding, and Hanbing Lu Copyright © 2016 Nannan Yu et al. All rights reserved. Prediction of Early Recurrence of Liver Cancer by a Novel Discrete Bayes Decision Rule for Personalized Medicine Sun, 09 Oct 2016 13:19:21 +0000 We discuss a novel diagnostic method for predicting the early recurrence of liver cancer with high accuracy for personalized medicine. The difficulty with cancer treatment is that even if the types of cancer are the same, the cancers vary depending on the patient. Thus, remarkable attention has been paid to personalized medicine. Unfortunately, although the Tokyo Score, the Modified JIS, and the TNM classification have been proposed as liver scoring systems, none of these scoring systems have met the needs of clinical practice. In this paper, we convert continuous and discrete data to categorical data and keep the natively categorical data as is. Then, we propose a discrete Bayes decision rule that can deal with the categorical data. This may lead to its use with various types of laboratory data. Experimental results show that the proposed method produced a sensitivity of 0.86 and a specificity of 0.49 for the test samples. This suggests that our method may be superior to the well-known Tokyo Score, the Modified JIS, and the TNM classification in terms of sensitivity. Additional comparative study shows that if the numbers of test samples in two classes are the same, this method works well in terms of the measure compared to the existing scoring methods. Hiroyuki Ogihara, Norio Iizuka, and Yoshihiko Hamamoto Copyright © 2016 Hiroyuki Ogihara et al. All rights reserved. Pattern Recognition in Bioinformatics Sun, 09 Oct 2016 08:52:05 +0000 Sher Afzal Khan, Daojing He, and Jose C. Valverde Copyright © 2016 Sher Afzal Khan et al. All rights reserved. Brain-Computer Interface for Control of Wheelchair Using Fuzzy Neural Networks Thu, 29 Sep 2016 12:07:33 +0000 The design of brain-computer interface for the wheelchair for physically disabled people is presented. The design of the proposed system is based on receiving, processing, and classification of the electroencephalographic (EEG) signals and then performing the control of the wheelchair. The number of experimental measurements of brain activity has been done using human control commands of the wheelchair. Based on the mental activity of the user and the control commands of the wheelchair, the design of classification system based on fuzzy neural networks (FNN) is considered. The design of FNN based algorithm is used for brain-actuated control. The training data is used to design the system and then test data is applied to measure the performance of the control system. The control of the wheelchair is performed under real conditions using direction and speed control commands of the wheelchair. The approach used in the paper allows reducing the probability of misclassification and improving the control accuracy of the wheelchair. Rahib H. Abiyev, Nurullah Akkaya, Ersin Aytac, Irfan Günsel, and Ahmet Çağman Copyright © 2016 Rahib H. Abiyev et al. All rights reserved. Computer Based Melanocytic and Nevus Image Enhancement and Segmentation Wed, 28 Sep 2016 07:51:29 +0000 Digital dermoscopy aids dermatologists in monitoring potentially cancerous skin lesions. Melanoma is the 5th common form of skin cancer that is rare but the most dangerous. Melanoma is curable if it is detected at an early stage. Automated segmentation of cancerous lesion from normal skin is the most critical yet tricky part in computerized lesion detection and classification. The effectiveness and accuracy of lesion classification are critically dependent on the quality of lesion segmentation. In this paper, we have proposed a novel approach that can automatically preprocess the image and then segment the lesion. The system filters unwanted artifacts including hairs, gel, bubbles, and specular reflection. A novel approach is presented using the concept of wavelets for detection and inpainting the hairs present in the cancer images. The contrast of lesion with the skin is enhanced using adaptive sigmoidal function that takes care of the localized intensity distribution within a given lesion’s images. We then present a segmentation approach to precisely segment the lesion from the background. The proposed approach is tested on the European database of dermoscopic images. Results are compared with the competitors to demonstrate the superiority of the suggested approach. Uzma Jamil, M. Usman Akram, Shehzad Khalid, Sarmad Abbas, and Kashif Saleem Copyright © 2016 Uzma Jamil et al. All rights reserved. Integrated Analysis of Multiscale Large-Scale Biological Data for Investigating Human Disease 2016 Thu, 15 Sep 2016 07:42:14 +0000 Tao Huang, Lei Chen, Jiangning Song, Mingyue Zheng, Jialiang Yang, and Zhenguo Zhang Copyright © 2016 Tao Huang et al. All rights reserved. Corrigendum to “VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity” Wed, 14 Sep 2016 10:32:44 +0000 Yiqing Xu, Changwei Bi, Guoxin Wu, Suyun Wei, Xiaogang Dai, Tongming Yin, and Ning Ye Copyright © 2016 Yiqing Xu et al. All rights reserved. Functions of thga1 Gene in Trichoderma harzianum Based on Transcriptome Analysis Thu, 08 Sep 2016 13:41:21 +0000 Trichoderma spp. are important biocontrol filamentous fungi, which are widely used for their adaptability, broad antimicrobial spectrum, and various antagonistic mechanisms. In our previous studies, we cloned thga1 gene encoding GαI protein from Trichoderma harzianum Th-33. Its knockout mutant showed that the growth rate, conidial yield, cAMP level, antagonistic action, and hydrophobicity decreased. Therefore, Illumina RNA-seq technology (RNA-seq) was used to determine transcriptomic differences between the wild-type strain and thga1 mutant. A total of 888 genes were identified as differentially expressed genes (DEGs), including 427 upregulated and 461 downregulated genes. All DEGs were assigned to KEGG pathway databases, and 318 genes were annotated in 184 individual pathways. KEGG analysis revealed that these unigenes were significantly enriched in metabolism and degradation pathways. GO analysis suggested that the majority of DEGs were associated with catalytic activities and metabolism processes that encode carbohydrate-active enzymes, secondary metabolites, secreted proteins, or transcription factors. According to the functional annotation of these DEGs by KOG, the most abundant group was “secondary metabolite biosynthesis, transport, and catabolism.” Further studies for functional characterization of candidate genes and pathways reported in this paper are necessary to further define the G protein signaling system in T. harzianum. Qing Sun, Xiliang Jiang, Li Pang, Lirong Wang, and Mei Li Copyright © 2016 Qing Sun et al. All rights reserved. Natural Language Processing Based Instrument for Classification of Free Text Medical Records Wed, 07 Sep 2016 12:28:12 +0000 According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and -Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. Manana Khachidze, Magda Tsintsadze, and Maia Archuadze Copyright © 2016 Manana Khachidze et al. All rights reserved. New Trends of Digital Data Storage in DNA Mon, 05 Sep 2016 13:10:42 +0000 With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged period of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand extreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as a potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an organic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving computational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA while identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and stenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also identifies the advantages and limitations of DNA as a memory device and memory applications. Pavani Yashodha De Silva and Gamage Upeksha Ganegoda Copyright © 2016 Pavani Yashodha De Silva and Gamage Upeksha Ganegoda. All rights reserved. Analyzing the miRNA-Gene Networks to Mine the Important miRNAs under Skin of Human and Mouse Mon, 05 Sep 2016 06:44:47 +0000 Genetic networks provide new mechanistic insights into the diversity of species morphology. In this study, we have integrated the MGI, GEO, and miRNA database to analyze the genetic regulatory networks under morphology difference of integument of humans and mice. We found that the gene expression network in the skin is highly divergent between human and mouse. The GO term of secretion was highly enriched, and this category was specific in human compared to mouse. These secretion genes might be involved in eccrine system evolution in human. In addition, total 62,637 miRNA binding target sites were predicted in human integument genes (IGs), while 26,280 miRNA binding target sites were predicted in mouse IGs. The interactions between miRNAs and IGs in human are more complex than those in mouse. Furthermore, hsa-miR-548, mmu-miR-466, and mmu-miR-467 have an enormous number of targets on IGs, which both have the role of inhibition of host immunity response. The pattern of distribution on the chromosome of these three miRNAs families is very different. The interaction of miRNA/IGs has added the new dimension in traditional gene regulation networks of skin. Our results are generating new insights into the gene networks basis of skin difference between human and mouse. Jianghong Wu, Husile Gong, Yongsheng Bai, and Wenguang Zhang Copyright © 2016 Jianghong Wu et al. All rights reserved. Aberrant LncRNA Expression Profile in a Contusion Spinal Cord Injury Mouse Model Sun, 04 Sep 2016 11:37:29 +0000 Long noncoding RNAs (LncRNAs) play a crucial role in cell growth, development, and various diseases related to the central nervous system. However, LncRNA differential expression profiles in spinal cord injury are yet to be reported. In this study, we profiled the expression pattern of LncRNAs using a microarray method in a contusion spinal cord injury (SCI) mouse model. Compared with a spinal cord without injury, few changes in LncRNA expression levels were noted 1 day after injury. The differential changes in LncRNA expression peaked 1 week after SCI and subsequently declined until 3 weeks after injury. Quantitative real-time polymerase chain reaction (qRT-PCR) was used to validate the reliability of the microarray, demonstrating that the results were reliable. Gene ontology (GO) analysis indicated that differentially expressed mRNAs were involved in transport, cell adhesion, ion transport, and metabolic processes, among others. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis showed that the neuroactive ligand-receptor interaction, the PI3K-Akt signaling pathway, and focal adhesions were potentially implicated in SCI pathology. We constructed a dynamic LncRNA-mRNA network containing 264 LncRNAs and 949 mRNAs to elucidate the interactions between the LncRNAs and mRNAs. Overall, the results from this study indicate for the first time that LncRNAs are differentially expressed in a contusion SCI mouse model. Ya Ding, Zhiwen Song, and Jinbo Liu Copyright © 2016 Ya Ding et al. All rights reserved. Transcriptome Analysis of HepG2 Cells Expressing ORF3 from Swine Hepatitis E Virus to Determine the Effects of ORF3 on Host Cells Sun, 28 Aug 2016 16:45:03 +0000 Hepatitis E virus- (HEV-) mediated hepatitis has become a global public health problem. An important regulatory protein of HEV, ORF3, influences multiple signal pathways in host cells. In this study, to investigate the function of ORF3 from the swine form of HEV (SHEV), high-throughput RNA-Seq-based screening was performed to identify the differentially expressed genes in ORF3-expressing HepG2 cells. The results were validated with quantitative real-time PCR and gene ontology was employed to assign differentially expressed genes to functional categories. The results indicated that, in the established ORF3-expressing HepG2 cells, the mRNA levels of CLDN6, YLPM1, APOC3, NLRP1, SCARA3, FGA, FGG, FGB, and FREM1 were upregulated, whereas the mRNA levels of SLC2A3, DKK1, BPIFB2, and PTGR1 were downregulated. The deregulated expression of CLDN6 and FREM1 might contribute to changes in integral membrane protein and basement membrane protein expression, expression changes for NLRP1 might affect the apoptosis of HepG2 cells, and the altered expression of APOC3, SCARA3, and DKK1 may affect lipid metabolism in HepG2 cells. In conclusion, ORF3 plays a functional role in virus-cell interactions by affecting the expression of integral membrane protein and basement membrane proteins and by altering the process of apoptosis and lipid metabolism in host cells. These findings provide important insight into the pathogenic mechanism of HEV. Kailian Xu, Shiyu Guo, Tianjing Zhao, Huapei Zhu, Hanwei Jiao, Qiaoyun Shi, Feng Pang, Yaying Li, Guohua Li, Dongmei Peng, Xin Nie, Ying Cheng, Kebang Wu, Li Du, Ke Cui, Wenguang Zhang, and Fengyang Wang Copyright © 2016 Kailian Xu et al. All rights reserved. Candidate SNP Markers of Chronopathologies Are Predicted by a Significant Change in the Affinity of TATA-Binding Protein for Human Gene Promoters Mon, 22 Aug 2016 09:45:06 +0000 Variations in human genome (e.g., single nucleotide polymorphisms, SNPs) may be associated with hereditary diseases, their complications, comorbidities, and drug responses. Using Web service SNP_TATA_Comparator presented in our previous paper, here we analyzed immediate surroundings of known SNP markers of diseases and identified several candidate SNP markers that can significantly change the affinity of TATA-binding protein for human gene promoters, with circadian consequences. For example, rs572527200 may be related to asthma, where symptoms are circadian (worse at night), and rs367732974 may be associated with heart attacks that are characterized by a circadian preference (early morning). By the same method, we analyzed the 90 bp proximal promoter region of each protein-coding transcript of each human gene of the circadian clock core. This analysis yielded 53 candidate SNP markers, such as rs181985043 (susceptibility to acute Q fever in male patients), rs192518038 (higher risk of a heart attack in patients with diabetes), and rs374778785 (emphysema and lung cancer in smokers). If they are properly validated according to clinical standards, these candidate SNP markers may turn out to be useful for physicians (to select optimal treatment for each patient) and for the general population (to choose a lifestyle preventing possible circadian complications of diseases). Petr Ponomarenko, Dmitry Rasskazov, Valentin Suslov, Ekaterina Sharypova, Ludmila Savinkova, Olga Podkolodnaya, Nikolay L. Podkolodny, Natalya N. Tverdokhleb, Irina Chadaeva, Mikhail Ponomarenko, and Nikolay Kolchanov Copyright © 2016 Petr Ponomarenko et al. All rights reserved. Integrated Analysis of DNA Methylation and mRNA Expression Profiles Data to Identify Key Genes in Lung Adenocarcinoma Wed, 17 Aug 2016 16:48:22 +0000 Introduction. Lung adenocarcinoma (LAC) is the most frequent type of lung cancer and has a high metastatic rate at an early stage. This study is aimed at identifying LAC-associated genes. Materials and Methods. GSE62950 downloaded from Gene Expression Omnibus included a DNA methylation dataset and an mRNA expression profiles dataset, both of which included 28 LAC tissue samples and 28 adjacent normal tissue samples. The differentially expressed genes (DEGs) were screened by Limma package in R, and their functions were predicted by enrichment analysis using TargetMine online tool. Then, protein-protein interaction (PPI) network was constructed using STRING and Cytoscape. Finally, LAC-associated methylation sites were identified by CpGassoc package in R and mapped to the DEGs to obtain LAC-associated DEGs. Results. Total 913 DEGs were identified in LAC tissues. In the PPI networks, MAD2L1, AURKB, CCNB2, CDC20, and WNT3A had higher degrees, and the first four genes might be involved in LAC through interaction. Total 8856 LAC-associated methylation sites were identified and mapped to the DEGs. And there were 29 LAC-associated methylation sites located in 27 DEGs (e.g., SH3GL2, BAI3, CDH13, JAM2, MT1A, LHX6, and IGFBP3). Conclusions. These key genes might play a role in pathogenesis of LAC. Xiang Jin, Xingang Liu, Xiaodan Li, and Yinghui Guan Copyright © 2016 Xiang Jin et al. All rights reserved. SABinder: A Web Service for Predicting Streptavidin-Binding Peptides Wed, 17 Aug 2016 13:18:24 +0000 Streptavidin is sometimes used as the intended target to screen phage-displayed combinatorial peptide libraries for streptavidin-binding peptides (SBPs). More often in the biopanning system, however, streptavidin is just a commonly used anchoring molecule that can efficiently capture the biotinylated target. In this case, SBPs creeping into the biopanning results are not desired binders but target-unrelated peptides (TUP). Taking them as intended binders may mislead subsequent studies. Therefore, it is important to find if a peptide is likely to be an SBP when streptavidin is either the intended target or just the anchoring molecule. In this paper, we describe an SVM-based ensemble predictor called SABinder. It is the first predictor for SBP. The model was built with the feature of optimized dipeptide composition. It was observed that 89.20% (MCC = 0.78; AUC = 0.93; permutation test, ) of peptides were correctly classified. As a web server, SABinder is freely accessible. The tool provides a highly efficient way to exclude potential SBP when they are TUP or to facilitate identification of possibly new SBP when they are the desired binders. In either case, it will be helpful and can benefit related scientific community. Bifang He, Juanjuan Kang, Beibei Ru, Hui Ding, Peng Zhou, and Jian Huang Copyright © 2016 Bifang He et al. All rights reserved. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition Wed, 17 Aug 2016 12:03:43 +0000 Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins. Chi-Hua Tung, Chi-Wei Chen, Ren-Chao Guo, Hui-Fuang Ng, and Yen-Wei Chu Copyright © 2016 Chi-Hua Tung et al. All rights reserved. Differential Regulatory Analysis Based on Coexpression Network in Cancer Research Thu, 11 Aug 2016 06:03:26 +0000 With rapid development of high-throughput techniques and accumulation of big transcriptomic data, plenty of computational methods and algorithms such as differential analysis and network analysis have been proposed to explore genome-wide gene expression characteristics. These efforts are aiming to transform underlying genomic information into valuable knowledges in biological and medical research fields. Recently, tremendous integrative research methods are dedicated to interpret the development and progress of neoplastic diseases, whereas differential regulatory analysis (DRA) based on gene coexpression network (GCN) increasingly plays a robust complement to regular differential expression analysis in revealing regulatory functions of cancer related genes such as evading growth suppressors and resisting cell death. Differential regulatory analysis based on GCN is prospective and shows its essential role in discovering the system properties of carcinogenesis features. Here we briefly review the paradigm of differential regulatory analysis based on GCN. We also focus on the applications of differential regulatory analysis based on GCN in cancer research and point out that DRA is necessary and extraordinary to reveal underlying molecular mechanism in large-scale carcinogenesis studies. Junyi Li, Yi-Xue Li, and Yuan-Yuan Li Copyright © 2016 Junyi Li et al. All rights reserved. Reconstruction of the Fatty Acid Biosynthetic Pathway of Exiguobacterium antarcticum B7 Based on Genomic and Bibliomic Data Tue, 09 Aug 2016 06:23:53 +0000 Exiguobacterium antarcticum B7 is extremophile Gram-positive bacteria able to survive in cold environments. A key factor to understanding cold adaptation processes is related to the modification of fatty acids composing the cell membranes of psychrotrophic bacteria. In our study we show the in silico reconstruction of the fatty acid biosynthesis pathway of E. antarcticum B7. To build the stoichiometric model, a semiautomatic procedure was applied, which integrates genome information using KEGG and RAST/SEED. Constraint-based methods, namely, Flux Balance Analysis (FBA) and elementary modes (EM), were applied. FBA was implemented in the sense of hexadecenoic acid production maximization. To evaluate the influence of the gene expression in the fluxome analysis, FBA was also calculated using the values obtained in the transcriptome analysis at 0°C and 37°C. The fatty acid biosynthesis pathway showed a total of 13 elementary flux modes, four of which showed routes for the production of hexadecenoic acid. The reconstructed pathway demonstrated the capacity of E. antarcticum B7 to de novo produce fatty acid molecules. Under the influence of the transcriptome, the fluxome was altered, promoting the production of short-chain fatty acids. The calculated models contribute to better understanding of the bacterial adaptation at cold environments. Regiane Kawasaki, Rafael A. Baraúna, Artur Silva, Marta S. P. Carepo, Rui Oliveira, Rodolfo Marques, Rommel T. J. Ramos, and Maria P. C. Schneider Copyright © 2016 Regiane Kawasaki et al. All rights reserved. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data Thu, 04 Aug 2016 12:47:25 +0000 Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. Shuaiqun Wang, Aorigele, Wei Kong, Weiming Zeng, and Xiaomin Hong Copyright © 2016 Shuaiqun Wang et al. All rights reserved. Social Determinants of Chronic Prostatitis/Chronic Pelvic Pain Syndrome Related Lifestyle and Behaviors among Urban Men in China: A Case-Control Study Thu, 04 Aug 2016 06:02:22 +0000 Purpose. In order to find key risk factors of chronic prostatitis/chronic pelvic pain syndrome (CP/CPPS) among urban men in China, an age-matched case-control study was performed from September 2012 to May 2013 in Yichang, Hubei Province, China. Methodology. A total of 279 patients and 558 controls were recruited in this study. Data were collected by a self-administered questionnaire, including demographics, diet and lifestyle, psychological status, and a physical exam. Conditional logistic regression model was used to analyze collected data. Results. Chemical factors exposure, night shift, severity of mood, and poor self-health cognition were entered into the regression model, and result displayed that these four factors had odds ratios of 1.929 (95% CI, 1.321–2.819), 1.456 (95% CI, 1.087–1.949), 1.619 (95% CI, 1.280–2.046), and 1.304 (95% CI, 1.094–1.555), respectively, which suggested that these four factors could significantly affect CP/CPPS. Conclusion. These results suggest that many factors affect CP/CPPS, including biological, social, and psychological factors. Yan Wang, Chen Chen, Changcai Zhu, Liang Chen, Qingrong Han, and Huarong Ye Copyright © 2016 Yan Wang et al. All rights reserved. Predicting Diagnostic Gene Biomarkers for Non-Small-Cell Lung Cancer Sun, 31 Jul 2016 08:05:35 +0000 Lung cancer is the primary reason for death due to cancer worldwide, and non-small-cell lung cancer (NSCLC) is the most common subtype of lung cancer. Most patients die from complications of NSCLC due to poor diagnosis. In this paper, we aimed to predict gene biomarkers that may be of use for diagnosis of NSCLC by integrating differential gene expression analysis with functional association network analysis. We first constructed an NSCLC-specific functional association network by combining gene expression correlation with functional association. Then, we applied a network partition algorithm to divide the network into gene modules and identify the most NSCLC-specific gene modules based on their differential expression pattern in between normal and NSCLC samples. Finally, from these modules, we identified genes that exhibited the most impact on the expression of their functionally associated genes in between normal and NSCLC samples and predicted them as NSCLC biomarkers. Literature review of the top predicted gene biomarkers suggested that most of them were already considered critical for development of NSCLC. Bin Liang, Yang Shao, Fei Long, and Shu-Juan Jiang Copyright © 2016 Bin Liang et al. All rights reserved. A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction Thu, 14 Jul 2016 11:45:10 +0000 The state-of-the-art methods for protein-protein interaction (PPI) extraction are primarily based on kernel methods, and their performances strongly depend on the handcraft features. In this paper, we tackle PPI extraction by using convolutional neural networks (CNN) and propose a shortest dependency path based CNN (sdpCNN) model. The proposed method only takes the sdp and word embedding as input and could avoid bias from feature selection by using CNN. We performed experiments on standard Aimed and BioInfer datasets, and the experimental results demonstrated that our approach outperformed state-of-the-art kernel based methods. In particular, by tracking the sdpCNN model, we find that sdpCNN could extract key features automatically and it is verified that pretrained word embedding is crucial in PPI task. Lei Hua and Chanqin Quan Copyright © 2016 Lei Hua and Chanqin Quan. All rights reserved. A Five-Gene Expression Signature Predicts Clinical Outcome of Ovarian Serous Cystadenocarcinoma Tue, 05 Jul 2016 14:27:05 +0000 Ovarian serous cystadenocarcinoma is a common malignant tumor of female genital organs. Treatment is generally less effective as patients are usually diagnosed in the late stage. Therefore, a well-designed prognostic marker provides valuable data for optimizing therapy. In this study, we analyzed 303 samples of ovarian serous cystadenocarcinoma and the corresponding RNA-seq data. We observed the correlation between gene expression and patients’ survival and eventually established a risk assessment model of five factors using Cox proportional hazards regression analysis. We found that the survival time in high-risk patients was significantly shorter than in low-risk patients in both training and testing sets after Kaplan-Meier analysis. The AUROC value was 0.67 when predicting the survival time in testing set, which indicates a relatively high specificity and sensitivity. The results suggest diagnostic and therapeutic applications of our five-gene model for ovarian serous cystadenocarcinoma. Li-Wei Liu, Qiuhao Zhang, Wenna Guo, Kun Qian, and Qiang Wang Copyright © 2016 Li-Wei Liu et al. All rights reserved. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm Thu, 30 Jun 2016 11:58:15 +0000 Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. Lei Zhang, Linlin Wang, Bochuan Du, Tianjiao Wang, Pu Tian, and Suyan Tian Copyright © 2016 Lei Zhang et al. All rights reserved. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA Wed, 29 Jun 2016 13:37:34 +0000 Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods—the standard -score (STDZ) method, the GC correction -score (GCCZ) method, and the internal reference -score (IRZ) method—together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping. Baohong Liu, Xiaoyan Tang, Feng Qiu, Chunmei Tao, Junhui Gao, Mengmeng Ma, Tingyan Zhong, JianPing Cai, Yixue Li, and Guohui Ding Copyright © 2016 Baohong Liu et al. All rights reserved. A Comparative Study of Land Cover Classification by Using Multispectral and Texture Data Wed, 08 Jun 2016 07:50:20 +0000 The main objective of this study is to find out the importance of machine vision approach for the classification of five types of land cover data such as bare land, desert rangeland, green pasture, fertile cultivated land, and Sutlej river land. A novel spectra-statistical framework is designed to classify the subjective land cover data types accurately. Multispectral data of these land covers were acquired by using a handheld device named multispectral radiometer in the form of five spectral bands (blue, green, red, near infrared, and shortwave infrared) while texture data were acquired with a digital camera by the transformation of acquired images into 229 texture features for each image. The most discriminant 30 features of each image were obtained by integrating the three statistical features selection techniques such as Fisher, Probability of Error plus Average Correlation, and Mutual Information (F + PA + MI). Selected texture data clustering was verified by nonlinear discriminant analysis while linear discriminant analysis approach was applied for multispectral data. For classification, the texture and multispectral data were deployed to artificial neural network (ANN: n-class). By implementing a cross validation method (80-20), we received an accuracy of 91.332% for texture data and 96.40% for multispectral data, respectively. Salman Qadri, Dost Muhammad Khan, Farooq Ahmad, Syed Furqan Qadri, Masroor Ellahi Babar, Muhammad Shahid, Muzammil Ul-Rehman, Abdul Razzaq, Syed Shah Muhammad, Muhammad Fahad, Sarfraz Ahmad, Muhammad Tariq Pervez, Nasir Naveed, Naeem Aslam, Mutiullah Jamil, Ejaz Ahmad Rehmani, Nazir Ahmad, and Naeem Akhtar Khan Copyright © 2016 Salman Qadri et al. All rights reserved. Finding Clocks in Genes: A Bayesian Approach to Estimate Periodicity Thu, 02 Jun 2016 13:49:37 +0000 Identification of rhythmic gene expression from metabolic cycles to circadian rhythms is crucial for understanding the gene regulatory networks and functions of these biological processes. Recently, two algorithms, JTK_CYCLE and ARSER, have been developed to estimate periodicity of rhythmic gene expression. JTK_CYCLE performs well for long or less noisy time series, while ARSER performs well for detecting a single rhythmic category. However, observing gene expression at high temporal resolution is not always feasible, and many scientists are interested in exploring both ultradian and circadian rhythmic categories simultaneously. In this paper, a new algorithm, named autoregressive Bayesian spectral regression (ABSR), is proposed. It estimates the period of time-course experimental data and classifies gene expression profiles into multiple rhythmic categories simultaneously. Through the simulation studies, it is shown that ABSR substantially improves the accuracy of periodicity estimation and clustering of rhythmic categories as compared to JTK_CYCLE and ARSER for the data with low temporal resolution. Moreover, ABSR is insensitive to rhythmic patterns. This new scheme is applied to existing time-course mouse liver data to estimate period of rhythms and classify the genes into ultradian, circadian, and arrhythmic categories. It is observed that 49.2% of the circadian profiles detected by JTK_CYCLE with 1-hour resolution are also detected by ABSR with only 4-hour resolution. Yan Ren, Christian I. Hong, Sookkyung Lim, and Seongho Song Copyright © 2016 Yan Ren et al. All rights reserved. Therapeutic Effects of CUR-Activated Human Umbilical Cord Mesenchymal Stem Cells on 1-Methyl-4-phenylpyridine-Induced Parkinson’s Disease Cell Model Tue, 31 May 2016 07:13:19 +0000 The purpose of this study is to evaluate the therapeutic effects of human umbilical cord-derived mesenchymal stem cells (hUC-MSC) activated by curcumin (CUR) on PC12 cells induced by 1-methyl-4-phenylpyridinium ion (MPP+), a cell model of Parkinson’s disease (PD). The supernatant of hUC-MSC and hUC-MSC activated by 5 µmol/L CUR (hUC-MSC-CUR) were collected in accordance with the same concentration. The cell proliferation and differentiation potential to dopaminergic neuronal cells and antioxidation were observed in PC12 cells after being treated with the above two supernatants and 5 µmol/L CUR. The results showed that the hUC-MSC-CUR could more obviously promote the proliferation and the expression of tyrosine hydroxylase (TH) and microtubule associated protein-2 (MAP2) and significantly decreased the expression of nitric oxide (NO) and inducible nitric oxide synthase (iNOS) in PC12 cells. Furtherly, cytokines detection gave a clue that the expression of IL-6, IL-10, and NGF was significantly higher in the group treated with the hUC-MSC-CUR compared to those of other two groups. Therefore, the hUC-MSC-CUR may be a potential strategy to promote the proliferation and differentiation of PD cell model, therefore providing new insights into a novel therapeutic approach in PD. Li Jinfeng, Wang Yunliang, Liu Xinshan, Wang Yutong, Wang Shanshan, Xue Peng, Yang Xiaopeng, Xu Zhixiu, Lu Qingshan, Yin Honglei, Cao Xia, Wang Hongwei, and Cao Bingzhen Copyright © 2016 Li Jinfeng et al. All rights reserved. The Effects of Real-Time Interactive Multimedia Teleradiology System Tue, 17 May 2016 14:18:35 +0000 This study describes the design of a real-time interactive multimedia teleradiology system and assesses how the system is used by referring physicians in point-of-care situations and supports or hinders aspects of physician-radiologist interaction. We developed a real-time multimedia teleradiology management system that automates the transfer of images and radiologists’ reports and surveyed physicians to triangulate the findings and to verify the realism and results of the experiment. The web-based survey was delivered to 150 physicians from a range of specialties. The survey was completed by 72% of physicians. Data showed a correlation between rich interactivity, satisfaction, and effectiveness. The results of our experiments suggest that real-time multimedia teleradiology systems are valued by referring physicians and may have the potential for enhancing their practice and improving patient care and highlight the critical role of multimedia technologies to provide real-time multimode interactivity in current medical care. Lilac Al-Safadi Copyright © 2016 Lilac Al-Safadi. All rights reserved. Impacts of Nonsynonymous Single Nucleotide Polymorphisms of Adiponectin Receptor 1 Gene on Corresponding Protein Stability: A Computational Approach Sun, 15 May 2016 10:00:10 +0000 Despite the reported association of adiponectin receptor 1 (ADIPOR1) gene mutations with vulnerability to several human metabolic diseases, there is lack of computational analysis on the functional and structural impacts of single nucleotide polymorphisms (SNPs) of the human ADIPOR1 at protein level. Therefore, sequence- and structure-based computational tools were employed in this study to functionally and structurally characterize the coding nsSNPs of ADIPOR1 gene listed in the dbSNP database. Our in silico analysis by SIFT, nsSNPAnalyzer, PolyPhen-2, Fathmm, I-Mutant 2.0, SNPs&GO, PhD-SNP, PANTHER, and SNPeffect tools identified the nsSNPs with distorting functional impacts, namely, rs765425383 (A348G), rs752071352 (H341Y), rs759555652 (R324L), rs200326086 (L224F), and rs766267373 (L143P) from 74 nsSNPs of ADIPOR1 gene. Finally the aforementioned five deleterious nsSNPs were introduced using Swiss-PDB Viewer package within the X-ray crystal structure of ADIPOR1 protein, and changes in free energy for these mutations were computed. Although increased free energy was observed for all the mutants, the nsSNP H341Y caused the highest energy increase amongst all. RMSD and TM scores predicted that mutants were structurally similar to wild type protein. Our analyses suggested that the aforementioned variants especially H341Y could directly or indirectly destabilize the amino acid interactions and hydrogen bonding networks of ADIPOR1. Md. Abu Saleh, Md. Solayman, Sudip Paul, Moumoni Saha, Md. Ibrahim Khalil, and Siew Hua Gan Copyright © 2016 Md. Abu Saleh et al. All rights reserved. Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks Wed, 11 May 2016 08:24:34 +0000 Studies for the association between diseases and informative single nucleotide polymorphisms (SNPs) have received great attention. However, most of them just use the whole set of useful SNPs and fail to consider the SNP-SNP interactions, while these interactions have already been proven in biology experiments. In this paper, we use a binary particle swarm optimization with hierarchical structure (BPSOHS) algorithm to improve the effective of PSO for the identification of the SNP-SNP interactions. Furthermore, in order to use these SNP interactions in the susceptibility analysis, we propose an emotional neural network (ENN) to treat SNP interactions as emotional tendency. Different from the normal architecture, just as the emotional brain, this architecture provides a specific path to treat the emotional value, by which the SNP interactions can be considered more quickly and directly. The ENN helps us use the prior knowledge about the SNP interactions and other influence factors together. Finally, the experimental results prove that the proposed BPSOHS_ENN algorithm can detect the informative SNP-SNP interaction and predict the breast cancer risk with a much higher accuracy than existing methods. Xiao Wang, Qinke Peng, and Yue Fan Copyright © 2016 Xiao Wang et al. All rights reserved. The Use of Protein-Protein Interactions for the Analysis of the Associations between PM2.5 and Some Diseases Sun, 08 May 2016 11:57:09 +0000 Nowadays, pollution levels are rapidly increasing all over the world. One of the most important pollutants is PM2.5. It is known that the pollution environment may cause several problems, such as greenhouse effect and acid rain. Among them, the most important problem is that pollutants can induce a number of serious diseases. Some studies have reported that PM2.5 is an important etiologic factor for lung cancer. In this study, we extensively investigate the associations between PM2.5 and 22 disease classes recommended by Goh et al., such as respiratory diseases, cardiovascular diseases, and gastrointestinal diseases. The protein-protein interactions were used to measure the linkage between disease genes and genes that have been reported to be modulated by PM2.5. The results suggest that some diseases, such as diseases related to ear, nose, and throat and gastrointestinal, nutritional, renal, and cardiovascular diseases, are influenced by PM2.5 and some evidences were provided to confirm our results. For example, a total of 18 genes related to cardiovascular diseases are identified to be closely related to PM2.5, and cardiovascular disease relevant gene DSP is significantly related to PM2.5 gene JUP. Qing Zhang, Pei-Wei Zhang, and Yu-Dong Cai Copyright © 2016 Qing Zhang et al. All rights reserved. Bioinformatics Applications in Life Sciences and Technologies Wed, 04 May 2016 12:59:07 +0000 Sílvia A. Sousa, Jorge H. Leitão, Raul C. Martins, João M. Sanches, Jasjit S. Suri, and Alejandro Giorgetti Copyright © 2016 Sílvia A. Sousa et al. All rights reserved. Differential Proteomics Analysis of Colonic Tissues in Patients of Slow Transit Constipation Sat, 30 Apr 2016 14:08:05 +0000 Objective. To investigate and screen the different expression of proteins in STC and normal group with a comparative proteomic approach. Methods. Two-dimensional electrophoresis was applied to separate the proteins in specimens from both 5 STC patients and 5 normal controls. The proteins with statistically significant differential expression between two groups were identified by computer aided image analysis and matrix assisted laser desorption ionization tandem time of flight mass spectrometry (MALDI-TOF-MS). Results. A total of 239 protein spots were identified in the average gel of the normal control and 215 in patients with STC. A total of 197 protein spots were matched and the mean matching rate was 82%. There were 14 protein spots which were expressed with statistically significant differences from others. Of those 14 protein spots, the expression of 12 spots increased markedly, while that of 2 spots decreased significantly. Conclusion. The proteomics expression in colonic specimens of STC patients is statistically significantly different from that of normal control, which may be associated with the pathogenesis of STC. Songlin Wan, Weicheng Liu, Cuiping Tian, Xianghai Ren, Zhao Ding, Qun Qian, Congqing Jiang, and Yunhua Wu Copyright © 2016 Songlin Wan et al. All rights reserved. Discovery of Azurin-Like Anticancer Bacteriocins from Human Gut Microbiome through Homology Modeling and Molecular Docking against the Tumor Suppressor p53 Sat, 30 Apr 2016 13:22:47 +0000 Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells. Chuong Nguyen and Van Duy Nguyen Copyright © 2016 Chuong Nguyen and Van Duy Nguyen. All rights reserved. Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier Sun, 24 Apr 2016 07:09:15 +0000 Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins. Xiao Wang, Hui Li, Qiuwen Zhang, and Rong Wang Copyright © 2016 Xiao Wang et al. All rights reserved. Predicted 3D Model of the Rabies Virus Glycoprotein Trimer Sun, 24 Apr 2016 06:03:45 +0000 The RABVG ectodomain is a homotrimer, and trimers are often called spikes. They are responsible for the attachment of the virus through the interaction with nicotinic acetylcholine receptors, neural cell adhesion molecule (NCAM), and the p75 neurotrophin receptor (p75NTR). This makes them relevant in viral pathogenesis. The antigenic structure differs significantly between the trimers and monomers. Surfaces rich in hydrophobic amino acids are important for trimer stabilization in which the C-terminal of the ectodomain plays an important role; to understand these interactions between the G proteins, a mechanistic study of their functions was performed with a molecular model of G protein in its trimeric form. This verified its 3D conformation. The molecular modeling of G protein was performed by a I-TASSER server and was evaluated via a Rachamandran plot and ERRAT program obtained 84.64% and 89.9% of the residues in the favorable regions and overall quality factor, respectively. The molecular dynamics simulations were carried out on RABVG trimer at 310 K. From these theoretical studies, we retrieved the RMSD values from Cα atoms to assess stability. Preliminary model of G protein of rabies virus stable at 12 ns with molecular dynamics was obtained. Bastida-González Fernando, Celaya-Trejo Yersin, Correa-Basurto José, and Zárate-Segura Paola Copyright © 2016 Bastida-González Fernando et al. All rights reserved. A Comprehensive Curation Shows the Dynamic Evolutionary Patterns of Prokaryotic CRISPRs Mon, 18 Apr 2016 13:31:39 +0000 Motivation. Clustered regularly interspaced short palindromic repeat (CRISPR) is a genetic element with active regulation roles for foreign invasive genes in the prokaryotic genomes and has been engineered to work with the CRISPR-associated sequence (Cas) gene Cas9 as one of the modern genome editing technologies. Due to inconsistent definitions, the existing CRISPR detection programs seem to have missed some weak CRISPR signals. Results. This study manually curates all the currently annotated CRISPR elements in the prokaryotic genomes and proposes 95 updates to the annotations. A new definition is proposed to cover all the CRISPRs. The comprehensive comparison of CRISPR numbers on the taxonomic levels of both domains and genus shows high variations for closely related species even in the same genus. The detailed investigation of how CRISPRs are evolutionarily manipulated in the 8 completely sequenced species in the genus Thermoanaerobacter demonstrates that transposons act as a frequent tool for splitting long CRISPRs into shorter ones along a long evolutionary history. Guoqin Mai, Ruiquan Ge, Guoquan Sun, Qinghan Meng, and Fengfeng Zhou Copyright © 2016 Guoqin Mai et al. All rights reserved. The Occurrence of Genetic Alterations during the Progression of Breast Carcinoma Thu, 14 Apr 2016 09:28:52 +0000 The interrelationship among genetic variations between the developing process of carcinoma and the order of occurrence has not been completely understood. Interpreting the mechanisms of copy number variation (CNV) is absolutely necessary for understanding the etiology of genetic disorders. Oncogenetic tree is a special phylogenetic tree inferential pictorial representation of oncogenesis. In our present study, we constructed oncogenetic tree to imitate the occurrence of genetic and cytogenetic alterations in human breast cancer. The oncogenetic tree model was built on CNV of ErbB2, AKT2, KRAS, PIK3CA, PTEN, and CCND1 genes in 963 cases of tumors with sequencing and CNA data of human breast cancer from TCGA. Results from the oncogenetic tree model indicate that ErbB2 copy number variation is the frequent early event of human breast cancer. The oncogenetic tree model based on the phylogenetic tree is a type of mathematical model that may eventually provide a better way to understand the process of oncogenesis. Xiao-Chen Li, Chenglin Liu, Tao Huang, and Yang Zhong Copyright © 2016 Xiao-Chen Li et al. All rights reserved. Methylation Status of SP1 Sites within miR-23a-27a-24-2 Promoter Region Influences Laryngeal Cancer Cell Proliferation and Apoptosis Wed, 23 Mar 2016 12:17:47 +0000 DNA methylation plays critical roles in regulation of microRNA expression and function. miR-23a-27a-24-2 cluster has various functions and aberrant expression of the cluster is a common event in many cancers. However, whether DNA methylation influences the cluster expression and function is not reported. Here we found a CG-rich region spanning two SP1 sites in the cluster promoter region. The SP1 sites in the cluster were demethylated and methylated in Hep2 cells and HEK293 cells, respectively. Meanwhile, the cluster was significantly upregulated and downregulated in Hep2 cells and HEK293 cells, respectively. The SP1 sites were remethylated and the cluster was significantly downregulated in Hep2 cells into which methyl donor, S-adenosyl-L-methionine, was introduced. Moreover, S-adenosyl-L-methionine significantly increased Hep2 cell viability and repressed Hep2 cell early apoptosis. We also found that construct with two SP1 sites had highest luciferase activity and SP1 specifically bound the gene cluster promoter in vitro. We conclude that demethylated SP1 sites in miR-23a-27a-24-2 cluster upregulate the cluster expression, leading to proliferation promotion and early apoptosis inhibition in laryngeal cancer cells. Ye Wang, Zhao-Xiong Zhang, Sheng Chen, Guang-Bin Qiu, Zhen-Ming Xu, and Wei-Neng Fu Copyright © 2016 Ye Wang et al. All rights reserved. -Index for Differentiating Complex Dynamic Traits Tue, 15 Mar 2016 16:43:19 +0000 While it is a daunting challenge in current biology to understand how the underlying network of genes regulates complex dynamic traits, functional mapping, a tool for mapping quantitative trait loci (QTLs) and single nucleotide polymorphisms (SNPs), has been applied in a variety of cases to tackle this challenge. Though useful and powerful, functional mapping performs well only when one or more model parameters are clearly responsible for the developmental trajectory, typically being a logistic curve. Moreover, it does not work when the curves are more complex than that, especially when they are not monotonic. To overcome this inadaptability, we therefore propose a mathematical-biological concept and measurement, -index (earliness-index), which cumulatively measures the earliness degree to which a variable (or a dynamic trait) increases or decreases its value. Theoretical proofs and simulation studies show that -index is more general than functional mapping and can be applied to any complex dynamic traits, including those with logistic curves and those with nonmonotonic curves. Meanwhile, -index vector is proposed as well to capture more subtle differences of developmental patterns. Jiandong Qi, Jianfeng Sun, and Jianxin Wang Copyright © 2016 Jiandong Qi et al. All rights reserved. Using Small RNA Deep Sequencing Data to Detect Human Viruses Tue, 15 Mar 2016 13:33:25 +0000 Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. Fang Wang, Yu Sun, Jishou Ruan, Rui Chen, Xin Chen, Chengjie Chen, Jan F. Kreuze, ZhangJun Fei, Xiao Zhu, and Shan Gao Copyright © 2016 Fang Wang et al. All rights reserved. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest Tue, 15 Mar 2016 12:13:52 +0000 Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite. Hamid D. Ismail, Ahoi Jones, Jung H. Kim, Robert H. Newman, and Dukka B. KC Copyright © 2016 Hamid D. Ismail et al. All rights reserved. SNP Mining in Functional Genes from Nonmodel Species by Next-Generation Sequencing: A Case of Flowering, Pre-Harvest Sprouting, and Dehydration Resistant Genes in Wheat Mon, 14 Mar 2016 10:48:25 +0000 As plenty of nonmodel plants are without genomic sequences, the combination of molecular technologies and the next generation sequencing (NGS) platform has led to a new approach to study the genetic variations of these plants. Software GATK, SOAPsnp, samtools, and others are often used to deal with the NGS data. In this study, BLAST was applied to call SNPs from 16 mixed functional gene’s sequence data of polyploidy wheat. In total 1.2 million reads were obtained with the average of 7500 reads per genes. To get accurate information, 390,992 pair reads were successfully assembled before aligning to those functional genes. Standalone BLAST tools were used to map assembled sequence to functional genes, respectively. Polynomial fitting was applied to find the suitable minor allele frequency (MAF) threshold at 6% for assembled reads of each functional gene. SNPs accuracy form assembled reads, pretrimmed reads, and original reads were compared, which declared that SNPs mined from the assembled reads were more reliable than others. It was also demonstrated that mixed samples’ NGS sequences and then analysis by BLAST were an effective, low-cost, and accurate way to mine SNPs for nonmodel species. Assembled reads and polynomial fitting threshold were recommended for more accurate SNPs target. Zhong-Xu Chen, Mei Deng, and Ji-Rui Wang Copyright © 2016 Zhong-Xu Chen et al. All rights reserved. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues Thu, 10 Mar 2016 08:27:20 +0000 Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. Song Guo, Chunhua Liu, Peng Zhou, and Yanling Li Copyright © 2016 Song Guo et al. All rights reserved. Treating Diabetes Mellitus: Pharmacophore Based Designing of Potential Drugs from Gymnema sylvestre against Insulin Receptor Protein Sun, 28 Feb 2016 14:16:49 +0000 Diabetes mellitus (DM) is one of the most prevalent metabolic disorders which can affect the quality of life severely. Injectable insulin is currently being used to treat DM which is mainly associated with patient inconvenience. Small molecules that can act as insulin receptor (IR) agonist would be better alternatives to insulin injection. Herein, ten bioactive small compounds derived from Gymnema sylvestre (G. sylvestre) were chosen to determine their IR binding affinity and ADMET properties using a combined approach of molecular docking study and computational pharmacokinetic elucidation. Designing structural analogues were also performed for the compounds associated with toxicity and less IR affinity. Among the ten parent compounds, six were found to have significant pharmacokinetic properties with considerable binding affinity towards IR while four compounds were associated with toxicity and less IR affinity. Among the forty structural analogues, four compounds demonstrated considerably increased binding affinity towards IR and less toxicity compared with parent compounds. Finally, molecular interaction analysis revealed that six parent compounds and four analogues interact with the active site amino acids of IR. So this study would be a way to identify new therapeutics and alternatives to insulin for diabetic patients. Mohammad Uzzal Hossain, Md. Arif Khan, S. M. Rakib-Uz-Zaman, Mohammad Tuhin Ali, Md. Saidul Islam, Chaman Ara Keya, and Md. Salimullah Copyright © 2016 Mohammad Uzzal Hossain et al. All rights reserved. Identification of Deleterious Mutations in Myostatin Gene of Rohu Carp (Labeo rohita) Using Modeling and Molecular Dynamic Simulation Approaches Thu, 25 Feb 2016 15:28:07 +0000 The myostatin (MSTN) is a known negative growth regulator of skeletal muscle. The mutated myostatin showed a double-muscular phenotype having a positive significance for the farmed animals. Consequently, adequate information is not available in the teleosts, including farmed rohu carp, Labeo rohita. In the absence of experimental evidence, computational algorithms were utilized in predicting the impact of point mutation of rohu myostatin, especially its structural and functional relationships. The four mutations were generated at different positions (p.D76A, p.Q204P, p.C312Y, and p.D313A) of MSTN protein of rohu. The impacts of each mutant were analyzed using SIFT, I-Mutant 2.0, PANTHER, and PROVEAN, wherein two substitutions (p.D76A and p.Q204P) were predicted as deleterious. The comparative structural analysis of each mutant protein with the native was explored using 3D modeling as well as molecular-dynamic simulation techniques. The simulation showed altered dynamic behaviors concerning RMSD and RMSF, for either p.D76A or p.Q204P substitution, when compared with the native counterpart. Interestingly, incorporated two mutations imposed a significant negative impact on protein structure and stability. The present study provided the first-hand information in identifying possible amino acids, where mutations could be incorporated into MSTN gene of rohu carp including other carps for undertaking further in vivo studies. Kiran Dashrath Rasal, Vemulawada Chakrapani, Swagat Kumar Patra, Shibani D. Mohapatra, Swapnarani Nayak, Sasmita Jena, Jitendra Kumar Sundaray, Pallipuram Jayasankar, and Hirak Kumar Barman Copyright © 2016 Kiran Dashrath Rasal et al. All rights reserved. VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity Wed, 24 Feb 2016 10:02:32 +0000 Background. In order to understand the colocalization of genetic loci amongst species, synteny and collinearity analysis is a frequent task in comparative genomics research. However many analysis software packages are not effective in visualizing results. Problems include lack of graphic visualization, simple representation, or inextensible format of outputs. Moreover, higher throughput sequencing technology requires higher resolution image output. Implementation. To fill this gap, this paper publishes VGSC, the Vector Graph toolkit of genome Synteny and Collinearity, and its online service, to visualize the synteny and collinearity in the common graphical format, including both raster (JPEG, Bitmap, and PNG) and vector graphic (SVG, EPS, and PDF). Result. Users can upload sequence alignments from blast and collinearity relationship from the synteny analysis tools. The website can generate the vector or raster graphical results automatically. We also provide a java-based bytecode binary to enable the command-line execution. Yiqing Xu, Changwei Bi, Guoxin Wu, Suyun Wei, Xiaogang Dai, Tongming Yin, and Ning Ye Copyright © 2016 Yiqing Xu et al. All rights reserved. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification Sun, 14 Feb 2016 14:02:26 +0000 Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. Yin Wang, Rudong Li, Yuhua Zhou, Zongxin Ling, Xiaokui Guo, Lu Xie, and Lei Liu Copyright © 2016 Yin Wang et al. All rights reserved. Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments Wed, 10 Feb 2016 13:50:15 +0000 We present the advancements and novelties recently introduced in RNASeqGUI, a graphical user interface that helps biologists to handle and analyse large data collected in RNA-Seq experiments. This work focuses on the concept of reproducible research and shows how it has been incorporated in RNASeqGUI to provide reproducible (computational) results. The novel version of RNASeqGUI combines graphical interfaces with tools for reproducible research, such as literate statistical programming, human readable report, parallel executions, caching, and interactive and web-explorable tables of results. These features allow the user to analyse big datasets in a fast, efficient, and reproducible way. Moreover, this paper represents a proof of concept, showing a simple way to develop computational tools for Life Science in the spirit of reproducible research. Francesco Russo, Dario Righelli, and Claudia Angelini Copyright © 2016 Francesco Russo et al. All rights reserved. A Prediction Model for Membrane Proteins Using Moments Based Features Sun, 07 Feb 2016 15:42:21 +0000 The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. Ahmad Hassan Butt, Sher Afzal Khan, Hamza Jamil, Nouman Rasool, and Yaser Daanial Khan Copyright © 2016 Ahmad Hassan Butt et al. All rights reserved. PIPINO: A Software Package to Facilitate the Identification of Protein-Protein Interactions from Affinity Purification Mass Spectrometry Data Sun, 07 Feb 2016 14:17:44 +0000 The functionality of most proteins is regulated by protein-protein interactions. Hence, the comprehensive characterization of the interactome is the next milestone on the path to understand the biochemistry of the cell. A powerful method to detect protein-protein interactions is a combination of coimmunoprecipitation or affinity purification with quantitative mass spectrometry. Nevertheless, both methods tend to precipitate a high number of background proteins due to nonspecific interactions. To address this challenge the software Protein-Protein-Interaction-Optimizer (PIPINO) was developed to perform an automated data analysis, to facilitate the selection of bona fide binding partners, and to compare the dynamic of interaction networks. In this study we investigated the STAT1 interaction network and its activation dependent dynamics. Stable isotope labeling by amino acids in cell culture (SILAC) was applied to analyze the STAT1 interactome after streptavidin pull-down of biotagged STAT1 from human embryonic kidney 293T cells with and without activation. Starting from more than 2,000 captured proteins 30 potential STAT1 interaction partners were extracted. Interestingly, more than 50% of these were already reported or predicted to bind STAT1. Furthermore, 16 proteins were found to affect the binding behavior depending on STAT1 phosphorylation such as STAT3 or the importin subunits alpha 1 and alpha 6. Stefan Kalkhof, Stefan Schildbach, Conny Blumert, Friedemann Horn, Martin von Bergen, and Dirk Labudde Copyright © 2016 Stefan Kalkhof et al. All rights reserved. Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm Wed, 03 Feb 2016 06:40:35 +0000 The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request. ShaoPeng Wang, Yu-Hang Zhang, Jing Lu, Weiren Cui, Jerry Hu, and Yu-Dong Cai Copyright © 2016 ShaoPeng Wang et al. All rights reserved. ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition Thu, 28 Jan 2016 06:46:23 +0000 Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities. Abbas Akkasi, Ekrem Varoğlu, and Nazife Dimililer Copyright © 2016 Abbas Akkasi et al. All rights reserved. Inhibition of DNA Topoisomerase Type IIα (TOP2A) by Mitoxantrone and Its Halogenated Derivatives: A Combined Density Functional and Molecular Docking Study Wed, 27 Jan 2016 09:17:56 +0000 In this study, mitoxantrone and its halogenated derivatives have been designed by density functional theory (DFT) to explore their structural and thermodynamical properties. The performance of these drugs was also evaluated to inhibit DNA topoisomerase type IIα (TOP2A) by molecular docking calculation. Noncovalent interactions play significant role in improving the performance of halogenated drugs. The combined quantum and molecular mechanics calculations revealed that CF3 containing drug shows better preference in inhibiting the TOP2A compared to other modified drugs. Md. Abu Saleh, Md. Solayman, Mohammad Mazharol Hoque, Mohammad A. K. Khan, Mohammed G. Sarwar, and Mohammad A. Halim Copyright © 2016 Md. Abu Saleh et al. All rights reserved. Segmenting Brain Tissues from Chinese Visible Human Dataset by Deep-Learned Features with Stacked Autoencoder Tue, 26 Jan 2016 13:58:34 +0000 Cryosection brain images in Chinese Visible Human (CVH) dataset contain rich anatomical structure information of tissues because of its high resolution (e.g., 0.167 mm per pixel). Fast and accurate segmentation of these images into white matter, gray matter, and cerebrospinal fluid plays a critical role in analyzing and measuring the anatomical structures of human brain. However, most existing automated segmentation methods are designed for computed tomography or magnetic resonance imaging data, and they may not be applicable for cryosection images due to the imaging difference. In this paper, we propose a supervised learning-based CVH brain tissues segmentation method that uses stacked autoencoder (SAE) to automatically learn the deep feature representations. Specifically, our model includes two successive parts where two three-layer SAEs take image patches as input to learn the complex anatomical feature representation, and then these features are sent to Softmax classifier for inferring the labels. Experimental results validated the effectiveness of our method and showed that it outperformed four other classical brain tissue detection strategies. Furthermore, we reconstructed three-dimensional surfaces of these tissues, which show their potential in exploring the high-resolution anatomical structures of human brain. Guangjun Zhao, Xuchu Wang, Yanmin Niu, Liwen Tan, and Shao-Xiang Zhang Copyright © 2016 Guangjun Zhao et al. All rights reserved. Automated Cell Selection Using Support Vector Machine for Application to Spectral Nanocytology Tue, 19 Jan 2016 16:21:10 +0000 Partial wave spectroscopy (PWS) enables quantification of the statistical properties of cell structures at the nanoscale, which has been used to identify patients harboring premalignant tumors by interrogating easily accessible sites distant from location of the lesion. Due to its high sensitivity, cells that are well preserved need to be selected from the smear images for further analysis. To date, such cell selection has been done manually. This is time-consuming, is labor-intensive, is vulnerable to bias, and has considerable inter- and intraoperator variability. In this study, we developed a classification scheme to identify and remove the corrupted cells or debris that are of no diagnostic value from raw smear images. The slide of smear sample is digitized by acquiring and stitching low-magnification transmission. Objects are then extracted from these images through segmentation algorithms. A training-set is created by manually classifying objects as suitable or unsuitable. A feature-set is created by quantifying a large number of features for each object. The training-set and feature-set are used to train a selection algorithm using Support Vector Machine (SVM) classifiers. We show that the selection algorithm achieves an error rate of 93% with a sensitivity of 95%. Qin Miao, Justin Derbas, Aya Eid, Hariharan Subramanian, and Vadim Backman Copyright © 2016 Qin Miao et al. All rights reserved. Identification of Novel RD1 Antigens and Their Combinations for Diagnosis of Sputum Smear−/Culture+ TB Patients Mon, 18 Jan 2016 13:39:48 +0000 Rapid and accurate diagnosis of pulmonary tuberculosis (PTB) is an unresolved problem worldwide, especially for sputum smear− (S−) cases. In this study, five antigen genes including Rv3871, Rv3874, Rv3875, Rv3876, and Rv3879 were cloned from Mycobacterium tuberculosis (Mtb) RD1 and overexpressed to generate antigen fragments. These antigens and their combinations were investigated for PTB serodiagnosis. 298 serum samples were collected from active PTB patients, including 117 sputum smear+ (S+) and sputum culture+ (C+) cases, 101 S−/C+ cases, and 80 S−/C− cases. The serum IgG levels of the five antigens were measured by ELISA. Based on IgG levels, the sensitivity/specificity of Rv3871, Rv3874, Rv3875, Rv3876, and Rv3879 for PTB detection was 81.21%/74.74%, 63.09%/94.78%, 32.21%/87.37%, 62.42%/85.26%, and 83.56%/83.16%, respectively. Furthermore, the optimal result for PTB diagnosis was achieved by combining antigens Rv3871, Rv3876, and Rv3879. In addition, the IgG levels of Rv3871, Rv3876, and Rv3879 were found to be higher in S−/C+ PTB patients than in other PTB populations. More importantly, combination of the three antigens demonstrated superior diagnostic performance for both S−/C+ and S−/C− PTB. In conclusion, the combination of Rv3871, Rv3876, and Rv3879 induced higher IgG response in sputum S−/C+ PTB patients and represents a promising biomarker combination for diagnosing of PTB. Zhiqiang Liu, Shuang Qie, Lili Li, Bingshui Xiu, Xiqin Yang, Zhenhua Dai, Xuhui Zhang, Cuimi Duan, Haiping Que, Ping Zhao, Heather Johnson, Heqiu Zhang, and Xiaoyan Feng Copyright © 2016 Zhiqiang Liu et al. All rights reserved. The Subcellular Localization and Functional Analysis of Fibrillarin2, a Nucleolar Protein in Nicotiana benthamiana Sun, 17 Jan 2016 10:20:02 +0000 Nucleolar proteins play important roles in plant cytology, growth, and development. Fibrillarin2 is a nucleolar protein of Nicotiana benthamiana (N. benthamiana). Its cDNA was amplified by RT-PCR and inserted into expression vector pEarley101 labeled with yellow fluorescent protein (YFP). The fusion protein was localized in the nucleolus and Cajal body of leaf epidermal cells of N. benthamiana. The N. benthamiana fibrillarin2 (NbFib2) protein has three functional domains (i.e., glycine and arginine rich domain, RNA-binding domain, and α-helical domain) and a nuclear localization signal (NLS) in C-terminal. The protein 3D structure analysis predicted that NbFib2 is an α/β protein. In addition, the virus induced gene silencing (VIGS) approach was used to determine the function of NbFib2. Our results showed that symptoms including growth retardation, organ deformation, chlorosis, and necrosis appeared in NbFib2-silenced N. benthamiana. Luping Zheng, Jinai Yao, Fangluan Gao, Lin Chen, Chao Zhang, Lingli Lian, Liyan Xie, Zujian Wu, and Lianhui Xie Copyright © 2016 Luping Zheng et al. All rights reserved. Modification of the Sweetness and Stability of Sweet-Tasting Protein Monellin by Gene Mutation and Protein Engineering Sun, 10 Jan 2016 10:31:15 +0000 Natural sweet protein monellin has a high sweetness and low calorie, suggesting its potential in food applications. However, due to its low heat and acid resistance, the application of monellin is limited. In this study, we show that the thermostability of monellin can be improved with no sweetness decrease by means of sequence, structure analysis, and site-directed mutagenesis. We analyzed residues located in the α-helix as well as an ionizable residue C41. Of the mutants investigated, the effects of E23A and C41A mutants were most remarkable. The former displayed significantly improved thermal stability, while its sweetness was not changed. The mutated protein was stable after 30 min incubation at 85°C. The latter showed increased sweetness and slight improvement of thermostability. Furthermore, we found that most mutants enhancing the thermostability of the protein were distributed at the two ends of α-helix. Molecular biophysics analysis revealed that the state of buried ionizable residues may account for the modulated properties of mutated proteins. Our results prove that the properties of sweet protein monellin can be modified by means of bioinformatics analysis, gene manipulation, and protein modification, highlighting the possibility of designing novel effective sweet proteins based on structure-function relationships. Qiulei Liu, Lei Li, Liu Yang, Tianming Liu, Chenggu Cai, and Bo Liu Copyright © 2016 Qiulei Liu et al. All rights reserved. Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model Tue, 29 Dec 2015 14:36:27 +0000 Recent study shows that long noncoding RNAs (lncRNAs) are participating in diverse biological processes and complex diseases. However, at present the functions of lncRNAs are still rarely known. In this study, we propose a network-based computational method, which is called lncRNA-protein interaction prediction based on Heterogeneous Network Model (LPIHN), to predict the potential lncRNA-protein interactions. First, we construct a heterogeneous network by integrating the lncRNA-lncRNA similarity network, lncRNA-protein interaction network, and protein-protein interaction (PPI) network. Then, a random walk with restart is implemented on the heterogeneous network to infer novel lncRNA-protein interactions. The leave-one-out cross validation test shows that our approach can achieve an AUC value of 96.0%. Some lncRNA-protein interactions predicted by our method have been confirmed in recent research or database, indicating the efficiency of LPIHN to predict novel lncRNA-protein interactions. Ao Li, Mengqu Ge, Yao Zhang, Chen Peng, and Minghui Wang Copyright © 2015 Ao Li et al. All rights reserved. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords Thu, 10 Dec 2015 06:31:19 +0000 For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction. Shun Koyabu, Thi Thanh Thuy Phan, and Takenao Ohkawa Copyright © 2015 Shun Koyabu et al. All rights reserved. Cofunctional Subpathways Were Regulated by Transcription Factor with Common Motif, Common Family, or Common Tissue Tue, 24 Nov 2015 14:10:30 +0000 Dissecting the characteristics of the transcription factor (TF) regulatory subpathway is helpful for understanding the TF underlying regulatory function in complex biological systems. To gain insight into the influence of TFs on their regulatory subpathways, we constructed a global TF-subpathways network (TSN) to analyze systematically the regulatory effect of common-motif, common-family, or common-tissue TFs on subpathways. We performed cluster analysis to show that the common-motif, common-family, or common-tissue TFs that regulated the same pathway classes tended to cluster together and contribute to the same biological function that led to disease initiation and progression. We analyzed the Jaccard coefficient to show that the functional consistency of subpathways regulated by the TF pairs with common motif, common family, or common tissue was significantly greater than the random TF pairs at the subpathway level, pathway level, and pathway class level. For example, HNF4A (hepatocyte nuclear factor 4, alpha) and NR1I3 (nuclear receptor subfamily 1, group I, member 3) were a pair of TFs with common motif, common family, and common tissue. They were involved in drug metabolism pathways and were liver-specific factors required for physiological transcription. In short, we inferred that the cofunctional subpathways were regulated by common-motif, common-family, or common-tissue TFs. Fei Su, Desi Shang, Yanjun Xu, Li Feng, Haixiu Yang, Baoquan Liu, Shengyang Su, Lina Chen, and Xia Li Copyright © 2015 Fei Su et al. All rights reserved. MatPred: Computational Identification of Mature MicroRNAs within Novel Pre-MicroRNAs Mon, 23 Nov 2015 09:23:36 +0000 Background. MicroRNAs (miRNAs) are short noncoding RNAs integral for regulating gene expression at the posttranscriptional level. However, experimental methods often fall short in finding miRNAs expressed at low levels or in specific tissues. While several computational methods have been developed for predicting the localization of mature miRNAs within the precursor transcript, the prediction accuracy requires significant improvement. Methodology/Principal Findings. Here, we present MatPred, which predicts mature miRNA candidates within novel pre-miRNA transcripts. In addition to the relative locus of the mature miRNA within the pre-miRNA hairpin loop and minimum free energy, we innovatively integrated features that describe the nucleotide-specific RNA secondary structure characteristics. In total, 94 features were extracted from the mature miRNA loci and flanking regions. The model was trained based on a radial basis function kernel/support vector machine (RBF/SVM). Our method can predict precise locations of mature miRNAs, as affirmed by experimentally verified human pre-miRNAs or pre-miRNAs candidates, thus achieving a significant advantage over existing methods. Conclusions. MatPred is a highly effective method for identifying mature miRNAs within novel pre-miRNA transcripts. Our model significantly outperformed three other widely used existing methods. Such processing prediction methods may provide important insight into miRNA biogenesis. Jin Li, Ying Wang, Lei Wang, Weixing Feng, Kuan Luan, Xuefeng Dai, Chengzhen Xu, Xianglian Meng, Qiushi Zhang, and Hong Liang Copyright © 2015 Jin Li et al. All rights reserved. Corrigendum to “Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation” Thu, 12 Nov 2015 09:43:55 +0000 Young-Seok Choi and Xiaofeng Jia Copyright © 2015 Young-Seok Choi and Xiaofeng Jia. All rights reserved. Improved Pre-miRNA Classification by Reducing the Effect of Class Imbalance Tue, 10 Nov 2015 13:09:26 +0000 MicroRNAs (miRNAs) play important roles in the diverse biological processes of animals and plants. Although the prediction methods based on machine learning can identify nonhomologous and species-specific miRNAs, they suffered from severe class imbalance on real and pseudo pre-miRNAs. We propose a pre-miRNA classification method based on cost-sensitive ensemble learning and refer to it as MiRNAClassify. Through a series of iterations, the information of all the positive and negative samples is completely exploited. In each iteration, a new classification instance is trained by the equal number of positive and negative samples. In this way, the negative effect of class imbalance is efficiently relieved. The new instance primarily focuses on those samples that are easy to be misclassified. In addition, the positive samples are assigned higher cost weight than the negative samples. MiRNAClassify is compared with several state-of-the-art methods and some well-known classification models by testing the datasets about human, animal, and plant. The result of cross validation indicates that MiRNAClassify significantly outperforms other methods and models. In addition, the newly added pre-miRNAs are used to further evaluate the ability of these methods to discover novel pre-miRNAs. MiRNAClassify still achieves consistently superior performance and can discover more pre-miRNAs. Yingli Zhong, Ping Xuan, Ke Han, Weiping Zhang, and Jianzhong Li Copyright © 2015 Yingli Zhong et al. All rights reserved. Comparative Genome and Network Centrality Analysis to Identify Drug Targets of Mycobacterium tuberculosis H37Rv Thu, 05 Nov 2015 13:16:24 +0000 Potential drug targets of Mycobacterium tuberculosis H37Rv were identified through systematically integrated comparative genome and network centrality analysis. The comparative analysis of the complete genome of Mycobacterium tuberculosis H37Rv against Database of Essential Genes (DEG) yields a list of proteins which are essential for the growth and survival of the pathogen. Those proteins which are nonhomologous with human were selected. The resulting proteins were then prioritized by using the four network centrality measures: degree, closeness, betweenness, and eigenvector. Proteins whose centrality value is close to the centre of gravity of the interactome network were proposed as a final list of potential drug targets for the pathogen. The use of an integrated approach is believed to increase the success of the drug target identification process. For the purpose of validation, selective comparisons have been made among the proposed targets and previously identified drug targets by various other methods. About half of these proteins have been already reported as potential drug targets. We believe that the identified proteins will be an important input to experimental study which in the way could save considerable amount of time and cost of drug target discovery. Tilahun Melak and Sunita Gakkhar Copyright © 2015 Tilahun Melak and Sunita Gakkhar. All rights reserved. The Roles of miR-26, miR-29, and miR-203 in the Silencing of the Epigenetic Machinery during Melanocyte Transformation Wed, 04 Nov 2015 08:28:53 +0000 The epigenetic marks located throughout the genome exhibit great variation between normal and transformed cancer cells. While normal cells contain hypomethylated CpG islands near gene promoters and hypermethylated repetitive DNA, the opposite pattern is observed in cancer cells. Recently, it has been reported that alteration in the microenvironment of melanocyte cells, such as substrate adhesion blockade, results in the selection of anoikis-resistant cells, which have tumorigenic characteristics. Melanoma cells obtained through this model show an altered epigenetic pattern, which represents one of the first events during the melanocytes malignant transformation. Because microRNAs are involved in controlling components of the epigenetic machinery, the aim of this work was to evaluate the potential association between the expression of miR-203, miR-26, and miR-29 family members and the genes Dnmt3a, Dnmt3b, Mecp2, and Ezh2 during cells transformation. Our results show that microRNAs and their validated or predicted targets are inversely expressed, indicating that these molecules are involved in epigenetic reprogramming. We also show that miR-203 downregulates Dnmt3b in mouse melanocyte cells. In addition, treatment with 5-aza-CdR promotes the expression of miR-26 and miR-29 in a nonmetastatic melanoma cell line. Considering the occurrence of CpG islands near the miR-26 and miR-29 promoters, these data suggest that they might be epigenetically regulated in cancer. Cláudia Regina Gasque Schoof, Alberto Izzotti, Miriam Galvonas Jasiulionis, and Luciana dos Reis Vasques Copyright © 2015 Cláudia Regina Gasque Schoof et al. All rights reserved. Mining for Candidate Genes Related to Pancreatic Cancer Using Protein-Protein Interactions and a Shortest Path Approach Tue, 03 Nov 2015 09:37:59 +0000 Pancreatic cancer (PC) is a highly malignant tumor derived from pancreas tissue and is one of the leading causes of death from cancer. Its molecular mechanism has been partially revealed by validating its oncogenes and tumor suppressor genes; however, the available data remain insufficient for medical workers to design effective treatments. Large-scale identification of PC-related genes can promote studies on PC. In this study, we propose a computational method for mining new candidate PC-related genes. A large network was constructed using protein-protein interaction information, and a shortest path approach was applied to mine new candidate genes based on validated PC-related genes. In addition, a permutation test was adopted to further select key candidate genes. Finally, for all discovered candidate genes, the likelihood that the genes are novel PC-related genes is discussed based on their currently known functions. Fei Yuan, Yu-Hang Zhang, Sibao Wan, ShaoPeng Wang, and Xiang-Yin Kong Copyright © 2015 Fei Yuan et al. All rights reserved. Big Data and Network Biology 2015 Sun, 01 Nov 2015 07:06:50 +0000 Shigehiko Kanaya, Md. Altaf-Ul-Amin, Samuel K. Kiboi, and Farit Mochamad Afendi Copyright © 2015 Shigehiko Kanaya et al. All rights reserved. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species Thu, 29 Oct 2015 13:34:56 +0000 Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. Deborah Galpert, Sara del Río, Francisco Herrera, Evys Ancede-Gallardo, Agostinho Antunes, and Guillermin Agüero-Chapin Copyright © 2015 Deborah Galpert et al. All rights reserved. Frontiers in Integrative Genomics and Translational Bioinformatics Wed, 28 Oct 2015 13:36:26 +0000 Zhongming Zhao, Victor X. Jin, Yufei Huang, Chittibabu Guda, and Jianhua Ruan Copyright © 2015 Zhongming Zhao et al. All rights reserved. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence Wed, 28 Oct 2015 07:26:10 +0000 Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset of Yeast, Human, and H. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we used Yeast PPIs samples as training set to predict PPIs of other five species datasets. Yu-An Huang, Zhu-Hong You, Xin Gao, Leon Wong, and Lirong Wang Copyright © 2015 Yu-An Huang et al. All rights reserved. Systematic Analysis and Prediction of In Situ Cross Talk of O-GlcNAcylation and Phosphorylation Tue, 27 Oct 2015 06:51:33 +0000 Reversible posttranslational modification (PTM) plays a very important role in biological process by changing properties of proteins. As many proteins are multiply modified by PTMs, cross talk of PTMs is becoming an intriguing topic and draws much attention. Currently, lots of evidences suggest that the PTMs work together to accomplish a specific biological function. However, both the general principles and underlying mechanism of PTM crosstalk are elusive. In this study, by using large-scale datasets we performed evolutionary conservation analysis, gene ontology enrichment, motif extraction of proteins with cross talk of O-GlcNAcylation and phosphorylation cooccurring on the same residue. We found that proteins with in situ O-GlcNAc/Phos cross talk were significantly enriched in some specific gene ontology terms and no obvious evolutionary pressure was observed. Moreover, 3 functional motifs associated with O-GlcNAc/Phos sites were extracted. We further used sequence features and GO features to predict O-GlcNAc/Phos cross talk sites based on phosphorylated sites and O-GlcNAcylated sites separately by the use of SVM model. The AUC of classifier based on phosphorylated sites is 0.896 and the other classifier based on GlcNAcylated sites is 0.843. Both classifiers achieved a relatively better performance compared with other existing methods. Heming Yao, Ao Li, and Minghui Wang Copyright © 2015 Heming Yao et al. All rights reserved. JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method Mon, 26 Oct 2015 06:16:47 +0000 Different types of J-proteins perform distinct functions in chaperone processes and diseases development. Accurate identification of types of J-proteins will provide significant clues to reveal the mechanism of J-proteins and contribute to developing drugs for diseases. In this study, an ensemble predictor called JPPRED for J-protein prediction is proposed with hybrid features, including split amino acid composition (SAAC), pseudo amino acid composition (PseAAC), and position specific scoring matrix (PSSM). To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied. The average sensitivity of JPPRED based on above-mentioned individual feature spaces lies in the range of 0.744–0.851, indicating the discriminative power of these features. In addition, JPPRED yields the highest average sensitivity of 0.875 using the hybrid feature spaces of SAAC, PseAAC, and PSSM. Compared to individual base classifiers, JPPRED obtains more balanced and better performance for each type of J-proteins. To evaluate the prediction performance objectively, JPPRED is compared with previous study. Encouragingly, JPPRED obtains balanced performance for each type of J-proteins, which is significantly superior to that of the existing method. It is anticipated that JPPRED can be a potential candidate for J-protein prediction. Lina Zhang, Chengjin Zhang, Rui Gao, and Runtao Yang Copyright © 2015 Lina Zhang et al. All rights reserved. Identification of Gene Expression Pattern Related to Breast Cancer Survival Using Integrated TCGA Datasets and Genomic Tools Tue, 20 Oct 2015 14:15:40 +0000 Several large-scale human cancer genomics projects such as TCGA offered huge genomic and clinical data for researchers to obtain meaningful genomics alterations which intervene in the development and metastasis of the tumor. A web-based TCGA data analysis platform called TCGA4U was developed in this study. TCGA4U provides a visualization solution for this study to illustrate the relationship of these genomics alternations with clinical data. A whole genome screening of the survival related gene expression patterns in breast cancer was studied. The gene list that impacts the breast cancer patient survival was divided into two patterns. Gene list of each of these patterns was separately analyzed on DAVID. The result showed that mitochondrial ribosomes play a more crucial role in the cancer development. We also reported that breast cancer patients with low HSPA2 expression level had shorter overall survival time. This is widely different to findings of HSPA2 expression pattern in other cancer types. TCGA4U provided a new perspective for the TCGA datasets. We believe it can inspire more biomedical researchers to study and explain the genomic alterations in cancer development and discover more targeted therapies to help more cancer patients. Zhenzhen Huang, Huilong Duan, and Haomin Li Copyright © 2015 Zhenzhen Huang et al. All rights reserved. Proteomic Study to Survey the CIGB-552 Antitumor Effect Tue, 20 Oct 2015 11:43:43 +0000 CIGB-552 is a cell-penetrating peptide that exerts in vitro and in vivo antitumor effect on cancer cells. In the present work, the mechanism involved in such anticancer activity was studied using chemical proteomics and expression-based proteomics in culture cancer cell lines. CIGB-552 interacts with at least 55 proteins, as determined by chemical proteomics. A temporal differential proteomics based on iTRAQ quantification method was performed to identify CIGB-552 modulated proteins. The proteomic profile includes 72 differentially expressed proteins in response to CIGB-552 treatment. Proteins related to cell proliferation and apoptosis were identified by both approaches. In line with previous findings, proteomic data revealed that CIGB-552 triggers the inhibition of NF-κB signaling pathway. Furthermore, proteins related to cell invasion were differentially modulated by CIGB-552 treatment suggesting new potentialities of CIGB-552 as anticancer agent. Overall, the current study contributes to a better understanding of the antitumor action mechanism of CIGB-552. Arielis Rodríguez-Ulloa, Jeovanis Gil, Yassel Ramos, Lilian Hernández-Álvarez, Lisandra Flores, Brizaida Oliva, Dayana García, Aniel Sánchez-Puente, Alexis Musacchio-Lasa, Jorge Fernández-de-Cossio, Gabriel Padrón, Luis J. González López, Vladimir Besada, and Maribel Guerra-Vallespí Copyright © 2015 Arielis Rodríguez-Ulloa et al. All rights reserved. Personal Verification/Identification via Analysis of the Peripheral ECG Leads: Influence of the Personal Health Status on the Accuracy Mon, 19 Oct 2015 14:11:32 +0000 Traditional means for identity validation (PIN codes, passwords), and physiological and behavioral biometric characteristics (fingerprint, iris, and speech) are susceptible to hacker attacks and/or falsification. This paper presents a method for person verification/identification based on correlation of present-to-previous limb ECG leads: I (), II (), calculated from them first principal ECG component (), linear and nonlinear combinations between , , and . For the verification task, the one-to-one scenario is applied and threshold values for , , and and their combinations are derived. The identification task supposes one-to-many scenario and the tested subject is identified according to the maximal correlation with a previously recorded ECG in a database. The population based ECG-ILSA database of 540 patients (147 healthy subjects, 175 patients with cardiac diseases, and 218 with hypertension) has been considered. In addition a common reference PTB dataset (14 healthy individuals) with short time interval between the two acquisitions has been taken into account. The results on ECG-ILSA database were satisfactory with healthy people, and there was not a significant decrease in nonhealthy patients, demonstrating the robustness of the proposed method. With PTB database, the method provides an identification accuracy of 92.9% and a verification sensitivity and specificity of 100% and 89.9%. Irena Jekova and Giovanni Bortolan Copyright © 2015 Irena Jekova and Giovanni Bortolan. All rights reserved. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms Tue, 13 Oct 2015 13:54:53 +0000 The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms. Yang Xiang and Sarath Chandra Janga Copyright © 2015 Yang Xiang and Sarath Chandra Janga. All rights reserved. Predicting Drug-Target Interactions via Within-Score and Between-Score Mon, 12 Oct 2015 13:48:13 +0000 Network inference and local classification models have been shown to be useful in predicting newly potential drug-target interactions (DTIs) for assisting in drug discovery or drug repositioning. The idea is to represent drugs, targets, and their interactions as a bipartite network or an adjacent matrix. However, existing methods have not yet addressed appropriately several issues, such as the powerless inference in the case of isolated subnetworks, the biased classifiers derived from insufficient positive samples, the need of training a number of local classifiers, and the unavailable relationship between known DTIs and unapproved drug-target pairs (DTPs). Designing more effective approaches to address those issues is always desirable. In this paper, after presenting better drug similarities and target similarities, we characterize each DTP as a feature vector of within-scores and between-scores so as to hold the following superiorities: (1) a uniform vector of all types of DTPs, (2) only one global classifier with less bias benefiting from adequate positive samples, and (3) more importantly, the visualized relationship between known DTIs and unapproved DTPs. The effectiveness of our approach is finally demonstrated via comparing with other popular methods under cross validation and predicting potential interactions for DTPs under the validation in existing databases. Jian-Yu Shi, Zun Liu, Hui Yu, and Yong-Jun Li Copyright © 2015 Jian-Yu Shi et al. All rights reserved. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection Mon, 12 Oct 2015 11:18:29 +0000 The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. Xin Ma, Jing Guo, and Xiao Sun Copyright © 2015 Xin Ma et al. All rights reserved. RNAseq by Total RNA Library Identifies Additional RNAs Compared to Poly(A) RNA Library Mon, 12 Oct 2015 09:19:06 +0000 The most popular RNA library used for RNA sequencing is the poly(A) captured RNA library. This library captures RNA based on the presence of poly(A) tails at the 3′ end. Another type of RNA library for RNA sequencing is the total RNA library which differs from the poly(A) library by capture method and price. The total RNA library costs more and its capture of RNA is not dependent on the presence of poly(A) tails. In practice, only ribosomal RNAs and small RNAs are washed out in the total RNA library preparation. To evaluate the ability of detecting RNA for both RNA libraries we designed a study using RNA sequencing data of the same two breast cancer cell lines from both RNA libraries. We found that the RNA expression values captured by both RNA libraries were highly correlated. However, the number of RNAs captured was significantly higher for the total RNA library. Furthermore, we identify several subsets of protein coding RNAs that were not captured efficiently by the poly(A) library. One of the most noticeable is the histone-encode genes, which lack the poly(A) tail. Yan Guo, Shilin Zhao, Quanhu Sheng, Mingsheng Guo, Brian Lehmann, Jennifer Pietenpol, David C. Samuels, and Yu Shyr Copyright © 2015 Yan Guo et al. All rights reserved. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA Mon, 12 Oct 2015 08:57:17 +0000 A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. Huiyan Jiang, Di Zhao, Ruiping Zheng, and Xiaoqi Ma Copyright © 2015 Huiyan Jiang et al. All rights reserved. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes Mon, 12 Oct 2015 08:53:14 +0000 Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics. Kashish Chetal and Sarath Chandra Janga Copyright © 2015 Kashish Chetal and Sarath Chandra Janga. All rights reserved. Coexpression Network Analysis of miRNA-142 Overexpression in Neuronal Cells Sun, 11 Oct 2015 14:01:19 +0000 MicroRNAs are small noncoding RNA molecules, which are differentially expressed in diverse biological processes and are also involved in the regulation of multiple genes. A number of sites in the 3′ untranslated regions (UTRs) of different mRNAs allow complimentary binding for a microRNA, leading to their posttranscriptional regulation. The miRNA-142 is one of the microRNAs overexpressed in neurons that is found to regulate SIRT1 and MAOA genes. Differential analysis of gene expression data, which is focused on identifying up- or downregulated genes, ignores many relationships between genes affected by miRNA-142 overexpression in a cell. Thus, we applied a correlation network model to identify the coexpressed genes and to study the impact of miRNA-142 overexpression on this network. Combining multiple sources of knowledge is useful to infer meaningful relationships in systems biology. We applied coexpression model on the data obtained from wild type and miR-142 overexpression neuronal cells and integrated miRNA seed sequence mapping information to identify genes greatly affected by this overexpression. Larger differences in the enriched networks revealed that the nervous system development related genes such as TEAD2, PLEKHA6, and POGLUT1 were greatly impacted due to miRNA-142 overexpression. Ishwor Thapa, Howard S. Fox, and Dhundy Bastola Copyright © 2015 Ishwor Thapa et al. All rights reserved. Assessing Computational Steps for CLIP-Seq Data Analysis Sun, 11 Oct 2015 13:53:05 +0000 RNA-binding protein (RBP) is a key player in regulating gene expression at the posttranscriptional level. CLIP-Seq, with the ability to provide a genome-wide map of protein-RNA interactions, has been increasingly used to decipher RBP-mediated posttranscriptional regulation. Generating highly reliable binding sites from CLIP-Seq requires not only stringent library preparation but also considerable computational efforts. Here we presented a first systematic evaluation of major computational steps for identifying RBP binding sites from CLIP-Seq data, including preprocessing, the choice of control samples, peak normalization, and motif discovery. We found that avoiding PCR amplification artifacts, normalizing to input RNA or mRNAseq, and defining the background model from control samples can reduce the bias introduced by RNA abundance and improve the quality of detected binding sites. Our findings can serve as a general guideline for CLIP experiments design and the comprehensive analysis of CLIP-Seq data. Qi Liu, Xue Zhong, Blair B. Madison, Anil K. Rustgi, and Yu Shyr Copyright © 2015 Qi Liu et al. All rights reserved. Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations Sun, 11 Oct 2015 13:45:41 +0000 An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data. Yukun Chen, Jingchun Sun, Liang-Chin Huang, Hua Xu, and Zhongming Zhao Copyright © 2015 Yukun Chen et al. All rights reserved. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference Sun, 11 Oct 2015 13:43:57 +0000 High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves. Adam Cornish and Chittibabu Guda Copyright © 2015 Adam Cornish and Chittibabu Guda. All rights reserved. How to Choose In Vitro Systems to Predict In Vivo Drug Clearance: A System Pharmacology Perspective Sun, 11 Oct 2015 13:35:32 +0000 The use of in vitro metabolism data to predict human clearance has become more significant in the current prediction of large scale drug clearance for all the drugs. The relevant information (in vitro metabolism data and in vivo human clearance values) of thirty-five drugs that satisfied the entry criteria of probe drugs was collated from the literature. Then the performance of different in vitro systems including Escherichia coli system, yeast system, lymphoblastoid system and baculovirus system is compared after in vitro-in vivo extrapolation. Baculovirus system, which can provide most of the data, has almost equal accuracy as the other systems in predicting clearance. And in most cases, baculovirus system has the smaller CV in scaling factors. Therefore, the baculovirus system can be recognized as the suitable system for the large scale drug clearance prediction. Lei Wang, ChienWei Chiang, Hong Liang, Hengyi Wu, Weixing Feng, Sara K. Quinney, Jin Li, and Lang Li Copyright © 2015 Lei Wang et al. All rights reserved. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction Sun, 04 Oct 2015 11:09:01 +0000 Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. Daqing Zhang, Jianfeng Xiao, Nannan Zhou, Mingyue Zheng, Xiaomin Luo, Hualiang Jiang, and Kaixian Chen Copyright © 2015 Daqing Zhang et al. All rights reserved. How to Use SNP_TATA_Comparator to Find a Significant Change in Gene Expression Caused by the Regulatory SNP of This Gene’s Promoter via a Change in Affinity of the TATA-Binding Protein for This Promoter Sun, 04 Oct 2015 07:28:06 +0000 The use of biomedical SNP markers of diseases can improve effectiveness of treatment. Genotyping of patients with subsequent searching for SNPs more frequent than in norm is the only commonly accepted method for identification of SNP markers within the framework of translational research. The bioinformatics applications aimed at millions of unannotated SNPs of the “1000 Genomes” can make this search for SNP markers more focused and less expensive. We used our Web service involving Fisher’s -score for candidate SNP markers to find a significant change in a gene’s expression. Here we analyzed the change caused by SNPs in the gene’s promoter via a change in affinity of the TATA-binding protein for this promoter. We provide examples and discuss how to use this bioinformatics application in the course of practical analysis of unannotated SNPs from the “1000 Genomes” project. Using known biomedical SNP markers, we identified 17 novel candidate SNP markers nearby: rs549858786 (rheumatoid arthritis); rs72661131 (cardiovascular events in rheumatoid arthritis); rs562962093 (stroke); rs563558831 (cyclophosphamide bioactivation); rs55878706 (malaria resistance, leukopenia), rs572527200 (asthma, systemic sclerosis, and psoriasis), rs371045754 (hemophilia B), rs587745372 (cardiovascular events); rs372329931, rs200209906, rs367732974, and rs549591993 (all four: cancer); rs17231520 and rs569033466 (both: atherosclerosis); rs63750953, rs281864525, and rs34166473 (all three: malaria resistance, thalassemia). Mikhail Ponomarenko, Dmitry Rasskazov, Olga Arkova, Petr Ponomarenko, Valentin Suslov, Ludmila Savinkova, and Nikolay Kolchanov Copyright © 2015 Mikhail Ponomarenko et al. All rights reserved. Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Zebrafish Embryo Thu, 01 Oct 2015 13:15:34 +0000 Recent progress in microscopy technologies, biological markers, and automated processing methods is making possible the development of gene expression atlases at cellular-level resolution over whole embryos. Raw data on gene expression is usually very noisy. This noise comes from both experimental (technical/methodological) and true biological sources (from stochastic biochemical processes). In addition, the cells or nuclei being imaged are irregularly arranged in 3D space. This makes the processing, extraction, and study of expression signals and intrinsic biological noise a serious challenge for 3D data, requiring new computational approaches. Here, we present a new approach for studying gene expression in nuclei located in a thick layer around a spherical surface. The method includes depth equalization on the sphere, flattening, interpolation to a regular grid, pattern extraction by Shaped 3D singular spectrum analysis (SSA), and interpolation back to original nuclear positions. The approach is demonstrated on several examples of gene expression in the zebrafish egg (a model system in vertebrate development). The method is tested on several different data geometries (e.g., nuclear positions) and different forms of gene expression patterns. Fully 3D datasets for developmental gene expression are becoming increasingly available; we discuss the prospects of applying 3D-SSA to data processing and analysis in this growing field. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Analysis of Chemical Properties of Edible and Medicinal Ginger by Metabolomics Approach Thu, 01 Oct 2015 13:06:22 +0000 In traditional herbal medicine, comprehensive understanding of bioactive constituent is important in order to analyze its true medicinal function. We investigated the chemical properties of medicinal and edible ginger cultivars using a liquid-chromatography mass spectrometry (LC-MS) approach. Our PCA results indicate the importance of acetylated derivatives of gingerol, not gingerol or shogaol, as the medicinal indicator. A newly developed ginger cultivar, Z. officinale cv. Ogawa Umare or “Ogawa Umare” (OG), contains more active ingredients, showing properties as a new resource for the production of herbal medicines derived from ginger in terms of its chemical constituents and rhizome yield. Ken Tanaka, Masanori Arita, Hiroaki Sakurai, Naoaki Ono, and Yasuhiro Tezuka Copyright © 2015 Ken Tanaka et al. All rights reserved. EMRlog Method for Computer Security for Electronic Medical Records with Logic and Data Mining Thu, 01 Oct 2015 13:04:50 +0000 The proper functioning of a hospital computer system is an arduous work for managers and staff. However, inconsistent policies are frequent and can produce enormous problems, such as stolen information, frequent failures, and loss of the entire or part of the hospital data. This paper presents a new method named EMRlog for computer security systems in hospitals. EMRlog is focused on two kinds of security policies: directive and implemented policies. Security policies are applied to computer systems that handle huge amounts of information such as databases, applications, and medical records. Firstly, a syntactic verification step is applied by using predicate logic. Then data mining techniques are used to detect which security policies have really been implemented by the computer systems staff. Subsequently, consistency is verified in both kinds of policies; in addition these subsets are contrasted and validated. This is performed by an automatic theorem prover. Thus, many kinds of vulnerabilities can be removed for achieving a safer computer system. Sergio Mauricio Martínez Monterrubio, Juan Frausto Solis, and Raúl Monroy Borja Copyright © 2015 Sergio Mauricio Martínez Monterrubio et al. All rights reserved. Cellular Metabolic Network Analysis: Discovering Important Reactions in Treponema pallidum Thu, 01 Oct 2015 11:46:40 +0000 T. pallidum, the syphilis-causing pathogen, performs very differently in metabolism compared with other bacterial pathogens. The desire for safe and effective vaccine of syphilis requests identification of important steps in T. pallidum’s metabolism. Here, we apply Flux Balance Analysis to represent the reactions quantitatively. Thus, it is possible to cluster all reactions in T. pallidum. By calculating minimal cut sets and analyzing topological structure for the metabolic network of T. pallidum, critical reactions are identified. As a comparison, we also apply the analytical approaches to the metabolic network of H. pylori to find coregulated drug targets and unique drug targets for different microorganisms. Based on the clustering results, all reactions are further classified into various roles. Therefore, the general picture of their metabolic network is obtained and two types of reactions, both of which are involved in nucleic acid metabolism, are found to be essential for T. pallidum. It is also discovered that both hubs of reactions and the isolated reactions in purine and pyrimidine metabolisms play important roles in T. pallidum. These reactions could be potential drug targets for treating syphilis. Xueying Chen, Min Zhao, and Hong Qu Copyright © 2015 Xueying Chen et al. All rights reserved. Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data Thu, 01 Oct 2015 07:26:59 +0000 With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data. Akihito Kikuchi, Toshimichi Ikemura, and Takashi Abe Copyright © 2015 Akihito Kikuchi et al. All rights reserved. Discovering Distinct Functional Modules of Specific Cancer Types Using Protein-Protein Interaction Networks Thu, 01 Oct 2015 07:05:17 +0000 Background. The molecular profiles exhibited in different cancer types are very different; hence, discovering distinct functional modules associated with specific cancer types is very important to understand the distinct functions associated with them. Protein-protein interaction networks carry vital information about molecular interactions in cellular systems, and identification of functional modules (subgraphs) in these networks is one of the most important applications of biological network analysis. Results. In this study, we developed a new graph theory based method to identify distinct functional modules from nine different cancer protein-protein interaction networks. The method is composed of three major steps: (i) extracting modules from protein-protein interaction networks using network clustering algorithms; (ii) identifying distinct subgraphs from the derived modules; and (iii) identifying distinct subgraph patterns from distinct subgraphs. The subgraph patterns were evaluated using experimentally determined cancer-specific protein-protein interaction data from the Ingenuity knowledgebase, to identify distinct functional modules that are specific to each cancer type. Conclusion. We identified cancer-type specific subgraph patterns that may represent the functional modules involved in the molecular pathogenesis of different cancer types. Our method can serve as an effective tool to discover cancer-type specific functional modules from large protein-protein interaction networks. Ru Shen, Xiaosheng Wang, and Chittibabu Guda Copyright © 2015 Ru Shen et al. All rights reserved. Development and Mining of a Volatile Organic Compound Database Thu, 01 Oct 2015 06:59:32 +0000 Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online. Azian Azamimi Abdullah, Md. Altaf-Ul-Amin, Naoaki Ono, Tetsuo Sato, Tadao Sugiura, Aki Hirai Morita, Tetsuo Katsuragi, Ai Muto, Takaaki Nishioka, and Shigehiko Kanaya Copyright © 2015 Azian Azamimi Abdullah et al. All rights reserved. Systematic Analysis of the Associations between Adverse Drug Reactions and Pathways Thu, 01 Oct 2015 06:52:17 +0000 Adverse drug reactions (ADRs) are responsible for drug candidate failure during clinical trials. It is crucial to investigate biological pathways contributing to ADRs. Here, we applied a large-scale analysis to identify overrepresented ADR-pathway combinations through merging clinical phenotypic data, biological pathway data, and drug-target relations. Evaluation was performed by scientific literature review and defining a pathway-based ADR-ADR similarity measure. The results showed that our method is efficient for finding the associations between ADRs and pathways. To more systematically understand the mechanisms of ADRs, we constructed an ADR-pathway network and an ADR-ADR network. Through network analysis on biology and pharmacology, it was found that frequent ADRs were associated with more pathways than infrequent and rare ADRs. Moreover, environmental information processing pathways contributed most to the observed ADRs. Integrating the system organ class of ADRs, we found that most classes tended to interact with other classes instead of themselves. ADR classes were distributed promiscuously in all the ADR cliques. These results reflected that drug perturbation to a certain pathway can cause changes in multiple organs, rather than in one specific organ. Our work not only provides a global view of the associations between ADRs and pathways, but also is helpful to understand the mechanisms of ADRs. Xiaowen Chen, Yanqiu Wang, Pingping Wang, Baofeng Lian, Chunquan Li, Jing Wang, Xia Li, and Wei Jiang Copyright © 2015 Xiaowen Chen et al. All rights reserved. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text Thu, 01 Oct 2015 06:50:59 +0000 The substrates of a transporter are not only useful for inferring function of the transporter, but also important to discover compound-compound interaction and to reconstruct metabolic pathway. Though plenty of data has been accumulated with the developing of new technologies such as in vitro transporter assays, the search for substrates of transporters is far from complete. In this article, we introduce METSP, a maximum-entropy classifier devoted to retrieve transporter-substrate pairs (TSPs) from semistructured text. Based on the high quality annotation from UniProt, METSP achieves high precision and recall in cross-validation experiments. When METSP is applied to 182,829 human transporter annotation sentences in UniProt, it identifies 3942 sentences with transporter and compound information. Finally, 1547 confidential human TSPs are identified for further manual curation, among which 58.37% pairs with novel substrates not annotated in public transporter databases. METSP is the first efficient tool to extract TSPs from semistructured annotation text in UniProt. This tool can help to determine the precise substrates and drugs of transporters, thus facilitating drug-target prediction, metabolic network reconstruction, and literature classification. Min Zhao, Yanming Chen, Dacheng Qu, and Hong Qu Copyright © 2015 Min Zhao et al. All rights reserved. A Glimpse to Background and Characteristics of Major Molecular Biological Networks Wed, 30 Sep 2015 13:30:47 +0000 Recently, biology has become a data intensive science because of huge data sets produced by high throughput molecular biological experiments in diverse areas including the fields of genomics, transcriptomics, proteomics, and metabolomics. These huge datasets have paved the way for system-level analysis of the processes and subprocesses of the cell. For system-level understanding, initially the elements of a system are connected based on their mutual relations and a network is formed. Among omics researchers, construction and analysis of biological networks have become highly popular. In this review, we briefly discuss both the biological background and topological properties of major types of omics networks to facilitate a comprehensive understanding and to conceptualize the foundation of network biology. Md. Altaf-Ul-Amin, Tetsuo Katsuragi, Tetsuo Sato, and Shigehiko Kanaya Copyright © 2015 Md. Altaf-Ul-Amin et al. All rights reserved. Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data Mon, 07 Sep 2015 06:56:22 +0000 Guohua Wang, Yunlong Liu, Dongxiao Zhu, Gunnar W. Klau, and Weixing Feng Copyright © 2015 Guohua Wang et al. All rights reserved. MicroRNA Promoter Identification in Arabidopsis Using Multiple Histone Markers Thu, 03 Sep 2015 13:14:36 +0000 A microRNA is a small noncoding RNA molecule, which functions in RNA silencing and posttranscriptional regulation of gene expression. To understand the mechanism of the activation of microRNA genes, the location of promoter regions driving their expression is required to be annotated precisely. Only a fraction of microRNA genes have confirmed transcription start sites (TSSs), which hinders our understanding of the transcription factor binding events. With the development of the next generation sequencing technology, the chromatin states can be inferred precisely by virtue of a combination of specific histone modifications. Using the genome-wide profiles of nine histone markers including H3K4me2, H3K4me3, H3K9Ac, H3K9me2, H3K18Ac, H3K27me1, H3K27me3, H3K36me2, and H3K36me3, we developed a computational strategy to identify the promoter regions of most microRNA genes in Arabidopsis, based upon the assumption that the distribution of histone markers around the TSSs of microRNA genes is similar to the TSSs of protein coding genes. Among 298 miRNA genes, our model identified 42 independent miRNA TSSs and 132 miRNA TSSs, which are located in the promoters of upstream genes. The identification of promoters will provide better understanding of microRNA regulation and can play an important role in the study of diseases at genetic level. Yuming Zhao, Fang Wang, and Liran Juan Copyright © 2015 Yuming Zhao et al. All rights reserved. Constructing a Genome-Wide LD Map of Wild A. gambiae Using Next-Generation Sequencing Thu, 03 Sep 2015 13:11:49 +0000 Anopheles gambiae is the major malaria vector in Africa. Examining the molecular basis of A. gambiae traits requires knowledge of both genetic variation and genome-wide linkage disequilibrium (LD) map of wild A. gambiae populations from malaria-endemic areas. We sequenced the genomes of nine wild A. gambiae mosquitoes individually using next-generation sequencing technologies and detected 2,219,815 common single nucleotide polymorphisms (SNPs), 88% of which are novel. SNPs are not evenly distributed across A. gambiae chromosomes. The low SNP-frequency regions overlay heterochromatin and chromosome inversion domains, consistent with the lower recombinant rates at these regions. Nearly one million SNPs that were genotyped correctly in all individual mosquitoes with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, we constructed a genome-wide LD map for wild A. gambiae from malaria-endemic areas in Kenya and made it available through a public Website. The average size of LD blocks is less than 40 bp, and several large LD blocks were also discovered clustered around the para gene, which is consistent with the effect of insecticide selective sweeps. The SNPs and the LD map will be valuable resources for scientific communities to dissect the A. gambiae genome. Xiaohong Wang, Yaw A. Afrane, Guiyun Yan, and Jun Li Copyright © 2015 Xiaohong Wang et al. All rights reserved. Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico Thu, 03 Sep 2015 11:55:27 +0000 Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS. Feng Min, Sumei Wang, and Li Zhang Copyright © 2015 Feng Min et al. All rights reserved. Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites Thu, 03 Sep 2015 09:24:16 +0000 Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5–20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation. Guohua Wang, Fang Wang, Qian Huang, Yu Li, Yunlong Liu, and Yadong Wang Copyright © 2015 Guohua Wang et al. All rights reserved. Active Microbial Communities Inhabit Sulphate-Methane Interphase in Deep Bedrock Fracture Fluids in Olkiluoto, Finland Thu, 03 Sep 2015 09:23:57 +0000 Active microbial communities of deep crystalline bedrock fracture water were investigated from seven different boreholes in Olkiluoto (Western Finland) using bacterial and archaeal 16S rRNA, dsrB, and mcrA gene transcript targeted 454 pyrosequencing. Over a depth range of 296–798 m below ground surface the microbial communities changed according to depth, salinity gradient, and sulphate and methane concentrations. The highest bacterial diversity was observed in the sulphate-methane mixing zone (SMMZ) at 250–350 m depth, whereas archaeal diversity was highest in the lowest boundaries of the SMMZ. Sulphide-oxidizing ε-proteobacteria (Sulfurimonas sp.) dominated in the SMMZ and γ-proteobacteria (Pseudomonas spp.) below the SMMZ. The active archaeal communities consisted mostly of ANME-2D and Thermoplasmatales groups, although Methermicoccaceae, Methanobacteriaceae, and Thermoplasmatales (SAGMEG, TMG) were more common at 415–559 m depth. Typical indicator microorganisms for sulphate-methane transition zones in marine sediments, such as ANME-1 archaea, α-, β- and δ-proteobacteria, JS1, Actinomycetes, Planctomycetes, Chloroflexi, and MBGB Crenarchaeota were detected at specific depths. DsrB genes were most numerous and most actively transcribed in the SMMZ while the mcrA gene concentration was highest in the deep methane rich groundwater. Our results demonstrate that active and highly diverse but sparse and stratified microbial communities inhabit the Fennoscandian deep bedrock ecosystems. Malin Bomberg, Mari Nyyssönen, Petteri Pitkänen, Anne Lehtinen, and Merja Itävaara Copyright © 2015 Malin Bomberg et al. All rights reserved. 454-Pyrosequencing Analysis of Bacterial Communities from Autotrophic Nitrogen Removal Bioreactors Utilizing Universal Primers: Effect of Annealing Temperature Thu, 03 Sep 2015 09:22:02 +0000 Identification of anaerobic ammonium oxidizing (anammox) bacteria by molecular tools aimed at the evaluation of bacterial diversity in autotrophic nitrogen removal systems is limited by the difficulty to design universal primers for the Bacteria domain able to amplify the anammox 16S rRNA genes. A metagenomic analysis (pyrosequencing) of total bacterial diversity including anammox population in five autotrophic nitrogen removal technologies, two bench-scale models (MBR and Low Temperature CANON) and three full-scale bioreactors (anammox, CANON, and DEMON), was successfully carried out by optimization of primer selection and PCR conditions (annealing temperature). The universal primer 530F was identified as the best candidate for total bacteria and anammox bacteria diversity coverage. Salt-adjusted optimum annealing temperature of primer 530F was calculated (47°C) and hence a range of annealing temperatures of 44–49°C was tested. Pyrosequencing data showed that annealing temperature of 45°C yielded the best results in terms of species richness and diversity for all bioreactors analyzed. Alejandro Gonzalez-Martinez, Alejandro Rodriguez-Sanchez, Belén Rodelas, Ben A. Abbas, Maria Victoria Martinez-Toledo, Mark C. M. van Loosdrecht, F. Osorio, and Jesus Gonzalez-Lopez Copyright © 2015 Alejandro Gonzalez-Martinez et al. All rights reserved. mmnet: An R Package for Metagenomics Systems Biology Analysis Thu, 03 Sep 2015 09:22:02 +0000 The human microbiome plays important roles in human health and disease. Previous microbiome studies focused mainly on single pure species function and overlooked the interactions in the complex communities on system-level. A metagenomic approach introduced recently integrates metagenomic data with community-level metabolic network modeling, but no comprehensive tool was available for such kind of approaches. To facilitate these kinds of studies, we developed an R package, mmnet, to implement community-level metabolic network reconstruction. The package also implements a set of functions for automatic analysis pipeline construction including functional annotation of metagenomic reads, abundance estimation of enzymatic genes, community-level metabolic network reconstruction, and integrated network analysis. The result can be represented in an intuitive way and sent to Cytoscape for further exploration. The package has substantial potentials in metagenomic studies that focus on identifying system-level variations of human microbiome associated with disease. Yang Cao, Xiaofei Zheng, Fei Li, and Xiaochen Bo Copyright © 2015 Yang Cao et al. All rights reserved. Genetic Interactions Explain Variance in Cingulate Amyloid Burden: An AV-45 PET Genome-Wide Association and Interaction Study in the ADNI Cohort Thu, 03 Sep 2015 09:20:58 +0000 Alzheimer’s disease (AD) is the most common neurodegenerative disorder. Using discrete disease status as the phenotype and computing statistics at the single marker level may not be able to address the underlying biological interactions that contribute to disease mechanism and may contribute to the issue of “missing heritability.” We performed a genome-wide association study (GWAS) and a genome-wide interaction study (GWIS) of an amyloid imaging phenotype, using the data from Alzheimer’s Disease Neuroimaging Initiative. We investigated the genetic main effects and interaction effects on cingulate amyloid-beta (A) load in an effort to better understand the genetic etiology of A deposition that is a widely studied AD biomarker. PLINK was used in the single marker GWAS, and INTERSNP was used to perform the two-marker GWIS, focusing only on SNPs with for the GWAS analysis. Age, sex, and diagnosis were used as covariates in both analyses. Corrected p values using the Bonferroni method were reported. The GWAS analysis revealed significant hits within or proximal to APOE, APOC1, and TOMM40 genes, which were previously implicated in AD. The GWIS analysis yielded 8 novel SNP-SNP interaction findings that warrant replication and further investigation. Jin Li, Qiushi Zhang, Feng Chen, Jingwen Yan, Sungeun Kim, Lei Wang, Weixing Feng, Andrew J. Saykin, Hong Liang, and Li Shen Copyright © 2015 Jin Li et al. All rights reserved. How to Isolate a Plant’s Hypomethylome in One Shot Thu, 03 Sep 2015 09:14:51 +0000 Genome assembly remains a challenge for large and/or complex plant genomes due to their abundant repetitive regions resulting in studies focusing on gene space instead of the whole genome. Thus, DNA enrichment strategies facilitate the assembly by increasing the coverage and simultaneously reducing the complexity of the whole genome. In this paper we provide an easy, fast, and cost-effective variant of MRE-seq to obtain a plant’s hypomethylome by an optimized methyl filtration protocol followed by next generation sequencing. The method is demonstrated on three plant species with knowingly large and/or complex (polyploid) genomes: Oryza sativa, Picea abies, and Crocus sativus. The identified hypomethylomes show clear enrichment for genes and their flanking regions and clear reduction of transposable elements. Additionally, genomic sequences around genes are captured including regulatory elements in introns and up- and downstream flanks. High similarity of the results obtained by a de novo assembly approach with a reference based mapping in rice supports the applicability for studying and understanding the genomes of nonmodel organisms. Hence we show the high potential of MRE-seq in a wide range of scenarios for the direct analysis of methylation differences, for example, between ecotypes, individuals, within or across species harbouring large, and complex genomes. Elisabeth Wischnitzki, Eva Maria Sehr, Karin Hansel-Hohl, Maria Berenyi, Kornel Burg, and Silvia Fluch Copyright © 2015 Elisabeth Wischnitzki et al. All rights reserved. Data Acquisition and Processing in Biology and Medicine Wed, 26 Aug 2015 10:13:46 +0000 Cheng-Hong Yang, Yu-Jie Huang, An Liu, Yi Rong, and Tsair-Fwu Lee Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. The Combinational Polymorphisms of ORAI1 Gene Are Associated with Preventive Models of Breast Cancer in the Taiwanese Tue, 25 Aug 2015 14:02:28 +0000 The ORAI calcium release-activated calcium modulator 1 (ORAI1) has been proven to be an important gene for breast cancer progression and metastasis. However, the protective association model between the single nucleotide polymorphisms (SNPs) of ORAI1 gene was not investigated. Based on a published data set of 345 female breast cancer patients and 290 female controls, we used a particle swarm optimization (PSO) algorithm to identify the possible protective models of breast cancer association in terms of the SNPs of ORAI1 gene. Results showed that the PSO-generated models of 2-SNP (rs12320939-TT/rs12313273-CC), 3-SNP (rs12320939-TT/rs12313273-CC/rs712853-(TT/TC)), 4-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs712853-(TT/TC)), and 5-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs6486795-CC/rs712853-(TT/TC)) displayed low values of odds ratios (0.409–0.425) for breast cancer association. Taken together, these results suggested that our proposed PSO strategy is powerful to identify the combinational SNPs of rs12320939, rs12313273, rs7135617, rs6486795, and rs712853 of ORAI1 gene with a strongly protective association in breast cancer. Fu Ou-Yang, Yu-Da Lin, Li-Yeh Chuang, Hsueh-Wei Chang, Cheng-Hong Yang, and Ming-Feng Hou Copyright © 2015 Fu Ou-Yang et al. All rights reserved. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information Tue, 25 Aug 2015 08:22:17 +0000 Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. Chi Zhang, Li Tong, Ying Zeng, Jingfang Jiang, Haibing Bu, Bin Yan, and Jianxin Li Copyright © 2015 Chi Zhang et al. All rights reserved. Tennis Elbow Diagnosis Using Equivalent Uniform Voltage to Fit the Logistic and the Probit Diseased Probability Models Tue, 25 Aug 2015 07:46:17 +0000 To develop the logistic and the probit models to analyse electromyographic (EMG) equivalent uniform voltage- (EUV-) response for the tenderness of tennis elbow. In total, 78 hands from 39 subjects were enrolled. In this study, surface EMG (sEMG) signal is obtained by an innovative device with electrodes over forearm region. The analytical endpoint was defined as Visual Analog Score (VAS) 3+ tenderness of tennis elbow. The logistic and the probit diseased probability (DP) models were established for the VAS score and EMG absolute voltage-time histograms (AVTH). TV50 is the threshold equivalent uniform voltage predicting a 50% risk of disease. Twenty-one out of 78 samples (27%) developed VAS 3+ tenderness of tennis elbow reported by the subject and confirmed by the physician. The fitted DP parameters were TV50 = 153.0 mV (CI: 136.3–169.7 mV), γ50 = 0.84 (CI: 0.78–0.90) and TV50 = 155.6 mV (CI: 138.9–172.4 mV), m = 0.54 (CI: 0.49–0.59) for logistic and probit models, respectively. When the EUV ≥ 153 mV, the DP of the patient is greater than 50% and vice versa. The logistic and the probit models are valuable tools to predict the DP of VAS 3+ tenderness of tennis elbow. Tsair-Fwu Lee, Wei-Chun Lin, Hung-Yu Wang, Shu-Yuan Lin, Li-Fu Wu, Shih-Sian Guo, Hsiang-Jui Huang, Hui-Min Ting, and Pei-Ju Chao Copyright © 2015 Tsair-Fwu Lee et al. All rights reserved. A Data Hiding Technique to Synchronously Embed Physiological Signals in H.264/AVC Encoded Video for Medicine Healthcare Tue, 25 Aug 2015 07:45:41 +0000 The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal’s samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of −2.6196% for high and medium motion video sequences. Raul Peña, Alfonso Ávila, David Muñoz, and Juan Lavariega Copyright © 2015 Raul Peña et al. All rights reserved. Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation Mon, 24 Aug 2015 14:18:37 +0000 This paper presents a data-driven multiscale entropy measure to reveal the scale dependent information quantity of electroencephalogram (EEG) recordings. This work is motivated by the previous observations on the nonlinear and nonstationary nature of EEG over multiple time scales. Here, a new framework of entropy measures considering changing dynamics over multiple oscillatory scales is presented. First, to deal with nonstationarity over multiple scales, EEG recording is decomposed by applying the empirical mode decomposition (EMD) which is known to be effective for extracting the constituent narrowband components without a predetermined basis. Following calculation of Renyi entropy of the probability distributions of the intrinsic mode functions extracted by EMD leads to a data-driven multiscale Renyi entropy. To validate the performance of the proposed entropy measure, actual EEG recordings from rats experiencing 7 min cardiac arrest followed by resuscitation were analyzed. Simulation and experimental results demonstrate that the use of the multiscale Renyi entropy leads to better discriminative capability of the injury levels and improved correlations with the neurological deficit evaluation after 72 hours after cardiac arrest, thus suggesting an effective diagnostic and prognostic tool. Young-Seok Choi Copyright © 2015 Young-Seok Choi. All rights reserved. Gene Network Analysis of Glucose Linked Signaling Pathways and Their Role in Human Hepatocellular Carcinoma Cell Growth and Survival in HuH7 and HepG2 Cell Lines Mon, 24 Aug 2015 11:19:10 +0000 Cancer progression may be affected by metabolism. In this study, we aimed to analyze the effect of glucose on the proliferation and/or survival of human hepatocellular carcinoma (HCC) cells. Human gene datasets regulated by glucose were compared to gene datasets either dysregulated in HCC or regulated by other signaling pathways. Significant numbers of common genes suggested putative involvement in transcriptional regulations by glucose. Real-time proliferation assays using high (4.5 g/L) versus low (1 g/L) glucose on two human HCC cell lines and specific inhibitors of selected pathways were used for experimental validations. High glucose promoted HuH7 cell proliferation but not that of HepG2 cell line. Gene network analyses suggest that gene transcription by glucose could be mediated at 92% through ChREBP in HepG2 cells, compared to 40% in either other human cells or rodent healthy liver, with alteration of LKB1 (serine/threonine kinase 11) and NOX (NADPH oxidases) signaling pathways and loss of transcriptional regulation of PPARGC1A (peroxisome-proliferator activated receptors gamma coactivator 1) target genes by high glucose. Both PPARA and PPARGC1A regulate transcription of genes commonly regulated by glycolysis, by the antidiabetic agent metformin and by NOX, suggesting their major interplay in the control of HCC progression. Emmanuelle Berger, Nathalie Vega, Michèle Weiss-Gayet, and Alain Géloën Copyright © 2015 Emmanuelle Berger et al. All rights reserved. Applying NGS Data to Find Evolutionary Network Biomarkers from the Early and Late Stages of Hepatocellular Carcinoma Thu, 20 Aug 2015 07:07:01 +0000 Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. In this study, we used a systems biology approach to construct protein-protein interaction networks (PPINs) for early-stage and late-stage liver cancer. By comparing the networks of these two stages, we found that the two networks showed some common mechanisms and some significantly different mechanisms. To obtain differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two stages of liver cancer by systems biology method using NGS data from cancer cells and adjacent noncancer cells. Using carcinogenesis relevance values (CRVs), we identified 43 and 80 significant proteins and their PPINs (network markers) for early-stage and late-stage liver cancer. To investigate the evolution of network biomarkers in the carcinogenesis process, a primary pathway analysis showed that common pathways of the early and late stages were those related to ordinary cancer mechanisms. A pathway specific to the early stage was the mismatch repair pathway, while pathways specific to the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This study provides a new direction for cancer-targeted therapies at different stages. Yung-Hao Wong, Chia-Chou Wu, Chih-Lung Lin, Ting-Shou Chen, Tzu-Hao Chang, and Bor-Sen Chen Copyright © 2015 Yung-Hao Wong et al. All rights reserved. The ABCC6 Transporter as a Paradigm for Networking from an Orphan Disease to Complex Disorders Tue, 18 Aug 2015 09:35:55 +0000 The knowledge on the genetic etiology of complex disorders largely results from the study of rare monogenic disorders. Often these common and rare diseases show phenotypic overlap, though monogenic diseases generally have a more extreme symptomatology. ABCC6, the gene responsible for pseudoxanthoma elasticum, an autosomal recessive ectopic mineralization disorder, can be considered a paradigm gene with relevance that reaches far beyond this enigmatic orphan disease. Indeed, common traits such as chronic kidney disease or cardiovascular disorders have been linked to the ABCC6 gene. While during the last decade the awareness of the wide ramifications of ABCC6 has increased significantly, the gene itself and the transmembrane transporter it encodes have not unveiled all of the mysteries that surround them. To gain more insights, multiple approaches are being used including next-generation sequencing, computational methods, and various “omics” technologies. Much effort is made to place the vast amount of data that is gathered in an integrated system-biological network; the involvement of ABCC6 in common disorders provides a good view on the wide implications and potential of such a network. In this review, we summarize the network approaches used to study ABCC6 and the role of this gene in several complex diseases. Eva Y. G. De Vilder, Mohammad Jakir Hosen, and Olivier M. Vanakker Copyright © 2015 Eva Y. G. De Vilder et al. All rights reserved. An Affinity Propagation-Based DNA Motif Discovery Algorithm Mon, 10 Aug 2015 09:57:56 +0000 The planted motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy. Chunxiao Sun, Hongwei Huo, Qiang Yu, Haitao Guo, and Zhigang Sun Copyright © 2015 Chunxiao Sun et al. All rights reserved.