BioMed Research International: Bioinformatics The latest articles from Hindawi Publishing Corporation © 2015 , Hindawi Publishing Corporation . All rights reserved. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature Thu, 25 Jun 2015 06:28:00 +0000 The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures. Ning Ye, Hengfu Yin, Jingjing Liu, Xiaogang Dai, and Tongming Yin Copyright © 2015 Ning Ye et al. All rights reserved. Combined Analysis of SNP Array Data Identifies Novel CNV Candidates and Pathways in Ependymoma and Mesothelioma Mon, 22 Jun 2015 06:06:41 +0000 Copy number variation is a class of structural genomic modifications that includes the gain and loss of a specific genomic region, which may include an entire gene. Many studies have used low-resolution techniques to identify regions that are frequently lost or amplified in cancer. Usually, researchers choose to use proprietary or non-open-source software to detect these regions because the graphical interface tends to be easier to use. In this study, we combined two different open-source packages into an innovative strategy to identify novel copy number variations and pathways associated with cancer. We used a mesothelioma and ependymoma published datasets to assess our tool. We detected previously described and novel copy number variations that are associated with cancer chemotherapy resistance. We also identified altered pathways associated with these diseases, like cell adhesion in patients with mesothelioma and negative regulation of glutamatergic synaptic transmission in ependymoma patients. In conclusion, we present a novel strategy using open-source software to identify copy number variations and altered pathways associated with cancer. Gabriel Wajnberg, Benilton S. Carvalho, Carlos G. Ferreira, and Fabio Passetti Copyright © 2015 Gabriel Wajnberg et al. All rights reserved. Network-Based Logistic Classification with an Enhanced Solver Reveals Biomarker and Subnetwork Signatures for Diagnosing Lung Cancer Tue, 16 Jun 2015 08:08:23 +0000 Identifying biomarker and signaling pathway is a critical step in genomic studies, in which the regularization method is a widely used feature extraction approach. However, most of the regularizers are based on -norm and their results are not good enough for sparsity and interpretation and are asymptotically biased, especially in genomic research. Recently, we gained a large amount of molecular interaction information about the disease-related biological processes and gathered them through various databases, which focused on many aspects of biological systems. In this paper, we use an enhanced penalized solver to penalize network-constrained logistic regression model called an enhanced net, where the predictors are based on gene-expression data with biologic network knowledge. Extensive simulation studies showed that our proposed approach outperforms regularization, the old penalized solver, and the Elastic net approaches in terms of classification accuracy and stability. Furthermore, we applied our method for lung cancer data analysis and found that our method achieves higher predictive accuracy than regularization, the old penalized solver, and the Elastic net approaches, while fewer but informative biomarkers and pathways are selected. Hai-Hui Huang, Yong Liang, and Xiao-Ying Liu Copyright © 2015 Hai-Hui Huang et al. All rights reserved. The Impact of Normalization Methods on RNA-Seq Data Analysis Mon, 15 Jun 2015 12:14:24 +0000 High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. J. Zyprych-Walczak, A. Szabelska, L. Handschuh, K. Górczak, K. Klamecka, M. Figlerowicz, and I. Siatkowski Copyright © 2015 J. Zyprych-Walczak et al. All rights reserved. A Robust Supervised Variable Selection for Noisy High-Dimensional Data Tue, 02 Jun 2015 06:53:56 +0000 The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers. Jan Kalina and Anna Schlenker Copyright © 2015 Jan Kalina and Anna Schlenker. All rights reserved. Toward a Literature-Driven Definition of Big Data in Healthcare Tue, 02 Jun 2015 06:08:12 +0000 Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals and the number of variables for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with . Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data. Emilie Baro, Samuel Degoul, Régis Beuscart, and Emmanuel Chazard Copyright © 2015 Emilie Baro et al. All rights reserved. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies Mon, 01 Jun 2015 14:35:07 +0000 Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. Alexandre G. de Brevern, Jean-Philippe Meyniel, Cécile Fairhead, Cécile Neuvéglise, and Alain Malpertuy Copyright © 2015 Alexandre G. de Brevern et al. All rights reserved. Improved Diagnostic Multimodal Biomarkers for Alzheimer’s Disease and Mild Cognitive Impairment Thu, 28 May 2015 06:17:43 +0000 The early diagnosis of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) is very important for treatment research and patient care purposes. Few biomarkers are currently considered in clinical settings, and their use is still optional. The objective of this work was to determine whether multimodal and nonpreviously AD associated features could improve the classification accuracy between AD, MCI, and healthy controls, which may impact future AD biomarkers. For this, Alzheimer’s Disease Neuroimaging Initiative database was mined for case-control candidates. At least 652 baseline features extracted from MRI and PET analyses, biological samples, and clinical data up to February 2014 were used. A feature selection methodology that includes a genetic algorithm search coupled to a logistic regression classifier and forward and backward selection strategies was used to explore combinations of features. This generated diagnostic models with sizes ranging from 3 to 8, including well documented AD biomarkers, as well as unexplored image, biochemical, and clinical features. Accuracies of 0.85, 0.79, and 0.80 were achieved for HC-AD, HC-MCI, and MCI-AD classifications, respectively, when evaluated using a blind test set. In conclusion, a set of features provided additional and independent information to well-established AD biomarkers, aiding in the classification of MCI and AD. Antonio Martínez-Torteya, Víctor Treviño, and José G. Tamez-Peña Copyright © 2015 Antonio Martínez-Torteya et al. All rights reserved. Quantitative Assessment of the Association between Genetic Variants in MicroRNAs and Colorectal Cancer Risk Wed, 20 May 2015 09:31:47 +0000 Background. The associations between polymorphisms in microRNAs and the susceptibility of colorectal cancer (CRC) were inconsistent in previous studies. This study aims to quantify the strength of the correlation between the four common polymorphisms among microRNAs (hsa-mir-146a rs2910164, hsa-mir-149 rs2292832, hsa-mir-196a2 rs11614913, and hsa-mir-499 rs3746444) and CRC risk. Methods. We searched PubMed, Web of Knowledge, and CNKI to find relevant studies. The combined odds ratio (OR) with 95% confidence interval (95% CI) was used to estimate the strength of the association in a fixed or random effect model. Results. 15 studies involving 5,486 CRC patients and 7,184 controls were included. Meta-analyses showed that rs3746444 had association with CRC risk in Caucasians (OR = 0.57, 95% CI = 0.34–0.95). In the subgroup analysis, we found significant associations between rs2910164 and CRC in hospital based studies (OR = 1.24, 95% CI = 1.03–1.49). rs2292832 may be a high risk factor of CRC in population based studied (OR = 1.18, 95% CI = 1.08–1.38). Conclusion. This meta-analysis showed that rs2910164 and rs2292832 may increase the risk of CRC. However, rs11614913 polymorphism may reduce the risk of CRC. rs3746444 may have a decreased risk to CRC in Caucasians. Xiao-Xu Liu, Meng Wang, Dan Xu, Jian-Hai Yang, Hua-Feng Kang, Xi-Jing Wang, Shuai Lin, Peng-Tao Yang, Xing-Han Liu, and Zhi-Jun Dai Copyright © 2015 Xiao-Xu Liu et al. All rights reserved. Multiblock Discriminant Analysis for Integrative Genomic Study Sun, 17 May 2015 11:45:02 +0000 Human diseases are abnormal medical conditions in which multiple biological components are complicatedly involved. Nevertheless, most contributions of research have been made with a single type of genetic data such as Single Nucleotide Polymorphism (SNP) or Copy Number Variation (CNV). Furthermore, epigenetic modifications and transcriptional regulations have to be considered to fully exploit the knowledge of the complex human diseases as well as the genomic variants. We call the collection of the multiple heterogeneous data “multiblock data.” In this paper, we propose a novel Multiblock Discriminant Analysis (MultiDA) method that provides a new integrative genomic model for the multiblock analysis and an efficient algorithm for discriminant analysis. The integrative genomic model is built by exploiting the representative genomic data including SNP, CNV, DNA methylation, and gene expression. The efficient algorithm for the discriminant analysis identifies discriminative factors of the multiblock data. The discriminant analysis is essential to discover biomarkers in computational biology. The performance of the proposed MultiDA was assessed by intensive simulation experiments, where the outstanding performance comparing the related methods was reported. As a target application, we applied MultiDA to human brain data of psychiatric disorders. The findings and gene regulatory network derived from the experiment are discussed. Mingon Kang, Dong-Chul Kim, Chunyu Liu, and Jean Gao Copyright © 2015 Mingon Kang et al. All rights reserved. Intelligent Informatics in Translational Medicine Wed, 06 May 2015 08:30:36 +0000 Hao-Teng Chang, Tatsuya Akutsu, Sorin Draghici, Oliver Ray, and Tun-Wen Pai Copyright © 2015 Hao-Teng Chang et al. All rights reserved. Implication of Caspase-3 as a Common Therapeutic Target for Multineurodegenerative Disorders and Its Inhibition Using Nonpeptidyl Natural Compounds Mon, 04 May 2015 13:43:02 +0000 Caspase-3 has been identified as a key mediator of neuronal apoptosis. The present study identifies caspase-3 as a common player involved in the regulation of multineurodegenerative disorders, namely, Alzheimer’s disease (AD), Parkinson’s disease (PD), Huntington’s disease (HD), and amyotrophic lateral sclerosis (ALS). The protein interaction network prepared using STRING database provides a strong evidence of caspase-3 interactions with the metabolic cascade of the said multineurodegenerative disorders, thus characterizing it as a potential therapeutic target for multiple neurodegenerative disorders. In silico molecular docking of selected nonpeptidyl natural compounds against caspase-3 exposed potent leads against this common therapeutic target. Rosmarinic acid and curcumin proved to be the most promising ligands (leads) mimicking the inhibitory action of peptidyl inhibitors with the highest Gold fitness scores 57.38 and 53.51, respectively. These results were in close agreement with the fitness score predicted using X-score, a consensus based scoring function to calculate the binding affinity. Nonpeptidyl inhibitors of caspase-3 identified in the present study expeditiously mimic the inhibitory action of the previously identified peptidyl inhibitors. Since, nonpeptidyl inhibitors are preferred drug candidates, hence, discovery of natural compounds as nonpeptidyl inhibitors is a significant transition towards feasible drug development for neurodegenerative disorders. Saif Khan, Khurshid Ahmad, Eyad M. A. Alshammari, Mohd Adnan, Mohd Hassan Baig, Mohtashim Lohani, Pallavi Somvanshi, and Shafiul Haque Copyright © 2015 Saif Khan et al. All rights reserved. The TF-miRNA Coregulation Network in Oral Lichen Planus Sun, 03 May 2015 12:33:38 +0000 Oral lichen planus (OLP) is a chronic inflammatory disease that affects oral mucosa, some of which may finally develop into oral squamous cell carcinoma. Therefore, pinpointing the molecular mechanisms underlying the pathogenesis of OLP is important to develop efficient treatments for OLP. Recently, the accumulation of the large amount of omics data, especially transcriptome data, provides opportunities to investigate OLPs from a systematic perspective. In this paper, assuming that the OLP associated genes have functional relationships, we present a new approach to identify OLP related gene modules from gene regulatory networks. In particular, we find that the gene modules regulated by both transcription factors (TFs) and microRNAs (miRNAs) play important roles in the pathogenesis of OLP and many genes in the modules have been reported to be related to OLP in the literature. Yu-Ling Zuo, Di-Ping Gong, Bi-Ze Li, Juan Zhao, Ling-Yue Zhou, Fang-Yang Shao, Zhao Jin, and Yuan He Copyright © 2015 Yu-Ling Zuo et al. All rights reserved. Prediction of Metabolic Gene Biomarkers for Neurodegenerative Disease by an Integrated Network-Based Approach Sun, 03 May 2015 11:35:44 +0000 Neurodegenerative diseases (NDs), such as Parkinson’s disease (PD) and Huntington’s disease (HD), have become more and more common among aged people worldwide. One hallmark of NDs is the presence of intracellular accumulation of specific pathogenic proteins that may result from abnormal function of metabolic processes. Previously, we have developed a computational method named Met-express that predicted key enzyme-coding genes in cancer development by integrating cancer gene coexpression network with the metabolic network. Here, we applied Met-express to predict key enzyme-coding genes in both PD and HD. Functional enrichment analysis and literature review of predicted genes suggested that there might be some common pathogenic metabolic pathways for PD and HD. We further found that the predicted genes had significant functional association with known disease genes, with some of them already documented as biomarkers or therapeutic targets for NDs. As such, the predicted metabolic genes may be of use as novel biomarkers not only for ND diagnosis but also for potential therapeutic treatments. Qi Ni, Xianming Su, Jingqi Chen, and Weidong Tian Copyright © 2015 Qi Ni et al. All rights reserved. Identification of Gene and MicroRNA Signatures for Oral Cancer Developed from Oral Leukoplakia Sun, 03 May 2015 11:12:40 +0000 In clinic, oral leukoplakia (OLK) may develop into oral cancer. However, the mechanism underlying this transformation is still unclear. In this work, we present a new pipeline to identify oral cancer related genes and microRNAs (miRNAs) by integrating both gene and miRNA expression profiles. In particular, we find some network modules as well as their miRNA regulators that play important roles in the development of OLK to oral cancer. Among these network modules, 91.67% of genes and 37.5% of miRNAs have been previously reported to be related to oral cancer in literature. The promising results demonstrate the effectiveness and efficiency of our proposed approach. Guanghui Zhu, Yuan He, Shaofang Yang, Beimin Chen, Min Zhou, and Xin-Jian Xu Copyright © 2015 Guanghui Zhu et al. All rights reserved. A Heparan Sulfate-Binding Cell Penetrating Peptide for Tumor Targeting and Migration Inhibition Sun, 03 May 2015 09:21:33 +0000 As heparan sulfate proteoglycans (HSPGs) are known as co-receptors to interact with numerous growth factors and then modulate downstream biological activities, overexpression of HS/HSPG on cell surface acts as an increasingly reliable prognostic factor in tumor progression. Cell penetrating peptides (CPPs) are short-chain peptides developed as functionalized vectors for delivery approaches of impermeable agents. On cell surface negatively charged HS provides the initial attachment of basic CPPs by electrostatic interaction, leading to multiple cellular effects. Here a functional peptide (CPPecp) has been identified from critical HS binding region in hRNase3, a unique RNase family member with in vitro antitumor activity. In this study we analyze a set of HS-binding CPPs derived from natural proteins including CPPecp. In addition to cellular binding and internalization, CPPecp demonstrated multiple functions including strong binding activity to tumor cell surface with higher HS expression, significant inhibitory effects on cancer cell migration, and suppression of angiogenesis in vitro and in vivo. Moreover, different from conventional highly basic CPPs, CPPecp facilitated magnetic nanoparticle to selectively target tumor site in vivo. Therefore, CPPecp could engage its capacity to be developed as biomaterials for diagnostic imaging agent, therapeutic supplement, or functionalized vector for drug delivery. Chien-Jung Chen, Kang-Chiao Tsai, Ping-Hsueh Kuo, Pei-Lin Chang, Wen-Ching Wang, Yung-Jen Chuang, and Margaret Dah-Tsyr Chang Copyright © 2015 Chien-Jung Chen et al. All rights reserved. A Survey on the Computational Approaches to Identify Drug Targets in the Postgenomic Era Tue, 28 Apr 2015 07:02:35 +0000 Identifying drug targets plays essential roles in designing new drugs and combating diseases. Unfortunately, our current knowledge about drug targets is far from comprehensive. Screening drug targets in the lab is an expensive and time-consuming procedure. In the past decade, the accumulation of various types of omics data makes it possible to develop computational approaches to predict drug targets. In this paper, we make a survey on the recent progress being made on computational methodologies that have been developed to predict drug targets based on different kinds of omics data and drug property data. Yan-Fen Dai and Xing-Ming Zhao Copyright © 2015 Yan-Fen Dai and Xing-Ming Zhao. All rights reserved. A Large-Scale Structural Classification of Antimicrobial Peptides Mon, 27 Apr 2015 12:43:57 +0000 Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. AMPs have abundant sequences and structures, two fundamental resources for bioinformatics researches, but analyses on how they associate with each other are either nonexistent or limited to partial classification and data. We thus present A Database of Anti-Microbial peptides (ADAM), which contains 7,007 unique sequences and 759 structures, to systematically establish comprehensive associations between AMP sequences and structures through structural folds and to provide an easy access to view their relationships. 30 distinct AMP structural fold clusters with more than one structure are detected and about a thousand AMPs are associated with at least one structural fold cluster. According to ADAM, AMP structural folds are limited—AMPs only cover about 3% of the overall protein fold space. Hao-Ting Lee, Chen-Che Lee, Je-Ruei Yang, Jim Z. C. Lai, and Kuan Y. Chang Copyright © 2015 Hao-Ting Lee et al. All rights reserved. Predicting Flavin and Nicotinamide Adenine Dinucleotide-Binding Sites in Proteins Using the Fragment Transformation Method Mon, 27 Apr 2015 11:48:34 +0000 We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified. Chih-Hao Lu, Chin-Sheng Yu, Yu-Feng Lin, and Jin-Yi Chen Copyright © 2015 Chih-Hao Lu et al. All rights reserved. Functional Genomics, Genetics, and Bioinformatics Wed, 22 Apr 2015 06:20:06 +0000 Youping Deng, Hongwei Wang, Ryuji Hamamoto, David Schaffer, and Shiwei Duan Copyright © 2015 Youping Deng et al. All rights reserved. Integrated Analysis of Multiscale Large-Scale Biological Data for Investigating Human Disease Mon, 20 Apr 2015 09:13:13 +0000 Tao Huang, Lei Chen, Mingyue Zheng, and Jiangning Song Copyright © 2015 Tao Huang et al. All rights reserved. Application of Systems Biology and Bioinformatics Methods in Biochemistry and Biomedicine 2014 Sun, 19 Apr 2015 11:37:29 +0000 Yudong Cai, Tao Huang, Lei Chen, and Bing Niu Copyright © 2015 Yudong Cai et al. All rights reserved. A Practical and Scalable Tool to Find Overlaps between Sequences Sun, 19 Apr 2015 10:48:30 +0000 The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment. Maan Haj Rachid and Qutaibah Malluhi Copyright © 2015 Maan Haj Rachid and Qutaibah Malluhi. All rights reserved. High Order Gene-Gene Interactions in Eight Single Nucleotide Polymorphisms of Renin-Angiotensin System Genes for Hypertension Association Study Sun, 19 Apr 2015 09:58:13 +0000 Several single nucleotide polymorphisms (SNPs) of renin-angiotensin system (RAS) genes are associated with hypertension (HT) but most of them are focusing on single locus effects. Here, we introduce an unbalanced function based on multifactor dimensionality reduction (MDR) for multiloci genotypes to detect high order gene-gene (SNP-SNP) interaction in unbalanced cases and controls of HT data. Eight SNPs of three RAS genes (angiotensinogen, AGT; angiotensin-converting enzyme, ACE; angiotensin II type 1 receptor, AT1R) in HT and non-HT subjects were included that showed no significant genotype differences. In 2- to 6-locus models of the SNP-SNP interaction, the SNPs of AGT and ACE genes were associated with hypertension (bootstrapping odds ratio [Boot-OR] = 1.972~3.785; 95%, confidence interval (CI) 1.26~6.21; ). In 7- and 8-locus model, SNP A1166C of AT1R gene is joined to improve the maximum Boot-OR values of 4.050 to 4.483; CI = 2.49 to 7.29; . In conclusion, the epistasis networks are identified by eight SNP-SNP interaction models. AGT, ACE, and AT1R genes have overall effects with susceptibility to hypertension, where the SNPs of ACE have a mainly hypertension-associated effect and show an interacting effect to SNPs of AGT and AT1R genes. Cheng-Hong Yang, Yu-Da Lin, Shyh-Jong Wu, Li-Yeh Chuang, and Hsueh-Wei Chang Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. Effect of Electrode Shape on Impedance of Single HeLa Cell: A COMSOL Simulation Thu, 16 Apr 2015 08:39:21 +0000 In disease prophylaxis, single cell inspection provides more detailed data compared to conventional examinations. At the individual cell level, the electrical properties of the cell are helpful for understanding the effects of cellular behavior. The electric field distribution affects the results of single cell impedance measurements whereas the electrode geometry affects the electric field distributions. Therefore, this study obtained numerical solutions by using the COMSOL multiphysics package to perform FEM simulations of the effects of electrode geometry on microfluidic devices. An equivalent circuit model incorporating the PBS solution, a pair of electrodes, and a cell is used to obtain the impedance of a single HeLa cell. Simulations indicated that the circle and parallel electrodes provide higher electric field strength compared to cross and standard electrodes at the same operating voltage. Additionally, increasing the operating voltage reduces the impedance magnitude of a single HeLa cell in all electrode shapes. Decreasing impedance magnitude of the single HeLa cell increases measurement sensitivity, but higher operational voltage will damage single HeLa cell. Min-Haw Wang and Wen-Hao Chang Copyright © 2015 Min-Haw Wang and Wen-Hao Chang. All rights reserved. Evolutionary Pattern and Regulation Analysis to Support Why Diversity Functions Existed within PPAR Gene Family Members Wed, 15 Apr 2015 14:21:56 +0000 Peroxisome proliferators-activated receptor (PPAR) gene family members exhibit distinct patterns of distribution in tissues and differ in functions. The purpose of this study is to investigate the evolutionary impacts on diversity functions of PPAR members and the regulatory differences on gene expression patterns. 63 homology sequences of PPAR genes from 31 species were collected and analyzed. The results showed that three isolated types of PPAR gene family may emerge from twice times of gene duplication events. The conserved domains of HOLI (ligand binding domain of hormone receptors) domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors) are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3′ UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3′ UTR are essential for PPARs evolution and diversity functions acquired. Tianyu Zhou, Xiping Yan, Guosong Wang, Hehe Liu, Xiang Gan, Tao Zhang, Jiwen Wang, and Liang Li Copyright © 2015 Tianyu Zhou et al. All rights reserved. Relationship between Hyperuricemia and Haar-Like Features on Tongue Images Wed, 15 Apr 2015 13:36:54 +0000 Objective. To investigate differences in tongue images of subjects with and without hyperuricemia. Materials and Methods. This population-based case-control study was performed in 2012-2013. We collected data from 46 case subjects with hyperuricemia and 46 control subjects, including results of biochemical examinations and tongue images. Symmetrical Haar-like features based on integral images were extracted from tongue images. T-tests were performed to determine the ability of extracted features to distinguish between the case and control groups. We first selected features using the common criterion , then conducted further examination of feature characteristics and feature selection using means and standard deviations of distributions in the case and control groups. Results. A total of 115,683 features were selected using the criterion . The maximum area under the receiver operating characteristic curve (AUC) of these features was 0.877. The sensitivity of the feature with the maximum AUC value was 0.800 and specificity was 0.826 when the Youden index was maximized. Features that performed well were concentrated in the tongue root region. Conclusions. Symmetrical Haar-like features enabled discrimination of subjects with and without hyperuricemia in our sample. The locations of these discriminative features were in agreement with the interpretation of tongue appearance in traditional Chinese and Western medicine. Yan Cui, Shizhong Liao, Hongwu Wang, Hongyu Liu, Wenhua Wang, and Liqun Yin Copyright © 2015 Yan Cui et al. All rights reserved. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling Wed, 15 Apr 2015 12:38:00 +0000 An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. Hala Alshamlan, Ghada Badr, and Yousef Alohali Copyright © 2015 Hala Alshamlan et al. All rights reserved. Identification of Novel Breast Cancer Subtype-Specific Biomarkers by Integrating Genomics Analysis of DNA Copy Number Aberrations and miRNA-mRNA Dual Expression Profiling Wed, 15 Apr 2015 11:49:05 +0000 Breast cancer is a heterogeneous disease with well-defined molecular subtypes. Currently, comparative genomic hybridization arrays (aCGH) techniques have been developed rapidly, and recent evidences in studies of breast cancer suggest that tumors within gene expression subtypes share similar DNA copy number aberrations (CNA) which can be used to further subdivide subtypes. Moreover, subtype-specific miRNA expression profiles are also proposed as novel signatures for breast cancer classification. The identification of mRNA or miRNA expression-based breast cancer subtypes is considered an instructive means of prognosis. Here, we conducted an integrated analysis based on copy number aberrations data and miRNA-mRNA dual expression profiling data to identify breast cancer subtype-specific biomarkers. Interestingly, we found a group of genes residing in subtype-specific CNA regions that also display the corresponding changes in mRNAs levels and their target miRNAs’ expression. Among them, the predicted direct correlation of BRCA1-miR-143-miR-145 pairs was selected for experimental validation. The study results indicated that BRCA1 positively regulates miR-143-miR-145 expression and miR-143-miR-145 can serve as promising novel biomarkers for breast cancer subtyping. In our integrated genomics analysis and experimental validation, a new frame to predict candidate biomarkers of breast cancer subtype is provided and offers assistance in order to understand the potential disease etiology of the breast cancer subtypes. Dongguo Li, Hong Xia, Zhen-ya Li, Lin Hua, and Lin Li Copyright © 2015 Dongguo Li et al. All rights reserved. Improving the Understanding of Pathogenesis of Human Papillomavirus 16 via Mapping Protein-Protein Interaction Network Wed, 15 Apr 2015 11:48:17 +0000 The human papillomavirus 16 (HPV16) has high risk to lead various cancers and afflictions, especially, the cervical cancer. Therefore, investigating the pathogenesis of HPV16 is very important for public health. Protein-protein interaction (PPI) network between HPV16 and human was used as a measure to improve our understanding of its pathogenesis. By adopting sequence and topological features, a support vector machine (SVM) model was built to predict new interactions between HPV16 and human proteins. All interactions were comprehensively investigated and analyzed. The analysis indicated that HPV16 enlarged its scope of influence by interacting with human proteins as much as possible. These interactions alter a broad array of cell cycle progression. Furthermore, not only was HPV16 highly prone to interact with hub proteins and bottleneck proteins, but also it could effectively affect a breadth of signaling pathways. In addition, we found that the HPV16 evolved into high carcinogenicity on the condition that its own reproduction had been ensured. Meanwhile, this work will contribute to providing potential new targets for antiviral therapeutics and help experimental research in the future. Yongcheng Dong, Qifan Kuang, Xu Dai, Rong Li, Yiming Wu, Weijia Leng, Yizhou Li, and Menglong Li Copyright © 2015 Yongcheng Dong et al. All rights reserved. Coexpression Pattern Analysis of NPM1-Associated Genes in Chronic Myelogenous Leukemia Wed, 15 Apr 2015 09:10:06 +0000 Background. Nucleophosmin 1 (NPM1) plays an important role in ribosomal synthesis and malignancies, but NPM1 mutations occur rarely in the blast-crisis and chronic-phase chronic myelogenous leukemia (CML) patients. The NPM1-associated gene set (GCM_NPM1), in total 116 genes including NPM1, was chosen as the candidate gene set for the coexpression analysis. We wonder if NPM1-associated genes can affect the ribosomal synthesis and translation process in CML. Results. We presented a distribution-based approach for gene pair classification by identifying a disease-specific cutoff point that classified the coexpressed gene pairs into strong and weak coexpression structures. The differences in the coexpression patterns between the normal and the CML groups were reflected from the overall structure by performing two-sample Kolmogorov-Smirnov test. Our developed method effectively identified the coexpression pattern differences from the overall structure: for the maximum deviation . Moreover, we found that genes involved in the ribosomal synthesis and translation process tended to be coexpressed in the CML group. Conclusion. Our developed method can identify the coexpression difference between two different groups. Dysregulation of ribosomal synthesis and translation process may be related to the CML disease. Our significant findings may provide useful information for the novel CML mechanism exploration and cancer treatment. Fengfeng Wang, Lawrence W. C. Chan, Nancy B. Y. Tsui, S. C. Cesar Wong, Parco M. Siu, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2015 Fengfeng Wang et al. All rights reserved. Prediction of Antifungal Activity of Gemini Imidazolium Compounds Wed, 15 Apr 2015 08:44:15 +0000 The progress of antimicrobial therapy contributes to the development of strains of fungi resistant to antimicrobial drugs. Since cationic surfactants have been described as good antifungals, we present a SAR study of a novel homologous series of 140 bis-quaternary imidazolium chlorides and analyze them with respect to their biological activity against Candida albicans as one of the major opportunistic pathogens causing a wide spectrum of diseases in human beings. We characterize a set of features of these compounds, concerning their structure, molecular descriptors, and surface active properties. SAR study was conducted with the help of the Dominance-Based Rough Set Approach (DRSA), which involves identification of relevant features and relevant combinations of features being in strong relationship with a high antifungal activity of the compounds. The SAR study shows, moreover, that the antifungal activity is dependent on the type of substituents and their position at the chloride moiety, as well as on the surface active properties of the compounds. We also show that molecular descriptors MlogP, HOMO-LUMO gap, total structure connectivity index, and Wiener index may be useful in prediction of antifungal activity of new chemical compounds. Łukasz Pałkowski, Jerzy Błaszczyński, Andrzej Skrzypczak, Jan Błaszczak, Alicja Nowaczyk, Joanna Wróblewska, Sylwia Kożuszko, Eugenia Gospodarek, Roman Słowiński, and Jerzy Krysiński Copyright © 2015 Łukasz Pałkowski et al. All rights reserved. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction Wed, 15 Apr 2015 08:27:29 +0000 It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that , , , and are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future. Hui Liu, Xiaomiao Shi, Dongmei Guo, Zuowei Zhao, and Yimin Copyright © 2015 Hui Liu et al. All rights reserved. Identification of Novel Potential Vaccine Candidates against Tuberculosis Based on Reverse Vaccinology Wed, 15 Apr 2015 08:06:14 +0000 Tuberculosis (TB) is a chronic infectious disease, considered as the second leading cause of death worldwide, caused by Mycobacterium tuberculosis. The limited efficacy of the bacillus Calmette-Guérin (BCG) vaccine against pulmonary TB and the emergence of multidrug-resistant TB warrants the need for more efficacious vaccines. Reverse vaccinology uses the entire proteome of a pathogen to select the best vaccine antigens by in silico approaches. M. tuberculosis H37Rv proteome was analyzed with NERVE (New Enhanced Reverse Vaccinology Environment) prediction software to identify potential vaccine targets; these 331 proteins were further analyzed with VaxiJen for the determination of their antigenicity value. Only candidates with values ≥0.5 of antigenicity and 50% of adhesin probability and without homology with human proteins or transmembrane regions were selected, resulting in 73 antigens. These proteins were grouped by families in seven groups and analyzed by amino acid sequence alignments, selecting 16 representative proteins. For each candidate, a search of the literature and protein analysis with different bioinformatics tools, as well as a simulation of the immune response, was conducted. Finally, we selected six novel vaccine candidates, EsxL, PE26, PPE65, PE_PGRS49, PBP1, and Erp, from M. tuberculosis that can be used to improve or design new TB vaccines. Gloria P. Monterrubio-López, Jorge A. González-Y-Merchand, and Rosa María Ribas-Aparicio Copyright © 2015 Gloria P. Monterrubio-López et al. All rights reserved. Transcriptomic Analysis of mRNAs in Human Monocytic Cells Expressing the HIV-1 Nef Protein and Their Exosomes Wed, 15 Apr 2015 07:57:37 +0000 The Nef protein of human immunodeficiency virus (HIV) promotes viral replication and progression to AIDS. Besides its well-studied effects on intracellular signaling, Nef also functions through its secretion in exosomes, which are nanovesicles containing proteins, microRNAs, and mRNAs and are important for intercellular communication. Nef expression enhances exosome secretion and these exosomes can enter uninfected CD4 T cells leading to apoptotic death. We have recently reported the first miRNome analysis of exosomes secreted from Nef-expressing U937monocytic cells. Here we show genome-wide transcriptome analysis of Nef-expressing U937 cells and their exosomes. We identified four key mRNAs preferentially retained in Nef-expressing cells; these code for MECP2, HMOX1, AARSD1, and ATF2 and are important for chromatin modification and gene expression. Interestingly, their target miRNAs are exported out in exosomes. We also identified three key mRNAs selectively secreted in exosomes from Nef-expressing U937 cells and their corresponding miRNAs being preferentially retained in cells. These are AATK, SLC27A1, and CDKAL and are important in apoptosis and fatty acid transport. Thus, our study identifies selectively expressed mRNAs in Nef-expressing U937 cells and their exosomes and supports a new mode on intercellular regulation by the HIV-1 Nef protein. Madeeha Aqil, Saurav Mallik, Sanghamitra Bandyopadhyay, Ujjwal Maulik, and Shahid Jameel Copyright © 2015 Madeeha Aqil et al. All rights reserved. Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods Wed, 15 Apr 2015 06:14:19 +0000 Glycation is a nonenzymatic process in which proteins react with reducing sugar molecules. The identification of glycation sites in protein may provide guidelines to understand the biological function of protein glycation. In this study, we developed a computational method to predict protein glycation sites by using the support vector machine classifier. The experimental results showed that the prediction accuracy was 85.51% and an overall MCC was 0.70. Feature analysis indicated that the composition of -spaced amino acid pairs feature contributed the most for glycation sites prediction. Yan Liu, Wenxiang Gu, Wenyi Zhang, and Jianan Wang Copyright © 2015 Yan Liu et al. All rights reserved. A Gas Chromatography-Mass Spectrometry Based Study on Urine Metabolomics in Rats Chronically Poisoned with Hydrogen Sulfide Tue, 14 Apr 2015 17:02:10 +0000 Gas chromatography-mass spectrometry (GS-MS) in combination with multivariate statistical analysis was applied to explore the metabolic variability in urine of chronically hydrogen sulfide- (H2S-) poisoned rats relative to control ones. The changes in endogenous metabolites were studied by partial least squares-discriminate analysis (PLS-DA) and independent-samples t-test. The metabolic patterns of H2S-poisoned group are separated from the control, suggesting that the metabolic profiles of H2S-poisoned rats were markedly different from the controls. Moreover, compared to the control group, the level of alanine, d-ribose, tetradecanoic acid, L-aspartic acid, pentanedioic acid, cholesterol, acetate, and oleic acid in rat urine of the poisoning group decreased, while the level of glycine, d-mannose, arabinofuranose, and propanoic acid increased. These metabolites are related to amino acid metabolism as well as energy and lipid metabolism in vivo. Studying metabolomics using GC-MS allows for a comprehensive overview of the metabolism of the living body. This technique can be employed to decipher the mechanism of chronic H2S poisoning, thus promoting the use of metabolomics in clinical toxicology. Mingjie Deng, Meiling Zhang, Fa Sun, Jianshe Ma, Lufeng Hu, Xuezhi Yang, Guanyang Lin, and Xianqin Wang Copyright © 2015 Mingjie Deng et al. All rights reserved. An Integrated Modeling and Experimental Approach to Study the Influence of Environmental Nutrients on Biofilm Formation of Pseudomonas aeruginosa Tue, 14 Apr 2015 16:59:46 +0000 The availability of nutrient components in the environment was identified as a critical regulator of virulence and biofilm formation in Pseudomonas aeruginosa. This work proposes the first systems-biology approach to quantify microbial biofilm formation upon the change of nutrient availability in the environment. Specifically, the change of fluxes of metabolic reactions that were positively associated with P. aeruginosa biofilm formation was used to monitor the trend for P. aeruginosa to form a biofilm. The uptake rates of nutrient components were changed according to the change of the nutrient availability. We found that adding each of the eleven amino acids (Arg, Tyr, Phe, His, Iso, Orn, Pro, Glu, Leu, Val, and Asp) to minimal medium promoted P. aeruginosa biofilm formation. Both modeling and experimental approaches were further developed to quantify P. aeruginosa biofilm formation for four different availability levels for each of the three ions that include ferrous ions, sulfate, and phosphate. The developed modeling approach correctly predicted the amount of biofilm formation. By comparing reaction flux change upon the change of nutrient concentrations, metabolic reactions used by P. aeruginosa to regulate its biofilm formation are mainly involved in arginine metabolism, glutamate production, magnesium transport, acetate metabolism, and the TCA cycle. Zhaobin Xu, Sabina Islam, Thomas K. Wood, and Zuyi Huang Copyright © 2015 Zhaobin Xu et al. All rights reserved. Computer-Simulated Biopsy Marking System for Endoscopic Surveillance of Gastric Lesions: A Pilot Study Tue, 14 Apr 2015 16:57:39 +0000 Endoscopic tattoo with India ink injection for surveillance of premalignant gastric lesions is technically cumbersome and may not be durable. The aim of the study is to evaluate the accuracy of a novel, computer-simulated biopsy marking system (CSBMS) developed for the endoscopic marking of gastric lesions. Twenty-five patients with history of gastric intestinal metaplasia received both CSBMS-guided marking and India ink injection in five points in the stomach at index endoscopy. A second endoscopy was performed at three months. Primary outcome was accuracy of CSBMS (distance between CSBMS probe-guided site and tattoo site measured by CSBMS). The mean accuracy of CSBMS at angularis was  mm, antral lesser curvature  mm, antral greater curvature  mm, antral anterior wall  mm, and antral posterior wall  mm. CSBMS ( versus seconds; ) required less procedure time compared to endoscopic tattooing. No adverse events were encountered. CSBMS accurately identified previously marked gastric sites by endoscopic tattooing within 1 cm on follow-up endoscopy. Weiling Hu, Bin Wang, Leimin Sun, Shujie Chen, Liangjing Wang, Kan Wang, Jiaguo Wu, John J. Kim, Jiquan Liu, Ning Dai, Huilong Duan, and Jianmin Si Copyright © 2015 Weiling Hu et al. All rights reserved. Intuitive Web-Based Experimental Design for High-Throughput Biomedical Data Tue, 14 Apr 2015 11:18:12 +0000 Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model. Andreas Friedrich, Erhan Kenar, Oliver Kohlbacher, and Sven Nahnsen Copyright © 2015 Andreas Friedrich et al. All rights reserved. A Method for Generating New Datasets Based on Copy Number for Cancer Analysis Wed, 08 Apr 2015 12:22:40 +0000 New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer. Shinuk Kim, Mark Kon, and Hyunsik Kang Copyright © 2015 Shinuk Kim et al. All rights reserved. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process Mon, 06 Apr 2015 12:30:15 +0000 Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders. A. Mesut Erzurumluoglu, Santiago Rodriguez, Hashem A. Shihab, Denis Baird, Tom G. Richardson, Ian N. M. Day, and Tom R. Gaunt Copyright © 2015 A. Mesut Erzurumluoglu et al. All rights reserved. Genetic Polymorphism in Extracellular Regulators of Wnt Signaling Pathway Sun, 05 Apr 2015 10:46:55 +0000 The Wnt signaling pathway is mediated by a family of secreted glycoproteins through canonical and noncanonical mechanism. The signaling pathways are regulated by various modulators, which are classified into two classes on the basis of their interaction with either Wnt or its receptors. Secreted frizzled-related proteins (sFRPs) are the member of class that binds to Wnt protein and antagonizes Wnt signaling pathway. The other class consists of Dickkopf (DKK) proteins family that binds to Wnt receptor complex. The present review discusses the disease related association of various polymorphisms in Wnt signaling modulators. Furthermore, this review also highlights that some of the sFRPs and DKKs are unable to act as an antagonist for Wnt signaling pathway and thus their function needs to be explored more extensively. Garima Sharma, Ashish Ranjan Sharma, Eun-Min Seo, and Ju-Suk Nam Copyright © 2015 Garima Sharma et al. All rights reserved. Integrative Analysis of CRISPR/Cas9 Target Sites in the Human HBB Gene Tue, 31 Mar 2015 09:39:46 +0000 Recently, the clustered regularly interspaced short palindromic repeats (CRISPR) system has emerged as a powerful customizable artificial nuclease to facilitate precise genetic correction for tissue regeneration and isogenic disease modeling. However, previous studies reported substantial off-target activities of CRISPR system in human cells, and the enormous putative off-target sites are labor-intensive to be validated experimentally, thus motivating bioinformatics methods for rational design of CRISPR system and prediction of its potential off-target effects. Here, we describe an integrative analytical process to identify specific CRISPR target sites in the human β-globin gene (HBB) and predict their off-target effects. Our method includes off-target analysis in both coding and noncoding regions, which was neglected by previous studies. It was found that the CRISPR target sites in the introns have fewer off-target sites in the coding regions than those in the exons. Remarkably, target sites containing certain transcriptional factor motif have enriched binding sites of relevant transcriptional factor in their off-target sets. We also found that the intron sites have fewer SNPs, which leads to less variation of CRISPR efficiency in different individuals during clinical applications. Our studies provide a standard analytical procedure to select specific CRISPR targets for genetic correction. Yumei Luo, Detu Zhu, Zhizhuo Zhang, Yaoyong Chen, and Xiaofang Sun Copyright © 2015 Yumei Luo et al. All rights reserved. In Silico Search of Energy Metabolism Inhibitors for Alternative Leishmaniasis Treatments Mon, 30 Mar 2015 13:56:04 +0000 Leishmaniasis is a complex disease that affects mammals and is caused by approximately 20 distinct protozoa from the genus Leishmania. Leishmaniasis is an endemic disease that exerts a large socioeconomic impact on poor and developing countries. The current treatment for leishmaniasis is complex, expensive, and poorly efficacious. Thus, there is an urgent need to develop more selective, less expensive new drugs. The energy metabolism pathways of Leishmania include several interesting targets for specific inhibitors. In the present study, we sought to establish which energy metabolism enzymes in Leishmania could be targets for inhibitors that have already been approved for the treatment of other diseases. We were able to identify 94 genes and 93 Leishmania energy metabolism targets. Using each gene’s designation as a search criterion in the TriTrypDB database, we located the predicted peptide sequences, which in turn were used to interrogate the DrugBank, Therapeutic Target Database (TTD), and PubChem databases. We identified 44 putative targets of which 11 are predicted to be amenable to inhibition by drugs which have already been approved for use in humans for 11 of these targets. We propose that these drugs should be experimentally tested and potentially used in the treatment of leishmaniasis. Lourival A. Silva, Marina C. Vinaud, Ana Maria Castro, Pedro Vítor L. Cravo, and José Clecildo B. Bezerra Copyright © 2015 Lourival A. Silva et al. All rights reserved. Evaluation and Application of the Strand-Specific Protocol for Next-Generation Sequencing Sun, 29 Mar 2015 07:06:19 +0000 Next-generation sequencing (NGS) has become a powerful sequencing tool, applied in a wide range of biological studies. However, the traditional sample preparation protocol for NGS is non-strand-specific (NSS), leading to biased estimates of expression for transcripts overlapped at the antisense strand. Strand-specific (SS) protocols have recently been developed. In this study, we prepared the same RNA sample by using the SS and NSS protocols, followed by sequencing with Illumina HiSeq platform. Using real-time quantitative PCR as a standard, we first proved that the SS protocol more precisely estimates gene expressions compared with the NSS protocol, particularly for those overlapped at the antisense strand. In addition, we also showed that the sequence reads from the SS protocol are comparable with those from conventional NSS protocols in many aspects. Finally, we also mapped a fraction of sequence reads back to the antisense strand of the known genes, originally without annotated genes located. Using sequence assembly and PCR validation, we succeeded in identifying and characterizing the novel antisense genes. Our results show that the SS protocol performs more accurately than the traditional NSS protocol and can be applied in future studies. Kuo-Wang Tsai, Bill Chang, Cheng-Tsung Pan, Wei-Chen Lin, Ting-Wen Chen, and Sung-Chou Li Copyright © 2015 Kuo-Wang Tsai et al. All rights reserved. Distributed Artificial Intelligence Models for Knowledge Discovery in Bioinformatics Wed, 25 Mar 2015 13:17:46 +0000 Juan M. Corchado, Isabelle Bichindaritz, and Juan F. De Paz Copyright © 2015 Juan M. Corchado et al. All rights reserved. A Linear-RBF Multikernel SVM to Classify Big Text Corpora Mon, 23 Mar 2015 08:13:54 +0000 Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. R. Romero, E. L. Iglesias, and L. Borrajo Copyright © 2015 R. Romero et al. All rights reserved. A Network Flow Approach to Predict Protein Targets and Flavonoid Backbones to Treat Respiratory Syncytial Virus Infection Mon, 23 Mar 2015 06:08:56 +0000 Background. Respiratory syncytial virus (RSV) infection is the major cause of respiratory disease in lower respiratory tract in infants and young children. Attempts to develop effective vaccines or pharmacological treatments to inhibit RSV infection without undesired effects on human health have been unsuccessful. However, RSV infection has been reported to be affected by flavonoids. The mechanisms underlying viral inhibition induced by these compounds are largely unknown, making the development of new drugs difficult. Methods. To understand the mechanisms induced by flavonoids to inhibit RSV infection, a systems pharmacology-based study was performed using microarray data from primary culture of human bronchial cells infected by RSV, together with compound-proteomic interaction data available for Homo sapiens. Results. After an initial evaluation of 26 flavonoids, 5 compounds (resveratrol, quercetin, myricetin, apigenin, and tricetin) were identified through topological analysis of a major chemical-protein (CP) and protein-protein interacting (PPI) network. In a nonclustered form, these flavonoids regulate directly the activity of two protein bottlenecks involved in inflammation and apoptosis. Conclusions. Our findings may potentially help uncovering mechanisms of action of early RSV infection and provide chemical backbones and their protein targets in the difficult quest to develop new effective drugs. José Eduardo Vargas, Renato Puga, Joice de Faria Poloni, Luis Fernando Saraiva Macedo Timmers, Barbara Nery Porto, Osmar Norberto de Souza, Diego Bonatto, Paulo Márcio Condessa Pitrez, and Renato Tetelbom Stein Copyright © 2015 José Eduardo Vargas et al. All rights reserved. Identification of Novel Thyroid Cancer-Related Genes and Chemicals Using Shortest Path Algorithm Sun, 22 Mar 2015 11:26:51 +0000 Thyroid cancer is a typical endocrine malignancy. In the past three decades, the continued growth of its incidence has made it urgent to design effective treatments to treat this disease. To this end, it is necessary to uncover the mechanism underlying this disease. Identification of thyroid cancer-related genes and chemicals is helpful to understand the mechanism of thyroid cancer. In this study, we generalized some previous methods to discover both disease genes and chemicals. The method was based on shortest path algorithm and applied to discover novel thyroid cancer-related genes and chemicals. The analysis of the final obtained genes and chemicals suggests that some of them are crucial to the formation and development of thyroid cancer. It is indicated that the proposed method is effective for the discovery of novel disease genes and chemicals. Yang Jiang, Peiwei Zhang, Li-Peng Li, Yi-Chun He, Ru-jian Gao, and Yu-Fei Gao Copyright © 2015 Yang Jiang et al. All rights reserved. A Meta-Analysis Strategy for Gene Prioritization Using Gene Expression, SNP Genotype, and eQTL Data Sun, 22 Mar 2015 10:56:57 +0000 In order to understand disease pathogenesis, improve medical diagnosis, or discover effective drug targets, it is important to identify significant genes deeply involved in human disease. For this purpose, many earlier approaches attempted to prioritize candidate genes using gene expression profiles or SNP genotype data, but they often suffer from producing many false-positive results. To address this issue, in this paper, we propose a meta-analysis strategy for gene prioritization that employs three different genetic resources—gene expression data, single nucleotide polymorphism (SNP) genotype data, and expression quantitative trait loci (eQTL) data—in an integrative manner. For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources. This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes. Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources. Jingmin Che and Miyoung Shin Copyright © 2015 Jingmin Che and Miyoung Shin. All rights reserved. Analysis of Environmental Stress Factors Using an Artificial Growth System and Plant Fitness Optimization Sun, 22 Mar 2015 10:35:00 +0000 The environment promotes evolution. Evolutionary processes represent environmental adaptations over long time scales; evolution of crop genomes is not inducible within the relatively short time span of a human generation. Extreme environmental conditions can accelerate evolution, but such conditions are often stress inducing and disruptive. Artificial growth systems can be used to induce and select genomic variation by changing external environmental conditions, thus, accelerating evolution. By using cloud computing and big-data analysis, we analyzed environmental stress factors for Pleurotus ostreatus by assessing, evaluating, and predicting information of the growth environment. Through the indexing of environmental stress, the growth environment can be precisely controlled and developed into a technology for improving crop quality and production. Meonghun Lee and Hyun Yoe Copyright © 2015 Meonghun Lee and Hyun Yoe. All rights reserved. Agent-Based Spatiotemporal Simulation of Biomolecular Systems within the Open Source MASON Framework Sun, 22 Mar 2015 10:04:24 +0000 Agent-based modelling is being used to represent biological systems with increasing frequency and success. This paper presents the implementation of a new tool for biomolecular reaction modelling in the open source Multiagent Simulator of Neighborhoods framework. The rationale behind this new tool is the necessity to describe interactions at the molecular level to be able to grasp emergent and meaningful biological behaviour. We are particularly interested in characterising and quantifying the various effects that facilitate biocatalysis. Enzymes may display high specificity for their substrates and this information is crucial to the engineering and optimisation of bioprocesses. Simulation results demonstrate that molecule distributions, reaction rate parameters, and structural parameters can be adjusted separately in the simulation allowing a comprehensive study of individual effects in the context of realistic cell environments. While higher percentage of collisions with occurrence of reaction increases the affinity of the enzyme to the substrate, a faster reaction (i.e., turnover number) leads to a smaller number of time steps. Slower diffusion rates and molecular crowding (physical hurdles) decrease the collision rate of reactants, hence reducing the reaction rate, as expected. Also, the random distribution of molecules affects the results significantly. Gael Pérez-Rodríguez, Martín Pérez-Pérez, Daniel Glez-Peña, Florentino Fdez-Riverola, Nuno F. Azevedo, and Anália Lourenço Copyright © 2015 Gael Pérez-Rodríguez et al. All rights reserved. Using the eServices Platform for Detecting Behavior Patterns Deviation in the Elderly Assisted Living: A Case Study Sun, 22 Mar 2015 09:33:56 +0000 World’s aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly’s behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly’s interaction in a service provider platform, called eServices—Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. This pattern recognition will further activate the adequate response. With the system evolution, it will learn to predict potential danger situations for a specified user, acting preventively and ensuring the elderly’s safety and well-being. As the eServices platform is still in development, synthetic data, based on real data sample and empiric knowledge, is being used to populate the initial dataset. The presented work is a proof of concept of knowledge extraction using the eServices platform information. Regardless of not using real data, this work proves to be an asset, achieving a good performance in preventing alert situations. Isabel Marcelino, David Lopes, Michael Reis, Fernando Silva, Rosalía Laza, and António Pereira Copyright © 2015 Isabel Marcelino et al. All rights reserved. A Distributed Multiagent System Architecture for Body Area Networks Applied to Healthcare Monitoring Sun, 22 Mar 2015 09:23:02 +0000 In the last years the area of health monitoring has grown significantly, attracting the attention of both academia and commercial sectors. At the same time, the availability of new biomedical sensors and suitable network protocols has led to the appearance of a new generation of wireless sensor networks, the so-called wireless body area networks. Nowadays, these networks are routinely used for continuous monitoring of vital parameters, movement, and the surrounding environment of people, but the large volume of data generated in different locations represents a major obstacle for the appropriate design, development, and deployment of more elaborated intelligent systems. In this context, we present an open and distributed architecture based on a multiagent system for recognizing human movements, identifying human postures, and detecting harmful activities. The proposed system evolved from a single node for fall detection to a multisensor hardware solution capable of identifying unhampered falls and analyzing the users’ movement. The experiments carried out contemplate two different scenarios and demonstrate the accuracy of our proposal as a real distributed movement monitoring and accident detection system. Moreover, we also characterize its performance, enabling future analyses and comparisons with similar approaches. Filipe Felisberto, Rosalía Laza, Florentino Fdez-Riverola, and António Pereira Copyright © 2015 Filipe Felisberto et al. All rights reserved. RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases Sun, 22 Mar 2015 09:18:34 +0000 High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available. Joel Perdiz Arrais and José Luís Oliveira Copyright © 2015 Joel Perdiz Arrais and José Luís Oliveira. All rights reserved. Probabilistic Inference of Biological Networks via Data Integration Sun, 22 Mar 2015 09:02:27 +0000 There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%. Mark F. Rogers, Colin Campbell, and Yiming Ying Copyright © 2015 Mark F. Rogers et al. All rights reserved. aCGH-MAS: Analysis of aCGH by means of Multiagent System Sun, 22 Mar 2015 08:55:39 +0000 There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. Juan F. De Paz, Rocío Benito, Javier Bajo, Ana Eugenia Rodríguez, and María Abáigar Copyright © 2015 Juan F. De Paz et al. All rights reserved. Gene Knockout Identification Using an Extension of Bees Hill Flux Balance Analysis Sun, 22 Mar 2015 08:47:09 +0000 Microbial strain optimisation for the overproduction of a desired phenotype has been a popular topic in recent years. Gene knockout is a genetic engineering technique that can modify the metabolism of microbial cells to obtain desirable phenotypes. Optimisation algorithms have been developed to identify the effects of gene knockout. However, the complexities of metabolic networks have made the process of identifying the effects of genetic modification on desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to a combinatorial problem in obtaining optimal gene knockout. The computational time increases exponentially as the size of the problem increases. This work reports an extension of Bees Hill Flux Balance Analysis (BHFBA) to identify optimal gene knockouts to maximise the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by integrating OptKnock into BHFBA for validating the results automatically. The results show that the extension of BHFBA is suitable, reliable, and applicable in predicting gene knockout. Through several experiments conducted on Escherichia coli, Bacillus subtilis, and Clostridium thermocellum as model organisms, extension of BHFBA has shown better performance in terms of computational time, stability, growth rate, and production yield of desired phenotypes. Yee Wen Choon, Mohd Saberi Mohamad, Safaai Deris, Chuii Khim Chong, Sigeru Omatu, and Juan Manuel Corchado Copyright © 2015 Yee Wen Choon et al. All rights reserved. Modelling the Longevity of Dental Restorations by means of a CBR System Thu, 19 Mar 2015 14:23:43 +0000 The lifespan of dental restorations is limited. Longevity depends on the material used and the different characteristics of the dental piece. However, it is not always the case that the best and longest lasting material is used since patients may prefer different treatments according to how noticeable the material is. Over the last 100 years, the most commonly used material has been silver amalgam, which, while very durable, is somewhat aesthetically displeasing. Our study is based on the collection of data from the charts, notes, and radiographic information of restorative treatments performed by Dr. Vera in 1993, the analysis of the information by computer artificial intelligence to determine the most appropriate restoration, and the monitoring of the evolution of the dental restoration. The data will be treated confidentially according to the Organic Law 15/1999 on 13 December on the Protection of Personal Data. This paper also presents a clustering technique capable of identifying the most significant cases with which to instantiate the case-base. In order to classify the cases, a mixture of experts is used which incorporates a Bayesian network and a multilayer perceptron; the combination of both classifiers is performed with a neural network. Ignacio J. Aliaga, Vicente Vera, Juan F. De Paz, Alvaro E. García, and Mohd Saberi Mohamad Copyright © 2015 Ignacio J. Aliaga et al. All rights reserved. Bladder Carcinoma Data with Clinical Risk Factors and Molecular Markers: A Cluster Analysis Thu, 19 Mar 2015 13:41:50 +0000 Bladder cancer occurs in the epithelial lining of the urinary bladder and is amongst the most common types of cancer in humans, killing thousands of people a year. This paper is based on the hypothesis that the use of clinical and histopathological data together with information about the concentration of various molecular markers in patients is useful for the prediction of outcomes and the design of treatments of nonmuscle invasive bladder carcinoma (NMIBC). A population of 45 patients with a new diagnosis of NMIBC was selected. Patients with benign prostatic hyperplasia (BPH), muscle invasive bladder carcinoma (MIBC), carcinoma in situ (CIS), and NMIBC recurrent tumors were not included due to their different clinical behavior. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT) and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR). A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior. Enrique Redondo-Gonzalez, Leandro Nunes de Castro, Jesús Moreno-Sierra, María Luisa Maestro de las Casas, Vicente Vera-Gonzalez, Daniel Gomes Ferrari, and Juan Manuel Corchado Copyright © 2015 Enrique Redondo-Gonzalez et al. All rights reserved. The Plant Growth-Promoting Bacteria Azospirillum amazonense: Genomic Versatility and Phytohormone Pathway Thu, 19 Mar 2015 12:07:19 +0000 The rhizosphere bacterium Azospirillum amazonense associates with plant roots to promote plant growth. Variation in replicon numbers and rearrangements is common among Azospirillum strains, and characterization of these naturally occurring differences can improve our understanding of genome evolution. We performed an in silico comparative genomic analysis to understand the genomic plasticity of A. amazonense. The number of A. amazonense-specific coding sequences was similar when compared with the six closely related bacteria regarding belonging or not to the Azospirillum genus. Our results suggest that the versatile gene repertoire found in A. amazonense genome could have been acquired from distantly related bacteria from horizontal transfer. Furthermore, the identification of coding sequence related to phytohormone production, such as flavin-monooxygenase and aldehyde oxidase, is likely to represent the tryptophan-dependent TAM pathway for auxin production in this bacterium. Moreover, the presence of the coding sequence for nitrilase indicates the presence of the alternative route that uses IAN as an intermediate for auxin synthesis, but it remains to be established whether the IAN pathway is the Trp-independent route. Future investigations are necessary to support the hypothesis that its genomic structure has evolved to meet the requirement for adaptation to the rhizosphere and interaction with host plants. Ricardo Cecagno, Tiago Ebert Fritsch, and Irene Silveira Schrank Copyright © 2015 Ricardo Cecagno et al. All rights reserved. Identification of Subtype Specific miRNA-mRNA Functional Regulatory Modules in Matched miRNA-mRNA Expression Data: Multiple Myeloma as a Case Thu, 19 Mar 2015 11:44:23 +0000 Identification of miRNA-mRNA modules is an important step to elucidate their combinatorial effect on the pathogenesis and mechanisms underlying complex diseases. Current identification methods primarily are based upon miRNA-target information and matched miRNA and mRNA expression profiles. However, for heterogeneous diseases, the miRNA-mRNA regulatory mechanisms may differ between subtypes, leading to differences in clinical behavior. In order to explore the pathogenesis of each subtype, it is important to identify subtype specific miRNA-mRNA modules. In this study, we integrated the Ping-Pong algorithm and multiobjective genetic algorithm to identify subtype specific miRNA-mRNA functional regulatory modules (MFRMs) through integrative analysis of three biological data sets: GO biological processes, miRNA target information, and matched miRNA and mRNA expression data. We applied our method on a heterogeneous disease, multiple myeloma (MM), to identify MM subtype specific MFRMs. The constructed miRNA-mRNA regulatory networks provide modular outlook at subtype specific miRNA-mRNA interactions. Furthermore, clustering analysis demonstrated that heterogeneous MFRMs were able to separate corresponding MM subtypes. These subtype specific MFRMs may aid in the further elucidation of the pathogenesis of each subtype and may serve to guide MM subtype diagnosis and treatment. Yunpeng Zhang, Wei Liu, Yanjun Xu, Chunquan Li, Yingying Wang, Haixiu Yang, Chunlong Zhang, Fei Su, Yixue Li, and Xia Li Copyright © 2015 Yunpeng Zhang et al. All rights reserved. Shaped Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Drosophila Embryo Thu, 19 Mar 2015 10:25:53 +0000 In recent years, with the development of automated microscopy technologies, the volume and complexity of image data on gene expression have increased tremendously. The only way to analyze quantitatively and comprehensively such biological data is by developing and applying new sophisticated mathematical approaches. Here, we present extensions of 2D singular spectrum analysis (2D-SSA) for application to 2D and 3D datasets of embryo images. These extensions, circular and shaped 2D-SSA, are applied to gene expression in the nuclear layer just under the surface of the Drosophila (fruit fly) embryo. We consider the commonly used cylindrical projection of the ellipsoidal Drosophila embryo. We demonstrate how circular and shaped versions of 2D-SSA help to decompose expression data into identifiable components (such as trend and noise), as well as separating signals from different genes. Detection and improvement of under- and overcorrection in multichannel imaging is addressed, as well as the extraction and analysis of 3D features in 3D gene expression patterns. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Effect of Celastrol on Growth Inhibition of Prostate Cancer Cells through the Regulation of hERG Channel In Vitro Thu, 19 Mar 2015 10:23:31 +0000 Objective. To explore the antiprostate cancer effects of Celastrol on prostate cancer cells’ proliferation, apoptosis, and cell cycle distribution, as well as the correlation to the regulation of hERG. Methods. DU145 cells were treated with various concentrations of Celastrol (0.25–16.0 μmol/L) for 0–72 hours. MTT assay was used to evaluate the inhibition effect of Celastrol on the growth of DU145 cells. Cell apoptosis was detected through both Annexin-V FITC/PI double-labeled cytometry and Hoechst 33258. Cell cycle regulation was examined by a propidium iodide method. Western blot and RT-PCR technologies were applied to assess the expression level of hERG in DU145 cells. Results. Celastrol presented striking growth inhibition and apoptosis induction potency on DU145 cells in vitro in a time- and dose-dependent manner. The IC50 value of Celastrol for 24 hours was 2.349 ± 0.213 μmol/L. Moreover, Celastrol induced DU145 cell apoptosis in a cell cycle-dependent manner, which means Celastrol could arrest DU145 cells in G0/G1 phase; accordingly, cells in S phase decreased gradually and no obvious changes were found in G2/M phase cells. Through transmission electron microscope, apoptotic bodies containing nuclear fragments were found in Celastrol-treated DU145 cells. Overexpression of hERG channel was found in DU145 cells, while Celastrol could downregulate it at both protein and mRNA level in a dose-dependent manner (). Conclusions. Celastrol exhibits its antiprostate cancer effects partially through the downregulation of the expression level of hERG channel in DU145 cells, suggesting that Celastrol may be a potential agent against prostate cancer with a mechanism of blocking the hERG channel. Nan Ji, Jinjun Li, Zexiong Wei, Fanhu Kong, Hongyan Jin, Xiaoya Chen, Yan Li, and Youping Deng Copyright © 2015 Nan Ji et al. All rights reserved. The Construction of Common and Specific Significance Subnetworks of Alzheimer’s Disease from Multiple Brain Regions Thu, 19 Mar 2015 09:58:56 +0000 Alzheimer’s disease (AD) is a progressively and fatally neurodegenerative disorder and leads to irreversibly cognitive and memorial damage in different brain regions. The identification and analysis of the dysregulated pathways and subnetworks among affected brain regions will provide deep insights for the pathogenetic mechanism of AD. In this paper, commonly and specifically significant subnetworks were identified from six AD brain regions. Protein-protein interaction (PPI) data were integrated to add molecular biological information to construct the functional modules of six AD brain regions by Heinz algorithm. Then, the simulated annealing algorithm based on edge weight is applied to predicting and optimizing the maximal scoring networks for common and specific genes, respectively, which can remove the weak interactions and add the prediction of strong interactions to increase the accuracy of the networks. The identified common subnetworks showed that inflammation of the brain nerves is one of the critical factors of AD and calcium imbalance may be a link among several causative factors in AD pathogenesis. In addition, the extracted specific subnetworks for each brain region revealed many biologically functional mechanisms to understand AD pathogenesis. Wei Kong, Xiaoyang Mou, Na Zhang, Weiming Zeng, Shasha Li, and Yang Yang Copyright © 2015 Wei Kong et al. All rights reserved. The Expression and Distributions of ANP32A in the Developing Brain Thu, 19 Mar 2015 09:34:47 +0000 Acidic (leucine-rich) nuclear phosphoprotein 32 family, member A (ANP32A), has multiple functions involved in neuritogenesis, transcriptional regulation, and apoptosis. However, whether ANP32A has an effect on the mammalian developing brain is still in question. In this study, it was shown that brain was the organ that expressed the most abundant ANP32A by human multiple tissue expression (MTE) array. The distribution of ANP32A in the different adult brain areas was diverse dramatically, with high expression in cerebellum, temporal lobe, and cerebral cortex and with low expression in pons, medulla oblongata, and spinal cord. The expression of ANP32A was higher in the adult brain than in the fetal brain of not only humans but also mice in a time-dependent manner. ANP32A signals were dispersed accordantly in embryonic mouse brain. However, ANP32A was abundant in the granular layer of the cerebellum and the cerebral cortex when the mice were growing up, as well as in the Purkinje cells of the cerebellum. The variation of expression levels and distribution of ANP32A in the developing brain would imply that ANP32A may play an important role in mammalian brain development, especially in the differentiation and function of neurons in the cerebellum and the cerebral cortex. Shanshan Wang, Yunliang Wang, Qingshan Lu, Xinshan Liu, Fuyu Wang, Xiaodong Ma, Chunping Cui, Chenghe Shi, Jinfeng Li, and Dajin Zhang Copyright © 2015 Shanshan Wang et al. All rights reserved. Protecting Intestinal Epithelial Cell Number 6 against Fission Neutron Irradiation through NF-κB Signaling Pathway Thu, 19 Mar 2015 09:13:32 +0000 The purpose of this paper is to explore the change of NF-κB signaling pathway in intestinal epithelial cell induced by fission neutron irradiation and the influence of the PI3K/Akt pathway inhibitor LY294002. Three groups of IEC-6 cell lines were given: control group, neutron irradiation of 4Gy group, and neutron irradiation of 4Gy with LY294002 treatment group. Except the control group, the other groups were irradiated by neutron of 4Gy. LY294002 was given before 24 hours of neutron irradiation. At 6 h and 24 h after neutron irradiation, the morphologic changes, proliferation ability, apoptosis, and necrosis rates of the IEC-6 cell lines were assayed and the changes of NF-κB and PI3K/Akt pathway were detected. At 6 h and 24 h after neutron irradiation of 4Gy, the proliferation ability of the IEC-6 cells decreased and lots of apoptotic and necrotic cells were found. The injuries in LY294002 treatment and neutron irradiation group were more serious than those in control and neutron irradiation groups. The results suggest that IEC-6 cells were obviously damaged and induced serious apoptosis and necrosis by neutron irradiation of 4Gy; the NF-κB signaling pathway in IEC-6 was activated by neutron irradiation which could protect IEC-6 against injury by neutron irradiation; LY294002 could inhibit the activity of IEC-6 cells. Gong-Min Chang, Ya-Bing Gao, Shui-Ming Wang, Xin-Ping Xu, Li Zhao, Jing Zhang, Jin-Feng Li, Yun-Liang Wang, and Rui-Yun Peng Copyright © 2015 Gong-Min Chang et al. All rights reserved. Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms Tue, 17 Mar 2015 13:03:24 +0000 Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues’s method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues’s method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis. Chien-Hung Huang, Huai-Shun Peng, and Ka-Lok Ng Copyright © 2015 Chien-Hung Huang et al. All rights reserved. Prediction of Drug Indications Based on Chemical Interactions and Chemical Similarities Mon, 02 Mar 2015 09:49:30 +0000 Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs. Guohua Huang, Yin Lu, Changhong Lu, Mingyue Zheng, and Yu-Dong Cai Copyright © 2015 Guohua Huang et al. All rights reserved. Predicting the Functions of Long Noncoding RNAs Using RNA-Seq Based on Bayesian Network Sat, 28 Feb 2015 07:50:45 +0000 Long noncoding RNAs (lncRNAs) have been shown to play key roles in various biological processes. However, functions of most lncRNAs are poorly characterized. Here, we represent a framework to predict functions of lncRNAs through construction of a regulatory network between lncRNAs and protein-coding genes. Using RNA-seq data, the transcript profiles of lncRNAs and protein-coding genes are constructed. Using the Bayesian network method, a regulatory network, which implies dependency relations between lncRNAs and protein-coding genes, was built. In combining protein interaction network, highly connected coding genes linked by a given lncRNA were subsequently used to predict functions of the lncRNA through functional enrichment. Application of our method to prostate RNA-seq data showed that 762 lncRNAs in the constructed regulatory network were assigned functions. We found that lncRNAs are involved in diverse biological processes, such as tissue development or embryo development (e.g., nervous system development and mesoderm development). By comparison with functions inferred using the neighboring gene-based method and functions determined using lncRNA knockdown experiments, our method can provide comparable predicted functions of lncRNAs. Overall, our method can be applied to emerging RNA-seq data, which will help researchers identify complex relations between lncRNAs and coding genes and reveal important functions of lncRNAs. Yun Xiao, Yanling Lv, Hongying Zhao, Yonghui Gong, Jing Hu, Feng Li, Jinyuan Xu, Jing Bai, Fulong Yu, and Xia Li Copyright © 2015 Yun Xiao et al. All rights reserved. ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function Wed, 25 Feb 2015 13:26:55 +0000 Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. Robson da Silva Lopes, Walas Jhony Lopes Moraes, Thiago de Souza Rodrigues, and Daniella Castanheira Bartholomeu Copyright © 2015 Robson da Silva Lopes et al. All rights reserved. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity Mon, 23 Feb 2015 07:09:56 +0000 This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity. Xin Yi Ng, Bakhtiar Affendi Rosdi, and Shahriza Shahrudin Copyright © 2015 Xin Yi Ng et al. All rights reserved. Novel Candidate Key Drivers in the Integrative Network of Genes, MicroRNAs, Methylations, and Copy Number Variations in Squamous Cell Lung Carcinoma Mon, 23 Feb 2015 07:03:50 +0000 The mechanisms of lung cancer are highly complex. Not only mRNA gene expression but also microRNAs, DNA methylation, and copy number variation (CNV) play roles in tumorigenesis. It is difficult to incorporate so much information into a single model that can comprehensively reflect all these lung cancer mechanisms. In this study, we analyzed the 129 TCGA (The Cancer Genome Atlas) squamous cell lung carcinoma samples with gene expression, microRNA expression, DNA methylation, and CNV data. First, we used variance inflation factor (VIF) regression to build the whole genome integrative network. Then, we isolated the lung cancer subnetwork by identifying the known lung cancer genes and their direct regulators. This subnetwork was refined by the Bayesian method, and the directed regulations among mRNA genes, microRNAs, methylations, and CNVs were obtained. The novel candidate key drivers in this refined subnetwork, such as the methylation of ARHGDIB and HOXD3, microRNA let-7a and miR-31, and the CNV of AGAP2, were identified and analyzed. On three large public available lung cancer datasets, the key drivers ARHGDIB and HOXD3 demonstrated significant associations with the overall survival of lung cancer patients. Our results provide new insights into lung cancer mechanisms. Tao Huang, Jing Yang, and Yu-dong Cai Copyright © 2015 Tao Huang et al. All rights reserved. A miRNA-Driven Inference Model to Construct Potential Drug-Disease Associations for Drug Repositioning Thu, 19 Feb 2015 10:16:58 +0000 Increasing evidence discovered that the inappropriate expression of microRNAs (miRNAs) will lead to many kinds of complex diseases and drugs can regulate the expression level of miRNAs. Therefore human diseases may be treated by targeting some specific miRNAs with drugs, which provides a new perspective for drug repositioning. However, few studies have attempted to computationally predict associations between drugs and diseases via miRNAs for drug repositioning. In this paper, we developed an inference model to achieve this aim by combining experimentally supported drug-miRNA associations and miRNA-disease associations with the assumption that drugs will form associations with diseases when they share some significant miRNA partners. Experimental results showed excellent performance of our model. Case studies demonstrated that some of the strongly predicted drug-disease associations can be confirmed by the publicly accessible database CTD (, which indicated the usefulness of our inference model. Moreover, candidate miRNAs as molecular hypotheses underpinning the associations were listed to guide future experiments. The predicted results were released for further studies. We expect that this study will provide help in our understanding of drug-disease association prediction and in the roles of miRNAs in drug repositioning. Hailin Chen and Zuping Zhang Copyright © 2015 Hailin Chen and Zuping Zhang. All rights reserved. Prediction of Protein-Protein Interactions Related to Protein Complexes Based on Protein Interaction Networks Tue, 03 Feb 2015 13:32:51 +0000 A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptive k-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology. Peng Liu, Lei Yang, Daming Shi, and Xianglong Tang Copyright © 2015 Peng Liu et al. All rights reserved. Novel Numerical Characterization of Protein Sequences Based on Individual Amino Acid and Its Application Mon, 02 Feb 2015 13:44:51 +0000 The hydrophobicity and hydrophilicity of amino acids play a very important role in protein folding and its interaction with the environment and other molecules, as well as its catalytic mechanism. Based on the two physicochemical indexes, a 2D graphical representation of protein sequences is introduced; meanwhile, a new numerical characteristic has been proposed to compute the distance of different sequences for analysis of sequence similarity/dissimilarity on the basis of this graphical representation. Furthermore, we apply the new distance in the similarities/dissimilarities of ND5 proteins of nine species and predict the four major classes based on the dataset containing 639 domains. The results show that the method is simple and effective. Yan-ping Zhang, Ya-jun Sheng, Wei Zheng, Ping-an He, and Ji-shuo Ruan Copyright © 2015 Yan-ping Zhang et al. All rights reserved. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm Mon, 02 Feb 2015 06:51:40 +0000 The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis. Jianning Wu and Bin Wu Copyright © 2015 Jianning Wu and Bin Wu. All rights reserved. Detecting Key Genes Regulated by miRNAs in Dysfunctional Crosstalk Pathway of Myasthenia Gravis Sun, 01 Feb 2015 10:23:29 +0000 Myasthenia gravis (MG) is a neuromuscular autoimmune disorder resulting from autoantibodies attacking components of the neuromuscular junction. Recent studies have implicated the aberrant expression of microRNAs (miRNAs) in the pathogenesis of MG; however, the underlying mechanisms remain largely unknown. This study aimed to identify key genes regulated by miRNAs in MG. Six dysregulated pathways were identified through differentially expressed miRNAs and mRNAs in MG, and significant crosstalk was detected between five of these. Notably, crosstalk between the “synaptic long-term potentiation” pathway and four others was mediated by five genes involved in the MAPK signaling pathway. Furthermore, 14 key genes regulated by miRNAs were detected, of which six—MAPK1, RAF1, PGF, PDGFRA, EP300, and PPP1CC—mediated interactions between the dysregulated pathways. MAPK1 and RAF1 were responsible for most of this crosstalk (80%), likely reflecting their central roles in MG pathogenesis. In addition, most key genes were enriched in immune-related local areas that were strongly disordered in MG. These results provide new insight into the pathogenesis of MG and offer new potential targets for therapeutic intervention. Yuze Cao, Jianjian Wang, Huixue Zhang, Qinghua Tian, Lixia Chen, Shangwei Ning, Peifang Liu, Xuesong Sun, Xiaoyu Lu, Chang Song, Shuai Zhang, Bo Xiao, and Lihua Wang Copyright © 2015 Yuze Cao et al. All rights reserved. Conformational B-Cell Epitope Prediction Method Based on Antigen Preprocessing and Mimotopes Analysis Thu, 29 Jan 2015 06:48:20 +0000 Identification of epitopes which invokes strong humoral responses is an essential issue in the field of immunology. Various computational methods that have been developed based on the antigen structures and the mimotopes these years narrow the search for experimental validation. These methods can be divided into two categories: antigen structure-based methods and mimotope-based methods. Though new methods of the two kinds have been proposed in these years, they cannot maintain a high degree of satisfaction in various circumstances. In this paper, we proposed a new conformational B-cell epitope prediction method based on antigen preprocessing and mimotopes analysis. The method classifies the antigen surface residues into “epitopes” and “nonepitopes” by six epitope propensity scales, removing the “nonepitopes” and using the preprocessed antigen for epitope prediction based on mimotope sequences. The proposed method gives out the mean F score of 0.42 on the testing dataset. When compared with other publicly available servers by using the testing dataset, the new method yields better performance. The results demonstrate the proposed method is competent for the conformational B-cell epitope prediction. Pingping Sun, Haixu Ju, Baowen Zhang, Yu Gu, Bo Liu, Yanxin Huang, Huijie Zhang, and Yuxin Li Copyright © 2015 Pingping Sun et al. All rights reserved. Helicase and Its Interacting Factors: Regulation Mechanism, Characterization, Structure, and Application for Drug Design Wed, 28 Jan 2015 14:39:55 +0000 Cheng-Yang Huang, Yoshito Abe, Huangen Ding, and I-Fang Chung Copyright © 2015 Cheng-Yang Huang et al. All rights reserved. Automated Training for Algorithms That Learn from Genomic Data Wed, 28 Jan 2015 07:04:42 +0000 Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable. Gokcen Cilingir and Shira L. Broschat Copyright © 2015 Gokcen Cilingir and Shira L. Broschat. All rights reserved. Protein Complex Discovery by Interaction Filtering from Protein Interaction Networks Using Mutual Rank Coexpression and Sequence Similarity Tue, 27 Jan 2015 14:15:39 +0000 The evaluation of the biological networks is considered the essential key to understanding the complex biological systems. Meanwhile, the graph clustering algorithms are mostly used in the protein-protein interaction (PPI) network analysis. The complexes introduced by the clustering algorithms include noise proteins. The error rate of the noise proteins in the PPI network researches is about 40–90%. However, only 30–40% of the existing interactions in the PPI databases depend on the specific biological function. It is essential to eliminate the noise proteins and the interactions from the complexes created via clustering methods. We have introduced new methods of weighting interactions in protein clusters and the splicing of noise interactions and proteins-based interactions on their weights. The coexpression and the sequence similarity of each pair of proteins are considered the edge weight of the proteins in the network. The results showed that the edge filtering based on the amount of coexpression acts similar to the node filtering via graph-based characteristics. Regarding the removal of the noise edges, the edge filtering has a significant advantage over the graph-based method. The edge filtering based on the amount of sequence similarity has the ability to remove the noise proteins and the noise interactions. Ali Kazemi-Pour, Bahram Goliaei, and Hamid Pezeshk Copyright © 2015 Ali Kazemi-Pour et al. All rights reserved. Regulation of DEAH/RHA Helicases by G-Patch Proteins Tue, 27 Jan 2015 11:17:53 +0000 RNA helicases from the DEAH/RHA family are present in all the processes of RNA metabolism. The function of two helicases from this family, Prp2 and Prp43, is regulated by protein partners containing a G-patch domain. The G-patch is a glycine-rich domain discovered by sequence alignment, involved in protein-protein and protein-nucleic acid interaction. Although it has been shown to stimulate the helicase’s enzymatic activities, the precise role of the G-patch domain remains unclear. The role of G-patch proteins in the regulation of Prp43 activity has been studied in the two biological processes in which it is involved: splicing and ribosome biogenesis. Depending on the pathway, the activity of Prp43 is modulated by different G-patch proteins. A particular feature of the structure of DEAH/RHA helicases revealed by the Prp43 structure is the OB-fold domain in C-terminal part. The OB-fold has been shown to be a platform responsible for the interaction with G-patch proteins and RNA. Though there is still no structural data on the G-patch domain, in the current model, the interaction between the helicase, the G-patch protein, and RNA leads to a cooperative binding of RNA and conformational changes of the helicase. Julien Robert-Paganin, Stéphane Réty, and Nicolas Leulliot Copyright © 2015 Julien Robert-Paganin et al. All rights reserved. Virtual Screening of Acetylcholinesterase Inhibitors Using the Lipinski’s Rule of Five and ZINC Databank Thu, 22 Jan 2015 06:24:23 +0000 Alzheimer’s disease (AD) is a progressive and neurodegenerative pathology that can affect people over 65 years of age. It causes several complications, such as behavioral changes, language deficits, depression, and memory impairments. One of the methods used to treat AD is the increase of acetylcholine (ACh) in the brain by using acetylcholinesterase inhibitors (AChEIs). In this study, we used the ZINC databank and the Lipinski’s rule of five to perform a virtual screening and a molecular docking (using Auto Dock Vina 1.1.1) aiming to select possible compounds that have quaternary ammonium atom able to inhibit acetylcholinesterase (AChE) activity. The molecules were obtained by screening and further in vitro assays were performed to analyze the most potent inhibitors through the IC50 value and also to describe the interaction models between inhibitors and enzyme by molecular docking. The results showed that compound D inhibited AChE activity from different vertebrate sources and butyrylcholinesterase (BChE) from Equus ferus (EfBChE), with IC50 ranging from 1.69 ± 0.46 to 5.64 ± 2.47 µM. Compound D interacted with the peripheral anionic subsite in both enzymes, blocking substrate entrance to the active site. In contrast, compound C had higher specificity as inhibitor of EfBChE. In conclusion, the screening was effective in finding inhibitors of AChE and BuChE from different organisms. Pablo Andrei Nogara, Rogério de Aquino Saraiva, Diones Caeran Bueno, Lílian Juliana Lissner, Cristiane Lenz Dalla Corte, Marcos M. Braga, Denis Broock Rosemberg, and João Batista Teixeira Rocha Copyright © 2015 Pablo Andrei Nogara et al. All rights reserved. Mammalian Cell Culture Process for Monoclonal Antibody Production: Nonlinear Modelling and Parameter Estimation Mon, 19 Jan 2015 08:06:35 +0000 Monoclonal antibodies (mAbs) are at present one of the fastest growing products of pharmaceutical industry, with widespread applications in biochemistry, biology, and medicine. The operation of mAbs production processes is predominantly based on empirical knowledge, the improvements being achieved by using trial-and-error experiments and precedent practices. The nonlinearity of these processes and the absence of suitable instrumentation require an enhanced modelling effort and modern kinetic parameter estimation strategies. The present work is dedicated to nonlinear dynamic modelling and parameter estimation for a mammalian cell culture process used for mAb production. By using a dynamical model of such kind of processes, an optimization-based technique for estimation of kinetic parameters in the model of mammalian cell culture process is developed. The estimation is achieved as a result of minimizing an error function by a particle swarm optimization (PSO) algorithm. The proposed estimation approach is analyzed in this work by using a particular model of mammalian cell culture, as a case study, but is generic for this class of bioprocesses. The presented case study shows that the proposed parameter estimation technique provides a more accurate simulation of the experimentally observed process behaviour than reported in previous studies. Dan Selişteanu, Dorin Șendrescu, Vlad Georgeanu, and Monica Roman Copyright © 2015 Dan Selişteanu et al. All rights reserved. Simultaneous Parameters Identifiability and Estimation of an E. coli Metabolic Network Model Tue, 06 Jan 2015 08:05:04 +0000 This work proposes a procedure for simultaneous parameters identifiability and estimation in metabolic networks in order to overcome difficulties associated with lack of experimental data and large number of parameters, a common scenario in the modeling of such systems. As case study, the complex real problem of parameters identifiability of the Escherichia coli K-12 W3110 dynamic model was investigated, composed by 18 differential ordinary equations and 35 kinetic rates, containing 125 parameters. With the procedure, model fit was improved for most of the measured metabolites, achieving 58 parameters estimated, including 5 unknown initial conditions. The results indicate that simultaneous parameters identifiability and estimation approach in metabolic networks is appealing, since model fit to the most of measured metabolites was possible even when important measures of intracellular metabolites and good initial estimates of parameters are not available. Kese Pontes Freitas Alberton, André Luís Alberton, Jimena Andrea Di Maggio, Vanina Gisela Estrada, María Soledad Díaz, and Argimiro Resende Secchi Copyright © 2015 Kese Pontes Freitas Alberton et al. All rights reserved. DNASynth: A Computer Program for Assembly of Artificial Gene Parts in Decreasing Temperature Tue, 06 Jan 2015 05:58:05 +0000 Artificial gene synthesis requires consideration of nucleotide sequence development as well as long DNA molecule assembly protocols. The nucleotide sequence of the molecule must meet many conditions including particular preferences of the host organism for certain codons, avoidance of specific regulatory subsequences, and a lack of secondary structures that inhibit expression. The chemical synthesis of DNA molecule has limitations in terms of strand length; thus, the creation of artificial genes requires the assembly of long DNA molecules from shorter fragments. In the approach presented, the algorithm and the computer program address both tasks: developing the optimal nucleotide sequence to encode a given peptide for a given host organism and determining the long DNA assembly protocol. These tasks are closely connected; a change in codon usage may lead to changes in the optimal assembly protocol, and the lack of a simple assembly protocol may be addressed by changing the nucleotide sequence. The computer program presented in this study was tested with real data from an experiment in a wet biological laboratory to synthesize a peptide. The benefit of the presented algorithm and its application is the shorter time, compared to polymerase cycling assembly, needed to produce a ready synthetic gene. Robert M. Nowak, Anna Wojtowicz-Krawiec, and Andrzej Plucienniczak Copyright © 2015 Robert M. Nowak et al. All rights reserved. Novel Computing Technologies for Bioinformatics and Cheminformatics Sun, 28 Dec 2014 07:06:35 +0000 Chuan Yi Tang, Che-Lun Hung, Ching-Hsien Hsu, Huiru Zheng, and Chun-Yuan Lin Copyright © 2014 Chuan Yi Tang et al. All rights reserved. Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data Sun, 28 Dec 2014 06:47:37 +0000 Julia Tzu-Ya Weng, Li-Ching Wu, Wen-Chi Chang, Tzu-Hao Chang, Tatsuya Akutsu, and Tzong-Yi Lee Copyright © 2014 Julia Tzu-Ya Weng et al. All rights reserved. Phenomics Research on Coronary Heart Disease Based on Human Phenotype Ontology Mon, 15 Dec 2014 06:53:45 +0000 The characteristics of holistic, dynamics, complexity, and spatial and temporal features enable “Omics” and theories of TCM to interlink with each other. HPO, namely, “characterization,” can be understood as a sorting and generalization of the manifestations shown by people with diseases on the basis of the phenomics. Syndrome is the overall “manifestation” of human body pathological and physiological changes expressed by four diagnostic methods’ information. The four diagnostic methods’ data could be the most objective and direct manifestations of human body under morbid conditions. In this aspect, it is consistent with the connation of “characterization.” Meanwhile, the four diagnostic methods’ data also equip us with features of characterization in HPO. In our study, we compared 107 pieces of four diagnostic methods’ information with the “characterization database” to further analyze data of four diagnostic methods’ characterization in accordance with the common characteristics of four diagnostic methods’ information and characterization and integrated 107 pieces of four diagnostic methods’ data to relevant items in HPO and finished the expansion of characterization information in HPO. Qi Shi, Kuo Gao, Huihui Zhao, Juan Wang, Xing Zhai, Peng Lu, Jianxin Chen, and Wei Wang Copyright © 2014 Qi Shi et al. All rights reserved. Erratum to “A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats” Mon, 24 Nov 2014 00:00:00 +0000 Shuaibin Lian, Qingyan Li, Zhiming Dai, Qian Xiang, and Xianhua Dai Copyright © 2014 Shuaibin Lian et al. All rights reserved. A Least Square Method Based Model for Identifying Protein Complexes in Protein-Protein Interaction Network Thu, 23 Oct 2014 12:45:40 +0000 Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity. Qiguo Dai, Maozu Guo, Yingjie Guo, Xiaoyan Liu, Yang Liu, and Zhixia Teng Copyright © 2014 Qiguo Dai et al. All rights reserved. Evolution of Network Biomarkers from Early to Late Stage Bladder Cancer Samples Thu, 18 Sep 2014 06:53:32 +0000 We use a systems biology approach to construct protein-protein interaction networks (PPINs) for early and late stage bladder cancer. By comparing the networks of these two stages, we find that both networks showed very significantly different mechanisms. To obtain the differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two bladder cancer stages using microarray data from cancer cells and their adjacent noncancer cells, respectively. With their carcinogenesis relevance values (CRVs), we identified 152 and 50 significant proteins and their PPI networks (network markers) for early and late stage bladder cancer by statistical assessment. To investigate the evolution of network biomarkers in the carcinogenesis process, primary pathway analysis showed that the significant pathways of early stage bladder cancer are related to ordinary cancer mechanisms, while the ribosome pathway and spliceosome pathway are most important for late stage bladder cancer. Their only intersection is the ubiquitin mediated proteolysis pathway in the whole stage of bladder cancer. The evolution of network biomarkers from early to late stage can reveal the carcinogenesis of bladder cancer. The findings in this study are new clues specific to this study and give us a direction for targeted cancer therapy, and it should be validated in vivo or in vitro in the future. Yung-Hao Wong, Cheng-Wei Li, and Bor-Sen Chen Copyright © 2014 Yung-Hao Wong et al. All rights reserved. MicroRNA Expression Profiling Altered by Variant Dosage of Radiation Exposure Tue, 16 Sep 2014 08:57:42 +0000 Various biological effects are associated with radiation exposure. Irradiated cells may elevate the risk for genetic instability, mutation, and cancer under low levels of radiation exposure, in addition to being able to extend the postradiation side effects in normal tissues. Radiation-induced bystander effect (RIBE) is the focus of rigorous research as it may promote the development of cancer even at low radiation doses. Alterations in the DNA sequence could not explain these biological effects of radiation and it is thought that epigenetics factors may be involved. Indeed, some microRNAs (or miRNAs) have been found to correlate radiation-induced damages and may be potential biomarkers for the various biological effects caused by different levels of radiation exposure. However, the regulatory role that miRNA plays in this aspect remains elusive. In this study, we profiled the expression changes in miRNA under fractionated radiation exposure in human peripheral blood mononuclear cells. By utilizing publicly available microRNA knowledge bases and performing cross validations with our previous gene expression profiling under the same radiation condition, we identified various miRNA-gene interactions specific to different doses of radiation treatment, providing new insights for the molecular underpinnings of radiation injury. Kuei-Fang Lee, Yi-Cheng Chen, Paul Wei-Che Hsu, Ingrid Y. Liu, and Lawrence Shih-Hsin Wu Copyright © 2014 Kuei-Fang Lee et al. All rights reserved. WISCOD: A Statistical Web-Enabled Tool for the Identification of Significant Protein Coding Regions Mon, 15 Sep 2014 05:37:19 +0000 Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global value called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms. Mireia Vilardell, Genis Parra, and Sergi Civit Copyright © 2014 Mireia Vilardell et al. All rights reserved. EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction Thu, 11 Sep 2014 10:40:30 +0000 We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968. Chih-Hao Lu, Chin-Sheng Yu, Yu-Tung Chien, and Shao-Wei Huang Copyright © 2014 Chih-Hao Lu et al. All rights reserved. Computational Biophysical, Biochemical, and Evolutionary Signature of Human R-Spondin Family Proteins, the Member of Canonical Wnt/β-Catenin Signaling Pathway Mon, 08 Sep 2014 08:19:35 +0000 In human, Wnt/β-catenin signaling pathway plays a significant role in cell growth, cell development, and disease pathogenesis. Four human (Rspo)s are known to activate canonical Wnt/β-catenin signaling pathway. Presently, (Rspo)s serve as therapeutic target for several human diseases. Henceforth, basic understanding about the molecular properties of (Rspo)s is essential. We approached this issue by interpreting the biochemical and biophysical properties along with molecular evolution of (Rspo)s thorough computational algorithm methods. Our analysis shows that signal peptide length is roughly similar in (Rspo)s family along with similarity in aa distribution pattern. In Rspo3, four N-glycosylation sites were noted. All members are hydrophilic in nature and showed alike GRAVY values, approximately. Conversely, Rspo3 contains the maximum positively charged residues while Rspo4 includes the lowest. Four highly aligned blocks were recorded through Gblocks. Phylogenetic analysis shows Rspo4 is being rooted with Rspo2 and similarly Rspo3 and Rspo1 have the common point of origin. Through phylogenomics study, we developed a phylogenetic tree of sixty proteins () with the orthologs and paralogs seed sequences. Protein-protein network was also illustrated. Results demonstrated in our study may help the future researchers to unfold significant physiological and therapeutic properties of (Rspo)s in various disease models. Ashish Ranjan Sharma, Chiranjib Chakraborty, Sang-Soo Lee, Garima Sharma, Jeong Kyo Yoon, C. George Priya Doss, Dong-Keun Song, and Ju-Suk Nam Copyright © 2014 Ashish Ranjan Sharma et al. All rights reserved. Gene Expression Profiling of Biological Pathway Alterations by Radiation Exposure Mon, 08 Sep 2014 00:00:00 +0000 Though damage caused by radiation has been the focus of rigorous research, the mechanisms through which radiation exerts harmful effects on cells are complex and not well-understood. In particular, the influence of low dose radiation exposure on the regulation of genes and pathways remains unclear. In an attempt to investigate the molecular alterations induced by varying doses of radiation, a genome-wide expression analysis was conducted. Peripheral blood mononuclear cells were collected from five participants and each sample was subjected to 0.5 Gy, 1 Gy, 2.5 Gy, and 5 Gy of cobalt 60 radiation, followed by array-based expression profiling. Gene set enrichment analysis indicated that the immune system and cancer development pathways appeared to be the major affected targets by radiation exposure. Therefore, 1 Gy radioactive exposure seemed to be a critical threshold dosage. In fact, after 1 Gy radiation exposure, expression levels of several genes including FADD, TNFRSF10B, TNFRSF8, TNFRSF10A, TNFSF10, TNFSF8, CASP1, and CASP4 that are associated with carcinogenesis and metabolic disorders showed significant alterations. Our results suggest that exposure to low-dose radiation may elicit changes in metabolic and immune pathways, potentially increasing the risk of immune dysfunctions and metabolic disorders. Kuei-Fang Lee, Julia Tzu-Ya Weng, Paul Wei-Che Hsu, Yu-Hsiang Chi, Ching-Kai Chen, Ingrid Y. Liu, Yi-Cheng Chen, and Lawrence Shih-Hsin Wu Copyright © 2014 Kuei-Fang Lee et al. All rights reserved. Systematic Expression Profiling Analysis Identifies Specific MicroRNA-Gene Interactions that May Differentiate between Active and Latent Tuberculosis Infection Thu, 04 Sep 2014 00:00:00 +0000 Tuberculosis (TB) is the second most common cause of death from infectious diseases. About 90% of those infected are asymptomatic—the so-called latent TB infections (LTBI), with a 10% lifetime chance of progressing to active TB. To further understand the molecular pathogenesis of TB, several molecular studies have attempted to compare the expression profiles between healthy controls and active TB or LTBI patients. However, the results vary due to diverse genetic backgrounds and study designs and the inherent complexity of the disease process. Thus, developing a sensitive and efficient method for the detection of LTBI is both crucial and challenging. For the present study, we performed a systematic analysis of the gene and microRNA profiles of healthy individuals versus those affected with TB or LTBI. Combined with a series of in silico analysis utilizing publicly available microRNA knowledge bases and published literature data, we have uncovered several microRNA-gene interactions that specifically target both the blood and lungs. Some of these molecular interactions are novel and may serve as potential biomarkers of TB and LTBI, facilitating the development for a more sensitive, efficient, and cost-effective diagnostic assay for TB and LTBI for the Taiwanese population. Lawrence Shih-Hsin Wu, Shih-Wei Lee, Kai-Yao Huang, Tzong-Yi Lee, Paul Wei-Che Hsu, and Julia Tzu-Ya Weng Copyright © 2014 Lawrence Shih-Hsin Wu et al. All rights reserved. Human Umbilical Cord Mesenchymal Stem Cells Infected with Adenovirus Expressing HGF Promote Regeneration of Damaged Neuron Cells in a Parkinson’s Disease Model Wed, 03 Sep 2014 08:15:20 +0000 Parkinson’s disease (PD) is a neurodegenerative movement disorder that is characterized by the progressive degeneration of the dopaminergic (DA) pathway. Mesenchymal stem cells derived from human umbilical cord (hUC-MSCs) have great potential for developing a therapeutic agent as such. HGF is a multifunctional mediator originally identified in hepatocytes and has recently been reported to possess various neuroprotective properties. This study was designed to investigate the protective effect of hUC-MSCs infected by an adenovirus carrying the HGF gene on the PD cell model induced by MPP+ on human bone marrow neuroblastoma cells. Our results provide evidence that the cultural supernatant from hUC-MSCs expressing HGF could promote regeneration of damaged PD cells at higher efficacy than the supernatant from hUC-MSCs alone. And intracellular free Ca2+ obviously decreased after treatment with cultural supernatant from hUC-MSCs expressing HGF, while the expression of CaBP-D28k, an intracellular calcium binding protein, increased. Therefore our study clearly demonstrated that cultural supernatant of MSC overexpressing HGF was capable of eliciting regeneration of damaged PD model cells. This effect was probably achieved through the regulation of intracellular Ca2+ levels by modulating of CaBP-D28k expression. Xin-Shan Liu, Jin-Feng Li, Shan-Shan Wang, Yu-Tong Wang, Yu-Zhen Zhang, Hong-Lei Yin, Shuang Geng, Hui-Cui Gong, Bing Han, and Yun-Liang Wang Copyright © 2014 Xin-Shan Liu et al. All rights reserved. Structural Comparison, Substrate Specificity, and Inhibitor Binding of AGPase Small Subunit from Monocot and Dicot: Present Insight and Future Potential Tue, 02 Sep 2014 11:29:57 +0000 ADP-glucose pyrophosphorylase (AGPase) is the first rate limiting enzyme of starch biosynthesis pathway and has been exploited as the target for greater starch yield in several plants. The structure-function analysis and substrate binding specificity of AGPase have provided enormous potential for understanding the role of specific amino acid or motifs responsible for allosteric regulation and catalytic mechanisms, which facilitate the engineering of AGPases. We report the three-dimensional structure, substrate, and inhibitor binding specificity of AGPase small subunit from different monocot and dicot crop plants. Both monocot and dicot subunits were found to exploit similar interactions with the substrate and inhibitor molecule as in the case of their closest homologue potato tuber AGPase small subunit. Comparative sequence and structural analysis followed by molecular docking and electrostatic surface potential analysis reveal that rearrangements of secondary structure elements, substrate, and inhibitor binding residues are strongly conserved and follow common folding pattern and orientation within monocot and dicot displaying a similar mode of allosteric regulation and catalytic mechanism. The results from this study along with site-directed mutagenesis complemented by molecular dynamics simulation will shed more light on increasing the starch content of crop plants to ensure the food security worldwide. Kishore Sarma, Priyabrata Sen, Madhumita Barooah, Manabendra D. Choudhury, Shubhadeep Roychoudhury, and Mahendra K. Modi Copyright © 2014 Kishore Sarma et al. All rights reserved. A Review of Feature Extraction Software for Microarray Gene Expression Data Sun, 31 Aug 2014 07:10:08 +0000 When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. Ching Siang Tan, Wai Soon Ting, Mohd Saberi Mohamad, Weng Howe Chan, Safaai Deris, and Zuraini Ali Shah Copyright © 2014 Ching Siang Tan et al. All rights reserved. The Mcm2-7 Replicative Helicase: A Promising Chemotherapeutic Target Thu, 28 Aug 2014 15:15:54 +0000 Numerous eukaryotic replication factors have served as chemotherapeutic targets. One replication factor that has largely escaped drug development is the Mcm2-7 replicative helicase. This heterohexameric complex forms the licensing system that assembles the replication machinery at origins during initiation, as well as the catalytic core of the CMG (Cdc45-Mcm2-7-GINS) helicase that unwinds DNA during elongation. Emerging evidence suggests that Mcm2-7 is also part of the replication checkpoint, a quality control system that monitors and responds to DNA damage. As the only replication factor required for both licensing and DNA unwinding, Mcm2-7 is a major cellular regulatory target with likely cancer relevance. Mutations in at least one of the six MCM genes are particularly prevalent in squamous cell carcinomas of the lung, head and neck, and prostrate, and MCM mutations have been shown to cause cancer in mouse models. Moreover various cellular regulatory proteins, including the Rb tumor suppressor family members, bind Mcm2-7 and inhibit its activity. As a preliminary step toward drug development, several small molecule inhibitors that target Mcm2-7 have been recently discovered. Both its structural complexity and essential role at the interface between DNA replication and its regulation make Mcm2-7 a potential chemotherapeutic target. Nicholas E. Simon and Anthony Schwacha Copyright © 2014 Nicholas E. Simon and Anthony Schwacha. All rights reserved. Crystal Structure of a Conserved Hypothetical Protein MJ0927 from Methanocaldococcus jannaschii Reveals a Novel Quaternary Assembly in the Nif3 Family Thu, 28 Aug 2014 15:06:43 +0000 A Nif3 family protein of Methanocaldococcus jannaschii, MJ0927, is highly conserved from bacteria to humans. Although several structures of bacterial Nif3 proteins are known, no structure representing archaeal Nif3 has yet been reported. The crystal structure of Methanocaldococcus jannaschii MJ0927 was determined at 2.47 Å resolution to understand the structural differences between the bacterial and archaeal Nif3 proteins. Intriguingly, MJ0927 is found to adopt an unusual assembly comprising a trimer of dimers that forms a cage-like architecture. Electrophoretic mobility-shift assays indicate that MJ0927 binds to both single-stranded and double-stranded DNA. Structural analysis of MJ0927 reveals a positively charged region that can potentially explain its DNA-binding capability. Taken together, these data suggest that MJ0927 adopts a novel quartenary architecture that could play various DNA-binding roles in Methanocaldococcus jannaschii. Sheng-Chia Chen, Chi-Hung Huang, Chia Shin Yang, Shu-Min Kuan, Ching-Ting Lin, Shan-Ho Chou, and Yeh Chen Copyright © 2014 Sheng-Chia Chen et al. All rights reserved. Relationship between CCR and NT-proBNP in Chinese HF Patients, and Their Correlations with Severity of HF Thu, 28 Aug 2014 09:42:10 +0000 Aim. To evaluate the relationship between creatinine clearance rate (CCR) and the level of N-terminal pro-B-type natriuretic peptide (NT-proBNP) in heart failure (HF) patients and their correlations with HF severity. Methods and Results. Two hundred and one Chinese patients were grouped according to the New York Heart Association (NYHA) classification as NYHA 1-2 and 3-4 groups and 135 cases out of heart failure patients as control group. The following variables were compared among these three groups: age, sex, body mass index (BMI), smoking status, hypertension, diabetes, NT-proBNP, creatinine (Cr), uric acid (UA), left ventricular end-diastolic diameter (LVEDD), and CCR. The biomarkers of NT-proBNP, Cr, UA, LVEDD, and CCR varied significantly in the three groups, and these variables were positively correlated with the NHYA classification. The levels of NT-proBNP and CCR were closely related to the occurrence of HF and were independent risk factors for HF. At the same time, there was a significant negative correlation between the levels of NT-proBNP and CCR. The area under the receiver operating characteristic curve suggested that the NT-proBNP and CCR have high accuracy for diagnosis of HF and have clinical diagnostic value. Conclusion. NT-proBNP and CCR may be important biomarkers in evaluating the severity of HF. Zhigang Lu, Bo Wang, Yunliang Wang, Xueqing Qian, Wei Zheng, and Meng Wei Copyright © 2014 Zhigang Lu et al. All rights reserved. Establishing Standards for Studying Renal Function in Mice through Measurements of Body Size-Adjusted Creatinine and Urea Levels Wed, 27 Aug 2014 12:35:10 +0000 Strategies for obtaining reliable results are increasingly implemented in order to reduce errors in the analysis of human and veterinary samples; however, further data are required for murine samples. Here, we determined an average factor from the murine body surface area for the calculation of biochemical renal parameters, assessed the effects of storage and freeze-thawing of C57BL/6 mouse samples on plasmatic and urinary urea, and evaluated the effects of using two different urea-measurement techniques. After obtaining 24 h urine samples, blood was collected, and body weight and length were established. The samples were evaluated after collection or stored at −20°C and −70°C. At different time points (0, 4, and 90 days), these samples were thawed, the creatinine and/or urea concentrations were analyzed, and samples were restored at these temperatures for further measurements. We show that creatinine clearance measurements should be adjusted according to the body surface area, which was calculated based on the weight and length of the animal. Repeated freeze-thawing cycles negatively affected the urea concentration; the urea concentration was more reproducible when using the modified Berthelot reaction rather than the ultraviolet method. Our findings will facilitate standardization and optimization of methodology as well as understanding of renal and other biochemical data obtained from mice. Wellington Francisco Rodrigues, Camila Botelho Miguel, Marcelo Henrique Napimoga, Carlo Jose Freire Oliveira, and Javier Emilio Lazo-Chica Copyright © 2014 Wellington Francisco Rodrigues et al. All rights reserved. Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection Wed, 27 Aug 2014 12:02:00 +0000 Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method. Xiuquan Du and Jiaxing Cheng Copyright © 2014 Xiuquan Du and Jiaxing Cheng. All rights reserved. Crystal Structure of Deinococcus radiodurans RecQ Helicase Catalytic Core Domain: The Interdomain Flexibility Wed, 27 Aug 2014 08:21:26 +0000 RecQ DNA helicases are key enzymes in the maintenance of genome integrity, and they have functions in DNA replication, recombination, and repair. In contrast to most RecQs, RecQ from Deinococcus radiodurans (DrRecQ) possesses an unusual domain architecture that is crucial for its remarkable ability to repair DNA. Here, we determined the crystal structures of the DrRecQ helicase catalytic core and its ADP-bound form, revealing interdomain flexibility in its first RecA-like and winged-helix (WH) domains. Additionally, the WH domain of DrRecQ is positioned in a different orientation from that of the E. coli RecQ (EcRecQ). These results suggest that the orientation of the protein during DNA-binding is significantly different when comparing DrRecQ and EcRecQ. Sheng-Chia Chen, Chi-Hung Huang, Chia Shin Yang, Tzong-Der Way, Ming-Chung Chang, and Yeh Chen Copyright © 2014 Sheng-Chia Chen et al. All rights reserved. Characterization of Putative cis-Regulatory Elements in Genes Preferentially Expressed in Arabidopsis Male Meiocytes Wed, 27 Aug 2014 08:05:05 +0000 Meiosis is essential for plant reproduction because it is the process during which homologous chromosome pairing, synapsis, and meiotic recombination occur. The meiotic transcriptome is difficult to investigate because of the size of meiocytes and the confines of anther lobes. The recent development of isolation techniques has enabled the characterization of transcriptional profiles in male meiocytes of Arabidopsis. Gene expression in male meiocytes shows unique features. The direct interaction of transcription factors (TFs) with DNA regulatory sequences forms the basis for the specificity of transcriptional regulation. Here, we identified putative cis-regulatory elements (CREs) associated with male meiocyte-expressed genes using in silico tools. The upstream regions (1 kb) of the top 50 genes preferentially expressed in Arabidopsis meiocytes possessed conserved motifs. These motifs are putative binding sites of TFs, some of which share common functions, such as roles in cell division. In combination with cell-type-specific analysis, our findings could be a substantial aid for the identification and experimental verification of the protein-DNA interactions for the specific TFs that drive gene expression in meiocytes. Junhua Li, Jinhong Yuan, and Mingjun Li Copyright © 2014 Junhua Li et al. All rights reserved. Function Formula Oriented Construction of Bayesian Inference Nets for Diagnosis of Cardiovascular Disease Wed, 27 Aug 2014 06:47:48 +0000 An intelligent cardiovascular disease (CVD) diagnosis system using hemodynamic parameters (HDPs) derived from sphygmogram (SPG) signal is presented to support the emerging patient-centric healthcare models. To replicate clinical approach of diagnosis through a staged decision process, the Bayesian inference nets (BIN) are adapted. New approaches to construct a hierarchical multistage BIN using defined function formulas and a method employing fuzzy logic (FL) technology to quantify inference nodes with dynamic values of statistical parameters are proposed. The suggested methodology is validated by constructing hierarchical Bayesian fuzzy inference nets (HBFIN) to diagnose various heart pathologies from the deduced HDPs. The preliminary diagnostic results show that the proposed methodology has salient validity and effectiveness in the diagnosis of cardiovascular disease. Booma Devi Sekar and Mingchui Dong Copyright © 2014 Booma Devi Sekar and Mingchui Dong. All rights reserved. High-Throughput Functional Screening of Steroid Substrates with Wild-Type and Chimeric P450 Enzymes Tue, 26 Aug 2014 10:40:59 +0000 The promiscuity of a collection of enzymes consisting of 31 wild-type and synthetic variants of CYP1A enzymes was evaluated using a series of 14 steroids and 2 steroid-like chemicals, namely, nootkatone, a terpenoid, and mifepristone, a drug. For each enzyme-substrate couple, the initial steady-state velocity of metabolite formation was determined at a substrate saturating concentration. For that, a high-throughput approach was designed involving automatized incubations in 96-well microplate with sixteen 6-point kinetics per microplate and data acquisition using LC/MS system accepting 96-well microplate for injections. The resulting dataset was used for multivariate statistics aimed at sorting out the correlations existing between tested enzyme variants and ability to metabolize steroid substrates. Functional classifications of both CYP1A enzyme variants and steroid substrate structures were obtained allowing the delineation of global structural features for both substrate recognition and regioselectivity of oxidation. Philippe Urban, Gilles Truan, and Denis Pompon Copyright © 2014 Philippe Urban et al. All rights reserved. Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model Mon, 18 Aug 2014 10:52:22 +0000 Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection. Zhu-Hong You, Shuai Li, Xin Gao, Xin Luo, and Zhen Ji Copyright © 2014 Zhu-Hong You et al. All rights reserved. Drug Repositioning Discovery for Early- and Late-Stage Non-Small-Cell Lung Cancer Mon, 18 Aug 2014 07:02:32 +0000 Drug repositioning is a popular approach in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development time. Non-small-cell lung cancer (NSCLC) is one of the leading causes of death worldwide. To reduce the biological heterogeneity effects among different individuals, both normal and cancer tissues were taken from the same patient, hence allowing pairwise testing. By comparing early- and late-stage cancer patients, we can identify stage-specific NSCLC genes. Differentially expressed genes are clustered separately to form up- and downregulated communities that are used as queries to perform enrichment analysis. The results suggest that pathways for early- and late-stage cancers are different. Sets of up- and downregulated genes were submitted to the cMap web resource to identify potential drugs. To achieve high confidence drug prediction, multiple microarray experimental results were merged by performing meta-analysis. The results of a few drug findings are supported by MTT assay or clonogenic assay data. In conclusion, we have been able to assess the potential existing drugs to identify novel anticancer drugs, which may be helpful in drug repositioning discovery for NSCLC. Chien-Hung Huang, Peter Mu-Hsin Chang, Yong-Jie Lin, Cheng-Hsu Wang, Chi-Ying F. Huang, and Ka-Lok Ng Copyright © 2014 Chien-Hung Huang et al. All rights reserved. Systematic Analysis of the Association between Gut Flora and Obesity through High-Throughput Sequencing and Bioinformatics Approaches Thu, 14 Aug 2014 12:10:54 +0000 Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI) 24) were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant genera of bacteria in case samples (with a BMI 27) were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). A principal coordinate analysis (PCoA) demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher’s ). The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity. Chih-Min Chiu, Wei-Chih Huang, Shun-Long Weng, Han-Chi Tseng, Chao Liang, Wei-Chi Wang, Ting Yang, Tzu-Ling Yang, Chen-Tsung Weng, Tzu-Hao Chang, and Hsien-Da Huang Copyright © 2014 Chih-Min Chiu et al. All rights reserved. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products Tue, 12 Aug 2014 10:16:15 +0000 Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO) with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products based on multiple ontologies of hierarchical GO annotations. The performance of our algorithm was evaluated and found to be superior to the other tested methods. We implemented the proposed algorithm in a software package, FSim, based on statistical and computing environment. It can be used to discover functionally related genes for a given gene, group of genes, or set of function terms. Conclusions. FSim is a flexible tool to analyze functional gene groups based on the GO annotation databases. Qiang Hu, ZhiGang Wang, and ZhengGuo Zhang Copyright © 2014 Qiang Hu et al. All rights reserved. Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm Tue, 12 Aug 2014 00:00:00 +0000 Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews’ correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews’ correlation coefficients of 0.2239. Guohua Huang, Lin Lu, Kaiyan Feng, Jun Zhao, Yuchao Zhang, Yaochen Xu, Ning Zhang, Bi-Qing Li, Weiping Huang, and Yu-Dong Cai Copyright © 2014 Guohua Huang et al. All rights reserved. Novel Approach for Coexpression Analysis of E2F1–3 and MYC Target Genes in Chronic Myelogenous Leukemia Sun, 10 Aug 2014 08:29:13 +0000 Background. Chronic myelogenous leukemia (CML) is characterized by tremendous amount of immature myeloid cells in the blood circulation. E2F1–3 and MYC are important transcription factors that form positive feedback loops by reciprocal regulation in their own transcription processes. Since genes regulated by E2F1–3 or MYC are related to cell proliferation and apoptosis, we wonder if there exists difference in the coexpression patterns of genes regulated concurrently by E2F1–3 and MYC between the normal and the CML states. Results. We proposed a method to explore the difference in the coexpression patterns of those candidate target genes between the normal and the CML groups. A disease-specific cutoff point for coexpression levels that classified the coexpressed gene pairs into strong and weak coexpression classes was identified. Our developed method effectively identified the coexpression pattern differences from the overall structure. Moreover, we found that genes related to the cell adhesion and angiogenesis properties were more likely to be coexpressed in the normal group when compared to the CML group. Conclusion. Our findings may be helpful in exploring the underlying mechanisms of CML and provide useful information in cancer treatment. Fengfeng Wang, Lawrence W. C. Chan, William C. S. Cho, Petrus Tang, Jun Yu, Chi-Ren Shyu, Nancy B. Y. Tsui, S. C. Cesar Wong, Parco M. Siu, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2014 Fengfeng Wang et al. All rights reserved. A Genome-Wide Identification of Genes Undergoing Recombination and Positive Selection in Neisseria Sun, 10 Aug 2014 08:23:34 +0000 Currently, there is particular interest in the molecular mechanisms of adaptive evolution in bacteria. Neisseria is a genus of gram negative bacteria, and there has recently been considerable focus on its two human pathogenic species N. meningitidis and N. gonorrhoeae. Until now, no genome-wide studies have attempted to scan for the genes related to adaptive evolution. For this reason, we selected 18 Neisseria genomes (14 N. meningitidis, 3 N. gonorrhoeae and 1 commensal N. lactamics) to conduct a comparative genome analysis to obtain a comprehensive understanding of the roles of natural selection and homologous recombination throughout the history of adaptive evolution. Among the 1012 core orthologous genes, we identified 635 genes with recombination signals and 10 genes that showed significant evidence of positive selection. Further functional analyses revealed that no functional bias was found in the recombined genes. Positively selected genes are prone to DNA processing and iron uptake, which are essential for the fundamental life cycle. Overall, the results indicate that both recombination and positive selection play crucial roles in the adaptive evolution of Neisseria genomes. The positively selected genes and the corresponding amino acid sites provide us with valuable targets for further research into the detailed mechanisms of adaptive evolution in Neisseria. Dong Yu, Yuan Jin, Zhiqiu Yin, Hongguang Ren, Wei Zhou, Long Liang, and Junjie Yue Copyright © 2014 Dong Yu et al. All rights reserved. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration Wed, 06 Aug 2014 08:37:56 +0000 Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes. Jian Zhang, ZhiHao Xing, Mingming Ma, Ning Wang, Yu-Dong Cai, Lei Chen, and Xun Xu Copyright © 2014 Jian Zhang et al. All rights reserved. C-Terminal Domain Swapping of SSB Changes the Size of the ssDNA Binding Site Mon, 04 Aug 2014 06:33:19 +0000 Single-stranded DNA-binding protein (SSB) plays an important role in DNA metabolism, including DNA replication, repair, and recombination, and is therefore essential for cell survival. Bacterial SSB consists of an N-terminal ssDNA-binding/oligomerization domain and a flexible C-terminal protein-protein interaction domain. We characterized the ssDNA-binding properties of Klebsiella pneumoniae SSB (KpSSB), Salmonella enterica Serovar Typhimurium LT2 SSB (StSSB), Pseudomonas aeruginosa PAO1 SSB (PaSSB), and two chimeric KpSSB proteins, namely, KpSSBnStSSBc and KpSSBnPaSSBc. The C-terminal domain of StSSB or PaSSB was exchanged with that of KpSSB through protein chimeragenesis. By using the electrophoretic mobility shift assay, we characterized the stoichiometry of KpSSB, StSSB, PaSSB, KpSSBnStSSBc, and KpSSBnPaSSBc, complexed with a series of ssDNA homopolymers. The binding site sizes were determined to be , , , , and nucleotides (nt), respectively. Comparison of the binding site sizes of KpSSB, KpSSBnStSSBc, and KpSSBnPaSSBc showed that the C-terminal domain swapping of SSB changes the size of the binding site. Our observations suggest that not only the conserved N-terminal domain but also the C-terminal domain of SSB is an important determinant for ssDNA binding. Yen-Hua Huang and Cheng-Yang Huang Copyright © 2014 Yen-Hua Huang and Cheng-Yang Huang. All rights reserved. The Effects of the Context-Dependent Codon Usage Bias on the Structure of the nsp1α of Porcine Reproductive and Respiratory Syndrome Virus Sun, 03 Aug 2014 07:47:26 +0000 The information about the crystal structure of porcine reproductive and respiratory syndrome virus (PRRSV) leader protease nsp1α is available to analyze the roles of tRNA abundance of pigs and codon usage of the nsp1α gene in the formation of this protease. The effects of tRNA abundance of the pigs and the synonymous codon usage and the context-dependent codon bias (CDCB) of the nsp1α on shaping the specific folding units (α-helix, β-strand, and the coil) in the nsp1α were analyzed based on the structural information about this protease from protein data bank (PDB: 3IFU) and the nsp1α of the 191 PRRSV strains. By mapping the overall tRNA abundance along the nsp1α, we found that there is no link between the fluctuation of the overall tRNA abundance and the specific folding units in the nsp1α, and the low translation speed of ribosome caused by the tRNA abundance exists in the nsp1α. The strong correlation between some synonymous codon usage and the specific folding units in the nsp1α was found, and the phenomenon of CDCB exists in the specific folding units of the nsp1α. These findings provide an insight into the roles of the synonymous codon usage and CDCB in the formation of PRRSV nsp1α structure. Yao-zhong Ding, Ya-nan You, Dong-jie Sun, Hao-tai Chen, Yong-lu Wang, Hui-yun Chang, Li Pan, Yu-zhen Fang, Zhong-wang Zhang, Peng Zhou, Jian-liang Lv, Xin-sheng Liu, Jun-jun Shao, Fu-rong Zhao, Tong Lin, Laszlo Stipkovits, Zygmunt Pejsak, Yong-guang Zhang, and Jie Zhang Copyright © 2014 Yao-zhong Ding et al. All rights reserved. Detecting Epistatic Interactions in Metagenome-Wide Association Studies by metaBOOST Thu, 24 Jul 2014 18:41:12 +0000 Material and Methods. We recall the definition of epistasis and extend it for metagenomic biomarkers and then we describe the overview of our method metaBOOST and provide detailed information about each step of metaBOOST. Results. We describe the data sources for both simulation studies and real metagenomic datasets. Then, we describe the procedure of simulation studies and provide results for it. After that, we conduct real datasets studies and report the results. Conclusions and Discussion. Finally, we conclude our method and discuss some possible improvements for the future. Mengmeng Wu and Rui Jiang Copyright © 2014 Mengmeng Wu and Rui Jiang. All rights reserved. The N-Terminal Domain of Human DNA Helicase Rtel1 Contains a Redox Active Iron-Sulfur Cluster Thu, 24 Jul 2014 09:20:31 +0000 Human telomere length regulator Rtel1 is a superfamily II DNA helicase and is essential for maintaining proper length of telomeres in chromosomes. Here we report that the N-terminal domain of human Rtel1 (RtelN) expressed in Escherichia coli cells produces a protein that contains a redox active iron-sulfur cluster with the redox midpoint potential of −248 ± 10 mV (pH 8.0). The iron-sulfur cluster in RtelN is sensitive to hydrogen peroxide and nitric oxide, indicating that reactive oxygen/nitrogen species may modulate the DNA helicase activity of Rtel1 via modification of its iron-sulfur cluster. Purified RtelN retains a weak binding affinity for the single-stranded (ss) and double-stranded (ds) DNA in vitro. However, modification of the iron-sulfur cluster by hydrogen peroxide or nitric oxide does not significantly affect the DNA binding activity of RtelN, suggesting that the iron-sulfur cluster is not directly involved in the DNA interaction in the N-terminal domain of Rtel1. Aaron P. Landry and Huangen Ding Copyright © 2014 Aaron P. Landry and Huangen Ding. All rights reserved. Security Mechanism Based on Hospital Authentication Server for Secure Application of Implantable Medical Devices Thu, 24 Jul 2014 07:55:14 +0000 After two recent security attacks against implantable medical devices (IMDs) have been reported, the privacy and security risks of IMDs have been widely recognized in the medical device market and research community, since the malfunctioning of IMDs might endanger the patient’s life. During the last few years, a lot of researches have been carried out to address the security-related issues of IMDs, including privacy, safety, and accessibility issues. A physician accesses IMD through an external device called a programmer, for diagnosis and treatment. Hence, cryptographic key management between IMD and programmer is important to enforce a strict access control. In this paper, a new security architecture for the security of IMDs is proposed, based on a 3-Tier security model, where the programmer interacts with a Hospital Authentication Server, to get permissions to access IMDs. The proposed security architecture greatly simplifies the key management between IMDs and programmers. Also proposed is a security mechanism to guarantee the authenticity of the patient data collected from IMD and the nonrepudiation of the physician’s treatment based on it. The proposed architecture and mechanism are analyzed and compared with several previous works, in terms of security and performance. Chang-Seop Park Copyright © 2014 Chang-Seop Park. All rights reserved. An Intelligent System for Identifying Acetylated Lysine on Histones and Nonhistone Proteins Thu, 24 Jul 2014 00:00:00 +0000 Lysine acetylation is an important and ubiquitous posttranslational modification conserved in prokaryotes and eukaryotes. This process, which is dynamically and temporally regulated by histone acetyltransferases and deacetylases, is crucial for numerous essential biological processes such as transcriptional regulation, cellular signaling, and stress response. Since the experimental identification of lysine acetylation sites within proteins is time-consuming and laboratory-intensive, several computational approaches have been developed to identify candidates for experimental validation. In this work, acetylated protein data collected from UniProtKB were categorized into histone or nonhistone proteins. Support vector machines (SVMs) were applied to build predictive models by using amino acid pair composition (AAPC) as a feature in a histone model. We combined BLOSUM62 and AAPC features in a nonhistone model. Furthermore, using maximal dependence decomposition (MDD) clustering can enhance the performance of the model on a fivefold cross-validation evaluation to yield a sensitivity of 0.863, specificity of 0.885, accuracy of 0.880, and MCC of 0.706. Additionally, the proposed method is evaluated using independent test sets resulting in a predictive accuracy of 74%. This indicates that the performance of our method is comparable with that of other acetylation prediction methods. Cheng-Tsung Lu, Tzong-Yi Lee, Yu-Ju Chen, and Yi-Ju Chen Copyright © 2014 Cheng-Tsung Lu et al. All rights reserved. Studying the Complex Expression Dependences between Sets of Coexpressed Genes Thu, 24 Jul 2014 00:00:00 +0000 Organisms simplify the orchestration of gene expression by coregulating genes whose products function together in the cell. The use of clustering methods to obtain sets of coexpressed genes from expression arrays is very common; nevertheless there are no appropriate tools to study the expression networks among these sets of coexpressed genes. The aim of the developed tools is to allow studying the complex expression dependences that exist between sets of coexpressed genes. For this purpose, we start detecting the nonlinear expression relationships between pairs of genes, plus the coexpressed genes. Next, we form networks among sets of coexpressed genes that maintain nonlinear expression dependences between all of them. The expression relationship between the sets of coexpressed genes is defined by the expression relationship between the skeletons of these sets, where this skeleton represents the coexpressed genes with a well-defined nonlinear expression relationship with the skeleton of the other sets. As a result, we can study the nonlinear expression relationships between a target gene and other sets of coexpressed genes, or start the study from the skeleton of the sets, to study the complex relationships of activation and deactivation between the sets of coexpressed genes that carry out the different cellular processes present in the expression experiments. Mario Huerta, Oriol Casanova, Roberto Barchino, Jose Flores, Enrique Querol, and Juan Cedano Copyright © 2014 Mario Huerta et al. All rights reserved. An Efficient Parallel Algorithm for Multiple Sequence Similarities Calculation Using a Low Complexity Method Tue, 22 Jul 2014 09:07:46 +0000 With the advance of genomic researches, the number of sequences involved in comparative methods has grown immensely. Among them, there are methods for similarities calculation, which are used by many bioinformatics applications. Due the huge amount of data, the union of low complexity methods with the use of parallel computing is becoming desirable. The k-mers counting is a very efficient method with good biological results. In this work, the development of a parallel algorithm for multiple sequence similarities calculation using the k-mers counting method is proposed. Tests show that the algorithm presents a very good scalability and a nearly linear speedup. For 14 nodes was obtained 12x speedup. This algorithm can be used in the parallelization of some multiple sequence alignment tools, such as MAFFT and MUSCLE. Evandro A. Marucci, Geraldo F. D. Zafalon, Julio C. Momente, Leandro A. Neves, Carlo R. Valêncio, Alex R. Pinto, Adriano M. Cansian, Rogeria C. G. de Souza, Yang Shiyou, and José M. Machado Copyright © 2014 Evandro A. Marucci et al. All rights reserved. Cell Type-Dependent RNA Recombination Frequency in the Japanese Encephalitis Virus Tue, 22 Jul 2014 00:00:00 +0000 Japanese encephalitis virus (JEV) is one of approximately 70 flaviviruses, frequently causing symptoms involving the central nervous system. Mutations of its genomic RNA frequently occur during viral replication, which is believed to be a force contributing to viral evolution. Nevertheless, accumulating evidences show that some JEV strains may have actually arisen from RNA recombination between genetically different populations of the virus. We have demonstrated that RNA recombination in JEV occurs unequally in different cell types. In the present study, viral RNA fragments transfected into as well as viral RNAs synthesized in mosquito cells were shown not to be stable, especially in the early phase of infection possibly via cleavage by exoribonuclease. Such cleaved small RNA fragments may be further degraded through an RNA interference pathway triggered by viral double-stranded RNA during replication in mosquito cells, resulting in a lower frequency of RNA recombination in mosquito cells compared to that which occurs in mammalian cells. In fact, adjustment of viral RNA to an appropriately lower level in mosquito cells prevents overgrowth of the virus and is beneficial for cells to survive the infection. Our findings may also account for the slower evolution of arboviruses as reported previously. Wei-Wei Chiang, Ching-Kai Chuang, Mei Chao, and Wei-June Chen Copyright © 2014 Wei-Wei Chiang et al. All rights reserved. Structural Insight into the DNA-Binding Mode of the Primosomal Proteins PriA, PriB, and DnaT Mon, 21 Jul 2014 08:30:20 +0000 Replication restart primosome is a complex dynamic system that is essential for bacterial survival. This system uses various proteins to reinitiate chromosomal DNA replication to maintain genetic integrity after DNA damage. The replication restart primosome in Escherichia coli is composed of PriA helicase, PriB, PriC, DnaT, DnaC, DnaB helicase, and DnaG primase. The assembly of the protein complexes within the forked DNA responsible for reloading the replicative DnaB helicase anywhere on the chromosome for genome duplication requires the coordination of transient biomolecular interactions. Over the last decade, investigations on the structure and mechanism of these nucleoproteins have provided considerable insight into primosome assembly. In this review, we summarize and discuss our current knowledge and recent advances on the DNA-binding mode of the primosomal proteins PriA, PriB, and DnaT. Yen-Hua Huang and Cheng-Yang Huang Copyright © 2014 Yen-Hua Huang and Cheng-Yang Huang. All rights reserved. Mass Spectrometry Based Proteomic Analysis of Salivary Glands of Urban Malaria Vector Anopheles stephensi Mon, 14 Jul 2014 11:31:37 +0000 Salivary gland proteins of Anopheles mosquitoes offer attractive targets to understand interactions with sporozoites, blood feeding behavior, homeostasis, and immunological evaluation of malaria vectors and parasite interactions. To date limited studies have been carried out to elucidate salivary proteins of An. stephensi salivary glands. The aim of the present study was to provide detailed analytical attributives of functional salivary gland proteins of urban malaria vector An. stephensi. A proteomic approach combining one-dimensional electrophoresis (1DE), ion trap liquid chromatography mass spectrometry (LC/MS/MS), and computational bioinformatic analysis was adopted to provide the first direct insight into identification and functional characterization of known salivary proteins and novel salivary proteins of An. stephensi. Computational studies by online servers, namely, MASCOT and OMSSA algorithms, identified a total of 36 known salivary proteins and 123 novel proteins analysed by LC/MS/MS. This first report describes a baseline proteomic catalogue of 159 salivary proteins belonging to various categories of signal transduction, regulation of blood coagulation cascade, and various immune and energy pathways of An. stephensi sialotranscriptome by mass spectrometry. Our results may serve as basis to provide a putative functional role of proteins in concept of blood feeding, biting behavior, and other aspects of vector-parasite host interactions for parasite development in anopheline mosquitoes. Sonam Vijay, Manmeet Rawat, and Arun Sharma Copyright © 2014 Sonam Vijay et al. All rights reserved. PPI Network Analysis of mRNA Expression Profile of Ezrin Knockdown in Esophageal Squamous Cell Carcinoma Mon, 14 Jul 2014 08:56:44 +0000 Ezrin, coding protein EZR which cross-links actin filaments, overexpresses and involves invasion, metastasis, and poor prognosis in various cancers including esophageal squamous cell carcinoma (ESCC). In our previous study, Ezrin was knock down and analyzed by mRNA expression profile which has not been fully mined. In this study, we applied protein-protein interactions (PPI) network knowledge and methods to explore our understanding of these differentially expressed genes (DEGs). PPI subnetworks showed that hundreds of DEGs interact with thousands of other proteins. Subcellular localization analyses found that the DEGs and their directly or indirectly interacting proteins distribute in multiple layers, which was applied to analyze the shortest paths between EZR and other DEGs. Gene ontology annotation generated a functional annotation map and found hundreds of significant terms, especially those associated with cytoskeleton organization of Ezrin protein, such as “cytoskeleton organization,” “regulation of actin filament-based process,” and “regulation of actin cytoskeleton organization.” The algorithm of Random Walk with Restart was applied to prioritize the DEGs and identified several cancer related DEGs ranked closest to EZR. These analyses based on PPI network have greatly expanded our comprehension of the mRNA expression profile of Ezrin knockdown for future examination of the roles and mechanisms of Ezrin. Bingli Wu, Jianjun Xie, Zepeng Du, Jianyi Wu, Pixian Zhang, Liyan Xu, and Enmin Li Copyright © 2014 Bingli Wu et al. All rights reserved. Identifying the Gene Signatures from Gene-Pathway Bipartite Network Guarantees the Robust Model Performance on Predicting the Cancer Prognosis Mon, 14 Jul 2014 08:20:49 +0000 For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs) generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes. Li He, Yuelong Wang, Yongning Yang, Liqiu Huang, and Zhining Wen Copyright © 2014 Li He et al. All rights reserved. The Definition of a Prolonged Intensive Care Unit Stay for Spontaneous Intracerebral Hemorrhage Patients: An Application with National Health Insurance Research Database Mon, 14 Jul 2014 08:11:53 +0000 Introduction. Length of stay (LOS) in the intensive care unit (ICU) of spontaneous intracerebral hemorrhage (sICH) patients is one of the most important issues. The disease severity, psychosocial factors, and institutional factors will influence the length of ICU stay. This study is used in the Taiwan National Health Insurance Research Database (NHIRD) to define the threshold of a prolonged ICU stay in sICH patients. Methods. This research collected the demographic data of sICH patients in the NHIRD from 2005 to 2009. The threshold of prolonged ICU stay was calculated using change point analysis. Results. There were 1599 sICH patients included. A prolonged ICU stay was defined as being equal to or longer than 10 days. There were 436 prolonged ICU stay cases and 1163 nonprolonged cases. Conclusion. This study showed that the threshold of a prolonged ICU stay is a good indicator of hospital utilization in ICH patients. Different hospitals have their own different care strategies that can be identified with a prolonged ICU stay. This indicator can be improved using quality control methods such as complications prevention and efficiency of ICU bed management. Patients’ stay in ICUs and in hospitals will be shorter if integrated care systems are established. Chien-Lung Chan, Hsien-Wei Ting, and Hsin-Tsung Huang Copyright © 2014 Chien-Lung Chan et al. All rights reserved. Incorporating Amino Acids Composition and Functional Domains for Identifying Bacterial Toxin Proteins Mon, 07 Jul 2014 08:55:16 +0000 Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development. Min-Gang Su, Chien-Hsun Huang, Tzong-Yi Lee, Yu-Ju Chen, and Hsin-Yi Wu Copyright © 2014 Min-Gang Su et al. All rights reserved. Risk Factors for Mortality in Patients with Septic Acute Kidney Injury in Intensive Care Units in Beijing, China: A Multicenter Prospective Observational Study Mon, 07 Jul 2014 06:34:39 +0000 Objective. To discover risk factors for mortality of patients with septic AKI in ICU via a multicenter study. Background. Septic AKI is a serious threat to patients in ICU, but there are a few clinical studies focusing on this. Methods. This was a prospective, observational, and multicenter study conducted in 30 ICUs of 28 major hospitals in Beijing. 3,107 patients were admitted consecutively, among which 361 patients were with septic AKI. Patient clinical data were recorded daily for 10 days after admission. Kidney Disease: Improving Global Outcomes (KDIGO) criteria were used to define and stage AKI. Of the involved patients, 201 survived and 160 died. Results. The rate of septic AKI was 11.6%. Twenty-one risk factors were found, and six independent risk factors were identified: age, APACHE II score, duration of mechanical ventilation, duration of MAP <65 mmHg, time until RRT started, and progressive KIDGO stage. Admission KDIGO stages were not associated with mortality, while worst KDIGO stages were. Only progressive KIDGO stage was an independent risk factor. Conclusions. Six independent risk factors for mortality for septic AKI were identified. Progressive KIDGO stage is better than admission or the worst KIDGO for prediction of mortality. This trial is registered with ChiCTR-ONC-11001875. Xin Wang, Li Jiang, Ying Wen, Mei-Ping Wang, Wei Li, Zhi-Qiang Li, and Xiu-Ming Xi Copyright © 2014 Xin Wang et al. All rights reserved. Gonadal Transcriptome Analysis of Male and Female Olive Flounder (Paralichthys olivaceus) Sun, 06 Jul 2014 10:07:38 +0000 Olive flounder (Paralichthys olivaceus) is an important commercially cultured marine flatfish in China, Korea, and Japan, of which female grows faster than male. In order to explore the molecular mechanism of flounder sex determination and development, we used RNA-seq technology to investigate transcriptomes of flounder gonads. This produced 22,253,217 and 19,777,841 qualified reads from ovary and testes, which were jointly assembled into 97,233 contigs. Among them, 23,223 contigs were mapped to known genes, of which 2,193 were predicted to be differentially expressed in ovary and 887 in testes. According to annotation information, several sex-related biological pathways including ovarian steroidogenesis and estrogen signaling pathways were firstly found in flounder. The dimorphic expression of overall sex-related genes provides further insights into sex determination and gonadal development. Our study also provides an archive for further studies of molecular mechanism of fish sex determination. Zhaofei Fan, Feng You, Lijuan Wang, Shenda Weng, Zhihao Wu, Jinwei Hu, Yuxia Zou, Xungang Tan, and Peijun Zhang Copyright © 2014 Zhaofei Fan et al. All rights reserved. Characteristics and Prediction of RNA Structure Sun, 06 Jul 2014 09:18:47 +0000 RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding. Hengwu Li, Daming Zhu, Caiming Zhang, Huijian Han, and Keith A. Crandall Copyright © 2014 Hengwu Li et al. All rights reserved. Microarray-Based RNA Profiling of Breast Cancer: Batch Effect Removal Improves Cross-Platform Consistency Thu, 03 Jul 2014 00:00:00 +0000 Microarray is a powerful technique used extensively for gene expression analysis. Different technologies are available, but lack of standardization makes it challenging to compare and integrate data. Furthermore, batch-related biases within datasets are common but often not tackled. We have analyzed the same 234 breast cancers on two different microarray platforms. One dataset contained known batch-effects associated with the fabrication procedure used. The aim was to assess the significance of correcting for systematic batch-effects when integrating data from different platforms. We here demonstrate the importance of detecting batch-effects and how tools, such as ComBat, can be used to successfully overcome such systematic variations in order to unmask essential biological signals. Batch adjustment was found to be particularly valuable in the detection of more delicate differences in gene expression. Furthermore, our results show that prober adjustment is essential for integration of gene expression data obtained from multiple sources. We show that high-variance genes are highly reproducibly expressed across platforms making them particularly well suited as biomarkers and for building gene signatures, exemplified by prediction of estrogen-receptor status and molecular subtypes. In conclusion, the study emphasizes the importance of utilizing proper batch adjustment methods when integrating data across different batches and platforms. Martin J. Larsen, Mads Thomassen, Qihua Tan, Kristina P. Sørensen, and Torben A. Kruse Copyright © 2014 Martin J. Larsen et al. All rights reserved. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis Thu, 03 Jul 2014 00:00:00 +0000 Gene set analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the values and FDR (false discovery rate) -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online. Chih-Yi Chien, Ching-Wei Chang, Chen-An Tsai, and James J. Chen Copyright © 2014 Chih-Yi Chien et al. All rights reserved. Combined Analysis with Copy Number Variation Identifies Risk Loci in Lung Cancer Tue, 01 Jul 2014 11:54:11 +0000 Background. Lung cancer is the most important cause of cancer mortality worldwide, but the underlying mechanisms of this disease are not fully understood. Copy number variations (CNVs) are promising genetic variations to study because of their potential effects on cancer. Methodology/Principal Findings. Here we conducted a pilot study in which we systematically analyzed the association of CNVs in two lung cancer datasets: the Environment And Genetics in Lung cancer Etiology (EAGLE) and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial datasets. We used a preestablished association method to test the datasets separately and conducted a combined analysis to test the association accordance between the two datasets. Finally, we identified 167 risk SNP loci and 22 CNVs associated with lung cancer and linked them with recombination hotspots. Functional annotation and biological relevance analyses implied that some of our predicted risk loci were supported by other studies and might be potential candidate loci for lung cancer studies. Conclusions/Significance. Our results further emphasized the importance of copy number variations in cancer and might be a valuable complement to current genome-wide association studies on cancer. Xinlei Li, Xianfeng Chen, Guohong Hu, Yang Liu, Zhenguo Zhang, Ping Wang, You Zhou, Xianfu Yi, Jie Zhang, Yufei Zhu, Zejun Wei, Fei Yuan, Guoping Zhao, Jun Zhu, Landian Hu, and Xiangyin Kong Copyright © 2014 Xinlei Li et al. All rights reserved. Target Capture and Massive Sequencing of Genes Transcribed in Mytilus galloprovincialis Mon, 30 Jun 2014 11:33:59 +0000 Next generation sequencing (NGS) allows fast and massive production of both genome and transcriptome sequence datasets. As the genome of the Mediterranean mussel Mytilus galloprovincialis is not available at present, we have explored the possibility of reducing the whole genome sequencing efforts by using capture probes coupled with PCR amplification and high-throughput 454-sequencing to enrich selected genomic regions. The enrichment of DNA target sequences was validated by real-time PCR, whereas the efficacy of the applied strategy was evaluated by mapping the 454-output reads against reference transcript data already available for M. galloprovincialis and by measuring coverage, SNPs, number of de novo sequenced introns, and complete gene sequences. Focusing on a target size of nearly 1.5 Mbp, we obtained a target coverage which allowed the identification of more than 250 complete introns, 10,741 SNPs, and also complete gene sequences. This study confirms the transcriptome-based enrichment of gDNA regions as a good strategy to expand knowledge on specific subsets of genes also in nonmodel organisms. Umberto Rosani, Stefania Domeneghetti, Alberto Pallavicini, and Paola Venier Copyright © 2014 Umberto Rosani et al. All rights reserved. Identifying Hierarchical and Overlapping Protein Complexes Based on Essential Protein-Protein Interactions and “Seed-Expanding” Method Mon, 30 Jun 2014 09:43:33 +0000 Many evidences have demonstrated that protein complexes are overlapping and hierarchically organized in PPI networks. Meanwhile, the large size of PPI network wants complex detection methods have low time complexity. Up to now, few methods can identify overlapping and hierarchical protein complexes in a PPI network quickly. In this paper, a novel method, called MCSE, is proposed based on -module and “seed-expanding.” First, it chooses seeds as essential PPIs or edges with high edge clustering values. Then, it identifies protein complexes by expanding each seed to a -module. MCSE is suitable for large PPI networks because of its low time complexity. MCSE can identify overlapping protein complexes naturally because a protein can be visited by different seeds. MCSE uses the parameter _th to control the range of seed expanding and can detect a hierarchical organization of protein complexes by tuning the value of _th. Experimental results of S. cerevisiae show that this hierarchical organization is similar to that of known complexes in MIPS database. The experimental results also show that MCSE outperforms other previous competing algorithms, such as CPM, CMC, Core-Attachment, Dpclus, HC-PIN, MCL, and NFC, in terms of the functional enrichment and matching with known protein complexes. Jun Ren, Wei Zhou, and Jianxin Wang Copyright © 2014 Jun Ren et al. All rights reserved. Integrating In Silico Prediction Methods, Molecular Docking, and Molecular Dynamics Simulation to Predict the Impact of ALK Missense Mutations in Structural Perspective Thu, 26 Jun 2014 12:00:41 +0000 Over the past decade, advancements in next generation sequencing technology have placed personalized genomic medicine upon horizon. Understanding the likelihood of disease causing mutations in complex diseases as pathogenic or neutral remains as a major task and even impossible in the structural context because of its time consuming and expensive experiments. Among the various diseases causing mutations, single nucleotide polymorphisms (SNPs) play a vital role in defining individual’s susceptibility to disease and drug response. Understanding the genotype-phenotype relationship through SNPs is the first and most important step in drug research and development. Detailed understanding of the effect of SNPs on patient drug response is a key factor in the establishment of personalized medicine. In this paper, we represent a computational pipeline in anaplastic lymphoma kinase (ALK) for SNP-centred study by the application of in silico prediction methods, molecular docking, and molecular dynamics simulation approaches. Combination of computational methods provides a way in understanding the impact of deleterious mutations in altering the protein drug targets and eventually leading to variable patient’s drug response. We hope this rapid and cost effective pipeline will also serve as a bridge to connect the clinicians and in silico resources in tailoring treatments to the patients’ specific genotype. C. George Priya Doss, Chiranjib Chakraborty, Luonan Chen, and Hailong Zhu Copyright © 2014 C. George Priya Doss et al. All rights reserved. SSFinder: High Throughput CRISPR-Cas Target Sites Prediction Tool Thu, 26 Jun 2014 00:00:00 +0000 Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online. Santosh Kumar Upadhyay and Shailesh Sharma Copyright © 2014 Santosh Kumar Upadhyay and Shailesh Sharma. All rights reserved. Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering Wed, 25 Jun 2014 07:19:47 +0000 Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data. Daniel C. Ilut, Marie L. Nydam, and Matthew P. Hare Copyright © 2014 Daniel C. Ilut et al. All rights reserved. The Human Plasma Membrane Peripherome: Visualization and Analysis of Interactions Wed, 25 Jun 2014 07:02:03 +0000 A major part of membrane function is conducted by proteins, both integral and peripheral. Peripheral membrane proteins temporarily adhere to biological membranes, either to the lipid bilayer or to integral membrane proteins with noncovalent interactions. The aim of this study was to construct and analyze the interactions of the human plasma membrane peripheral proteins (peripherome hereinafter). For this purpose, we collected a dataset of peripheral proteins of the human plasma membrane. We also collected a dataset of experimentally verified interactions for these proteins. The interaction network created from this dataset has been visualized using Cytoscape. We grouped the proteins based on their subcellular location and clustered them using the MCL algorithm in order to detect functional modules. Moreover, functional and graph theory based analyses have been performed to assess biological features of the network. Interaction data with drug molecules show that ~10% of peripheral membrane proteins are targets for approved drugs, suggesting their potential implications in disease. In conclusion, we reveal novel features and properties regarding the protein-protein interaction network created by peripheral proteins of the human plasma membrane. Katerina C. Nastou, Georgios N. Tsaousis, Kimon E. Kremizas, Zoi I. Litou, and Stavros J. Hamodrakas Copyright © 2014 Katerina C. Nastou et al. All rights reserved. MPINet: Metabolite Pathway Identification via Coupling of Global Metabolite Network Structure and Metabolomic Profile Wed, 25 Jun 2014 06:50:21 +0000 High-throughput metabolomics technology, such as gas chromatography mass spectrometry, allows the analysis of hundreds of metabolites. Understanding that these metabolites dominate the study condition from biological pathway perspective is still a significant challenge. Pathway identification is an invaluable aid to address this issue and, thus, is urgently needed. In this study, we developed a network-based metabolite pathway identification method, MPINet, which considers the global importance of metabolites and the unique character of metabolomic profile. Through integrating the global metabolite functional network structure and the character of metabolomic profile, MPINet provides a more accurate metabolomic pathway analysis. This integrative strategy simultaneously captures the global nonequivalence of metabolites in a pathway and the bias from metabolomic experimental technology. We then applied MPINet to four different types of metabolite datasets. In the analysis of metastatic prostate cancer dataset, we demonstrated the effectiveness of MPINet. With the analysis of the two type 2 diabetes datasets, we show that MPINet has the potentiality for identifying novel pathways related with disease and is reliable for analyzing metabolomic data. Finally, we extensively applied MPINet to identify drug sensitivity related pathways. These results suggest MPINet’s effectiveness and reliability for analyzing metabolomic data across multiple different application fields. Feng Li, Yanjun Xu, Desi Shang, Haixiu Yang, Wei Liu, Junwei Han, Zeguo Sun, Qianlan Yao, Chunlong Zhang, Jiquan Ma, Fei Su, Li Feng, Xinrui Shi, Yunpeng Zhang, Jing Li, Qi Gu, Xia Li, and Chunquan Li Copyright © 2014 Feng Li et al. All rights reserved. Biomolecular Networks and Human Diseases Tue, 24 Jun 2014 08:06:49 +0000 FangXiang Wu, Luonan Chen, Jianxin Wang, and Reda Alhajj Copyright © 2014 FangXiang Wu et al. All rights reserved. miRSeq: A User-Friendly Standalone Toolkit for Sequencing Quality Evaluation and miRNA Profiling Tue, 24 Jun 2014 06:46:17 +0000 MicroRNAs (miRNAs) present diverse regulatory functions in a wide range of biological activities. Studies on miRNA functions generally depend on determining miRNA expression profiles between libraries by using a next-generation sequencing (NGS) platform. Currently, several online web services are developed to provide small RNA NGS data analysis. However, the submission of large amounts of NGS data, conversion of data format, and limited availability of species bring problems. In this study, we developed miRSeq to provide alternatives. To test the performance, we had small RNA NGS data from four species, including human, rat, fly, and nematode, analyzed with miRSeq. The alignments results indicate that miRSeq can precisely evaluate the sequencing quality of samples regarding percentage of self-ligation read, read length distribution, and read category. miRSeq is a user-friendly standalone toolkit featuring a graphical user interface (GUI). After a simple installation, users can easily operate miRSeq on a PC or laptop by using a mouse. Within minutes, miRSeq yields useful miRNA data, including miRNA expression profiles, 3′ end modification patterns, and isomiR forms. Moreover, miRSeq supports the analysis of up to 105 animal species, providing higher flexibility. Cheng-Tsung Pan, Kuo-Wang Tsai, Tzu-Min Hung, Wei-Chen Lin, Chao-Yu Pan, Hong-Ren Yu, and Sung-Chou Li Copyright © 2014 Cheng-Tsung Pan et al. All rights reserved. A Graphic Method for Identification of Novel Glioma Related Genes Mon, 23 Jun 2014 07:15:21 +0000 Glioma, as the most common and lethal intracranial tumor, is a serious disease that causes many deaths every year. Good comprehension of the mechanism underlying this disease is very helpful to design effective treatments. However, up to now, the knowledge of this disease is still limited. It is an important step to understand the mechanism underlying this disease by uncovering its related genes. In this study, a graphic method was proposed to identify novel glioma related genes based on known glioma related genes. A weighted graph was constructed according to the protein-protein interaction information retrieved from STRING and the well-known shortest path algorithm was employed to discover novel genes. The following analysis suggests that some of them are related to the biological process of glioma, proving that our method was effective in identifying novel glioma related genes. We hope that the proposed method would be applied to study other diseases and provide useful information to medical workers, thereby designing effective treatments of different diseases. Yu-Fei Gao, Yang Shu, Lei Yang, Yi-Chun He, Li-Peng Li, GuaHua Huang, Hai-Peng Li, and Yang Jiang Copyright © 2014 Yu-Fei Gao et al. All rights reserved. A Novel Dynamic Update Framework for Epileptic Seizure Prediction Sun, 22 Jun 2014 00:00:00 +0000 Epileptic seizure prediction is a difficult problem in clinical applications, and it has the potential to significantly improve the patients’ daily lives whose seizures cannot be controlled by either drugs or surgery. However, most current studies of epileptic seizure prediction focus on high sensitivity and low false-positive rate only and lack the flexibility for a variety of epileptic seizures and patients’ physical conditions. Therefore, a novel dynamic update framework for epileptic seizure prediction is proposed in this paper. In this framework, two basic sample pools are constructed and updated dynamically. Furthermore, the prediction model can be updated to be the most appropriate one for the prediction of seizures’ arrival. Mahalanobis distance is introduced in this part to solve the problem of side information, measuring the distance between two data sets. In addition, a multichannel feature extraction method based on Hilbert-Huang transform and extreme learning machine is utilized to extract the features of a patient’s preseizure state against the normal state. At last, a dynamic update epileptic seizure prediction system is built up. Simulations on Freiburg database show that the proposed system has a better performance than the one without update. The research of this paper is significantly helpful for clinical applications, especially for the exploitation of online portable devices. Min Han, Sunan Ge, Minghui Wang, Xiaojun Hong, and Jie Han Copyright © 2014 Min Han et al. All rights reserved. An Integrated Analysis of miRNA, lncRNA, and mRNA Expression Profiles Wed, 18 Jun 2014 06:38:20 +0000 Increasing amounts of evidence indicate that noncoding RNAs (ncRNAs) have important roles in various biological processes. Here, miRNA, lncRNA, and mRNA expression profiles were analyzed in human HepG2 and L02 cells using high-throughput technologies. An integrative method was developed to identify possible functional relationships between different RNA molecules. The dominant deregulated miRNAs were prone to be downregulated in tumor cells, and the most abnormal mRNAs and lncRNAs were always upregulated. However, the genome-wide analysis of differentially expressed RNA species did not show significant bias between up- and downregulated populations. miRNA-mRNA interaction was performed based on their regulatory relationships, and miRNA-lncRNA and mRNA-lncRNA interactions were thoroughly surveyed and identified based on their locational distributions and sequence correlations. Aberrantly expressed miRNAs were further analyzed based on their multiple isomiRs. IsomiR repertoires and expression patterns were varied across miRNA loci. Several specific miRNA loci showed differences between tumor and normal cells, especially with respect to abnormally expressed miRNA species. These findings suggest that isomiR repertoires and expression patterns might contribute to tumorigenesis through different biological roles. Systematic and integrative analysis of different RNA molecules with potential cross-talk may make great contributions to the unveiling of the complex mechanisms underlying tumorigenesis. Li Guo, Yang Zhao, Sheng Yang, Hui Zhang, and Feng Chen Copyright © 2014 Li Guo et al. All rights reserved. Ultrasonographic Fetal Growth Charts: An Informatic Approach by Quantitative Analysis of the Impact of Ethnicity on Diagnoses Based on a Preliminary Report on Salentinian Population Wed, 18 Jun 2014 00:00:00 +0000 Clear guidance on fetal growth assessment is important because of the strong links between growth restriction or macrosomia and adverse perinatal outcome in order to reduce associated morbidity and mortality. Fetal growth curves are extensively adopted to track fetal sizes from the early phases of pregnancy up to delivery. In the literature, a large variety of reference charts are reported but they are mostly up to five decades old. Furthermore, they do not address several variables and factors (e.g., ethnicity, foods, lifestyle, smoke, and physiological and pathological variables), which are very important for a correct evaluation of the fetal well-being. Therefore, currently adopted fetal growth charts are inadequate to support the melting pot of ethnic groups and lifestyles of our society. Customized fetal growth charts are needed to provide an accurate fetal assessment and to avoid unnecessary obstetric interventions at the time of delivery. Starting from the development of a growth chart purposely built for a specific population, in the paper, authors quantify and analyse the impact of the adoption of wrong growth charts on fetal diagnoses. These results come from a preliminary evaluation of a new open service developed to produce personalized growth charts for specific ethnicity, lifestyle, and other parameters. Andrea Tinelli, Mario Alessandro Bochicchio, Lucia Vaira, and Antonio Malvasi Copyright © 2014 Andrea Tinelli et al. All rights reserved. Conformational B-Cell Epitopes Prediction from Sequences Using Cost-Sensitive Ensemble Classifiers and Spatial Clustering Tue, 17 Jun 2014 07:10:22 +0000 B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use. Jian Zhang, Xiaowei Zhao, Pingping Sun, Bo Gao, and Zhiqiang Ma Copyright © 2014 Jian Zhang et al. All rights reserved. On Macroscopic Quantum Phenomena in Biomolecules and Cells: From Levinthal to Hopfield Mon, 16 Jun 2014 06:43:51 +0000 In the context of the macroscopic quantum phenomena of the second kind, we hereby seek for a solution-in-principle of the long standing problem of the polymer folding, which was considered by Levinthal as (semi)classically intractable. To illuminate it, we applied quantum-chemical and quantum decoherence approaches to conformational transitions. Our analyses imply the existence of novel macroscopic quantum biomolecular phenomena, with biomolecular chain folding in an open environment considered as a subtle interplay between energy and conformation eigenstates of this biomolecule, governed by quantum-chemical and quantum decoherence laws. On the other hand, within an open biological cell, a system of all identical (noninteracting and dynamically noncoupled) biomolecular proteins might be considered as corresponding spatial quantum ensemble of these identical biomolecular processors, providing spatially distributed quantum solution to a single corresponding biomolecular chain folding, whose density of conformational states might be represented as Hopfield-like quantum-holographic associative neural network too (providing an equivalent global quantum-informational alternative to standard molecular-biology local biochemical approach in biomolecules and cells and higher hierarchical levels of organism, as well). Dejan Raković, Miroljub Dugić, Jasmina Jeknić-Dugić, Milenko Plavšić, Stevo Jaćimovski, and Jovan Šetrajčić Copyright © 2014 Dejan Raković et al. All rights reserved. Big Data and Network Biology Sun, 15 Jun 2014 12:51:54 +0000 Shigehiko Kanaya, Md. Altaf-Ul-Amin, Samuel Kuria Kiboi, and Farit Mochamad Afendi Copyright © 2014 Shigehiko Kanaya et al. All rights reserved. Integrative Genomics and Computational Systems Medicine Sun, 15 Jun 2014 05:47:08 +0000 Jason E. McDermott, Yufei Huang, Bing Zhang, Hua Xu, and Zhongming Zhao Copyright © 2014 Jason E. McDermott et al. All rights reserved. Development of Dual Inhibitors against Alzheimer’s Disease Using Fragment-Based QSAR and Molecular Docking Thu, 12 Jun 2014 10:57:36 +0000 Alzheimer’s (AD) is the leading cause of dementia among elderly people. Considering the complex heterogeneous etiology of AD, there is an urgent need to develop multitargeted drugs for its suppression. -amyloid cleavage enzyme (BACE-1) and acetylcholinesterase (AChE), being important for AD progression, have been considered as promising drug targets. In this study, a robust and highly predictive group-based QSAR (GQSAR) model has been developed based on the descriptors calculated for the fragments of 20 1,4-dihydropyridine (DHP) derivatives. A large combinatorial library of DHP analogues was created, the activity of each compound was predicted, and the top compounds were analyzed using refined molecular docking. A detailed interaction analysis was carried out for the top two compounds (EDC and FDC) which showed significant binding affinity for BACE-1 and AChE. This study paves way for consideration of these lead molecules as prospective drugs for the effective dual inhibition of BACE-1 and AChE. The GQSAR model provides site-specific clues about the molecules where certain modifications can result in increased biological activity. This information could be of high value for design and development of multifunctional drugs for combating AD. Manisha Goyal, Jaspreet Kaur Dhanjal, Sukriti Goyal, Chetna Tyagi, Rabia Hamid, and Abhinav Grover Copyright © 2014 Manisha Goyal et al. All rights reserved. Large-Scale Investigation of Human TF-miRNA Relations Based on Coexpression Profiles Mon, 09 Jun 2014 00:00:00 +0000 Noncoding, endogenous microRNAs (miRNAs) are fairly well known for regulating gene expression rather than protein coding. Dysregulation of miRNA gene, either upregulated or downregulated, may lead to severe diseases or oncogenesis, especially when the miRNA disorder involves significant bioreactions or pathways. Thus, how miRNA genes are transcriptionally regulated has been highlighted as well as target recognition in recent years. In this study, a large-scale investigation of novel cis- and trans-elements was undertaken to further determine TF-miRNA regulatory relations, which are necessary to unravel the transcriptional regulation of miRNA genes. Based on miRNA and annotated gene expression profiles, the term “coTFBS” was introduced to detect common transcription factors and the corresponding binding sites within the promoter regions of each miRNA and its coexpressed annotated genes. The computational pipeline was successfully established to filter redundancy due to short sequence motifs for TFBS pattern search. Eventually, we identified more convinced TF-miRNA regulatory relations for 225 human miRNAs. This valuable information is helpful in understanding miRNA functions and provides knowledge to evaluate the therapeutic potential in clinical research. Once most expression profiles of miRNAs in the latest database are completed, TF candidates of more miRNAs can be explored by this filtering approach in the future. Chia-Hung Chien, Yi-Fan Chiang-Hsieh, Ann-Ping Tsou, Shun-Long Weng, Wen-Chi Chang, and Hsien-Da Huang Copyright © 2014 Chia-Hung Chien et al. All rights reserved. Computational Evidence of NAGNAG Alternative Splicing in Human Large Intergenic Noncoding RNA Thu, 05 Jun 2014 12:22:48 +0000 NAGNAG alternative splicing plays an essential role in biological processes and represents a highly adaptable system for posttranslational regulation of gene function. NAGNAG alternative splicing impacts a myriad of biological processes. Previous studies of NAGNAG largely focused on messenger RNA. To the best of our knowledge, this is the first study testing the hypothesis that NAGNAG alternative splicing is also operative in large intergenic noncoding RNA (lincRNA). The RNA-seq data sets from recent deep sequencing studies were queried to test our hypothesis. NAGNAG alternative splicing of human lincRNA was identified while querying two independent RNA-seq data sets. Within these datasets, 31 NAGNAG alternative splicing sites were identified in lincRNA. Notably, most exons of lincRNA containing NAGNAG acceptors were longer than those from protein-coding genes. Furthermore, presence of CAG coding appeared to participate in the splice site selection. Finally, expression of the isoforms of NAGNAG lincRNA exhibited tissue specificity. Together, this study improves our understanding of the NAGNAG alternative splicing in lincRNA. Xiaoyong Sun, Simon M. Lin, and Xiaoyan Yan Copyright © 2014 Xiaoyong Sun et al. All rights reserved. The Domain Landscape of Virus-Host Interactomes Wed, 04 Jun 2014 12:18:42 +0000 Viral infections result in millions of deaths in the world today. A thorough analysis of virus-host interactomes may reveal insights into viral infection and pathogenic strategies. In this study, we presented a landscape of virus-host interactomes based on protein domain interaction. Compared to the analysis at protein level, this domain-domain interactome provided a unique abstraction of protein-protein interactome. Through comparisons among DNA, RNA, and retrotranscribing viruses, we identified a core of human domains, that viruses used to hijack the cellular machinery and evade the immune system, which might be promising antiviral drug targets. We showed that viruses preferentially interacted with host hub and bottleneck domains, and the degree and betweenness centrality among three categories of viruses are significantly different. Further analysis at functional level highlighted that different viruses perturbed the host cellular molecular network by common and unique strategies. Most importantly, we creatively proposed a viral disease network among viral domains, human domains and the corresponding diseases, which uncovered several unknown virus-disease relationships that needed further verification. Overall, it is expected that the findings will help to deeply understand the viral infection and contribute to the development of antiviral therapy. Lu-Lu Zheng, Chunyan Li, Jie Ping, Yanhong Zhou, Yixue Li, and Pei Hao Copyright © 2014 Lu-Lu Zheng et al. All rights reserved. biomvRhsmm: Genomic Segmentation with Hidden Semi-Markov Model Tue, 03 Jun 2014 12:17:37 +0000 High-throughput technologies like tiling array and next-generation sequencing (NGS) generate continuous homogeneous segments or signal peaks in the genome that represent transcripts and transcript variants (transcript mapping and quantification), regions of deletion and amplification (copy number variation), or regions characterized by particular common features like chromatin state or DNA methylation ratio (epigenetic modifications). However, the volume and output of data produced by these technologies present challenges in analysis. Here, a hidden semi-Markov model (HSMM) is implemented and tailored to handle multiple genomic profile, to better facilitate genome annotation by assisting in the detection of transcripts, regulatory regions, and copy number variation by holistic microarray or NGS. With support for various data distributions, instead of limiting itself to one specific application, the proposed hidden semi-Markov model is designed to allow modeling options to accommodate different types of genomic data and to serve as a general segmentation engine. By incorporating genomic positions into the sojourn distribution of HSMM, with optional prior learning using annotation or previous studies, the modeling output is more biologically sensible. The proposed model has been compared with several other state-of-the-art segmentation models through simulation benchmarking, which shows that our efficient implementation achieves comparable or better sensitivity and specificity in genomic segmentation. Yang Du, Eduard Murani, Siriluck Ponsuksili, and Klaus Wimmers Copyright © 2014 Yang Du et al. All rights reserved. ABC and IFC: Modules Detection Method for PPI Network Mon, 02 Jun 2014 06:16:30 +0000 Many clustering algorithms are unable to solve the clustering problem of protein-protein interaction (PPI) networks effectively. A novel clustering model which combines the optimization mechanism of artificial bee colony (ABC) with the fuzzy membership matrix is proposed in this paper. The proposed ABC-IFC clustering model contains two parts: searching for the optimum cluster centers using ABC mechanism and forming clusters using intuitionistic fuzzy clustering (IFC) method. Firstly, the cluster centers are set randomly and the initial clustering results are obtained by using fuzzy membership matrix. Then the cluster centers are updated through different functions of bees in ABC algorithm; then the clustering result is obtained through IFC method based on the new optimized cluster center. To illustrate its performance, the ABC-IFC method is compared with the traditional fuzzy C-means clustering and IFC method. The experimental results on MIPS dataset show that the proposed ABC-IFC method not only gets improved in terms of several commonly used evaluation criteria such as precision, recall, and P value, but also obtains a better clustering result. Xiujuan Lei, Fang-Xiang Wu, Jianfang Tian, and Jie Zhao Copyright © 2014 Xiujuan Lei et al. All rights reserved. iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels Sun, 01 Jun 2014 06:50:38 +0000 Conotoxins are small disulfide-rich neurotoxic peptides, which can bind to ion channels with very high specificity and modulate their activities. Over the last few decades, conotoxins have been the drug candidates for treating chronic pain, epilepsy, spasticity, and cardiovascular diseases. According to their functions and targets, conotoxins are generally categorized into three types: potassium-channel type, sodium-channel type, and calcium-channel types. With the avalanche of peptide sequences generated in the postgenomic age, it is urgent and challenging to develop an automated method for rapidly and accurately identifying the types of conotoxins based on their sequence information alone. To address this challenge, a new predictor, called iCTX-Type, was developed by incorporating the dipeptide occurrence frequencies of a conotoxin sequence into a 400-D (dimensional) general pseudoamino acid composition, followed by the feature optimization procedure to reduce the sample representation from 400-D to 50-D vector. The overall success rate achieved by iCTX-Type via a rigorous cross-validation was over 91%, outperforming its counterpart (RBF network). Besides, iCTX-Type is so far the only predictor in this area with its web-server available, and hence is particularly useful for most experimental scientists to get their desired results without the need to follow the complicated mathematics involved. Hui Ding, En-Ze Deng, Lu-Feng Yuan, Li Liu, Hao Lin, Wei Chen, and Kuo-Chen Chou Copyright © 2014 Hui Ding et al. All rights reserved. Systems Biology in the Context of Big Data and Networks Tue, 27 May 2014 12:27:40 +0000 Science is going through two rapidly changing phenomena: one is the increasing capabilities of the computers and software tools from terabytes to petabytes and beyond, and the other is the advancement in high-throughput molecular biology producing piles of data related to genomes, transcriptomes, proteomes, metabolomes, interactomes, and so on. Biology has become a data intensive science and as a consequence biology and computer science have become complementary to each other bridged by other branches of science such as statistics, mathematics, physics, and chemistry. The combination of versatile knowledge has caused the advent of big-data biology, network biology, and other new branches of biology. Network biology for instance facilitates the system-level understanding of the cell or cellular components and subprocesses. It is often also referred to as systems biology. The purpose of this field is to understand organisms or cells as a whole at various levels of functions and mechanisms. Systems biology is now facing the challenges of analyzing big molecular biological data and huge biological networks. This review gives an overview of the progress in big-data biology, and data handling and also introduces some applications of networks and multivariate analysis in systems biology. Md. Altaf-Ul-Amin, Farit Mochamad Afendi, Samuel Kuria Kiboi, and Shigehiko Kanaya Copyright © 2014 Md. Altaf-Ul-Amin et al. All rights reserved. MultiRankSeq: Multiperspective Approach for RNAseq Differential Expression Analysis and Quality Control Tue, 27 May 2014 12:25:42 +0000 Background. After a decade of microarray technology dominating the field of high-throughput gene expression profiling, the introduction of RNAseq has revolutionized gene expression research. While RNAseq provides more abundant information than microarray, its analysis has proved considerably more complicated. To date, no consensus has been reached on the best approach for RNAseq-based differential expression analysis. Not surprisingly, different studies have drawn different conclusions as to the best approach to identify differentially expressed genes based upon their own criteria and scenarios considered. Furthermore, the lack of effective quality control may lead to misleading results interpretation and erroneous conclusions. To solve these aforementioned problems, we propose a simple yet safe and practical rank-sum approach for RNAseq-based differential gene expression analysis named MultiRankSeq. MultiRankSeq first performs quality control assessment. For data meeting the quality control criteria, MultiRankSeq compares the study groups using several of the most commonly applied analytical methods and combines their results to generate a new rank-sum interpretation. MultiRankSeq provides a unique analysis approach to RNAseq differential expression analysis. MultiRankSeq is written in R, and it is easily applicable. Detailed graphical and tabular analysis reports can be generated with a single command line. Yan Guo, Shilin Zhao, Fei Ye, Quanhu Sheng, and Yu Shyr Copyright © 2014 Yan Guo et al. All rights reserved. Gleditsia sinensis: Transcriptome Sequencing, Construction, and Application of Its Protein-Protein Interaction Network Tue, 27 May 2014 09:02:45 +0000 Gleditsia sinensis is a genus of deciduous tree in the family Caesalpinioideae, native to China, and is of great economic importance. However, despite its economic value, gene sequence information is strongly lacking. In the present study, transcriptome sequencing of G. sinensis was performed resulting in approximately 75.5 million clean reads assembled into 142155 unique transcripts generating 58583 unigenes. The average length of the unigenes was 900 bp, with an N50 of 549 bp. The obtained unigene sequences were then compared to four protein databases to include NCBI nonredundant protein (NRDB), Swiss-prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Cluster of Orthologous Groups (COG). Using BLAST procedure, 31385 unigenes (53.6%) were generated to have functional annotations. Additionally, sequence homologies between identified unigenes and genes of known species in a protein-protein interaction (PPI) network facilitated G. sinensis PPI network construction. Based on this network construction, new stress resistance genes (including cold, drought, and high salinity) were predicted. The present study is the first investigation of genome-wide gene expression in G. sinensis with the results providing a basis for future functional genomic studies relating to this species. Liucun Zhu, Ying Zhang, Wenna Guo, and Qiang Wang Copyright © 2014 Liucun Zhu et al. All rights reserved. An Association Study between Genetic Polymorphism in the Interleukin-6 Receptor Gene and Coronary Heart Disease Mon, 26 May 2014 11:25:43 +0000 The goal of our study is to test the association of IL6R rs7529229 polymorphism with CHD through a case-control study in Han Chinese population and a meta-analysis. Our result showed there is a lack of association between IL6R rs7529229 polymorphism and CHD on both genotype and allele levels in Han Chinese (). However, a meta-analysis among 11678 cases and 12861 controls showed that rs7529229-C allele was significantly associated with a decreased risk of CHD, especially in Europeans (, odds ratio = 0.93, 95% confidential interval = 0.89–0.96). Since there is significant difference among different populations, further studies are warranted to test the contribution of rs7529229 to CHD in other ethnic populations. Jiangqing Zhou, Xiaoliang Chen, Huadan Ye, Ping Peng, Yanna Ba, Xi Yang, Xiaoyan Huang, Yae Lu, Xin Jiang, Jiangfang Lian, and Shiwei Duan Copyright © 2014 Jiangqing Zhou et al. All rights reserved. enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning Mon, 26 May 2014 11:09:26 +0000 DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. Ruifeng Xu, Jiyun Zhou, Bin Liu, Lin Yao, Yulan He, Quan Zou, and Xiaolong Wang Copyright © 2014 Ruifeng Xu et al. All rights reserved. A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats Sun, 25 May 2014 08:46:24 +0000 Background. Next generation sequencing platforms can generate shorter reads, deeper coverage, and higher throughput than those of the Sanger sequencing. These short reads may be assembled de novo before some specific genome analyses. Up to now, the performances of assembling repeats of these current assemblers are very poor. Results. To improve this problem, we proposed a new genome assembly algorithm, named SWA, which has four properties: (1) assembling repeats and nonrepeats; (2) adopting a new overlapping extension strategy to extend each seed; (3) adopting sliding window to filter out the sequencing bias; and (4) proposing a compensational mechanism for low coverage datasets. SWA was evaluated and validated in both simulations and real sequencing datasets. The accuracy of assembling repeats and estimating the copy numbers is up to 99% and 100%, respectively. Finally, the extensive comparisons with other eight leading assemblers show that SWA outperformed others in terms of completeness and correctness of assembling repeats and nonrepeats. Conclusions. This paper proposed a new de novo genome assembly method for resolving complex repeats. SWA not only can detect where repeats or nonrepeats are but also can assemble them completely from NGS data, especially for assembling repeats. This is the advantage over other assemblers. Shuaibin Lian, Qingyan Li, Zhiming Dai, Qian Xiang, and Xianhua Dai Copyright © 2014 Shuaibin Lian et al. All rights reserved. iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach Thu, 22 May 2014 11:45:29 +0000 Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome’s translating mRNA will undergo a series of “product-forming” steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area. Wang-Ren Qiu, Xuan Xiao, Wei-Zhong Lin, and Kuo-Chen Chou Copyright © 2014 Wang-Ren Qiu et al. All rights reserved. Modelling Arterial Pressure Waveforms Using Gaussian Functions and Two-Stage Particle Swarm Optimizer Tue, 20 May 2014 11:09:31 +0000 Changes of arterial pressure waveform characteristics have been accepted as risk indicators of cardiovascular diseases. Waveform modelling using Gaussian functions has been used to decompose arterial pressure pulses into different numbers of subwaves and hence quantify waveform characteristics. However, the fitting accuracy and computation efficiency of current modelling approaches need to be improved. This study aimed to develop a novel two-stage particle swarm optimizer (TSPSO) to determine optimal parameters of Gaussian functions. The evaluation was performed on carotid and radial artery pressure waveforms (CAPW and RAPW) which were simultaneously recorded from twenty normal volunteers. The fitting accuracy and calculation efficiency of our TSPSO were compared with three published optimization methods: the Nelder-Mead, the modified PSO (MPSO), and the dynamic multiswarm particle swarm optimizer (DMS-PSO). The results showed that TSPSO achieved the best fitting accuracy with a mean absolute error (MAE) of 1.1% for CAPW and 1.0% for RAPW, in comparison with 4.2% and 4.1% for Nelder-Mead, 2.0% and 1.9% for MPSO, and 1.2% and 1.1% for DMS-PSO. In addition, to achieve target MAE of 2.0%, the computation time of TSPSO was only 1.5 s, which was only 20% and 30% of that for MPSO and DMS-PSO, respectively. Chengyu Liu, Tao Zhuang, Lina Zhao, Faliang Chang, Changchun Liu, Shoushui Wei, Qiqiang Li, and Dingchang Zheng Copyright © 2014 Chengyu Liu et al. All rights reserved. AmalgamScope: Merging Annotations Data across the Human Genome Tue, 20 May 2014 09:25:06 +0000 The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool. Georgia Tsiliki, Konstantinos Tsaramirsis, and Sophia Kossida Copyright © 2014 Georgia Tsiliki et al. All rights reserved. Bioinformatic Prediction of WSSV-Host Protein-Protein Interaction Mon, 19 May 2014 13:08:27 +0000 WSSV is one of the most dangerous pathogens in shrimp aquaculture. However, the molecular mechanism of how WSSV interacts with shrimp is still not very clear. In the present study, bioinformatic approaches were used to predict interactions between proteins from WSSV and shrimp. The genome data of WSSV (NC_003225.1) and the constructed transcriptome data of F. chinensis were used to screen potentially interacting proteins by searching in protein interaction databases, including STRING, Reactome, and DIP. Forty-four pairs of proteins were suggested to have interactions between WSSV and the shrimp. Gene ontology analysis revealed that 6 pairs of these interacting proteins were classified into “extracellular region” or “receptor complex” GO-terms. KEGG pathway analysis showed that they were involved in the “ECM-receptor interaction pathway.” In the 6 pairs of interacting proteins, an envelope protein called “collagen-like protein” (WSSV-CLP) encoded by an early virus gene “wsv001” in WSSV interacted with 6 deduced proteins from the shrimp, including three integrin alpha (ITGA), two integrin beta (ITGB), and one syndecan (SDC). Sequence analysis on WSSV-CLP, ITGA, ITGB, and SDC revealed that they possessed the sequence features for protein-protein interactions. This study might provide new insights into the interaction mechanisms between WSSV and shrimp. Zheng Sun, Shihao Li, Fuhua Li, and Jianhai Xiang Copyright © 2014 Zheng Sun et al. All rights reserved. A Priori Knowledge and Probability Density Based Segmentation Method for Medical CT Image Sequences Mon, 19 May 2014 06:00:45 +0000 This paper briefly introduces a novel segmentation strategy for CT images sequences. As first step of our strategy, we extract a priori intensity statistical information from object region which is manually segmented by radiologists. Then we define a search scope for object and calculate probability density for each pixel in the scope using a voting mechanism. Moreover, we generate an optimal initial level set contour based on a priori shape of object of previous slice. Finally the modified distance regularity level set method utilizes boundaries feature and probability density to conform final object. The main contributions of this paper are as follows: a priori knowledge is effectively used to guide the determination of objects and a modified distance regularization level set method can accurately extract actual contour of object in a short time. The proposed method is compared to other seven state-of-the-art medical image segmentation methods on abdominal CT image sequences datasets. The evaluated results demonstrate our method performs better and has the potential for segmentation in CT image sequences. Huiyan Jiang, Hanqing Tan, and Benqiang Yang Copyright © 2014 Huiyan Jiang et al. All rights reserved. High-Dimensional Additive Hazards Regression for Oral Squamous Cell Carcinoma Using Microarray Data: A Comparative Study Mon, 19 May 2014 05:42:13 +0000 Microarray technology results in high-dimensional and low-sample size data sets. Therefore, fitting sparse models is substantial because only a small number of influential genes can reliably be identified. A number of variable selection approaches have been proposed for high-dimensional time-to-event data based on Cox proportional hazards where censoring is present. The present study applied three sparse variable selection techniques of Lasso, smoothly clipped absolute deviation and the smooth integration of counting, and absolute deviation for gene expression survival time data using the additive risk model which is adopted when the absolute effects of multiple predictors on the hazard function are of interest. The performances of used techniques were evaluated by time dependent ROC curve and bootstrap .632+ prediction error curves. The selected genes by all methods were highly significant . The Lasso showed maximum median of area under ROC curve over time (0.95) and smoothly clipped absolute deviation showed the lowest prediction error (0.105). It was observed that the selected genes by all methods improved the prediction of purely clinical model indicating the valuable information containing in the microarray features. So it was concluded that used approaches can satisfactorily predict survival based on selected gene expression measurements. Omid Hamidi, Lily Tapak, Aarefeh Jafarzadeh Kohneloo, and Majid Sadeghifar Copyright © 2014 Omid Hamidi et al. All rights reserved. Identification of Influenza A/H7N9 Virus Infection-Related Human Genes Based on Shortest Paths in a Virus-Human Protein Interaction Network Sun, 18 May 2014 13:12:13 +0000 The recently emerging Influenza A/H7N9 virus is reported to be able to infect humans and cause mortality. However, viral and host factors associated with the infection are poorly understood. It is suggested by the “guilt by association” rule that interacting proteins share the same or similar functions and hence may be involved in the same pathway. In this study, we developed a computational method to identify Influenza A/H7N9 virus infection-related human genes based on this rule from the shortest paths in a virus-human protein interaction network. Finally, we screened out the most significant 20 human genes, which could be the potential infection related genes, providing guidelines for further experimental validation. Analysis of the 20 genes showed that they were enriched in protein binding, saccharide or polysaccharide metabolism related pathways and oxidative phosphorylation pathways. We also compared the results with those from human rhinovirus (HRV) and respiratory syncytial virus (RSV) by the same method. It was indicated that saccharide or polysaccharide metabolism related pathways might be especially associated with the H7N9 infection. These results could shed some light on the understanding of the virus infection mechanism, providing basis for future experimental biology studies and for the development of effective strategies for H7N9 clinical therapies. Ning Zhang, Min Jiang, Tao Huang, and Yu-Dong Cai Copyright © 2014 Ning Zhang et al. All rights reserved. Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks Sun, 18 May 2014 06:33:00 +0000 Identification of protein complexes from protein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived. The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures. Min Li, Weijie Chen, Jianxin Wang, Fang-Xiang Wu, and Yi Pan Copyright © 2014 Min Li et al. All rights reserved. A Network Biology Approach to Discover the Molecular Biomarker Associated with Hepatocellular Carcinoma Wed, 14 May 2014 09:12:48 +0000 In recent years, high throughput technologies such as microarray platform have provided a new avenue for hepatocellular carcinoma (HCC) investigation. Traditionally, gene sets enrichment analysis of survival related genes is commonly used to reveal the underlying functional mechanisms. However, this approach usually produces too many candidate genes and cannot discover detailed signaling transduction cascades, which greatly limits their clinical application such as biomarker development. In this study, we have proposed a network biology approach to discover novel biomarkers from multidimensional omics data. This approach effectively combines clinical survival data with topological characteristics of human protein interaction networks and patients expression profiling data. It can produce novel network based biomarkers together with biological understanding of molecular mechanism. We have analyzed eighty HCC expression profiling arrays and identified that extracellular matrix and programmed cell death are the main themes related to HCC progression. Compared with traditional enrichment analysis, this approach can provide concrete and testable hypothesis on functional mechanism. Furthermore, the identified subnetworks can potentially be used as suitable targets for therapeutic intervention in HCC. Liwei Zhuang, Yun Wu, Jiwu Han, Xiaohua Ling, Liguo Wang, Chengyan Zhu, and Yili Fu Copyright © 2014 Liwei Zhuang et al. All rights reserved. Breast Cancer Prognosis Risk Estimation Using Integrated Gene Expression and Clinical Data Wed, 14 May 2014 00:00:00 +0000 Background. Novel prognostic markers are needed so newly diagnosed breast cancer patients do not undergo any unnecessary therapy. Various microarray gene expression datasets based studies have generated gene signatures to predict the prognosis outcomes, while ignoring the large amount of information contained in established clinical markers. Nevertheless, small sample sizes in individual microarray datasets remain a bottleneck in generating robust gene signatures that show limited predictive power. The aim of this study is to achieve high classification accuracy for the good prognosis group and then achieve high classification accuracy for the poor prognosis group. Methods. We propose a novel algorithm called the IPRE (integrated prognosis risk estimation) algorithm. We used integrated microarray datasets from multiple studies to increase the sample sizes (∼2,700 samples). The IPRE algorithm consists of a virtual chromosome for the extraction of the prognostic gene signature that has 79 genes, and a multivariate logistic regression model that incorporates clinical data along with expression data to generate the risk score formula that accurately categorizes breast cancer patients into two prognosis groups. Results. The evaluation on two testing datasets showed that the IPRE algorithm achieved high classification accuracies of 82% and 87%, which was far greater than any existing algorithms. Ashish Saini, Jingyu Hou, and Wanlei Zhou Copyright © 2014 Ashish Saini et al. All rights reserved. Local Alignment Tool Based on Hadoop Framework and GPU Architecture Wed, 14 May 2014 00:00:00 +0000 With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance. Che-Lun Hung and Guan-Jie Hua Copyright © 2014 Che-Lun Hung and Guan-Jie Hua. All rights reserved. Meta-Analysis of Low Density Lipoprotein Receptor (LDLR) rs2228671 Polymorphism and Coronary Heart Disease Mon, 12 May 2014 14:08:07 +0000 Low density lipoprotein receptor (LDLR) can regulate cholesterol metabolism by removing the excess low density lipoprotein cholesterol (LDL-C) in blood. Since cholesterol metabolism is often disrupted in coronary heart disease (CHD), LDLR as a candidate gene of CHD has been intensively studied. The goal of our study is to evaluate the overall contribution of LDLR rs2228671 polymorphism to the risk of CHD by combining the genotyping data from multiple case-control studies. Our meta-analysis is involved with 8 case-control studies among 7588 cases and 9711 controls to test the association between LDLR rs2228671 polymorphism and CHD. In addition, we performed a case-control study of LDLR rs2228671 polymorphism with the risk of CHD in Chinese population. Our meta-analysis showed that rs2228671-T allele was significantly associated with a reduced risk of CHD (, odds ratio (OR) = 0.83, and 95% confidence interval (95% CI) = 0.75–0.92). However, rs2228671-T allele frequency was rare (1%) and was not associated with CHD in Han Chinese (), suggesting an ethnic difference of LDLR rs2228671 polymorphism. Meta-analysis has established rs2228671 as a protective factor of CHD in Europeans. The lack of association in Chinese reflects an ethnic difference of this genetic variant between Chinese and European populations. Huadan Ye, Qianlei Zhao, Yi Huang, Lingyan Wang, Haibo Liu, Chunming Wang, Dongjun Dai, Leiting Xu, Meng Ye, and Shiwei Duan Copyright © 2014 Huadan Ye et al. All rights reserved. Integration of Residue Attributes for Sequence Diversity Characterization of Terpenoid Enzymes Sun, 11 May 2014 13:35:58 +0000 Progress in the “omics” fields such as genomics, transcriptomics, proteomics, and metabolomics has engendered a need for innovative analytical techniques to derive meaningful information from the ever increasing molecular data. KNApSAcK motorcycle DB is a popular database for enzymes related to secondary metabolic pathways in plants. One of the challenges in analyses of protein sequence data in such repositories is the standard notation of sequences as strings of alphabetical characters. This has created lack of a natural underlying metric that eases amenability to computation. In view of this requirement, we applied novel integration of selected biochemical and physical attributes of amino acids derived from the amino acid index and quantified in numerical scale, to examine diversity of peptide sequences of terpenoid synthases accumulated in KNApSAcK motorcycle DB. We initially generated a reduced amino acid index table. This is a set of biochemical and physical properties obtained by random forest feature selection of important indices from the amino acid index. Principal component analysis was then applied for characterization of enzymes involved in synthesis of terpenoids. The variance explained was increased by incorporation of residue attributes for analyses. Nelson Kibinge, Shun Ikeda, Naoaki Ono, Md. Altaf-Ul-Amin, and Shigehiko Kanaya Copyright © 2014 Nelson Kibinge et al. All rights reserved. Topography Prediction of Helical Transmembrane Proteins by a New Modification of the Sliding Window Method Sun, 11 May 2014 00:00:00 +0000 Protein functions are specified by its three-dimensional structure, which is usually obtained by X-ray crystallography. Due to difficulty of handling membrane proteins experimentally to date the structure has only been determined for a very limited part of membrane proteins (<4%). Nevertheless, investigation of structure and functions of membrane proteins is important for medicine and pharmacology and, therefore, is of significant interest. Methods of computer modeling based on the data on the primary protein structure or the symbolic amino acid sequence have become an actual alternative to the experimental method of X-ray crystallography for investigating the structure of membrane proteins. Here we presented the results of the study of 35 transmembrane proteins, mainly GPCRs, using the novel method of cascade averaging of hydrophobicity function within the limits of a sliding window. The proposed method allowed revealing 139 transmembrane domains out of 140 (or 99.3%) identified by other methods. Also 236 transmembrane domain boundary positions out of 280 (or 84%) were predicted correctly by the proposed method with deviation from the predictions made by other methods that does not exceed the detection error of this method. Maria N. Simakova and Nikolai N. Simakov Copyright © 2014 Maria N. Simakova and Nikolai N. Simakov. All rights reserved. Network of microRNAs-mRNAs Interactions in Pancreatic Cancer Wed, 07 May 2014 13:18:55 +0000 Background. MicroRNAs are small RNA molecules that regulate the expression of certain genes through interaction with mRNA targets and are mainly involved in human cancer. This study was conducted to make the network of miRNAs-mRNAs interactions in pancreatic cancer as the fourth leading cause of cancer death. Methods. 56 miRNAs that were exclusively expressed and 1176 genes that were downregulated or silenced in pancreas cancer were extracted from beforehand investigations. MiRNA–mRNA interactions data analysis and related networks were explored using MAGIA tool and Cytoscape 3 software. Functional annotations of candidate genes in pancreatic cancer were identified by DAVID annotation tool. Results. This network is made of 217 nodes for mRNA, 15 nodes for miRNA, and 241 edges that show 241 regulations between 15 miRNAs and 217 target genes. The miR-24 was the most significantly powerful miRNA that regulated series of important genes. ACVR2B, GFRA1, and MTHFR were significant target genes were that downregulated. Conclusion. Although the collected previous data seems to be a treasure trove, there was no study simultaneous to analysis of miRNAs and mRNAs interaction. Network of miRNA-mRNA interactions will help to corroborate experimental remarks and could be used to refine miRNA target predictions for developing new therapeutic approaches. Elnaz Naderi, Mehdi Mostafaei, Akram Pourshams, and Ashraf Mohamadkhani Copyright © 2014 Elnaz Naderi et al. All rights reserved. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway Wed, 07 May 2014 12:20:42 +0000 Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. Fengfeng Wang, S. C. Cesar Wong, Lawrence W. C. Chan, William C. S. Cho, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2014 Fengfeng Wang et al. All rights reserved. Double-Bottom Chaotic Map Particle Swarm Optimization Based on Chi-Square Test to Determine Gene-Gene Interactions Wed, 07 May 2014 11:02:38 +0000 Gene-gene interaction studies focus on the investigation of the association between the single nucleotide polymorphisms (SNPs) of genes for disease susceptibility. Statistical methods are widely used to search for a good model of gene-gene interaction for disease analysis, and the previously determined models have successfully explained the effects between SNPs and diseases. However, the huge numbers of potential combinations of SNP genotypes limit the use of statistical methods for analysing high-order interaction, and finding an available high-order model of gene-gene interaction remains a challenge. In this study, an improved particle swarm optimization with double-bottom chaotic maps (DBM-PSO) was applied to assist statistical methods in the analysis of associated variations to disease susceptibility. A big data set was simulated using the published genotype frequencies of 26 SNPs amongst eight genes for breast cancer. Results showed that the proposed DBM-PSO successfully determined two- to six-order models of gene-gene interaction for the risk association with breast cancer (odds ratio > 1.0; value ). Analysis results supported that the proposed DBM-PSO can identify good models and provide higher chi-square values than conventional PSO. This study indicates that DBM-PSO is a robust and precise algorithm for determination of gene-gene interaction models for breast cancer. Cheng-Hong Yang, Yu-Da Lin, Li-Yeh Chuang, and Hsueh-Wei Chang Copyright © 2014 Cheng-Hong Yang et al. All rights reserved. Pathway-Driven Discovery of Rare Mutational Impact on Cancer Sun, 04 May 2014 12:48:09 +0000 Identifying driver mutation is important in understanding disease mechanism and future application of custom tailored therapeutic decision. Functional analysis of mutational impact usually focuses on the gene expression level of the mutated gene itself. However, complex regulatory network may cause differential gene expression among functional neighbors of the mutated gene. We suggest a new approach for discovering rare mutations that have real impact in the context of pathway; the philosophy of our method is iteratively combining rare mutations until no more mutations can be added under the condition that the combined mutational event can statistically discriminate pathway level mRNA expression between groups with and without mutational events. Breast cancer patients with somatic mutation and mRNA expression were analyzed by our approach. Our approach is shown to sensitively capture mutations that change pathway level mRNA expression, concurrently discovering important mutations previously reported in breast cancer such as TP53, PIK3CA, and RB1. In addition, out of 15,819 genes considered in breast cancer, our approach identified mutational events of 32 genes showing pathway level mRNA expression differences. TaeJin Ahn and Taesung Park Copyright © 2014 TaeJin Ahn and Taesung Park. All rights reserved. Mining Seasonal Marine Microbial Pattern with Greedy Heuristic Clustering and Symmetrical Nonnegative Matrix Factorization Sun, 27 Apr 2014 09:56:31 +0000 With the development of high-throughput and low-cost sequencing technology, a large number of marine microbial sequences were generated. The association patterns between marine microbial species and environment factors are hidden in these large amount sequences. Mining these association patterns is beneficial to exploit the marine resources. However, very few marine microbial association patterns are well investigated in this field. The present study reports the development of a novel method called HC-sNMF to detect the marine microbial association patterns. The results show that the four seasonal marine microbial association networks have characters of complex networks, the same environmental factor influences different species in the four seasons, and the correlative relationships are stronger between OTUs (taxa) than with environmental factors in the four seasons detecting community. Fei Liu, Shao-Wu Zhang, Ze-Gang Wei, Wei Chen, and Chen Zhou Copyright © 2014 Fei Liu et al. All rights reserved. OWL Reasoning Framework over Big Biological Knowledge Network Sun, 27 Apr 2014 00:00:00 +0000 Recently, huge amounts of data are generated in the domain of biology. Embedded with domain knowledge from different disciplines, the isolated biological resources are implicitly connected. Thus it has shaped a big network of versatile biological knowledge. Faced with such massive, disparate, and interlinked biological data, providing an efficient way to model, integrate, and analyze the big biological network becomes a challenge. In this paper, we present a general OWL (web ontology language) reasoning framework to study the implicit relationships among biological entities. A comprehensive biological ontology across traditional Chinese medicine (TCM) and western medicine (WM) is used to create a conceptual model for the biological network. Then corresponding biological data is integrated into a biological knowledge network as the data model. Based on the conceptual model and data model, a scalable OWL reasoning method is utilized to infer the potential associations between biological entities from the biological network. In our experiment, we focus on the association discovery between TCM and WM. The derived associations are quite useful for biologists to promote the development of novel drugs and TCM modernization. The experimental results show that the system achieves high efficiency, accuracy, scalability, and effectivity. Huajun Chen, Xi Chen, Peiqin Gu, Zhaohui Wu, and Tong Yu Copyright © 2014 Huajun Chen et al. All rights reserved. Novel Design Strategy for Checkpoint Kinase 2 Inhibitors Using Pharmacophore Modeling, Combinatorial Fusion, and Virtual Screening Wed, 23 Apr 2014 09:23:00 +0000 Checkpoint kinase 2 (Chk2) has a great effect on DNA-damage and plays an important role in response to DNA double-strand breaks and related lesions. In this study, we will concentrate on Chk2 and the purpose is to find the potential inhibitors by the pharmacophore hypotheses (PhModels), combinatorial fusion, and virtual screening techniques. Applying combinatorial fusion into PhModels and virtual screening techniques is a novel design strategy for drug design. We used combinatorial fusion to analyze the prediction results and then obtained the best correlation coefficient of the testing set () with the value 0.816 by combining the and prediction results. The potential inhibitors were selected from NCI database by screening according to + prediction results and molecular docking with CDOCKER docking program. Finally, the selected compounds have high interaction energy between a ligand and a receptor. Through these approaches, 23 potential inhibitors for Chk2 are retrieved for further study. Chun-Yuan Lin and Yen-Ling Wang Copyright © 2014 Chun-Yuan Lin and Yen-Ling Wang. All rights reserved. Syn-Lethality: An Integrative Knowledge Base of Synthetic Lethality towards Discovery of Selective Anticancer Therapies Tue, 22 Apr 2014 00:00:00 +0000 Synthetic lethality (SL) is a novel strategy for anticancer therapies, whereby mutations of two genes will kill a cell but mutation of a single gene will not. Therefore, a cancer-specific mutation combined with a drug-induced mutation, if they have SL interactions, will selectively kill cancer cells. While numerous SL interactions have been identified in yeast, only a few have been known in human. There is a pressing need to systematically discover and understand SL interactions specific to human cancer. In this paper, we present Syn-Lethality, the first integrative knowledge base of SL that is dedicated to human cancer. It integrates experimentally discovered and verified human SL gene pairs into a network, associated with annotations of gene function, pathway, and molecular mechanisms. It also includes yeast SL genes from high-throughput screenings which are mapped to orthologous human genes. Such an integrative knowledge base, organized as a relational database with user interface for searching and network visualization, will greatly expedite the discovery of novel anticancer drug targets based on synthetic lethality interactions. The database can be downloaded as a stand-alone Java application. Xue-juan Li, Shital K. Mishra, Min Wu, Fan Zhang, and Jie Zheng Copyright © 2014 Xue-juan Li et al. All rights reserved. Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem Wed, 16 Apr 2014 15:52:01 +0000 The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane’s compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors. Maan Haj Rachid, Qutaibah Malluhi, and Mohamed Abouelhoda Copyright © 2014 Maan Haj Rachid et al. All rights reserved. A Knowledge-Driven Approach to Extract Disease-Related Biomarkers from the Literature Wed, 16 Apr 2014 15:51:54 +0000 The biomedical literature represents a rich source of biomarker information. However, both the size of literature databases and their lack of standardization hamper the automatic exploitation of the information contained in these resources. Text mining approaches have proven to be useful for the exploitation of information contained in the scientific publications. Here, we show that a knowledge-driven text mining approach can exploit a large literature database to extract a dataset of biomarkers related to diseases covering all therapeutic areas. Our methodology takes advantage of the annotation of MEDLINE publications pertaining to biomarkers with MeSH terms, narrowing the search to specific publications and, therefore, minimizing the false positive ratio. It is based on a dictionary-based named entity recognition system and a relation extraction module. The application of this methodology resulted in the identification of 131,012 disease-biomarker associations between 2,803 genes and 2,751 diseases, and represents a valuable knowledge base for those interested in disease-related biomarkers. Additionally, we present a bibliometric analysis of the journals reporting biomarker related information during the last 40 years. À. Bravo, M. Cases, N. Queralt-Rosinach, F. Sanz, and L. I. Furlong Copyright © 2014 À. Bravo et al. All rights reserved. Integrated Analysis of Gene Network in Childhood Leukemia from Microarray and Pathway Databases Tue, 15 Apr 2014 14:07:22 +0000 Glucocorticoids (GCs) have been used as therapeutic agents for children with acute lymphoblastic leukaemia (ALL) for over 50 years. However, much remains to be understood about the molecular mechanism of GCs actions in ALL subtypes. In this study, we delineate differential responses of ALL subtypes, B- and T-ALL, to GCs treatment at systems level by identifying the differences among biological processes, molecular pathways, and interaction networks that emerge from the action of GCs through the use of a selected number of available bioinformatics methods and tools. We provide biological insight into GC-regulated genes, their related functions, and their networks specific to the ALL subtypes. We show that differentially expressed GC-regulated genes participate in distinct underlying biological processes affected by GCs in B-ALL and T-ALL with little to no overlap. These findings provide the opportunity towards identifying new therapeutic targets. Amphun Chaiboonchoe, Sandhya Samarasinghe, Don Kulasiri, and Kourosh Salehi-Ashtiani Copyright © 2014 Amphun Chaiboonchoe et al. All rights reserved. A Novel Algorithm for Detecting Protein Complexes with the Breadth First Search Thu, 10 Apr 2014 11:03:26 +0000 Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes. Xiwei Tang, Jianxin Wang, Min Li, Yiming He, and Yi Pan Copyright © 2014 Xiwei Tang et al. All rights reserved. Gene Expression Correlation for Cancer Diagnosis: A Pilot Study Wed, 09 Apr 2014 14:12:08 +0000 Poor prognosis for late-stage, high-grade, and recurrent cancers has been motivating cancer researchers to search for more efficient biomarkers to identify the onset of cancer. Recent advances in constructing and dynamically analyzing biomolecular networks for different types of cancer have provided a promising novel strategy to detect tumorigenesis and metastasis. The observation of different biomolecular networks associated with normal and cancerous states led us to hypothesize that correlations for gene expressions could serve as valid indicators of early cancer development. In this pilot study, we tested our hypothesis by examining whether the mRNA expressions of three randomly selected cancer-related genes PIK3C3, PIM3, and PTEN were correlated during cancer progression and the correlation coefficients could be used for cancer diagnosis. Strong correlations were observed between PIK3C3 and PIM3 in breast cancer, between PIK3C3 and PTEN in breast and ovary cancers, and between PIM3 and PTEN in breast, kidney, liver, and thyroid cancers during disease progression, implicating that the correlations for cancer network gene expressions could serve as a supplement to current clinical biomarkers, such as cancer antigens, for early cancer diagnosis. Binbing Ling, Lifeng Chen, Qiang Liu, and Jian Yang Copyright © 2014 Binbing Ling et al. All rights reserved. Computational Systems Biology Methods in Molecular Biology, Chemistry Biology, Molecular Biomedicine, and Biopharmacy Wed, 09 Apr 2014 13:17:43 +0000 Yudong Cai, Julio Vera González, Zengrong Liu, and Tao Huang Copyright © 2014 Yudong Cai et al. All rights reserved. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data Wed, 09 Apr 2014 12:35:01 +0000 A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. Nozomu Sakurai, Takeshi Ara, Mitsuo Enomoto, Takeshi Motegi, Yoshihiko Morishita, Atsushi Kurabayashi, Yoko Iijima, Yoshiyuki Ogata, Daisuke Nakajima, Hideyuki Suzuki, and Daisuke Shibata Copyright © 2014 Nozomu Sakurai et al. All rights reserved.