BioMed Research International: Bioinformatics The latest articles from Hindawi Publishing Corporation © 2015 , Hindawi Publishing Corporation . All rights reserved. Distributed Artificial Intelligence Models for Knowledge Discovery in Bioinformatics Wed, 25 Mar 2015 13:17:46 +0000 Juan M. Corchado, Isabelle Bichindaritz, and Juan F. De Paz Copyright © 2015 Juan M. Corchado et al. All rights reserved. A Linear-RBF Multikernel SVM to Classify Big Text Corpora Mon, 23 Mar 2015 08:13:54 +0000 Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. R. Romero, E. L. Iglesias, and L. Borrajo Copyright © 2015 R. Romero et al. All rights reserved. A Network Flow Approach to Predict Protein Targets and Flavonoid Backbones to Treat Respiratory Syncytial Virus Infection Mon, 23 Mar 2015 06:08:56 +0000 Background. Respiratory syncytial virus (RSV) infection is the major cause of respiratory disease in lower respiratory tract in infants and young children. Attempts to develop effective vaccines or pharmacological treatments to inhibit RSV infection without undesired effects on human health have been unsuccessful. However, RSV infection has been reported to be affected by flavonoids. The mechanisms underlying viral inhibition induced by these compounds are largely unknown, making the development of new drugs difficult. Methods. To understand the mechanisms induced by flavonoids to inhibit RSV infection, a systems pharmacology-based study was performed using microarray data from primary culture of human bronchial cells infected by RSV, together with compound-proteomic interaction data available for Homo sapiens. Results. After an initial evaluation of 26 flavonoids, 5 compounds (resveratrol, quercetin, myricetin, apigenin, and tricetin) were identified through topological analysis of a major chemical-protein (CP) and protein-protein interacting (PPI) network. In a nonclustered form, these flavonoids regulate directly the activity of two protein bottlenecks involved in inflammation and apoptosis. Conclusions. Our findings may potentially help uncovering mechanisms of action of early RSV infection and provide chemical backbones and their protein targets in the difficult quest to develop new effective drugs. José Eduardo Vargas, Renato Puga, Joice de Faria Poloni, Luis Fernando Saraiva Macedo Timmers, Barbara Nery Porto, Osmar Norberto de Souza, Diego Bonatto, Paulo Márcio Condessa Pitrez, and Renato Tetelbom Stein Copyright © 2015 José Eduardo Vargas et al. All rights reserved. Identification of Novel Thyroid Cancer-Related Genes and Chemicals Using Shortest Path Algorithm Sun, 22 Mar 2015 11:26:51 +0000 Thyroid cancer is a typical endocrine malignancy. In the past three decades, the continued growth of its incidence has made it urgent to design effective treatments to treat this disease. To this end, it is necessary to uncover the mechanism underlying this disease. Identification of thyroid cancer-related genes and chemicals is helpful to understand the mechanism of thyroid cancer. In this study, we generalized some previous methods to discover both disease genes and chemicals. The method was based on shortest path algorithm and applied to discover novel thyroid cancer-related genes and chemicals. The analysis of the final obtained genes and chemicals suggests that some of them are crucial to the formation and development of thyroid cancer. It is indicated that the proposed method is effective for the discovery of novel disease genes and chemicals. Yang Jiang, Peiwei Zhang, Li-Peng Li, Yi-Chun He, Ru-jian Gao, and Yu-Fei Gao Copyright © 2015 Yang Jiang et al. All rights reserved. A Meta-Analysis Strategy for Gene Prioritization Using Gene Expression, SNP Genotype, and eQTL Data Sun, 22 Mar 2015 10:56:57 +0000 In order to understand disease pathogenesis, improve medical diagnosis, or discover effective drug targets, it is important to identify significant genes deeply involved in human disease. For this purpose, many earlier approaches attempted to prioritize candidate genes using gene expression profiles or SNP genotype data, but they often suffer from producing many false-positive results. To address this issue, in this paper, we propose a meta-analysis strategy for gene prioritization that employs three different genetic resources—gene expression data, single nucleotide polymorphism (SNP) genotype data, and expression quantitative trait loci (eQTL) data—in an integrative manner. For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources. This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes. Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources. Jingmin Che and Miyoung Shin Copyright © 2015 Jingmin Che and Miyoung Shin. All rights reserved. Analysis of Environmental Stress Factors Using an Artificial Growth System and Plant Fitness Optimization Sun, 22 Mar 2015 10:35:00 +0000 The environment promotes evolution. Evolutionary processes represent environmental adaptations over long time scales; evolution of crop genomes is not inducible within the relatively short time span of a human generation. Extreme environmental conditions can accelerate evolution, but such conditions are often stress inducing and disruptive. Artificial growth systems can be used to induce and select genomic variation by changing external environmental conditions, thus, accelerating evolution. By using cloud computing and big-data analysis, we analyzed environmental stress factors for Pleurotus ostreatus by assessing, evaluating, and predicting information of the growth environment. Through the indexing of environmental stress, the growth environment can be precisely controlled and developed into a technology for improving crop quality and production. Meonghun Lee and Hyun Yoe Copyright © 2015 Meonghun Lee and Hyun Yoe. All rights reserved. Agent-Based Spatiotemporal Simulation of Biomolecular Systems within the Open Source MASON Framework Sun, 22 Mar 2015 10:04:24 +0000 Agent-based modelling is being used to represent biological systems with increasing frequency and success. This paper presents the implementation of a new tool for biomolecular reaction modelling in the open source Multiagent Simulator of Neighborhoods framework. The rationale behind this new tool is the necessity to describe interactions at the molecular level to be able to grasp emergent and meaningful biological behaviour. We are particularly interested in characterising and quantifying the various effects that facilitate biocatalysis. Enzymes may display high specificity for their substrates and this information is crucial to the engineering and optimisation of bioprocesses. Simulation results demonstrate that molecule distributions, reaction rate parameters, and structural parameters can be adjusted separately in the simulation allowing a comprehensive study of individual effects in the context of realistic cell environments. While higher percentage of collisions with occurrence of reaction increases the affinity of the enzyme to the substrate, a faster reaction (i.e., turnover number) leads to a smaller number of time steps. Slower diffusion rates and molecular crowding (physical hurdles) decrease the collision rate of reactants, hence reducing the reaction rate, as expected. Also, the random distribution of molecules affects the results significantly. Gael Pérez-Rodríguez, Martín Pérez-Pérez, Daniel Glez-Peña, Florentino Fdez-Riverola, Nuno F. Azevedo, and Anália Lourenço Copyright © 2015 Gael Pérez-Rodríguez et al. All rights reserved. Using the eServices Platform for Detecting Behavior Patterns Deviation in the Elderly Assisted Living: A Case Study Sun, 22 Mar 2015 09:33:56 +0000 World’s aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly’s behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly’s interaction in a service provider platform, called eServices—Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. This pattern recognition will further activate the adequate response. With the system evolution, it will learn to predict potential danger situations for a specified user, acting preventively and ensuring the elderly’s safety and well-being. As the eServices platform is still in development, synthetic data, based on real data sample and empiric knowledge, is being used to populate the initial dataset. The presented work is a proof of concept of knowledge extraction using the eServices platform information. Regardless of not using real data, this work proves to be an asset, achieving a good performance in preventing alert situations. Isabel Marcelino, David Lopes, Michael Reis, Fernando Silva, Rosalía Laza, and António Pereira Copyright © 2015 Isabel Marcelino et al. All rights reserved. A Distributed Multiagent System Architecture for Body Area Networks Applied to Healthcare Monitoring Sun, 22 Mar 2015 09:23:02 +0000 In the last years the area of health monitoring has grown significantly, attracting the attention of both academia and commercial sectors. At the same time, the availability of new biomedical sensors and suitable network protocols has led to the appearance of a new generation of wireless sensor networks, the so-called wireless body area networks. Nowadays, these networks are routinely used for continuous monitoring of vital parameters, movement, and the surrounding environment of people, but the large volume of data generated in different locations represents a major obstacle for the appropriate design, development, and deployment of more elaborated intelligent systems. In this context, we present an open and distributed architecture based on a multiagent system for recognizing human movements, identifying human postures, and detecting harmful activities. The proposed system evolved from a single node for fall detection to a multisensor hardware solution capable of identifying unhampered falls and analyzing the users’ movement. The experiments carried out contemplate two different scenarios and demonstrate the accuracy of our proposal as a real distributed movement monitoring and accident detection system. Moreover, we also characterize its performance, enabling future analyses and comparisons with similar approaches. Filipe Felisberto, Rosalía Laza, Florentino Fdez-Riverola, and António Pereira Copyright © 2015 Filipe Felisberto et al. All rights reserved. RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases Sun, 22 Mar 2015 09:18:34 +0000 High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available. Joel Perdiz Arrais and José Luís Oliveira Copyright © 2015 Joel Perdiz Arrais and José Luís Oliveira. All rights reserved. Probabilistic Inference of Biological Networks via Data Integration Sun, 22 Mar 2015 09:02:27 +0000 There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%. Mark F. Rogers, Colin Campbell, and Yiming Ying Copyright © 2015 Mark F. Rogers et al. All rights reserved. aCGH-MAS: Analysis of aCGH by means of Multiagent System Sun, 22 Mar 2015 08:55:39 +0000 There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. Juan F. De Paz, Rocío Benito, Javier Bajo, Ana Eugenia Rodríguez, and María Abáigar Copyright © 2015 Juan F. De Paz et al. All rights reserved. Gene Knockout Identification Using an Extension of Bees Hill Flux Balance Analysis Sun, 22 Mar 2015 08:47:09 +0000 Microbial strain optimisation for the overproduction of a desired phenotype has been a popular topic in recent years. Gene knockout is a genetic engineering technique that can modify the metabolism of microbial cells to obtain desirable phenotypes. Optimisation algorithms have been developed to identify the effects of gene knockout. However, the complexities of metabolic networks have made the process of identifying the effects of genetic modification on desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to a combinatorial problem in obtaining optimal gene knockout. The computational time increases exponentially as the size of the problem increases. This work reports an extension of Bees Hill Flux Balance Analysis (BHFBA) to identify optimal gene knockouts to maximise the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by integrating OptKnock into BHFBA for validating the results automatically. The results show that the extension of BHFBA is suitable, reliable, and applicable in predicting gene knockout. Through several experiments conducted on Escherichia coli, Bacillus subtilis, and Clostridium thermocellum as model organisms, extension of BHFBA has shown better performance in terms of computational time, stability, growth rate, and production yield of desired phenotypes. Yee Wen Choon, Mohd Saberi Mohamad, Safaai Deris, Chuii Khim Chong, Sigeru Omatu, and Juan Manuel Corchado Copyright © 2015 Yee Wen Choon et al. All rights reserved. Modelling the Longevity of Dental Restorations by means of a CBR System Thu, 19 Mar 2015 14:23:43 +0000 The lifespan of dental restorations is limited. Longevity depends on the material used and the different characteristics of the dental piece. However, it is not always the case that the best and longest lasting material is used since patients may prefer different treatments according to how noticeable the material is. Over the last 100 years, the most commonly used material has been silver amalgam, which, while very durable, is somewhat aesthetically displeasing. Our study is based on the collection of data from the charts, notes, and radiographic information of restorative treatments performed by Dr. Vera in 1993, the analysis of the information by computer artificial intelligence to determine the most appropriate restoration, and the monitoring of the evolution of the dental restoration. The data will be treated confidentially according to the Organic Law 15/1999 on 13 December on the Protection of Personal Data. This paper also presents a clustering technique capable of identifying the most significant cases with which to instantiate the case-base. In order to classify the cases, a mixture of experts is used which incorporates a Bayesian network and a multilayer perceptron; the combination of both classifiers is performed with a neural network. Ignacio J. Aliaga, Vicente Vera, Juan F. De Paz, Alvaro E. García, and Mohd Saberi Mohamad Copyright © 2015 Ignacio J. Aliaga et al. All rights reserved. Bladder Carcinoma Data with Clinical Risk Factors and Molecular Markers: A Cluster Analysis Thu, 19 Mar 2015 13:41:50 +0000 Bladder cancer occurs in the epithelial lining of the urinary bladder and is amongst the most common types of cancer in humans, killing thousands of people a year. This paper is based on the hypothesis that the use of clinical and histopathological data together with information about the concentration of various molecular markers in patients is useful for the prediction of outcomes and the design of treatments of nonmuscle invasive bladder carcinoma (NMIBC). A population of 45 patients with a new diagnosis of NMIBC was selected. Patients with benign prostatic hyperplasia (BPH), muscle invasive bladder carcinoma (MIBC), carcinoma in situ (CIS), and NMIBC recurrent tumors were not included due to their different clinical behavior. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT) and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR). A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior. Enrique Redondo-Gonzalez, Leandro Nunes de Castro, Jesús Moreno-Sierra, María Luisa Maestro de las Casas, Vicente Vera-Gonzalez, Daniel Gomes Ferrari, and Juan Manuel Corchado Copyright © 2015 Enrique Redondo-Gonzalez et al. All rights reserved. The Plant Growth-Promoting Bacteria Azospirillum amazonense: Genomic Versatility and Phytohormone Pathway Thu, 19 Mar 2015 12:07:19 +0000 The rhizosphere bacterium Azospirillum amazonense associates with plant roots to promote plant growth. Variation in replicon numbers and rearrangements is common among Azospirillum strains, and characterization of these naturally occurring differences can improve our understanding of genome evolution. We performed an in silico comparative genomic analysis to understand the genomic plasticity of A. amazonense. The number of A. amazonense-specific coding sequences was similar when compared with the six closely related bacteria regarding belonging or not to the Azospirillum genus. Our results suggest that the versatile gene repertoire found in A. amazonense genome could have been acquired from distantly related bacteria from horizontal transfer. Furthermore, the identification of coding sequence related to phytohormone production, such as flavin-monooxygenase and aldehyde oxidase, is likely to represent the tryptophan-dependent TAM pathway for auxin production in this bacterium. Moreover, the presence of the coding sequence for nitrilase indicates the presence of the alternative route that uses IAN as an intermediate for auxin synthesis, but it remains to be established whether the IAN pathway is the Trp-independent route. Future investigations are necessary to support the hypothesis that its genomic structure has evolved to meet the requirement for adaptation to the rhizosphere and interaction with host plants. Ricardo Cecagno, Tiago Ebert Fritsch, and Irene Silveira Schrank Copyright © 2015 Ricardo Cecagno et al. All rights reserved. Identification of Subtype Specific miRNA-mRNA Functional Regulatory Modules in Matched miRNA-mRNA Expression Data: Multiple Myeloma as a Case Thu, 19 Mar 2015 11:44:23 +0000 Identification of miRNA-mRNA modules is an important step to elucidate their combinatorial effect on the pathogenesis and mechanisms underlying complex diseases. Current identification methods primarily are based upon miRNA-target information and matched miRNA and mRNA expression profiles. However, for heterogeneous diseases, the miRNA-mRNA regulatory mechanisms may differ between subtypes, leading to differences in clinical behavior. In order to explore the pathogenesis of each subtype, it is important to identify subtype specific miRNA-mRNA modules. In this study, we integrated the Ping-Pong algorithm and multiobjective genetic algorithm to identify subtype specific miRNA-mRNA functional regulatory modules (MFRMs) through integrative analysis of three biological data sets: GO biological processes, miRNA target information, and matched miRNA and mRNA expression data. We applied our method on a heterogeneous disease, multiple myeloma (MM), to identify MM subtype specific MFRMs. The constructed miRNA-mRNA regulatory networks provide modular outlook at subtype specific miRNA-mRNA interactions. Furthermore, clustering analysis demonstrated that heterogeneous MFRMs were able to separate corresponding MM subtypes. These subtype specific MFRMs may aid in the further elucidation of the pathogenesis of each subtype and may serve to guide MM subtype diagnosis and treatment. Yunpeng Zhang, Wei Liu, Yanjun Xu, Chunquan Li, Yingying Wang, Haixiu Yang, Chunlong Zhang, Fei Su, Yixue Li, and Xia Li Copyright © 2015 Yunpeng Zhang et al. All rights reserved. Shaped Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Drosophila Embryo Thu, 19 Mar 2015 10:25:53 +0000 In recent years, with the development of automated microscopy technologies, the volume and complexity of image data on gene expression have increased tremendously. The only way to analyze quantitatively and comprehensively such biological data is by developing and applying new sophisticated mathematical approaches. Here, we present extensions of 2D singular spectrum analysis (2D-SSA) for application to 2D and 3D datasets of embryo images. These extensions, circular and shaped 2D-SSA, are applied to gene expression in the nuclear layer just under the surface of the Drosophila (fruit fly) embryo. We consider the commonly used cylindrical projection of the ellipsoidal Drosophila embryo. We demonstrate how circular and shaped versions of 2D-SSA help to decompose expression data into identifiable components (such as trend and noise), as well as separating signals from different genes. Detection and improvement of under- and overcorrection in multichannel imaging is addressed, as well as the extraction and analysis of 3D features in 3D gene expression patterns. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Effect of Celastrol on Growth Inhibition of Prostate Cancer Cells through the Regulation of hERG Channel In Vitro Thu, 19 Mar 2015 10:23:31 +0000 Objective. To explore the antiprostate cancer effects of Celastrol on prostate cancer cells’ proliferation, apoptosis, and cell cycle distribution, as well as the correlation to the regulation of hERG. Methods. DU145 cells were treated with various concentrations of Celastrol (0.25–16.0 μmol/L) for 0–72 hours. MTT assay was used to evaluate the inhibition effect of Celastrol on the growth of DU145 cells. Cell apoptosis was detected through both Annexin-V FITC/PI double-labeled cytometry and Hoechst 33258. Cell cycle regulation was examined by a propidium iodide method. Western blot and RT-PCR technologies were applied to assess the expression level of hERG in DU145 cells. Results. Celastrol presented striking growth inhibition and apoptosis induction potency on DU145 cells in vitro in a time- and dose-dependent manner. The IC50 value of Celastrol for 24 hours was 2.349 ± 0.213 μmol/L. Moreover, Celastrol induced DU145 cell apoptosis in a cell cycle-dependent manner, which means Celastrol could arrest DU145 cells in G0/G1 phase; accordingly, cells in S phase decreased gradually and no obvious changes were found in G2/M phase cells. Through transmission electron microscope, apoptotic bodies containing nuclear fragments were found in Celastrol-treated DU145 cells. Overexpression of hERG channel was found in DU145 cells, while Celastrol could downregulate it at both protein and mRNA level in a dose-dependent manner (). Conclusions. Celastrol exhibits its antiprostate cancer effects partially through the downregulation of the expression level of hERG channel in DU145 cells, suggesting that Celastrol may be a potential agent against prostate cancer with a mechanism of blocking the hERG channel. Nan Ji, Jinjun Li, Zexiong Wei, Fanhu Kong, Hongyan Jin, Xiaoya Chen, Yan Li, and Youping Deng Copyright © 2015 Nan Ji et al. All rights reserved. The Construction of Common and Specific Significance Subnetworks of Alzheimer’s Disease from Multiple Brain Regions Thu, 19 Mar 2015 09:58:56 +0000 Alzheimer’s disease (AD) is a progressively and fatally neurodegenerative disorder and leads to irreversibly cognitive and memorial damage in different brain regions. The identification and analysis of the dysregulated pathways and subnetworks among affected brain regions will provide deep insights for the pathogenetic mechanism of AD. In this paper, commonly and specifically significant subnetworks were identified from six AD brain regions. Protein-protein interaction (PPI) data were integrated to add molecular biological information to construct the functional modules of six AD brain regions by Heinz algorithm. Then, the simulated annealing algorithm based on edge weight is applied to predicting and optimizing the maximal scoring networks for common and specific genes, respectively, which can remove the weak interactions and add the prediction of strong interactions to increase the accuracy of the networks. The identified common subnetworks showed that inflammation of the brain nerves is one of the critical factors of AD and calcium imbalance may be a link among several causative factors in AD pathogenesis. In addition, the extracted specific subnetworks for each brain region revealed many biologically functional mechanisms to understand AD pathogenesis. Wei Kong, Xiaoyang Mou, Na Zhang, Weiming Zeng, Shasha Li, and Yang Yang Copyright © 2015 Wei Kong et al. All rights reserved. The Expression and Distributions of ANP32A in the Developing Brain Thu, 19 Mar 2015 09:34:47 +0000 Acidic (leucine-rich) nuclear phosphoprotein 32 family, member A (ANP32A), has multiple functions involved in neuritogenesis, transcriptional regulation, and apoptosis. However, whether ANP32A has an effect on the mammalian developing brain is still in question. In this study, it was shown that brain was the organ that expressed the most abundant ANP32A by human multiple tissue expression (MTE) array. The distribution of ANP32A in the different adult brain areas was diverse dramatically, with high expression in cerebellum, temporal lobe, and cerebral cortex and with low expression in pons, medulla oblongata, and spinal cord. The expression of ANP32A was higher in the adult brain than in the fetal brain of not only humans but also mice in a time-dependent manner. ANP32A signals were dispersed accordantly in embryonic mouse brain. However, ANP32A was abundant in the granular layer of the cerebellum and the cerebral cortex when the mice were growing up, as well as in the Purkinje cells of the cerebellum. The variation of expression levels and distribution of ANP32A in the developing brain would imply that ANP32A may play an important role in mammalian brain development, especially in the differentiation and function of neurons in the cerebellum and the cerebral cortex. Shanshan Wang, Yunliang Wang, Qingshan Lu, Xinshan Liu, Fuyu Wang, Xiaodong Ma, Chunping Cui, Chenghe Shi, Jinfeng Li, and Dajin Zhang Copyright © 2015 Shanshan Wang et al. All rights reserved. Protecting Intestinal Epithelial Cell Number 6 against Fission Neutron Irradiation through NF-κB Signaling Pathway Thu, 19 Mar 2015 09:13:32 +0000 The purpose of this paper is to explore the change of NF-κB signaling pathway in intestinal epithelial cell induced by fission neutron irradiation and the influence of the PI3K/Akt pathway inhibitor LY294002. Three groups of IEC-6 cell lines were given: control group, neutron irradiation of 4Gy group, and neutron irradiation of 4Gy with LY294002 treatment group. Except the control group, the other groups were irradiated by neutron of 4Gy. LY294002 was given before 24 hours of neutron irradiation. At 6 h and 24 h after neutron irradiation, the morphologic changes, proliferation ability, apoptosis, and necrosis rates of the IEC-6 cell lines were assayed and the changes of NF-κB and PI3K/Akt pathway were detected. At 6 h and 24 h after neutron irradiation of 4Gy, the proliferation ability of the IEC-6 cells decreased and lots of apoptotic and necrotic cells were found. The injuries in LY294002 treatment and neutron irradiation group were more serious than those in control and neutron irradiation groups. The results suggest that IEC-6 cells were obviously damaged and induced serious apoptosis and necrosis by neutron irradiation of 4Gy; the NF-κB signaling pathway in IEC-6 was activated by neutron irradiation which could protect IEC-6 against injury by neutron irradiation; LY294002 could inhibit the activity of IEC-6 cells. Gong-Min Chang, Ya-Bing Gao, Shui-Ming Wang, Xin-Ping Xu, Li Zhao, Jing Zhang, Jin-Feng Li, Yun-Liang Wang, and Rui-Yun Peng Copyright © 2015 Gong-Min Chang et al. All rights reserved. Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms Tue, 17 Mar 2015 13:03:24 +0000 Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues’s method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues’s method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis. Chien-Hung Huang, Huai-Shun Peng, and Ka-Lok Ng Copyright © 2015 Chien-Hung Huang et al. All rights reserved. Prediction of Drug Indications Based on Chemical Interactions and Chemical Similarities Mon, 02 Mar 2015 09:49:30 +0000 Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs. Guohua Huang, Yin Lu, Changhong Lu, Mingyue Zheng, and Yu-Dong Cai Copyright © 2015 Guohua Huang et al. All rights reserved. Predicting the Functions of Long Noncoding RNAs Using RNA-Seq Based on Bayesian Network Sat, 28 Feb 2015 07:50:45 +0000 Long noncoding RNAs (lncRNAs) have been shown to play key roles in various biological processes. However, functions of most lncRNAs are poorly characterized. Here, we represent a framework to predict functions of lncRNAs through construction of a regulatory network between lncRNAs and protein-coding genes. Using RNA-seq data, the transcript profiles of lncRNAs and protein-coding genes are constructed. Using the Bayesian network method, a regulatory network, which implies dependency relations between lncRNAs and protein-coding genes, was built. In combining protein interaction network, highly connected coding genes linked by a given lncRNA were subsequently used to predict functions of the lncRNA through functional enrichment. Application of our method to prostate RNA-seq data showed that 762 lncRNAs in the constructed regulatory network were assigned functions. We found that lncRNAs are involved in diverse biological processes, such as tissue development or embryo development (e.g., nervous system development and mesoderm development). By comparison with functions inferred using the neighboring gene-based method and functions determined using lncRNA knockdown experiments, our method can provide comparable predicted functions of lncRNAs. Overall, our method can be applied to emerging RNA-seq data, which will help researchers identify complex relations between lncRNAs and coding genes and reveal important functions of lncRNAs. Yun Xiao, Yanling Lv, Hongying Zhao, Yonghui Gong, Jing Hu, Feng Li, Jinyuan Xu, Jing Bai, Fulong Yu, and Xia Li Copyright © 2015 Yun Xiao et al. All rights reserved. ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function Wed, 25 Feb 2015 13:26:55 +0000 Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. Robson da Silva Lopes, Walas Jhony Lopes Moraes, Thiago de Souza Rodrigues, and Daniella Castanheira Bartholomeu Copyright © 2015 Robson da Silva Lopes et al. All rights reserved. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity Mon, 23 Feb 2015 07:09:56 +0000 This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity. Xin Yi Ng, Bakhtiar Affendi Rosdi, and Shahriza Shahrudin Copyright © 2015 Xin Yi Ng et al. All rights reserved. Novel Candidate Key Drivers in the Integrative Network of Genes, MicroRNAs, Methylations, and Copy Number Variations in Squamous Cell Lung Carcinoma Mon, 23 Feb 2015 07:03:50 +0000 The mechanisms of lung cancer are highly complex. Not only mRNA gene expression but also microRNAs, DNA methylation, and copy number variation (CNV) play roles in tumorigenesis. It is difficult to incorporate so much information into a single model that can comprehensively reflect all these lung cancer mechanisms. In this study, we analyzed the 129 TCGA (The Cancer Genome Atlas) squamous cell lung carcinoma samples with gene expression, microRNA expression, DNA methylation, and CNV data. First, we used variance inflation factor (VIF) regression to build the whole genome integrative network. Then, we isolated the lung cancer subnetwork by identifying the known lung cancer genes and their direct regulators. This subnetwork was refined by the Bayesian method, and the directed regulations among mRNA genes, microRNAs, methylations, and CNVs were obtained. The novel candidate key drivers in this refined subnetwork, such as the methylation of ARHGDIB and HOXD3, microRNA let-7a and miR-31, and the CNV of AGAP2, were identified and analyzed. On three large public available lung cancer datasets, the key drivers ARHGDIB and HOXD3 demonstrated significant associations with the overall survival of lung cancer patients. Our results provide new insights into lung cancer mechanisms. Tao Huang, Jing Yang, and Yu-dong Cai Copyright © 2015 Tao Huang et al. All rights reserved. A miRNA-Driven Inference Model to Construct Potential Drug-Disease Associations for Drug Repositioning Thu, 19 Feb 2015 10:16:58 +0000 Increasing evidence discovered that the inappropriate expression of microRNAs (miRNAs) will lead to many kinds of complex diseases and drugs can regulate the expression level of miRNAs. Therefore human diseases may be treated by targeting some specific miRNAs with drugs, which provides a new perspective for drug repositioning. However, few studies have attempted to computationally predict associations between drugs and diseases via miRNAs for drug repositioning. In this paper, we developed an inference model to achieve this aim by combining experimentally supported drug-miRNA associations and miRNA-disease associations with the assumption that drugs will form associations with diseases when they share some significant miRNA partners. Experimental results showed excellent performance of our model. Case studies demonstrated that some of the strongly predicted drug-disease associations can be confirmed by the publicly accessible database CTD (, which indicated the usefulness of our inference model. Moreover, candidate miRNAs as molecular hypotheses underpinning the associations were listed to guide future experiments. The predicted results were released for further studies. We expect that this study will provide help in our understanding of drug-disease association prediction and in the roles of miRNAs in drug repositioning. Hailin Chen and Zuping Zhang Copyright © 2015 Hailin Chen and Zuping Zhang. All rights reserved. Prediction of Protein-Protein Interactions Related to Protein Complexes Based on Protein Interaction Networks Tue, 03 Feb 2015 13:32:51 +0000 A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptive k-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology. Peng Liu, Lei Yang, Daming Shi, and Xianglong Tang Copyright © 2015 Peng Liu et al. All rights reserved. Novel Numerical Characterization of Protein Sequences Based on Individual Amino Acid and Its Application Mon, 02 Feb 2015 13:44:51 +0000 The hydrophobicity and hydrophilicity of amino acids play a very important role in protein folding and its interaction with the environment and other molecules, as well as its catalytic mechanism. Based on the two physicochemical indexes, a 2D graphical representation of protein sequences is introduced; meanwhile, a new numerical characteristic has been proposed to compute the distance of different sequences for analysis of sequence similarity/dissimilarity on the basis of this graphical representation. Furthermore, we apply the new distance in the similarities/dissimilarities of ND5 proteins of nine species and predict the four major classes based on the dataset containing 639 domains. The results show that the method is simple and effective. Yan-ping Zhang, Ya-jun Sheng, Wei Zheng, Ping-an He, and Ji-shuo Ruan Copyright © 2015 Yan-ping Zhang et al. All rights reserved. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm Mon, 02 Feb 2015 06:51:40 +0000 The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis. Jianning Wu and Bin Wu Copyright © 2015 Jianning Wu and Bin Wu. All rights reserved. Detecting Key Genes Regulated by miRNAs in Dysfunctional Crosstalk Pathway of Myasthenia Gravis Sun, 01 Feb 2015 10:23:29 +0000 Myasthenia gravis (MG) is a neuromuscular autoimmune disorder resulting from autoantibodies attacking components of the neuromuscular junction. Recent studies have implicated the aberrant expression of microRNAs (miRNAs) in the pathogenesis of MG; however, the underlying mechanisms remain largely unknown. This study aimed to identify key genes regulated by miRNAs in MG. Six dysregulated pathways were identified through differentially expressed miRNAs and mRNAs in MG, and significant crosstalk was detected between five of these. Notably, crosstalk between the “synaptic long-term potentiation” pathway and four others was mediated by five genes involved in the MAPK signaling pathway. Furthermore, 14 key genes regulated by miRNAs were detected, of which six—MAPK1, RAF1, PGF, PDGFRA, EP300, and PPP1CC—mediated interactions between the dysregulated pathways. MAPK1 and RAF1 were responsible for most of this crosstalk (80%), likely reflecting their central roles in MG pathogenesis. In addition, most key genes were enriched in immune-related local areas that were strongly disordered in MG. These results provide new insight into the pathogenesis of MG and offer new potential targets for therapeutic intervention. Yuze Cao, Jianjian Wang, Huixue Zhang, Qinghua Tian, Lixia Chen, Shangwei Ning, Peifang Liu, Xuesong Sun, Xiaoyu Lu, Chang Song, Shuai Zhang, Bo Xiao, and Lihua Wang Copyright © 2015 Yuze Cao et al. All rights reserved. Conformational B-Cell Epitope Prediction Method Based on Antigen Preprocessing and Mimotopes Analysis Thu, 29 Jan 2015 06:48:20 +0000 Identification of epitopes which invokes strong humoral responses is an essential issue in the field of immunology. Various computational methods that have been developed based on the antigen structures and the mimotopes these years narrow the search for experimental validation. These methods can be divided into two categories: antigen structure-based methods and mimotope-based methods. Though new methods of the two kinds have been proposed in these years, they cannot maintain a high degree of satisfaction in various circumstances. In this paper, we proposed a new conformational B-cell epitope prediction method based on antigen preprocessing and mimotopes analysis. The method classifies the antigen surface residues into “epitopes” and “nonepitopes” by six epitope propensity scales, removing the “nonepitopes” and using the preprocessed antigen for epitope prediction based on mimotope sequences. The proposed method gives out the mean F score of 0.42 on the testing dataset. When compared with other publicly available servers by using the testing dataset, the new method yields better performance. The results demonstrate the proposed method is competent for the conformational B-cell epitope prediction. Pingping Sun, Haixu Ju, Baowen Zhang, Yu Gu, Bo Liu, Yanxin Huang, Huijie Zhang, and Yuxin Li Copyright © 2015 Pingping Sun et al. All rights reserved. Helicase and Its Interacting Factors: Regulation Mechanism, Characterization, Structure, and Application for Drug Design Wed, 28 Jan 2015 14:39:55 +0000 Cheng-Yang Huang, Yoshito Abe, Huangen Ding, and I-Fang Chung Copyright © 2015 Cheng-Yang Huang et al. All rights reserved. Automated Training for Algorithms That Learn from Genomic Data Wed, 28 Jan 2015 07:04:42 +0000 Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable. Gokcen Cilingir and Shira L. Broschat Copyright © 2015 Gokcen Cilingir and Shira L. Broschat. All rights reserved. Protein Complex Discovery by Interaction Filtering from Protein Interaction Networks Using Mutual Rank Coexpression and Sequence Similarity Tue, 27 Jan 2015 14:15:39 +0000 The evaluation of the biological networks is considered the essential key to understanding the complex biological systems. Meanwhile, the graph clustering algorithms are mostly used in the protein-protein interaction (PPI) network analysis. The complexes introduced by the clustering algorithms include noise proteins. The error rate of the noise proteins in the PPI network researches is about 40–90%. However, only 30–40% of the existing interactions in the PPI databases depend on the specific biological function. It is essential to eliminate the noise proteins and the interactions from the complexes created via clustering methods. We have introduced new methods of weighting interactions in protein clusters and the splicing of noise interactions and proteins-based interactions on their weights. The coexpression and the sequence similarity of each pair of proteins are considered the edge weight of the proteins in the network. The results showed that the edge filtering based on the amount of coexpression acts similar to the node filtering via graph-based characteristics. Regarding the removal of the noise edges, the edge filtering has a significant advantage over the graph-based method. The edge filtering based on the amount of sequence similarity has the ability to remove the noise proteins and the noise interactions. Ali Kazemi-Pour, Bahram Goliaei, and Hamid Pezeshk Copyright © 2015 Ali Kazemi-Pour et al. All rights reserved. Regulation of DEAH/RHA Helicases by G-Patch Proteins Tue, 27 Jan 2015 11:17:53 +0000 RNA helicases from the DEAH/RHA family are present in all the processes of RNA metabolism. The function of two helicases from this family, Prp2 and Prp43, is regulated by protein partners containing a G-patch domain. The G-patch is a glycine-rich domain discovered by sequence alignment, involved in protein-protein and protein-nucleic acid interaction. Although it has been shown to stimulate the helicase’s enzymatic activities, the precise role of the G-patch domain remains unclear. The role of G-patch proteins in the regulation of Prp43 activity has been studied in the two biological processes in which it is involved: splicing and ribosome biogenesis. Depending on the pathway, the activity of Prp43 is modulated by different G-patch proteins. A particular feature of the structure of DEAH/RHA helicases revealed by the Prp43 structure is the OB-fold domain in C-terminal part. The OB-fold has been shown to be a platform responsible for the interaction with G-patch proteins and RNA. Though there is still no structural data on the G-patch domain, in the current model, the interaction between the helicase, the G-patch protein, and RNA leads to a cooperative binding of RNA and conformational changes of the helicase. Julien Robert-Paganin, Stéphane Réty, and Nicolas Leulliot Copyright © 2015 Julien Robert-Paganin et al. All rights reserved. Virtual Screening of Acetylcholinesterase Inhibitors Using the Lipinski’s Rule of Five and ZINC Databank Thu, 22 Jan 2015 06:24:23 +0000 Alzheimer’s disease (AD) is a progressive and neurodegenerative pathology that can affect people over 65 years of age. It causes several complications, such as behavioral changes, language deficits, depression, and memory impairments. One of the methods used to treat AD is the increase of acetylcholine (ACh) in the brain by using acetylcholinesterase inhibitors (AChEIs). In this study, we used the ZINC databank and the Lipinski’s rule of five to perform a virtual screening and a molecular docking (using Auto Dock Vina 1.1.1) aiming to select possible compounds that have quaternary ammonium atom able to inhibit acetylcholinesterase (AChE) activity. The molecules were obtained by screening and further in vitro assays were performed to analyze the most potent inhibitors through the IC50 value and also to describe the interaction models between inhibitors and enzyme by molecular docking. The results showed that compound D inhibited AChE activity from different vertebrate sources and butyrylcholinesterase (BChE) from Equus ferus (EfBChE), with IC50 ranging from 1.69 ± 0.46 to 5.64 ± 2.47 µM. Compound D interacted with the peripheral anionic subsite in both enzymes, blocking substrate entrance to the active site. In contrast, compound C had higher specificity as inhibitor of EfBChE. In conclusion, the screening was effective in finding inhibitors of AChE and BuChE from different organisms. Pablo Andrei Nogara, Rogério de Aquino Saraiva, Diones Caeran Bueno, Lílian Juliana Lissner, Cristiane Lenz Dalla Corte, Marcos M. Braga, Denis Broock Rosemberg, and João Batista Teixeira Rocha Copyright © 2015 Pablo Andrei Nogara et al. All rights reserved. Mammalian Cell Culture Process for Monoclonal Antibody Production: Nonlinear Modelling and Parameter Estimation Mon, 19 Jan 2015 08:06:35 +0000 Monoclonal antibodies (mAbs) are at present one of the fastest growing products of pharmaceutical industry, with widespread applications in biochemistry, biology, and medicine. The operation of mAbs production processes is predominantly based on empirical knowledge, the improvements being achieved by using trial-and-error experiments and precedent practices. The nonlinearity of these processes and the absence of suitable instrumentation require an enhanced modelling effort and modern kinetic parameter estimation strategies. The present work is dedicated to nonlinear dynamic modelling and parameter estimation for a mammalian cell culture process used for mAb production. By using a dynamical model of such kind of processes, an optimization-based technique for estimation of kinetic parameters in the model of mammalian cell culture process is developed. The estimation is achieved as a result of minimizing an error function by a particle swarm optimization (PSO) algorithm. The proposed estimation approach is analyzed in this work by using a particular model of mammalian cell culture, as a case study, but is generic for this class of bioprocesses. The presented case study shows that the proposed parameter estimation technique provides a more accurate simulation of the experimentally observed process behaviour than reported in previous studies. Dan Selişteanu, Dorin Șendrescu, Vlad Georgeanu, and Monica Roman Copyright © 2015 Dan Selişteanu et al. All rights reserved. Simultaneous Parameters Identifiability and Estimation of an E. coli Metabolic Network Model Tue, 06 Jan 2015 08:05:04 +0000 This work proposes a procedure for simultaneous parameters identifiability and estimation in metabolic networks in order to overcome difficulties associated with lack of experimental data and large number of parameters, a common scenario in the modeling of such systems. As case study, the complex real problem of parameters identifiability of the Escherichia coli K-12 W3110 dynamic model was investigated, composed by 18 differential ordinary equations and 35 kinetic rates, containing 125 parameters. With the procedure, model fit was improved for most of the measured metabolites, achieving 58 parameters estimated, including 5 unknown initial conditions. The results indicate that simultaneous parameters identifiability and estimation approach in metabolic networks is appealing, since model fit to the most of measured metabolites was possible even when important measures of intracellular metabolites and good initial estimates of parameters are not available. Kese Pontes Freitas Alberton, André Luís Alberton, Jimena Andrea Di Maggio, Vanina Gisela Estrada, María Soledad Díaz, and Argimiro Resende Secchi Copyright © 2015 Kese Pontes Freitas Alberton et al. All rights reserved. DNASynth: A Computer Program for Assembly of Artificial Gene Parts in Decreasing Temperature Tue, 06 Jan 2015 05:58:05 +0000 Artificial gene synthesis requires consideration of nucleotide sequence development as well as long DNA molecule assembly protocols. The nucleotide sequence of the molecule must meet many conditions including particular preferences of the host organism for certain codons, avoidance of specific regulatory subsequences, and a lack of secondary structures that inhibit expression. The chemical synthesis of DNA molecule has limitations in terms of strand length; thus, the creation of artificial genes requires the assembly of long DNA molecules from shorter fragments. In the approach presented, the algorithm and the computer program address both tasks: developing the optimal nucleotide sequence to encode a given peptide for a given host organism and determining the long DNA assembly protocol. These tasks are closely connected; a change in codon usage may lead to changes in the optimal assembly protocol, and the lack of a simple assembly protocol may be addressed by changing the nucleotide sequence. The computer program presented in this study was tested with real data from an experiment in a wet biological laboratory to synthesize a peptide. The benefit of the presented algorithm and its application is the shorter time, compared to polymerase cycling assembly, needed to produce a ready synthetic gene. Robert M. Nowak, Anna Wojtowicz-Krawiec, and Andrzej Plucienniczak Copyright © 2015 Robert M. Nowak et al. All rights reserved. Novel Computing Technologies for Bioinformatics and Cheminformatics Sun, 28 Dec 2014 07:06:35 +0000 Chuan Yi Tang, Che-Lun Hung, Ching-Hsien Hsu, Huiru Zheng, and Chun-Yuan Lin Copyright © 2014 Chuan Yi Tang et al. All rights reserved. Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data Sun, 28 Dec 2014 06:47:37 +0000 Julia Tzu-Ya Weng, Li-Ching Wu, Wen-Chi Chang, Tzu-Hao Chang, Tatsuya Akutsu, and Tzong-Yi Lee Copyright © 2014 Julia Tzu-Ya Weng et al. All rights reserved. Phenomics Research on Coronary Heart Disease Based on Human Phenotype Ontology Mon, 15 Dec 2014 06:53:45 +0000 The characteristics of holistic, dynamics, complexity, and spatial and temporal features enable “Omics” and theories of TCM to interlink with each other. HPO, namely, “characterization,” can be understood as a sorting and generalization of the manifestations shown by people with diseases on the basis of the phenomics. Syndrome is the overall “manifestation” of human body pathological and physiological changes expressed by four diagnostic methods’ information. The four diagnostic methods’ data could be the most objective and direct manifestations of human body under morbid conditions. In this aspect, it is consistent with the connation of “characterization.” Meanwhile, the four diagnostic methods’ data also equip us with features of characterization in HPO. In our study, we compared 107 pieces of four diagnostic methods’ information with the “characterization database” to further analyze data of four diagnostic methods’ characterization in accordance with the common characteristics of four diagnostic methods’ information and characterization and integrated 107 pieces of four diagnostic methods’ data to relevant items in HPO and finished the expansion of characterization information in HPO. Qi Shi, Kuo Gao, Huihui Zhao, Juan Wang, Xing Zhai, Peng Lu, Jianxin Chen, and Wei Wang Copyright © 2014 Qi Shi et al. All rights reserved. Erratum to “A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats” Mon, 24 Nov 2014 00:00:00 +0000 Shuaibin Lian, Qingyan Li, Zhiming Dai, Qian Xiang, and Xianhua Dai Copyright © 2014 Shuaibin Lian et al. All rights reserved. A Least Square Method Based Model for Identifying Protein Complexes in Protein-Protein Interaction Network Thu, 23 Oct 2014 12:45:40 +0000 Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity. Qiguo Dai, Maozu Guo, Yingjie Guo, Xiaoyan Liu, Yang Liu, and Zhixia Teng Copyright © 2014 Qiguo Dai et al. All rights reserved. Evolution of Network Biomarkers from Early to Late Stage Bladder Cancer Samples Thu, 18 Sep 2014 06:53:32 +0000 We use a systems biology approach to construct protein-protein interaction networks (PPINs) for early and late stage bladder cancer. By comparing the networks of these two stages, we find that both networks showed very significantly different mechanisms. To obtain the differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two bladder cancer stages using microarray data from cancer cells and their adjacent noncancer cells, respectively. With their carcinogenesis relevance values (CRVs), we identified 152 and 50 significant proteins and their PPI networks (network markers) for early and late stage bladder cancer by statistical assessment. To investigate the evolution of network biomarkers in the carcinogenesis process, primary pathway analysis showed that the significant pathways of early stage bladder cancer are related to ordinary cancer mechanisms, while the ribosome pathway and spliceosome pathway are most important for late stage bladder cancer. Their only intersection is the ubiquitin mediated proteolysis pathway in the whole stage of bladder cancer. The evolution of network biomarkers from early to late stage can reveal the carcinogenesis of bladder cancer. The findings in this study are new clues specific to this study and give us a direction for targeted cancer therapy, and it should be validated in vivo or in vitro in the future. Yung-Hao Wong, Cheng-Wei Li, and Bor-Sen Chen Copyright © 2014 Yung-Hao Wong et al. All rights reserved. MicroRNA Expression Profiling Altered by Variant Dosage of Radiation Exposure Tue, 16 Sep 2014 08:57:42 +0000 Various biological effects are associated with radiation exposure. Irradiated cells may elevate the risk for genetic instability, mutation, and cancer under low levels of radiation exposure, in addition to being able to extend the postradiation side effects in normal tissues. Radiation-induced bystander effect (RIBE) is the focus of rigorous research as it may promote the development of cancer even at low radiation doses. Alterations in the DNA sequence could not explain these biological effects of radiation and it is thought that epigenetics factors may be involved. Indeed, some microRNAs (or miRNAs) have been found to correlate radiation-induced damages and may be potential biomarkers for the various biological effects caused by different levels of radiation exposure. However, the regulatory role that miRNA plays in this aspect remains elusive. In this study, we profiled the expression changes in miRNA under fractionated radiation exposure in human peripheral blood mononuclear cells. By utilizing publicly available microRNA knowledge bases and performing cross validations with our previous gene expression profiling under the same radiation condition, we identified various miRNA-gene interactions specific to different doses of radiation treatment, providing new insights for the molecular underpinnings of radiation injury. Kuei-Fang Lee, Yi-Cheng Chen, Paul Wei-Che Hsu, Ingrid Y. Liu, and Lawrence Shih-Hsin Wu Copyright © 2014 Kuei-Fang Lee et al. All rights reserved. WISCOD: A Statistical Web-Enabled Tool for the Identification of Significant Protein Coding Regions Mon, 15 Sep 2014 05:37:19 +0000 Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global value called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms. Mireia Vilardell, Genis Parra, and Sergi Civit Copyright © 2014 Mireia Vilardell et al. All rights reserved. EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction Thu, 11 Sep 2014 10:40:30 +0000 We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968. Chih-Hao Lu, Chin-Sheng Yu, Yu-Tung Chien, and Shao-Wei Huang Copyright © 2014 Chih-Hao Lu et al. All rights reserved. Computational Biophysical, Biochemical, and Evolutionary Signature of Human R-Spondin Family Proteins, the Member of Canonical Wnt/β-Catenin Signaling Pathway Mon, 08 Sep 2014 08:19:35 +0000 In human, Wnt/β-catenin signaling pathway plays a significant role in cell growth, cell development, and disease pathogenesis. Four human (Rspo)s are known to activate canonical Wnt/β-catenin signaling pathway. Presently, (Rspo)s serve as therapeutic target for several human diseases. Henceforth, basic understanding about the molecular properties of (Rspo)s is essential. We approached this issue by interpreting the biochemical and biophysical properties along with molecular evolution of (Rspo)s thorough computational algorithm methods. Our analysis shows that signal peptide length is roughly similar in (Rspo)s family along with similarity in aa distribution pattern. In Rspo3, four N-glycosylation sites were noted. All members are hydrophilic in nature and showed alike GRAVY values, approximately. Conversely, Rspo3 contains the maximum positively charged residues while Rspo4 includes the lowest. Four highly aligned blocks were recorded through Gblocks. Phylogenetic analysis shows Rspo4 is being rooted with Rspo2 and similarly Rspo3 and Rspo1 have the common point of origin. Through phylogenomics study, we developed a phylogenetic tree of sixty proteins () with the orthologs and paralogs seed sequences. Protein-protein network was also illustrated. Results demonstrated in our study may help the future researchers to unfold significant physiological and therapeutic properties of (Rspo)s in various disease models. Ashish Ranjan Sharma, Chiranjib Chakraborty, Sang-Soo Lee, Garima Sharma, Jeong Kyo Yoon, C. George Priya Doss, Dong-Keun Song, and Ju-Suk Nam Copyright © 2014 Ashish Ranjan Sharma et al. All rights reserved. Gene Expression Profiling of Biological Pathway Alterations by Radiation Exposure Mon, 08 Sep 2014 00:00:00 +0000 Though damage caused by radiation has been the focus of rigorous research, the mechanisms through which radiation exerts harmful effects on cells are complex and not well-understood. In particular, the influence of low dose radiation exposure on the regulation of genes and pathways remains unclear. In an attempt to investigate the molecular alterations induced by varying doses of radiation, a genome-wide expression analysis was conducted. Peripheral blood mononuclear cells were collected from five participants and each sample was subjected to 0.5 Gy, 1 Gy, 2.5 Gy, and 5 Gy of cobalt 60 radiation, followed by array-based expression profiling. Gene set enrichment analysis indicated that the immune system and cancer development pathways appeared to be the major affected targets by radiation exposure. Therefore, 1 Gy radioactive exposure seemed to be a critical threshold dosage. In fact, after 1 Gy radiation exposure, expression levels of several genes including FADD, TNFRSF10B, TNFRSF8, TNFRSF10A, TNFSF10, TNFSF8, CASP1, and CASP4 that are associated with carcinogenesis and metabolic disorders showed significant alterations. Our results suggest that exposure to low-dose radiation may elicit changes in metabolic and immune pathways, potentially increasing the risk of immune dysfunctions and metabolic disorders. Kuei-Fang Lee, Julia Tzu-Ya Weng, Paul Wei-Che Hsu, Yu-Hsiang Chi, Ching-Kai Chen, Ingrid Y. Liu, Yi-Cheng Chen, and Lawrence Shih-Hsin Wu Copyright © 2014 Kuei-Fang Lee et al. All rights reserved. Systematic Expression Profiling Analysis Identifies Specific MicroRNA-Gene Interactions that May Differentiate between Active and Latent Tuberculosis Infection Thu, 04 Sep 2014 00:00:00 +0000 Tuberculosis (TB) is the second most common cause of death from infectious diseases. About 90% of those infected are asymptomatic—the so-called latent TB infections (LTBI), with a 10% lifetime chance of progressing to active TB. To further understand the molecular pathogenesis of TB, several molecular studies have attempted to compare the expression profiles between healthy controls and active TB or LTBI patients. However, the results vary due to diverse genetic backgrounds and study designs and the inherent complexity of the disease process. Thus, developing a sensitive and efficient method for the detection of LTBI is both crucial and challenging. For the present study, we performed a systematic analysis of the gene and microRNA profiles of healthy individuals versus those affected with TB or LTBI. Combined with a series of in silico analysis utilizing publicly available microRNA knowledge bases and published literature data, we have uncovered several microRNA-gene interactions that specifically target both the blood and lungs. Some of these molecular interactions are novel and may serve as potential biomarkers of TB and LTBI, facilitating the development for a more sensitive, efficient, and cost-effective diagnostic assay for TB and LTBI for the Taiwanese population. Lawrence Shih-Hsin Wu, Shih-Wei Lee, Kai-Yao Huang, Tzong-Yi Lee, Paul Wei-Che Hsu, and Julia Tzu-Ya Weng Copyright © 2014 Lawrence Shih-Hsin Wu et al. All rights reserved. Human Umbilical Cord Mesenchymal Stem Cells Infected with Adenovirus Expressing HGF Promote Regeneration of Damaged Neuron Cells in a Parkinson’s Disease Model Wed, 03 Sep 2014 08:15:20 +0000 Parkinson’s disease (PD) is a neurodegenerative movement disorder that is characterized by the progressive degeneration of the dopaminergic (DA) pathway. Mesenchymal stem cells derived from human umbilical cord (hUC-MSCs) have great potential for developing a therapeutic agent as such. HGF is a multifunctional mediator originally identified in hepatocytes and has recently been reported to possess various neuroprotective properties. This study was designed to investigate the protective effect of hUC-MSCs infected by an adenovirus carrying the HGF gene on the PD cell model induced by MPP+ on human bone marrow neuroblastoma cells. Our results provide evidence that the cultural supernatant from hUC-MSCs expressing HGF could promote regeneration of damaged PD cells at higher efficacy than the supernatant from hUC-MSCs alone. And intracellular free Ca2+ obviously decreased after treatment with cultural supernatant from hUC-MSCs expressing HGF, while the expression of CaBP-D28k, an intracellular calcium binding protein, increased. Therefore our study clearly demonstrated that cultural supernatant of MSC overexpressing HGF was capable of eliciting regeneration of damaged PD model cells. This effect was probably achieved through the regulation of intracellular Ca2+ levels by modulating of CaBP-D28k expression. Xin-Shan Liu, Jin-Feng Li, Shan-Shan Wang, Yu-Tong Wang, Yu-Zhen Zhang, Hong-Lei Yin, Shuang Geng, Hui-Cui Gong, Bing Han, and Yun-Liang Wang Copyright © 2014 Xin-Shan Liu et al. All rights reserved. Structural Comparison, Substrate Specificity, and Inhibitor Binding of AGPase Small Subunit from Monocot and Dicot: Present Insight and Future Potential Tue, 02 Sep 2014 11:29:57 +0000 ADP-glucose pyrophosphorylase (AGPase) is the first rate limiting enzyme of starch biosynthesis pathway and has been exploited as the target for greater starch yield in several plants. The structure-function analysis and substrate binding specificity of AGPase have provided enormous potential for understanding the role of specific amino acid or motifs responsible for allosteric regulation and catalytic mechanisms, which facilitate the engineering of AGPases. We report the three-dimensional structure, substrate, and inhibitor binding specificity of AGPase small subunit from different monocot and dicot crop plants. Both monocot and dicot subunits were found to exploit similar interactions with the substrate and inhibitor molecule as in the case of their closest homologue potato tuber AGPase small subunit. Comparative sequence and structural analysis followed by molecular docking and electrostatic surface potential analysis reveal that rearrangements of secondary structure elements, substrate, and inhibitor binding residues are strongly conserved and follow common folding pattern and orientation within monocot and dicot displaying a similar mode of allosteric regulation and catalytic mechanism. The results from this study along with site-directed mutagenesis complemented by molecular dynamics simulation will shed more light on increasing the starch content of crop plants to ensure the food security worldwide. Kishore Sarma, Priyabrata Sen, Madhumita Barooah, Manabendra D. Choudhury, Shubhadeep Roychoudhury, and Mahendra K. Modi Copyright © 2014 Kishore Sarma et al. All rights reserved. A Review of Feature Extraction Software for Microarray Gene Expression Data Sun, 31 Aug 2014 07:10:08 +0000 When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. Ching Siang Tan, Wai Soon Ting, Mohd Saberi Mohamad, Weng Howe Chan, Safaai Deris, and Zuraini Ali Shah Copyright © 2014 Ching Siang Tan et al. All rights reserved. The Mcm2-7 Replicative Helicase: A Promising Chemotherapeutic Target Thu, 28 Aug 2014 15:15:54 +0000 Numerous eukaryotic replication factors have served as chemotherapeutic targets. One replication factor that has largely escaped drug development is the Mcm2-7 replicative helicase. This heterohexameric complex forms the licensing system that assembles the replication machinery at origins during initiation, as well as the catalytic core of the CMG (Cdc45-Mcm2-7-GINS) helicase that unwinds DNA during elongation. Emerging evidence suggests that Mcm2-7 is also part of the replication checkpoint, a quality control system that monitors and responds to DNA damage. As the only replication factor required for both licensing and DNA unwinding, Mcm2-7 is a major cellular regulatory target with likely cancer relevance. Mutations in at least one of the six MCM genes are particularly prevalent in squamous cell carcinomas of the lung, head and neck, and prostrate, and MCM mutations have been shown to cause cancer in mouse models. Moreover various cellular regulatory proteins, including the Rb tumor suppressor family members, bind Mcm2-7 and inhibit its activity. As a preliminary step toward drug development, several small molecule inhibitors that target Mcm2-7 have been recently discovered. Both its structural complexity and essential role at the interface between DNA replication and its regulation make Mcm2-7 a potential chemotherapeutic target. Nicholas E. Simon and Anthony Schwacha Copyright © 2014 Nicholas E. Simon and Anthony Schwacha. All rights reserved. Crystal Structure of a Conserved Hypothetical Protein MJ0927 from Methanocaldococcus jannaschii Reveals a Novel Quaternary Assembly in the Nif3 Family Thu, 28 Aug 2014 15:06:43 +0000 A Nif3 family protein of Methanocaldococcus jannaschii, MJ0927, is highly conserved from bacteria to humans. Although several structures of bacterial Nif3 proteins are known, no structure representing archaeal Nif3 has yet been reported. The crystal structure of Methanocaldococcus jannaschii MJ0927 was determined at 2.47 Å resolution to understand the structural differences between the bacterial and archaeal Nif3 proteins. Intriguingly, MJ0927 is found to adopt an unusual assembly comprising a trimer of dimers that forms a cage-like architecture. Electrophoretic mobility-shift assays indicate that MJ0927 binds to both single-stranded and double-stranded DNA. Structural analysis of MJ0927 reveals a positively charged region that can potentially explain its DNA-binding capability. Taken together, these data suggest that MJ0927 adopts a novel quartenary architecture that could play various DNA-binding roles in Methanocaldococcus jannaschii. Sheng-Chia Chen, Chi-Hung Huang, Chia Shin Yang, Shu-Min Kuan, Ching-Ting Lin, Shan-Ho Chou, and Yeh Chen Copyright © 2014 Sheng-Chia Chen et al. All rights reserved. Relationship between CCR and NT-proBNP in Chinese HF Patients, and Their Correlations with Severity of HF Thu, 28 Aug 2014 09:42:10 +0000 Aim. To evaluate the relationship between creatinine clearance rate (CCR) and the level of N-terminal pro-B-type natriuretic peptide (NT-proBNP) in heart failure (HF) patients and their correlations with HF severity. Methods and Results. Two hundred and one Chinese patients were grouped according to the New York Heart Association (NYHA) classification as NYHA 1-2 and 3-4 groups and 135 cases out of heart failure patients as control group. The following variables were compared among these three groups: age, sex, body mass index (BMI), smoking status, hypertension, diabetes, NT-proBNP, creatinine (Cr), uric acid (UA), left ventricular end-diastolic diameter (LVEDD), and CCR. The biomarkers of NT-proBNP, Cr, UA, LVEDD, and CCR varied significantly in the three groups, and these variables were positively correlated with the NHYA classification. The levels of NT-proBNP and CCR were closely related to the occurrence of HF and were independent risk factors for HF. At the same time, there was a significant negative correlation between the levels of NT-proBNP and CCR. The area under the receiver operating characteristic curve suggested that the NT-proBNP and CCR have high accuracy for diagnosis of HF and have clinical diagnostic value. Conclusion. NT-proBNP and CCR may be important biomarkers in evaluating the severity of HF. Zhigang Lu, Bo Wang, Yunliang Wang, Xueqing Qian, Wei Zheng, and Meng Wei Copyright © 2014 Zhigang Lu et al. All rights reserved. Establishing Standards for Studying Renal Function in Mice through Measurements of Body Size-Adjusted Creatinine and Urea Levels Wed, 27 Aug 2014 12:35:10 +0000 Strategies for obtaining reliable results are increasingly implemented in order to reduce errors in the analysis of human and veterinary samples; however, further data are required for murine samples. Here, we determined an average factor from the murine body surface area for the calculation of biochemical renal parameters, assessed the effects of storage and freeze-thawing of C57BL/6 mouse samples on plasmatic and urinary urea, and evaluated the effects of using two different urea-measurement techniques. After obtaining 24 h urine samples, blood was collected, and body weight and length were established. The samples were evaluated after collection or stored at −20°C and −70°C. At different time points (0, 4, and 90 days), these samples were thawed, the creatinine and/or urea concentrations were analyzed, and samples were restored at these temperatures for further measurements. We show that creatinine clearance measurements should be adjusted according to the body surface area, which was calculated based on the weight and length of the animal. Repeated freeze-thawing cycles negatively affected the urea concentration; the urea concentration was more reproducible when using the modified Berthelot reaction rather than the ultraviolet method. Our findings will facilitate standardization and optimization of methodology as well as understanding of renal and other biochemical data obtained from mice. Wellington Francisco Rodrigues, Camila Botelho Miguel, Marcelo Henrique Napimoga, Carlo Jose Freire Oliveira, and Javier Emilio Lazo-Chica Copyright © 2014 Wellington Francisco Rodrigues et al. All rights reserved. Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection Wed, 27 Aug 2014 12:02:00 +0000 Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method. Xiuquan Du and Jiaxing Cheng Copyright © 2014 Xiuquan Du and Jiaxing Cheng. All rights reserved. Crystal Structure of Deinococcus radiodurans RecQ Helicase Catalytic Core Domain: The Interdomain Flexibility Wed, 27 Aug 2014 08:21:26 +0000 RecQ DNA helicases are key enzymes in the maintenance of genome integrity, and they have functions in DNA replication, recombination, and repair. In contrast to most RecQs, RecQ from Deinococcus radiodurans (DrRecQ) possesses an unusual domain architecture that is crucial for its remarkable ability to repair DNA. Here, we determined the crystal structures of the DrRecQ helicase catalytic core and its ADP-bound form, revealing interdomain flexibility in its first RecA-like and winged-helix (WH) domains. Additionally, the WH domain of DrRecQ is positioned in a different orientation from that of the E. coli RecQ (EcRecQ). These results suggest that the orientation of the protein during DNA-binding is significantly different when comparing DrRecQ and EcRecQ. Sheng-Chia Chen, Chi-Hung Huang, Chia Shin Yang, Tzong-Der Way, Ming-Chung Chang, and Yeh Chen Copyright © 2014 Sheng-Chia Chen et al. All rights reserved. Characterization of Putative cis-Regulatory Elements in Genes Preferentially Expressed in Arabidopsis Male Meiocytes Wed, 27 Aug 2014 08:05:05 +0000 Meiosis is essential for plant reproduction because it is the process during which homologous chromosome pairing, synapsis, and meiotic recombination occur. The meiotic transcriptome is difficult to investigate because of the size of meiocytes and the confines of anther lobes. The recent development of isolation techniques has enabled the characterization of transcriptional profiles in male meiocytes of Arabidopsis. Gene expression in male meiocytes shows unique features. The direct interaction of transcription factors (TFs) with DNA regulatory sequences forms the basis for the specificity of transcriptional regulation. Here, we identified putative cis-regulatory elements (CREs) associated with male meiocyte-expressed genes using in silico tools. The upstream regions (1 kb) of the top 50 genes preferentially expressed in Arabidopsis meiocytes possessed conserved motifs. These motifs are putative binding sites of TFs, some of which share common functions, such as roles in cell division. In combination with cell-type-specific analysis, our findings could be a substantial aid for the identification and experimental verification of the protein-DNA interactions for the specific TFs that drive gene expression in meiocytes. Junhua Li, Jinhong Yuan, and Mingjun Li Copyright © 2014 Junhua Li et al. All rights reserved. Function Formula Oriented Construction of Bayesian Inference Nets for Diagnosis of Cardiovascular Disease Wed, 27 Aug 2014 06:47:48 +0000 An intelligent cardiovascular disease (CVD) diagnosis system using hemodynamic parameters (HDPs) derived from sphygmogram (SPG) signal is presented to support the emerging patient-centric healthcare models. To replicate clinical approach of diagnosis through a staged decision process, the Bayesian inference nets (BIN) are adapted. New approaches to construct a hierarchical multistage BIN using defined function formulas and a method employing fuzzy logic (FL) technology to quantify inference nodes with dynamic values of statistical parameters are proposed. The suggested methodology is validated by constructing hierarchical Bayesian fuzzy inference nets (HBFIN) to diagnose various heart pathologies from the deduced HDPs. The preliminary diagnostic results show that the proposed methodology has salient validity and effectiveness in the diagnosis of cardiovascular disease. Booma Devi Sekar and Mingchui Dong Copyright © 2014 Booma Devi Sekar and Mingchui Dong. All rights reserved. High-Throughput Functional Screening of Steroid Substrates with Wild-Type and Chimeric P450 Enzymes Tue, 26 Aug 2014 10:40:59 +0000 The promiscuity of a collection of enzymes consisting of 31 wild-type and synthetic variants of CYP1A enzymes was evaluated using a series of 14 steroids and 2 steroid-like chemicals, namely, nootkatone, a terpenoid, and mifepristone, a drug. For each enzyme-substrate couple, the initial steady-state velocity of metabolite formation was determined at a substrate saturating concentration. For that, a high-throughput approach was designed involving automatized incubations in 96-well microplate with sixteen 6-point kinetics per microplate and data acquisition using LC/MS system accepting 96-well microplate for injections. The resulting dataset was used for multivariate statistics aimed at sorting out the correlations existing between tested enzyme variants and ability to metabolize steroid substrates. Functional classifications of both CYP1A enzyme variants and steroid substrate structures were obtained allowing the delineation of global structural features for both substrate recognition and regioselectivity of oxidation. Philippe Urban, Gilles Truan, and Denis Pompon Copyright © 2014 Philippe Urban et al. All rights reserved. Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model Mon, 18 Aug 2014 10:52:22 +0000 Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection. Zhu-Hong You, Shuai Li, Xin Gao, Xin Luo, and Zhen Ji Copyright © 2014 Zhu-Hong You et al. All rights reserved. Drug Repositioning Discovery for Early- and Late-Stage Non-Small-Cell Lung Cancer Mon, 18 Aug 2014 07:02:32 +0000 Drug repositioning is a popular approach in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development time. Non-small-cell lung cancer (NSCLC) is one of the leading causes of death worldwide. To reduce the biological heterogeneity effects among different individuals, both normal and cancer tissues were taken from the same patient, hence allowing pairwise testing. By comparing early- and late-stage cancer patients, we can identify stage-specific NSCLC genes. Differentially expressed genes are clustered separately to form up- and downregulated communities that are used as queries to perform enrichment analysis. The results suggest that pathways for early- and late-stage cancers are different. Sets of up- and downregulated genes were submitted to the cMap web resource to identify potential drugs. To achieve high confidence drug prediction, multiple microarray experimental results were merged by performing meta-analysis. The results of a few drug findings are supported by MTT assay or clonogenic assay data. In conclusion, we have been able to assess the potential existing drugs to identify novel anticancer drugs, which may be helpful in drug repositioning discovery for NSCLC. Chien-Hung Huang, Peter Mu-Hsin Chang, Yong-Jie Lin, Cheng-Hsu Wang, Chi-Ying F. Huang, and Ka-Lok Ng Copyright © 2014 Chien-Hung Huang et al. All rights reserved. Systematic Analysis of the Association between Gut Flora and Obesity through High-Throughput Sequencing and Bioinformatics Approaches Thu, 14 Aug 2014 12:10:54 +0000 Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI) 24) were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant genera of bacteria in case samples (with a BMI 27) were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). A principal coordinate analysis (PCoA) demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher’s ). The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity. Chih-Min Chiu, Wei-Chih Huang, Shun-Long Weng, Han-Chi Tseng, Chao Liang, Wei-Chi Wang, Ting Yang, Tzu-Ling Yang, Chen-Tsung Weng, Tzu-Hao Chang, and Hsien-Da Huang Copyright © 2014 Chih-Min Chiu et al. All rights reserved. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products Tue, 12 Aug 2014 10:16:15 +0000 Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO) with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products based on multiple ontologies of hierarchical GO annotations. The performance of our algorithm was evaluated and found to be superior to the other tested methods. We implemented the proposed algorithm in a software package, FSim, based on statistical and computing environment. It can be used to discover functionally related genes for a given gene, group of genes, or set of function terms. Conclusions. FSim is a flexible tool to analyze functional gene groups based on the GO annotation databases. Qiang Hu, ZhiGang Wang, and ZhengGuo Zhang Copyright © 2014 Qiang Hu et al. All rights reserved. Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm Tue, 12 Aug 2014 00:00:00 +0000 Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews’ correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews’ correlation coefficients of 0.2239. Guohua Huang, Lin Lu, Kaiyan Feng, Jun Zhao, Yuchao Zhang, Yaochen Xu, Ning Zhang, Bi-Qing Li, Weiping Huang, and Yu-Dong Cai Copyright © 2014 Guohua Huang et al. All rights reserved. Novel Approach for Coexpression Analysis of E2F1–3 and MYC Target Genes in Chronic Myelogenous Leukemia Sun, 10 Aug 2014 08:29:13 +0000 Background. Chronic myelogenous leukemia (CML) is characterized by tremendous amount of immature myeloid cells in the blood circulation. E2F1–3 and MYC are important transcription factors that form positive feedback loops by reciprocal regulation in their own transcription processes. Since genes regulated by E2F1–3 or MYC are related to cell proliferation and apoptosis, we wonder if there exists difference in the coexpression patterns of genes regulated concurrently by E2F1–3 and MYC between the normal and the CML states. Results. We proposed a method to explore the difference in the coexpression patterns of those candidate target genes between the normal and the CML groups. A disease-specific cutoff point for coexpression levels that classified the coexpressed gene pairs into strong and weak coexpression classes was identified. Our developed method effectively identified the coexpression pattern differences from the overall structure. Moreover, we found that genes related to the cell adhesion and angiogenesis properties were more likely to be coexpressed in the normal group when compared to the CML group. Conclusion. Our findings may be helpful in exploring the underlying mechanisms of CML and provide useful information in cancer treatment. Fengfeng Wang, Lawrence W. C. Chan, William C. S. Cho, Petrus Tang, Jun Yu, Chi-Ren Shyu, Nancy B. Y. Tsui, S. C. Cesar Wong, Parco M. Siu, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2014 Fengfeng Wang et al. All rights reserved. A Genome-Wide Identification of Genes Undergoing Recombination and Positive Selection in Neisseria Sun, 10 Aug 2014 08:23:34 +0000 Currently, there is particular interest in the molecular mechanisms of adaptive evolution in bacteria. Neisseria is a genus of gram negative bacteria, and there has recently been considerable focus on its two human pathogenic species N. meningitidis and N. gonorrhoeae. Until now, no genome-wide studies have attempted to scan for the genes related to adaptive evolution. For this reason, we selected 18 Neisseria genomes (14 N. meningitidis, 3 N. gonorrhoeae and 1 commensal N. lactamics) to conduct a comparative genome analysis to obtain a comprehensive understanding of the roles of natural selection and homologous recombination throughout the history of adaptive evolution. Among the 1012 core orthologous genes, we identified 635 genes with recombination signals and 10 genes that showed significant evidence of positive selection. Further functional analyses revealed that no functional bias was found in the recombined genes. Positively selected genes are prone to DNA processing and iron uptake, which are essential for the fundamental life cycle. Overall, the results indicate that both recombination and positive selection play crucial roles in the adaptive evolution of Neisseria genomes. The positively selected genes and the corresponding amino acid sites provide us with valuable targets for further research into the detailed mechanisms of adaptive evolution in Neisseria. Dong Yu, Yuan Jin, Zhiqiu Yin, Hongguang Ren, Wei Zhou, Long Liang, and Junjie Yue Copyright © 2014 Dong Yu et al. All rights reserved. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration Wed, 06 Aug 2014 08:37:56 +0000 Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes. Jian Zhang, ZhiHao Xing, Mingming Ma, Ning Wang, Yu-Dong Cai, Lei Chen, and Xun Xu Copyright © 2014 Jian Zhang et al. All rights reserved. C-Terminal Domain Swapping of SSB Changes the Size of the ssDNA Binding Site Mon, 04 Aug 2014 06:33:19 +0000 Single-stranded DNA-binding protein (SSB) plays an important role in DNA metabolism, including DNA replication, repair, and recombination, and is therefore essential for cell survival. Bacterial SSB consists of an N-terminal ssDNA-binding/oligomerization domain and a flexible C-terminal protein-protein interaction domain. We characterized the ssDNA-binding properties of Klebsiella pneumoniae SSB (KpSSB), Salmonella enterica Serovar Typhimurium LT2 SSB (StSSB), Pseudomonas aeruginosa PAO1 SSB (PaSSB), and two chimeric KpSSB proteins, namely, KpSSBnStSSBc and KpSSBnPaSSBc. The C-terminal domain of StSSB or PaSSB was exchanged with that of KpSSB through protein chimeragenesis. By using the electrophoretic mobility shift assay, we characterized the stoichiometry of KpSSB, StSSB, PaSSB, KpSSBnStSSBc, and KpSSBnPaSSBc, complexed with a series of ssDNA homopolymers. The binding site sizes were determined to be , , , , and nucleotides (nt), respectively. Comparison of the binding site sizes of KpSSB, KpSSBnStSSBc, and KpSSBnPaSSBc showed that the C-terminal domain swapping of SSB changes the size of the binding site. Our observations suggest that not only the conserved N-terminal domain but also the C-terminal domain of SSB is an important determinant for ssDNA binding. Yen-Hua Huang and Cheng-Yang Huang Copyright © 2014 Yen-Hua Huang and Cheng-Yang Huang. All rights reserved. The Effects of the Context-Dependent Codon Usage Bias on the Structure of the nsp1α of Porcine Reproductive and Respiratory Syndrome Virus Sun, 03 Aug 2014 07:47:26 +0000 The information about the crystal structure of porcine reproductive and respiratory syndrome virus (PRRSV) leader protease nsp1α is available to analyze the roles of tRNA abundance of pigs and codon usage of the nsp1α gene in the formation of this protease. The effects of tRNA abundance of the pigs and the synonymous codon usage and the context-dependent codon bias (CDCB) of the nsp1α on shaping the specific folding units (α-helix, β-strand, and the coil) in the nsp1α were analyzed based on the structural information about this protease from protein data bank (PDB: 3IFU) and the nsp1α of the 191 PRRSV strains. By mapping the overall tRNA abundance along the nsp1α, we found that there is no link between the fluctuation of the overall tRNA abundance and the specific folding units in the nsp1α, and the low translation speed of ribosome caused by the tRNA abundance exists in the nsp1α. The strong correlation between some synonymous codon usage and the specific folding units in the nsp1α was found, and the phenomenon of CDCB exists in the specific folding units of the nsp1α. These findings provide an insight into the roles of the synonymous codon usage and CDCB in the formation of PRRSV nsp1α structure. Yao-zhong Ding, Ya-nan You, Dong-jie Sun, Hao-tai Chen, Yong-lu Wang, Hui-yun Chang, Li Pan, Yu-zhen Fang, Zhong-wang Zhang, Peng Zhou, Jian-liang Lv, Xin-sheng Liu, Jun-jun Shao, Fu-rong Zhao, Tong Lin, Laszlo Stipkovits, Zygmunt Pejsak, Yong-guang Zhang, and Jie Zhang Copyright © 2014 Yao-zhong Ding et al. All rights reserved. Detecting Epistatic Interactions in Metagenome-Wide Association Studies by metaBOOST Thu, 24 Jul 2014 18:41:12 +0000 Material and Methods. We recall the definition of epistasis and extend it for metagenomic biomarkers and then we describe the overview of our method metaBOOST and provide detailed information about each step of metaBOOST. Results. We describe the data sources for both simulation studies and real metagenomic datasets. Then, we describe the procedure of simulation studies and provide results for it. After that, we conduct real datasets studies and report the results. Conclusions and Discussion. Finally, we conclude our method and discuss some possible improvements for the future. Mengmeng Wu and Rui Jiang Copyright © 2014 Mengmeng Wu and Rui Jiang. All rights reserved. The N-Terminal Domain of Human DNA Helicase Rtel1 Contains a Redox Active Iron-Sulfur Cluster Thu, 24 Jul 2014 09:20:31 +0000 Human telomere length regulator Rtel1 is a superfamily II DNA helicase and is essential for maintaining proper length of telomeres in chromosomes. Here we report that the N-terminal domain of human Rtel1 (RtelN) expressed in Escherichia coli cells produces a protein that contains a redox active iron-sulfur cluster with the redox midpoint potential of −248 ± 10 mV (pH 8.0). The iron-sulfur cluster in RtelN is sensitive to hydrogen peroxide and nitric oxide, indicating that reactive oxygen/nitrogen species may modulate the DNA helicase activity of Rtel1 via modification of its iron-sulfur cluster. Purified RtelN retains a weak binding affinity for the single-stranded (ss) and double-stranded (ds) DNA in vitro. However, modification of the iron-sulfur cluster by hydrogen peroxide or nitric oxide does not significantly affect the DNA binding activity of RtelN, suggesting that the iron-sulfur cluster is not directly involved in the DNA interaction in the N-terminal domain of Rtel1. Aaron P. Landry and Huangen Ding Copyright © 2014 Aaron P. Landry and Huangen Ding. All rights reserved. Security Mechanism Based on Hospital Authentication Server for Secure Application of Implantable Medical Devices Thu, 24 Jul 2014 07:55:14 +0000 After two recent security attacks against implantable medical devices (IMDs) have been reported, the privacy and security risks of IMDs have been widely recognized in the medical device market and research community, since the malfunctioning of IMDs might endanger the patient’s life. During the last few years, a lot of researches have been carried out to address the security-related issues of IMDs, including privacy, safety, and accessibility issues. A physician accesses IMD through an external device called a programmer, for diagnosis and treatment. Hence, cryptographic key management between IMD and programmer is important to enforce a strict access control. In this paper, a new security architecture for the security of IMDs is proposed, based on a 3-Tier security model, where the programmer interacts with a Hospital Authentication Server, to get permissions to access IMDs. The proposed security architecture greatly simplifies the key management between IMDs and programmers. Also proposed is a security mechanism to guarantee the authenticity of the patient data collected from IMD and the nonrepudiation of the physician’s treatment based on it. The proposed architecture and mechanism are analyzed and compared with several previous works, in terms of security and performance. Chang-Seop Park Copyright © 2014 Chang-Seop Park. All rights reserved. An Intelligent System for Identifying Acetylated Lysine on Histones and Nonhistone Proteins Thu, 24 Jul 2014 00:00:00 +0000 Lysine acetylation is an important and ubiquitous posttranslational modification conserved in prokaryotes and eukaryotes. This process, which is dynamically and temporally regulated by histone acetyltransferases and deacetylases, is crucial for numerous essential biological processes such as transcriptional regulation, cellular signaling, and stress response. Since the experimental identification of lysine acetylation sites within proteins is time-consuming and laboratory-intensive, several computational approaches have been developed to identify candidates for experimental validation. In this work, acetylated protein data collected from UniProtKB were categorized into histone or nonhistone proteins. Support vector machines (SVMs) were applied to build predictive models by using amino acid pair composition (AAPC) as a feature in a histone model. We combined BLOSUM62 and AAPC features in a nonhistone model. Furthermore, using maximal dependence decomposition (MDD) clustering can enhance the performance of the model on a fivefold cross-validation evaluation to yield a sensitivity of 0.863, specificity of 0.885, accuracy of 0.880, and MCC of 0.706. Additionally, the proposed method is evaluated using independent test sets resulting in a predictive accuracy of 74%. This indicates that the performance of our method is comparable with that of other acetylation prediction methods. Cheng-Tsung Lu, Tzong-Yi Lee, Yu-Ju Chen, and Yi-Ju Chen Copyright © 2014 Cheng-Tsung Lu et al. All rights reserved. Studying the Complex Expression Dependences between Sets of Coexpressed Genes Thu, 24 Jul 2014 00:00:00 +0000 Organisms simplify the orchestration of gene expression by coregulating genes whose products function together in the cell. The use of clustering methods to obtain sets of coexpressed genes from expression arrays is very common; nevertheless there are no appropriate tools to study the expression networks among these sets of coexpressed genes. The aim of the developed tools is to allow studying the complex expression dependences that exist between sets of coexpressed genes. For this purpose, we start detecting the nonlinear expression relationships between pairs of genes, plus the coexpressed genes. Next, we form networks among sets of coexpressed genes that maintain nonlinear expression dependences between all of them. The expression relationship between the sets of coexpressed genes is defined by the expression relationship between the skeletons of these sets, where this skeleton represents the coexpressed genes with a well-defined nonlinear expression relationship with the skeleton of the other sets. As a result, we can study the nonlinear expression relationships between a target gene and other sets of coexpressed genes, or start the study from the skeleton of the sets, to study the complex relationships of activation and deactivation between the sets of coexpressed genes that carry out the different cellular processes present in the expression experiments. Mario Huerta, Oriol Casanova, Roberto Barchino, Jose Flores, Enrique Querol, and Juan Cedano Copyright © 2014 Mario Huerta et al. All rights reserved. An Efficient Parallel Algorithm for Multiple Sequence Similarities Calculation Using a Low Complexity Method Tue, 22 Jul 2014 09:07:46 +0000 With the advance of genomic researches, the number of sequences involved in comparative methods has grown immensely. Among them, there are methods for similarities calculation, which are used by many bioinformatics applications. Due the huge amount of data, the union of low complexity methods with the use of parallel computing is becoming desirable. The k-mers counting is a very efficient method with good biological results. In this work, the development of a parallel algorithm for multiple sequence similarities calculation using the k-mers counting method is proposed. Tests show that the algorithm presents a very good scalability and a nearly linear speedup. For 14 nodes was obtained 12x speedup. This algorithm can be used in the parallelization of some multiple sequence alignment tools, such as MAFFT and MUSCLE. Evandro A. Marucci, Geraldo F. D. Zafalon, Julio C. Momente, Leandro A. Neves, Carlo R. Valêncio, Alex R. Pinto, Adriano M. Cansian, Rogeria C. G. de Souza, Yang Shiyou, and José M. Machado Copyright © 2014 Evandro A. Marucci et al. All rights reserved. Cell Type-Dependent RNA Recombination Frequency in the Japanese Encephalitis Virus Tue, 22 Jul 2014 00:00:00 +0000 Japanese encephalitis virus (JEV) is one of approximately 70 flaviviruses, frequently causing symptoms involving the central nervous system. Mutations of its genomic RNA frequently occur during viral replication, which is believed to be a force contributing to viral evolution. Nevertheless, accumulating evidences show that some JEV strains may have actually arisen from RNA recombination between genetically different populations of the virus. We have demonstrated that RNA recombination in JEV occurs unequally in different cell types. In the present study, viral RNA fragments transfected into as well as viral RNAs synthesized in mosquito cells were shown not to be stable, especially in the early phase of infection possibly via cleavage by exoribonuclease. Such cleaved small RNA fragments may be further degraded through an RNA interference pathway triggered by viral double-stranded RNA during replication in mosquito cells, resulting in a lower frequency of RNA recombination in mosquito cells compared to that which occurs in mammalian cells. In fact, adjustment of viral RNA to an appropriately lower level in mosquito cells prevents overgrowth of the virus and is beneficial for cells to survive the infection. Our findings may also account for the slower evolution of arboviruses as reported previously. Wei-Wei Chiang, Ching-Kai Chuang, Mei Chao, and Wei-June Chen Copyright © 2014 Wei-Wei Chiang et al. All rights reserved. Structural Insight into the DNA-Binding Mode of the Primosomal Proteins PriA, PriB, and DnaT Mon, 21 Jul 2014 08:30:20 +0000 Replication restart primosome is a complex dynamic system that is essential for bacterial survival. This system uses various proteins to reinitiate chromosomal DNA replication to maintain genetic integrity after DNA damage. The replication restart primosome in Escherichia coli is composed of PriA helicase, PriB, PriC, DnaT, DnaC, DnaB helicase, and DnaG primase. The assembly of the protein complexes within the forked DNA responsible for reloading the replicative DnaB helicase anywhere on the chromosome for genome duplication requires the coordination of transient biomolecular interactions. Over the last decade, investigations on the structure and mechanism of these nucleoproteins have provided considerable insight into primosome assembly. In this review, we summarize and discuss our current knowledge and recent advances on the DNA-binding mode of the primosomal proteins PriA, PriB, and DnaT. Yen-Hua Huang and Cheng-Yang Huang Copyright © 2014 Yen-Hua Huang and Cheng-Yang Huang. All rights reserved. Mass Spectrometry Based Proteomic Analysis of Salivary Glands of Urban Malaria Vector Anopheles stephensi Mon, 14 Jul 2014 11:31:37 +0000 Salivary gland proteins of Anopheles mosquitoes offer attractive targets to understand interactions with sporozoites, blood feeding behavior, homeostasis, and immunological evaluation of malaria vectors and parasite interactions. To date limited studies have been carried out to elucidate salivary proteins of An. stephensi salivary glands. The aim of the present study was to provide detailed analytical attributives of functional salivary gland proteins of urban malaria vector An. stephensi. A proteomic approach combining one-dimensional electrophoresis (1DE), ion trap liquid chromatography mass spectrometry (LC/MS/MS), and computational bioinformatic analysis was adopted to provide the first direct insight into identification and functional characterization of known salivary proteins and novel salivary proteins of An. stephensi. Computational studies by online servers, namely, MASCOT and OMSSA algorithms, identified a total of 36 known salivary proteins and 123 novel proteins analysed by LC/MS/MS. This first report describes a baseline proteomic catalogue of 159 salivary proteins belonging to various categories of signal transduction, regulation of blood coagulation cascade, and various immune and energy pathways of An. stephensi sialotranscriptome by mass spectrometry. Our results may serve as basis to provide a putative functional role of proteins in concept of blood feeding, biting behavior, and other aspects of vector-parasite host interactions for parasite development in anopheline mosquitoes. Sonam Vijay, Manmeet Rawat, and Arun Sharma Copyright © 2014 Sonam Vijay et al. All rights reserved. PPI Network Analysis of mRNA Expression Profile of Ezrin Knockdown in Esophageal Squamous Cell Carcinoma Mon, 14 Jul 2014 08:56:44 +0000 Ezrin, coding protein EZR which cross-links actin filaments, overexpresses and involves invasion, metastasis, and poor prognosis in various cancers including esophageal squamous cell carcinoma (ESCC). In our previous study, Ezrin was knock down and analyzed by mRNA expression profile which has not been fully mined. In this study, we applied protein-protein interactions (PPI) network knowledge and methods to explore our understanding of these differentially expressed genes (DEGs). PPI subnetworks showed that hundreds of DEGs interact with thousands of other proteins. Subcellular localization analyses found that the DEGs and their directly or indirectly interacting proteins distribute in multiple layers, which was applied to analyze the shortest paths between EZR and other DEGs. Gene ontology annotation generated a functional annotation map and found hundreds of significant terms, especially those associated with cytoskeleton organization of Ezrin protein, such as “cytoskeleton organization,” “regulation of actin filament-based process,” and “regulation of actin cytoskeleton organization.” The algorithm of Random Walk with Restart was applied to prioritize the DEGs and identified several cancer related DEGs ranked closest to EZR. These analyses based on PPI network have greatly expanded our comprehension of the mRNA expression profile of Ezrin knockdown for future examination of the roles and mechanisms of Ezrin. Bingli Wu, Jianjun Xie, Zepeng Du, Jianyi Wu, Pixian Zhang, Liyan Xu, and Enmin Li Copyright © 2014 Bingli Wu et al. All rights reserved. Identifying the Gene Signatures from Gene-Pathway Bipartite Network Guarantees the Robust Model Performance on Predicting the Cancer Prognosis Mon, 14 Jul 2014 08:20:49 +0000 For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs) generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes. Li He, Yuelong Wang, Yongning Yang, Liqiu Huang, and Zhining Wen Copyright © 2014 Li He et al. All rights reserved. The Definition of a Prolonged Intensive Care Unit Stay for Spontaneous Intracerebral Hemorrhage Patients: An Application with National Health Insurance Research Database Mon, 14 Jul 2014 08:11:53 +0000 Introduction. Length of stay (LOS) in the intensive care unit (ICU) of spontaneous intracerebral hemorrhage (sICH) patients is one of the most important issues. The disease severity, psychosocial factors, and institutional factors will influence the length of ICU stay. This study is used in the Taiwan National Health Insurance Research Database (NHIRD) to define the threshold of a prolonged ICU stay in sICH patients. Methods. This research collected the demographic data of sICH patients in the NHIRD from 2005 to 2009. The threshold of prolonged ICU stay was calculated using change point analysis. Results. There were 1599 sICH patients included. A prolonged ICU stay was defined as being equal to or longer than 10 days. There were 436 prolonged ICU stay cases and 1163 nonprolonged cases. Conclusion. This study showed that the threshold of a prolonged ICU stay is a good indicator of hospital utilization in ICH patients. Different hospitals have their own different care strategies that can be identified with a prolonged ICU stay. This indicator can be improved using quality control methods such as complications prevention and efficiency of ICU bed management. Patients’ stay in ICUs and in hospitals will be shorter if integrated care systems are established. Chien-Lung Chan, Hsien-Wei Ting, and Hsin-Tsung Huang Copyright © 2014 Chien-Lung Chan et al. All rights reserved. Incorporating Amino Acids Composition and Functional Domains for Identifying Bacterial Toxin Proteins Mon, 07 Jul 2014 08:55:16 +0000 Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development. Min-Gang Su, Chien-Hsun Huang, Tzong-Yi Lee, Yu-Ju Chen, and Hsin-Yi Wu Copyright © 2014 Min-Gang Su et al. All rights reserved. Risk Factors for Mortality in Patients with Septic Acute Kidney Injury in Intensive Care Units in Beijing, China: A Multicenter Prospective Observational Study Mon, 07 Jul 2014 06:34:39 +0000 Objective. To discover risk factors for mortality of patients with septic AKI in ICU via a multicenter study. Background. Septic AKI is a serious threat to patients in ICU, but there are a few clinical studies focusing on this. Methods. This was a prospective, observational, and multicenter study conducted in 30 ICUs of 28 major hospitals in Beijing. 3,107 patients were admitted consecutively, among which 361 patients were with septic AKI. Patient clinical data were recorded daily for 10 days after admission. Kidney Disease: Improving Global Outcomes (KDIGO) criteria were used to define and stage AKI. Of the involved patients, 201 survived and 160 died. Results. The rate of septic AKI was 11.6%. Twenty-one risk factors were found, and six independent risk factors were identified: age, APACHE II score, duration of mechanical ventilation, duration of MAP <65 mmHg, time until RRT started, and progressive KIDGO stage. Admission KDIGO stages were not associated with mortality, while worst KDIGO stages were. Only progressive KIDGO stage was an independent risk factor. Conclusions. Six independent risk factors for mortality for septic AKI were identified. Progressive KIDGO stage is better than admission or the worst KIDGO for prediction of mortality. This trial is registered with ChiCTR-ONC-11001875. Xin Wang, Li Jiang, Ying Wen, Mei-Ping Wang, Wei Li, Zhi-Qiang Li, and Xiu-Ming Xi Copyright © 2014 Xin Wang et al. All rights reserved. Gonadal Transcriptome Analysis of Male and Female Olive Flounder (Paralichthys olivaceus) Sun, 06 Jul 2014 10:07:38 +0000 Olive flounder (Paralichthys olivaceus) is an important commercially cultured marine flatfish in China, Korea, and Japan, of which female grows faster than male. In order to explore the molecular mechanism of flounder sex determination and development, we used RNA-seq technology to investigate transcriptomes of flounder gonads. This produced 22,253,217 and 19,777,841 qualified reads from ovary and testes, which were jointly assembled into 97,233 contigs. Among them, 23,223 contigs were mapped to known genes, of which 2,193 were predicted to be differentially expressed in ovary and 887 in testes. According to annotation information, several sex-related biological pathways including ovarian steroidogenesis and estrogen signaling pathways were firstly found in flounder. The dimorphic expression of overall sex-related genes provides further insights into sex determination and gonadal development. Our study also provides an archive for further studies of molecular mechanism of fish sex determination. Zhaofei Fan, Feng You, Lijuan Wang, Shenda Weng, Zhihao Wu, Jinwei Hu, Yuxia Zou, Xungang Tan, and Peijun Zhang Copyright © 2014 Zhaofei Fan et al. All rights reserved. Characteristics and Prediction of RNA Structure Sun, 06 Jul 2014 09:18:47 +0000 RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding. Hengwu Li, Daming Zhu, Caiming Zhang, Huijian Han, and Keith A. Crandall Copyright © 2014 Hengwu Li et al. All rights reserved. Microarray-Based RNA Profiling of Breast Cancer: Batch Effect Removal Improves Cross-Platform Consistency Thu, 03 Jul 2014 00:00:00 +0000 Microarray is a powerful technique used extensively for gene expression analysis. Different technologies are available, but lack of standardization makes it challenging to compare and integrate data. Furthermore, batch-related biases within datasets are common but often not tackled. We have analyzed the same 234 breast cancers on two different microarray platforms. One dataset contained known batch-effects associated with the fabrication procedure used. The aim was to assess the significance of correcting for systematic batch-effects when integrating data from different platforms. We here demonstrate the importance of detecting batch-effects and how tools, such as ComBat, can be used to successfully overcome such systematic variations in order to unmask essential biological signals. Batch adjustment was found to be particularly valuable in the detection of more delicate differences in gene expression. Furthermore, our results show that prober adjustment is essential for integration of gene expression data obtained from multiple sources. We show that high-variance genes are highly reproducibly expressed across platforms making them particularly well suited as biomarkers and for building gene signatures, exemplified by prediction of estrogen-receptor status and molecular subtypes. In conclusion, the study emphasizes the importance of utilizing proper batch adjustment methods when integrating data across different batches and platforms. Martin J. Larsen, Mads Thomassen, Qihua Tan, Kristina P. Sørensen, and Torben A. Kruse Copyright © 2014 Martin J. Larsen et al. All rights reserved. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis Thu, 03 Jul 2014 00:00:00 +0000 Gene set analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the values and FDR (false discovery rate) -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online. Chih-Yi Chien, Ching-Wei Chang, Chen-An Tsai, and James J. Chen Copyright © 2014 Chih-Yi Chien et al. All rights reserved. Combined Analysis with Copy Number Variation Identifies Risk Loci in Lung Cancer Tue, 01 Jul 2014 11:54:11 +0000 Background. Lung cancer is the most important cause of cancer mortality worldwide, but the underlying mechanisms of this disease are not fully understood. Copy number variations (CNVs) are promising genetic variations to study because of their potential effects on cancer. Methodology/Principal Findings. Here we conducted a pilot study in which we systematically analyzed the association of CNVs in two lung cancer datasets: the Environment And Genetics in Lung cancer Etiology (EAGLE) and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial datasets. We used a preestablished association method to test the datasets separately and conducted a combined analysis to test the association accordance between the two datasets. Finally, we identified 167 risk SNP loci and 22 CNVs associated with lung cancer and linked them with recombination hotspots. Functional annotation and biological relevance analyses implied that some of our predicted risk loci were supported by other studies and might be potential candidate loci for lung cancer studies. Conclusions/Significance. Our results further emphasized the importance of copy number variations in cancer and might be a valuable complement to current genome-wide association studies on cancer. Xinlei Li, Xianfeng Chen, Guohong Hu, Yang Liu, Zhenguo Zhang, Ping Wang, You Zhou, Xianfu Yi, Jie Zhang, Yufei Zhu, Zejun Wei, Fei Yuan, Guoping Zhao, Jun Zhu, Landian Hu, and Xiangyin Kong Copyright © 2014 Xinlei Li et al. All rights reserved. Target Capture and Massive Sequencing of Genes Transcribed in Mytilus galloprovincialis Mon, 30 Jun 2014 11:33:59 +0000 Next generation sequencing (NGS) allows fast and massive production of both genome and transcriptome sequence datasets. As the genome of the Mediterranean mussel Mytilus galloprovincialis is not available at present, we have explored the possibility of reducing the whole genome sequencing efforts by using capture probes coupled with PCR amplification and high-throughput 454-sequencing to enrich selected genomic regions. The enrichment of DNA target sequences was validated by real-time PCR, whereas the efficacy of the applied strategy was evaluated by mapping the 454-output reads against reference transcript data already available for M. galloprovincialis and by measuring coverage, SNPs, number of de novo sequenced introns, and complete gene sequences. Focusing on a target size of nearly 1.5 Mbp, we obtained a target coverage which allowed the identification of more than 250 complete introns, 10,741 SNPs, and also complete gene sequences. This study confirms the transcriptome-based enrichment of gDNA regions as a good strategy to expand knowledge on specific subsets of genes also in nonmodel organisms. Umberto Rosani, Stefania Domeneghetti, Alberto Pallavicini, and Paola Venier Copyright © 2014 Umberto Rosani et al. All rights reserved. Identifying Hierarchical and Overlapping Protein Complexes Based on Essential Protein-Protein Interactions and “Seed-Expanding” Method Mon, 30 Jun 2014 09:43:33 +0000 Many evidences have demonstrated that protein complexes are overlapping and hierarchically organized in PPI networks. Meanwhile, the large size of PPI network wants complex detection methods have low time complexity. Up to now, few methods can identify overlapping and hierarchical protein complexes in a PPI network quickly. In this paper, a novel method, called MCSE, is proposed based on -module and “seed-expanding.” First, it chooses seeds as essential PPIs or edges with high edge clustering values. Then, it identifies protein complexes by expanding each seed to a -module. MCSE is suitable for large PPI networks because of its low time complexity. MCSE can identify overlapping protein complexes naturally because a protein can be visited by different seeds. MCSE uses the parameter _th to control the range of seed expanding and can detect a hierarchical organization of protein complexes by tuning the value of _th. Experimental results of S. cerevisiae show that this hierarchical organization is similar to that of known complexes in MIPS database. The experimental results also show that MCSE outperforms other previous competing algorithms, such as CPM, CMC, Core-Attachment, Dpclus, HC-PIN, MCL, and NFC, in terms of the functional enrichment and matching with known protein complexes. Jun Ren, Wei Zhou, and Jianxin Wang Copyright © 2014 Jun Ren et al. All rights reserved. Integrating In Silico Prediction Methods, Molecular Docking, and Molecular Dynamics Simulation to Predict the Impact of ALK Missense Mutations in Structural Perspective Thu, 26 Jun 2014 12:00:41 +0000 Over the past decade, advancements in next generation sequencing technology have placed personalized genomic medicine upon horizon. Understanding the likelihood of disease causing mutations in complex diseases as pathogenic or neutral remains as a major task and even impossible in the structural context because of its time consuming and expensive experiments. Among the various diseases causing mutations, single nucleotide polymorphisms (SNPs) play a vital role in defining individual’s susceptibility to disease and drug response. Understanding the genotype-phenotype relationship through SNPs is the first and most important step in drug research and development. Detailed understanding of the effect of SNPs on patient drug response is a key factor in the establishment of personalized medicine. In this paper, we represent a computational pipeline in anaplastic lymphoma kinase (ALK) for SNP-centred study by the application of in silico prediction methods, molecular docking, and molecular dynamics simulation approaches. Combination of computational methods provides a way in understanding the impact of deleterious mutations in altering the protein drug targets and eventually leading to variable patient’s drug response. We hope this rapid and cost effective pipeline will also serve as a bridge to connect the clinicians and in silico resources in tailoring treatments to the patients’ specific genotype. C. George Priya Doss, Chiranjib Chakraborty, Luonan Chen, and Hailong Zhu Copyright © 2014 C. George Priya Doss et al. All rights reserved. SSFinder: High Throughput CRISPR-Cas Target Sites Prediction Tool Thu, 26 Jun 2014 00:00:00 +0000 Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online. Santosh Kumar Upadhyay and Shailesh Sharma Copyright © 2014 Santosh Kumar Upadhyay and Shailesh Sharma. All rights reserved. Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering Wed, 25 Jun 2014 07:19:47 +0000 Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data. Daniel C. Ilut, Marie L. Nydam, and Matthew P. Hare Copyright © 2014 Daniel C. Ilut et al. All rights reserved. The Human Plasma Membrane Peripherome: Visualization and Analysis of Interactions Wed, 25 Jun 2014 07:02:03 +0000 A major part of membrane function is conducted by proteins, both integral and peripheral. Peripheral membrane proteins temporarily adhere to biological membranes, either to the lipid bilayer or to integral membrane proteins with noncovalent interactions. The aim of this study was to construct and analyze the interactions of the human plasma membrane peripheral proteins (peripherome hereinafter). For this purpose, we collected a dataset of peripheral proteins of the human plasma membrane. We also collected a dataset of experimentally verified interactions for these proteins. The interaction network created from this dataset has been visualized using Cytoscape. We grouped the proteins based on their subcellular location and clustered them using the MCL algorithm in order to detect functional modules. Moreover, functional and graph theory based analyses have been performed to assess biological features of the network. Interaction data with drug molecules show that ~10% of peripheral membrane proteins are targets for approved drugs, suggesting their potential implications in disease. In conclusion, we reveal novel features and properties regarding the protein-protein interaction network created by peripheral proteins of the human plasma membrane. Katerina C. Nastou, Georgios N. Tsaousis, Kimon E. Kremizas, Zoi I. Litou, and Stavros J. Hamodrakas Copyright © 2014 Katerina C. Nastou et al. All rights reserved. MPINet: Metabolite Pathway Identification via Coupling of Global Metabolite Network Structure and Metabolomic Profile Wed, 25 Jun 2014 06:50:21 +0000 High-throughput metabolomics technology, such as gas chromatography mass spectrometry, allows the analysis of hundreds of metabolites. Understanding that these metabolites dominate the study condition from biological pathway perspective is still a significant challenge. Pathway identification is an invaluable aid to address this issue and, thus, is urgently needed. In this study, we developed a network-based metabolite pathway identification method, MPINet, which considers the global importance of metabolites and the unique character of metabolomic profile. Through integrating the global metabolite functional network structure and the character of metabolomic profile, MPINet provides a more accurate metabolomic pathway analysis. This integrative strategy simultaneously captures the global nonequivalence of metabolites in a pathway and the bias from metabolomic experimental technology. We then applied MPINet to four different types of metabolite datasets. In the analysis of metastatic prostate cancer dataset, we demonstrated the effectiveness of MPINet. With the analysis of the two type 2 diabetes datasets, we show that MPINet has the potentiality for identifying novel pathways related with disease and is reliable for analyzing metabolomic data. Finally, we extensively applied MPINet to identify drug sensitivity related pathways. These results suggest MPINet’s effectiveness and reliability for analyzing metabolomic data across multiple different application fields. Feng Li, Yanjun Xu, Desi Shang, Haixiu Yang, Wei Liu, Junwei Han, Zeguo Sun, Qianlan Yao, Chunlong Zhang, Jiquan Ma, Fei Su, Li Feng, Xinrui Shi, Yunpeng Zhang, Jing Li, Qi Gu, Xia Li, and Chunquan Li Copyright © 2014 Feng Li et al. All rights reserved. Biomolecular Networks and Human Diseases Tue, 24 Jun 2014 08:06:49 +0000 FangXiang Wu, Luonan Chen, Jianxin Wang, and Reda Alhajj Copyright © 2014 FangXiang Wu et al. All rights reserved. miRSeq: A User-Friendly Standalone Toolkit for Sequencing Quality Evaluation and miRNA Profiling Tue, 24 Jun 2014 06:46:17 +0000 MicroRNAs (miRNAs) present diverse regulatory functions in a wide range of biological activities. Studies on miRNA functions generally depend on determining miRNA expression profiles between libraries by using a next-generation sequencing (NGS) platform. Currently, several online web services are developed to provide small RNA NGS data analysis. However, the submission of large amounts of NGS data, conversion of data format, and limited availability of species bring problems. In this study, we developed miRSeq to provide alternatives. To test the performance, we had small RNA NGS data from four species, including human, rat, fly, and nematode, analyzed with miRSeq. The alignments results indicate that miRSeq can precisely evaluate the sequencing quality of samples regarding percentage of self-ligation read, read length distribution, and read category. miRSeq is a user-friendly standalone toolkit featuring a graphical user interface (GUI). After a simple installation, users can easily operate miRSeq on a PC or laptop by using a mouse. Within minutes, miRSeq yields useful miRNA data, including miRNA expression profiles, 3′ end modification patterns, and isomiR forms. Moreover, miRSeq supports the analysis of up to 105 animal species, providing higher flexibility. Cheng-Tsung Pan, Kuo-Wang Tsai, Tzu-Min Hung, Wei-Chen Lin, Chao-Yu Pan, Hong-Ren Yu, and Sung-Chou Li Copyright © 2014 Cheng-Tsung Pan et al. All rights reserved. A Graphic Method for Identification of Novel Glioma Related Genes Mon, 23 Jun 2014 07:15:21 +0000 Glioma, as the most common and lethal intracranial tumor, is a serious disease that causes many deaths every year. Good comprehension of the mechanism underlying this disease is very helpful to design effective treatments. However, up to now, the knowledge of this disease is still limited. It is an important step to understand the mechanism underlying this disease by uncovering its related genes. In this study, a graphic method was proposed to identify novel glioma related genes based on known glioma related genes. A weighted graph was constructed according to the protein-protein interaction information retrieved from STRING and the well-known shortest path algorithm was employed to discover novel genes. The following analysis suggests that some of them are related to the biological process of glioma, proving that our method was effective in identifying novel glioma related genes. We hope that the proposed method would be applied to study other diseases and provide useful information to medical workers, thereby designing effective treatments of different diseases. Yu-Fei Gao, Yang Shu, Lei Yang, Yi-Chun He, Li-Peng Li, GuaHua Huang, Hai-Peng Li, and Yang Jiang Copyright © 2014 Yu-Fei Gao et al. All rights reserved. A Novel Dynamic Update Framework for Epileptic Seizure Prediction Sun, 22 Jun 2014 00:00:00 +0000 Epileptic seizure prediction is a difficult problem in clinical applications, and it has the potential to significantly improve the patients’ daily lives whose seizures cannot be controlled by either drugs or surgery. However, most current studies of epileptic seizure prediction focus on high sensitivity and low false-positive rate only and lack the flexibility for a variety of epileptic seizures and patients’ physical conditions. Therefore, a novel dynamic update framework for epileptic seizure prediction is proposed in this paper. In this framework, two basic sample pools are constructed and updated dynamically. Furthermore, the prediction model can be updated to be the most appropriate one for the prediction of seizures’ arrival. Mahalanobis distance is introduced in this part to solve the problem of side information, measuring the distance between two data sets. In addition, a multichannel feature extraction method based on Hilbert-Huang transform and extreme learning machine is utilized to extract the features of a patient’s preseizure state against the normal state. At last, a dynamic update epileptic seizure prediction system is built up. Simulations on Freiburg database show that the proposed system has a better performance than the one without update. The research of this paper is significantly helpful for clinical applications, especially for the exploitation of online portable devices. Min Han, Sunan Ge, Minghui Wang, Xiaojun Hong, and Jie Han Copyright © 2014 Min Han et al. All rights reserved. An Integrated Analysis of miRNA, lncRNA, and mRNA Expression Profiles Wed, 18 Jun 2014 06:38:20 +0000 Increasing amounts of evidence indicate that noncoding RNAs (ncRNAs) have important roles in various biological processes. Here, miRNA, lncRNA, and mRNA expression profiles were analyzed in human HepG2 and L02 cells using high-throughput technologies. An integrative method was developed to identify possible functional relationships between different RNA molecules. The dominant deregulated miRNAs were prone to be downregulated in tumor cells, and the most abnormal mRNAs and lncRNAs were always upregulated. However, the genome-wide analysis of differentially expressed RNA species did not show significant bias between up- and downregulated populations. miRNA-mRNA interaction was performed based on their regulatory relationships, and miRNA-lncRNA and mRNA-lncRNA interactions were thoroughly surveyed and identified based on their locational distributions and sequence correlations. Aberrantly expressed miRNAs were further analyzed based on their multiple isomiRs. IsomiR repertoires and expression patterns were varied across miRNA loci. Several specific miRNA loci showed differences between tumor and normal cells, especially with respect to abnormally expressed miRNA species. These findings suggest that isomiR repertoires and expression patterns might contribute to tumorigenesis through different biological roles. Systematic and integrative analysis of different RNA molecules with potential cross-talk may make great contributions to the unveiling of the complex mechanisms underlying tumorigenesis. Li Guo, Yang Zhao, Sheng Yang, Hui Zhang, and Feng Chen Copyright © 2014 Li Guo et al. All rights reserved. Ultrasonographic Fetal Growth Charts: An Informatic Approach by Quantitative Analysis of the Impact of Ethnicity on Diagnoses Based on a Preliminary Report on Salentinian Population Wed, 18 Jun 2014 00:00:00 +0000 Clear guidance on fetal growth assessment is important because of the strong links between growth restriction or macrosomia and adverse perinatal outcome in order to reduce associated morbidity and mortality. Fetal growth curves are extensively adopted to track fetal sizes from the early phases of pregnancy up to delivery. In the literature, a large variety of reference charts are reported but they are mostly up to five decades old. Furthermore, they do not address several variables and factors (e.g., ethnicity, foods, lifestyle, smoke, and physiological and pathological variables), which are very important for a correct evaluation of the fetal well-being. Therefore, currently adopted fetal growth charts are inadequate to support the melting pot of ethnic groups and lifestyles of our society. Customized fetal growth charts are needed to provide an accurate fetal assessment and to avoid unnecessary obstetric interventions at the time of delivery. Starting from the development of a growth chart purposely built for a specific population, in the paper, authors quantify and analyse the impact of the adoption of wrong growth charts on fetal diagnoses. These results come from a preliminary evaluation of a new open service developed to produce personalized growth charts for specific ethnicity, lifestyle, and other parameters. Andrea Tinelli, Mario Alessandro Bochicchio, Lucia Vaira, and Antonio Malvasi Copyright © 2014 Andrea Tinelli et al. All rights reserved. Conformational B-Cell Epitopes Prediction from Sequences Using Cost-Sensitive Ensemble Classifiers and Spatial Clustering Tue, 17 Jun 2014 07:10:22 +0000 B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use. Jian Zhang, Xiaowei Zhao, Pingping Sun, Bo Gao, and Zhiqiang Ma Copyright © 2014 Jian Zhang et al. All rights reserved. On Macroscopic Quantum Phenomena in Biomolecules and Cells: From Levinthal to Hopfield Mon, 16 Jun 2014 06:43:51 +0000 In the context of the macroscopic quantum phenomena of the second kind, we hereby seek for a solution-in-principle of the long standing problem of the polymer folding, which was considered by Levinthal as (semi)classically intractable. To illuminate it, we applied quantum-chemical and quantum decoherence approaches to conformational transitions. Our analyses imply the existence of novel macroscopic quantum biomolecular phenomena, with biomolecular chain folding in an open environment considered as a subtle interplay between energy and conformation eigenstates of this biomolecule, governed by quantum-chemical and quantum decoherence laws. On the other hand, within an open biological cell, a system of all identical (noninteracting and dynamically noncoupled) biomolecular proteins might be considered as corresponding spatial quantum ensemble of these identical biomolecular processors, providing spatially distributed quantum solution to a single corresponding biomolecular chain folding, whose density of conformational states might be represented as Hopfield-like quantum-holographic associative neural network too (providing an equivalent global quantum-informational alternative to standard molecular-biology local biochemical approach in biomolecules and cells and higher hierarchical levels of organism, as well). Dejan Raković, Miroljub Dugić, Jasmina Jeknić-Dugić, Milenko Plavšić, Stevo Jaćimovski, and Jovan Šetrajčić Copyright © 2014 Dejan Raković et al. All rights reserved. Big Data and Network Biology Sun, 15 Jun 2014 12:51:54 +0000 Shigehiko Kanaya, Md. Altaf-Ul-Amin, Samuel Kuria Kiboi, and Farit Mochamad Afendi Copyright © 2014 Shigehiko Kanaya et al. All rights reserved. Integrative Genomics and Computational Systems Medicine Sun, 15 Jun 2014 05:47:08 +0000 Jason E. McDermott, Yufei Huang, Bing Zhang, Hua Xu, and Zhongming Zhao Copyright © 2014 Jason E. McDermott et al. All rights reserved. Development of Dual Inhibitors against Alzheimer’s Disease Using Fragment-Based QSAR and Molecular Docking Thu, 12 Jun 2014 10:57:36 +0000 Alzheimer’s (AD) is the leading cause of dementia among elderly people. Considering the complex heterogeneous etiology of AD, there is an urgent need to develop multitargeted drugs for its suppression. -amyloid cleavage enzyme (BACE-1) and acetylcholinesterase (AChE), being important for AD progression, have been considered as promising drug targets. In this study, a robust and highly predictive group-based QSAR (GQSAR) model has been developed based on the descriptors calculated for the fragments of 20 1,4-dihydropyridine (DHP) derivatives. A large combinatorial library of DHP analogues was created, the activity of each compound was predicted, and the top compounds were analyzed using refined molecular docking. A detailed interaction analysis was carried out for the top two compounds (EDC and FDC) which showed significant binding affinity for BACE-1 and AChE. This study paves way for consideration of these lead molecules as prospective drugs for the effective dual inhibition of BACE-1 and AChE. The GQSAR model provides site-specific clues about the molecules where certain modifications can result in increased biological activity. This information could be of high value for design and development of multifunctional drugs for combating AD. Manisha Goyal, Jaspreet Kaur Dhanjal, Sukriti Goyal, Chetna Tyagi, Rabia Hamid, and Abhinav Grover Copyright © 2014 Manisha Goyal et al. All rights reserved. Large-Scale Investigation of Human TF-miRNA Relations Based on Coexpression Profiles Mon, 09 Jun 2014 00:00:00 +0000 Noncoding, endogenous microRNAs (miRNAs) are fairly well known for regulating gene expression rather than protein coding. Dysregulation of miRNA gene, either upregulated or downregulated, may lead to severe diseases or oncogenesis, especially when the miRNA disorder involves significant bioreactions or pathways. Thus, how miRNA genes are transcriptionally regulated has been highlighted as well as target recognition in recent years. In this study, a large-scale investigation of novel cis- and trans-elements was undertaken to further determine TF-miRNA regulatory relations, which are necessary to unravel the transcriptional regulation of miRNA genes. Based on miRNA and annotated gene expression profiles, the term “coTFBS” was introduced to detect common transcription factors and the corresponding binding sites within the promoter regions of each miRNA and its coexpressed annotated genes. The computational pipeline was successfully established to filter redundancy due to short sequence motifs for TFBS pattern search. Eventually, we identified more convinced TF-miRNA regulatory relations for 225 human miRNAs. This valuable information is helpful in understanding miRNA functions and provides knowledge to evaluate the therapeutic potential in clinical research. Once most expression profiles of miRNAs in the latest database are completed, TF candidates of more miRNAs can be explored by this filtering approach in the future. Chia-Hung Chien, Yi-Fan Chiang-Hsieh, Ann-Ping Tsou, Shun-Long Weng, Wen-Chi Chang, and Hsien-Da Huang Copyright © 2014 Chia-Hung Chien et al. All rights reserved. Computational Evidence of NAGNAG Alternative Splicing in Human Large Intergenic Noncoding RNA Thu, 05 Jun 2014 12:22:48 +0000 NAGNAG alternative splicing plays an essential role in biological processes and represents a highly adaptable system for posttranslational regulation of gene function. NAGNAG alternative splicing impacts a myriad of biological processes. Previous studies of NAGNAG largely focused on messenger RNA. To the best of our knowledge, this is the first study testing the hypothesis that NAGNAG alternative splicing is also operative in large intergenic noncoding RNA (lincRNA). The RNA-seq data sets from recent deep sequencing studies were queried to test our hypothesis. NAGNAG alternative splicing of human lincRNA was identified while querying two independent RNA-seq data sets. Within these datasets, 31 NAGNAG alternative splicing sites were identified in lincRNA. Notably, most exons of lincRNA containing NAGNAG acceptors were longer than those from protein-coding genes. Furthermore, presence of CAG coding appeared to participate in the splice site selection. Finally, expression of the isoforms of NAGNAG lincRNA exhibited tissue specificity. Together, this study improves our understanding of the NAGNAG alternative splicing in lincRNA. Xiaoyong Sun, Simon M. Lin, and Xiaoyan Yan Copyright © 2014 Xiaoyong Sun et al. All rights reserved. The Domain Landscape of Virus-Host Interactomes Wed, 04 Jun 2014 12:18:42 +0000 Viral infections result in millions of deaths in the world today. A thorough analysis of virus-host interactomes may reveal insights into viral infection and pathogenic strategies. In this study, we presented a landscape of virus-host interactomes based on protein domain interaction. Compared to the analysis at protein level, this domain-domain interactome provided a unique abstraction of protein-protein interactome. Through comparisons among DNA, RNA, and retrotranscribing viruses, we identified a core of human domains, that viruses used to hijack the cellular machinery and evade the immune system, which might be promising antiviral drug targets. We showed that viruses preferentially interacted with host hub and bottleneck domains, and the degree and betweenness centrality among three categories of viruses are significantly different. Further analysis at functional level highlighted that different viruses perturbed the host cellular molecular network by common and unique strategies. Most importantly, we creatively proposed a viral disease network among viral domains, human domains and the corresponding diseases, which uncovered several unknown virus-disease relationships that needed further verification. Overall, it is expected that the findings will help to deeply understand the viral infection and contribute to the development of antiviral therapy. Lu-Lu Zheng, Chunyan Li, Jie Ping, Yanhong Zhou, Yixue Li, and Pei Hao Copyright © 2014 Lu-Lu Zheng et al. All rights reserved. biomvRhsmm: Genomic Segmentation with Hidden Semi-Markov Model Tue, 03 Jun 2014 12:17:37 +0000 High-throughput technologies like tiling array and next-generation sequencing (NGS) generate continuous homogeneous segments or signal peaks in the genome that represent transcripts and transcript variants (transcript mapping and quantification), regions of deletion and amplification (copy number variation), or regions characterized by particular common features like chromatin state or DNA methylation ratio (epigenetic modifications). However, the volume and output of data produced by these technologies present challenges in analysis. Here, a hidden semi-Markov model (HSMM) is implemented and tailored to handle multiple genomic profile, to better facilitate genome annotation by assisting in the detection of transcripts, regulatory regions, and copy number variation by holistic microarray or NGS. With support for various data distributions, instead of limiting itself to one specific application, the proposed hidden semi-Markov model is designed to allow modeling options to accommodate different types of genomic data and to serve as a general segmentation engine. By incorporating genomic positions into the sojourn distribution of HSMM, with optional prior learning using annotation or previous studies, the modeling output is more biologically sensible. The proposed model has been compared with several other state-of-the-art segmentation models through simulation benchmarking, which shows that our efficient implementation achieves comparable or better sensitivity and specificity in genomic segmentation. Yang Du, Eduard Murani, Siriluck Ponsuksili, and Klaus Wimmers Copyright © 2014 Yang Du et al. All rights reserved. ABC and IFC: Modules Detection Method for PPI Network Mon, 02 Jun 2014 06:16:30 +0000 Many clustering algorithms are unable to solve the clustering problem of protein-protein interaction (PPI) networks effectively. A novel clustering model which combines the optimization mechanism of artificial bee colony (ABC) with the fuzzy membership matrix is proposed in this paper. The proposed ABC-IFC clustering model contains two parts: searching for the optimum cluster centers using ABC mechanism and forming clusters using intuitionistic fuzzy clustering (IFC) method. Firstly, the cluster centers are set randomly and the initial clustering results are obtained by using fuzzy membership matrix. Then the cluster centers are updated through different functions of bees in ABC algorithm; then the clustering result is obtained through IFC method based on the new optimized cluster center. To illustrate its performance, the ABC-IFC method is compared with the traditional fuzzy C-means clustering and IFC method. The experimental results on MIPS dataset show that the proposed ABC-IFC method not only gets improved in terms of several commonly used evaluation criteria such as precision, recall, and P value, but also obtains a better clustering result. Xiujuan Lei, Fang-Xiang Wu, Jianfang Tian, and Jie Zhao Copyright © 2014 Xiujuan Lei et al. All rights reserved. iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels Sun, 01 Jun 2014 06:50:38 +0000 Conotoxins are small disulfide-rich neurotoxic peptides, which can bind to ion channels with very high specificity and modulate their activities. Over the last few decades, conotoxins have been the drug candidates for treating chronic pain, epilepsy, spasticity, and cardiovascular diseases. According to their functions and targets, conotoxins are generally categorized into three types: potassium-channel type, sodium-channel type, and calcium-channel types. With the avalanche of peptide sequences generated in the postgenomic age, it is urgent and challenging to develop an automated method for rapidly and accurately identifying the types of conotoxins based on their sequence information alone. To address this challenge, a new predictor, called iCTX-Type, was developed by incorporating the dipeptide occurrence frequencies of a conotoxin sequence into a 400-D (dimensional) general pseudoamino acid composition, followed by the feature optimization procedure to reduce the sample representation from 400-D to 50-D vector. The overall success rate achieved by iCTX-Type via a rigorous cross-validation was over 91%, outperforming its counterpart (RBF network). Besides, iCTX-Type is so far the only predictor in this area with its web-server available, and hence is particularly useful for most experimental scientists to get their desired results without the need to follow the complicated mathematics involved. Hui Ding, En-Ze Deng, Lu-Feng Yuan, Li Liu, Hao Lin, Wei Chen, and Kuo-Chen Chou Copyright © 2014 Hui Ding et al. All rights reserved. Systems Biology in the Context of Big Data and Networks Tue, 27 May 2014 12:27:40 +0000 Science is going through two rapidly changing phenomena: one is the increasing capabilities of the computers and software tools from terabytes to petabytes and beyond, and the other is the advancement in high-throughput molecular biology producing piles of data related to genomes, transcriptomes, proteomes, metabolomes, interactomes, and so on. Biology has become a data intensive science and as a consequence biology and computer science have become complementary to each other bridged by other branches of science such as statistics, mathematics, physics, and chemistry. The combination of versatile knowledge has caused the advent of big-data biology, network biology, and other new branches of biology. Network biology for instance facilitates the system-level understanding of the cell or cellular components and subprocesses. It is often also referred to as systems biology. The purpose of this field is to understand organisms or cells as a whole at various levels of functions and mechanisms. Systems biology is now facing the challenges of analyzing big molecular biological data and huge biological networks. This review gives an overview of the progress in big-data biology, and data handling and also introduces some applications of networks and multivariate analysis in systems biology. Md. Altaf-Ul-Amin, Farit Mochamad Afendi, Samuel Kuria Kiboi, and Shigehiko Kanaya Copyright © 2014 Md. Altaf-Ul-Amin et al. All rights reserved. MultiRankSeq: Multiperspective Approach for RNAseq Differential Expression Analysis and Quality Control Tue, 27 May 2014 12:25:42 +0000 Background. After a decade of microarray technology dominating the field of high-throughput gene expression profiling, the introduction of RNAseq has revolutionized gene expression research. While RNAseq provides more abundant information than microarray, its analysis has proved considerably more complicated. To date, no consensus has been reached on the best approach for RNAseq-based differential expression analysis. Not surprisingly, different studies have drawn different conclusions as to the best approach to identify differentially expressed genes based upon their own criteria and scenarios considered. Furthermore, the lack of effective quality control may lead to misleading results interpretation and erroneous conclusions. To solve these aforementioned problems, we propose a simple yet safe and practical rank-sum approach for RNAseq-based differential gene expression analysis named MultiRankSeq. MultiRankSeq first performs quality control assessment. For data meeting the quality control criteria, MultiRankSeq compares the study groups using several of the most commonly applied analytical methods and combines their results to generate a new rank-sum interpretation. MultiRankSeq provides a unique analysis approach to RNAseq differential expression analysis. MultiRankSeq is written in R, and it is easily applicable. Detailed graphical and tabular analysis reports can be generated with a single command line. Yan Guo, Shilin Zhao, Fei Ye, Quanhu Sheng, and Yu Shyr Copyright © 2014 Yan Guo et al. All rights reserved. Gleditsia sinensis: Transcriptome Sequencing, Construction, and Application of Its Protein-Protein Interaction Network Tue, 27 May 2014 09:02:45 +0000 Gleditsia sinensis is a genus of deciduous tree in the family Caesalpinioideae, native to China, and is of great economic importance. However, despite its economic value, gene sequence information is strongly lacking. In the present study, transcriptome sequencing of G. sinensis was performed resulting in approximately 75.5 million clean reads assembled into 142155 unique transcripts generating 58583 unigenes. The average length of the unigenes was 900 bp, with an N50 of 549 bp. The obtained unigene sequences were then compared to four protein databases to include NCBI nonredundant protein (NRDB), Swiss-prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Cluster of Orthologous Groups (COG). Using BLAST procedure, 31385 unigenes (53.6%) were generated to have functional annotations. Additionally, sequence homologies between identified unigenes and genes of known species in a protein-protein interaction (PPI) network facilitated G. sinensis PPI network construction. Based on this network construction, new stress resistance genes (including cold, drought, and high salinity) were predicted. The present study is the first investigation of genome-wide gene expression in G. sinensis with the results providing a basis for future functional genomic studies relating to this species. Liucun Zhu, Ying Zhang, Wenna Guo, and Qiang Wang Copyright © 2014 Liucun Zhu et al. All rights reserved. An Association Study between Genetic Polymorphism in the Interleukin-6 Receptor Gene and Coronary Heart Disease Mon, 26 May 2014 11:25:43 +0000 The goal of our study is to test the association of IL6R rs7529229 polymorphism with CHD through a case-control study in Han Chinese population and a meta-analysis. Our result showed there is a lack of association between IL6R rs7529229 polymorphism and CHD on both genotype and allele levels in Han Chinese (). However, a meta-analysis among 11678 cases and 12861 controls showed that rs7529229-C allele was significantly associated with a decreased risk of CHD, especially in Europeans (, odds ratio = 0.93, 95% confidential interval = 0.89–0.96). Since there is significant difference among different populations, further studies are warranted to test the contribution of rs7529229 to CHD in other ethnic populations. Jiangqing Zhou, Xiaoliang Chen, Huadan Ye, Ping Peng, Yanna Ba, Xi Yang, Xiaoyan Huang, Yae Lu, Xin Jiang, Jiangfang Lian, and Shiwei Duan Copyright © 2014 Jiangqing Zhou et al. All rights reserved. enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning Mon, 26 May 2014 11:09:26 +0000 DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. Ruifeng Xu, Jiyun Zhou, Bin Liu, Lin Yao, Yulan He, Quan Zou, and Xiaolong Wang Copyright © 2014 Ruifeng Xu et al. All rights reserved. A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats Sun, 25 May 2014 08:46:24 +0000 Background. Next generation sequencing platforms can generate shorter reads, deeper coverage, and higher throughput than those of the Sanger sequencing. These short reads may be assembled de novo before some specific genome analyses. Up to now, the performances of assembling repeats of these current assemblers are very poor. Results. To improve this problem, we proposed a new genome assembly algorithm, named SWA, which has four properties: (1) assembling repeats and nonrepeats; (2) adopting a new overlapping extension strategy to extend each seed; (3) adopting sliding window to filter out the sequencing bias; and (4) proposing a compensational mechanism for low coverage datasets. SWA was evaluated and validated in both simulations and real sequencing datasets. The accuracy of assembling repeats and estimating the copy numbers is up to 99% and 100%, respectively. Finally, the extensive comparisons with other eight leading assemblers show that SWA outperformed others in terms of completeness and correctness of assembling repeats and nonrepeats. Conclusions. This paper proposed a new de novo genome assembly method for resolving complex repeats. SWA not only can detect where repeats or nonrepeats are but also can assemble them completely from NGS data, especially for assembling repeats. This is the advantage over other assemblers. Shuaibin Lian, Qingyan Li, Zhiming Dai, Qian Xiang, and Xianhua Dai Copyright © 2014 Shuaibin Lian et al. All rights reserved. iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach Thu, 22 May 2014 11:45:29 +0000 Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome’s translating mRNA will undergo a series of “product-forming” steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area. Wang-Ren Qiu, Xuan Xiao, Wei-Zhong Lin, and Kuo-Chen Chou Copyright © 2014 Wang-Ren Qiu et al. All rights reserved. Modelling Arterial Pressure Waveforms Using Gaussian Functions and Two-Stage Particle Swarm Optimizer Tue, 20 May 2014 11:09:31 +0000 Changes of arterial pressure waveform characteristics have been accepted as risk indicators of cardiovascular diseases. Waveform modelling using Gaussian functions has been used to decompose arterial pressure pulses into different numbers of subwaves and hence quantify waveform characteristics. However, the fitting accuracy and computation efficiency of current modelling approaches need to be improved. This study aimed to develop a novel two-stage particle swarm optimizer (TSPSO) to determine optimal parameters of Gaussian functions. The evaluation was performed on carotid and radial artery pressure waveforms (CAPW and RAPW) which were simultaneously recorded from twenty normal volunteers. The fitting accuracy and calculation efficiency of our TSPSO were compared with three published optimization methods: the Nelder-Mead, the modified PSO (MPSO), and the dynamic multiswarm particle swarm optimizer (DMS-PSO). The results showed that TSPSO achieved the best fitting accuracy with a mean absolute error (MAE) of 1.1% for CAPW and 1.0% for RAPW, in comparison with 4.2% and 4.1% for Nelder-Mead, 2.0% and 1.9% for MPSO, and 1.2% and 1.1% for DMS-PSO. In addition, to achieve target MAE of 2.0%, the computation time of TSPSO was only 1.5 s, which was only 20% and 30% of that for MPSO and DMS-PSO, respectively. Chengyu Liu, Tao Zhuang, Lina Zhao, Faliang Chang, Changchun Liu, Shoushui Wei, Qiqiang Li, and Dingchang Zheng Copyright © 2014 Chengyu Liu et al. All rights reserved. AmalgamScope: Merging Annotations Data across the Human Genome Tue, 20 May 2014 09:25:06 +0000 The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool. Georgia Tsiliki, Konstantinos Tsaramirsis, and Sophia Kossida Copyright © 2014 Georgia Tsiliki et al. All rights reserved. Bioinformatic Prediction of WSSV-Host Protein-Protein Interaction Mon, 19 May 2014 13:08:27 +0000 WSSV is one of the most dangerous pathogens in shrimp aquaculture. However, the molecular mechanism of how WSSV interacts with shrimp is still not very clear. In the present study, bioinformatic approaches were used to predict interactions between proteins from WSSV and shrimp. The genome data of WSSV (NC_003225.1) and the constructed transcriptome data of F. chinensis were used to screen potentially interacting proteins by searching in protein interaction databases, including STRING, Reactome, and DIP. Forty-four pairs of proteins were suggested to have interactions between WSSV and the shrimp. Gene ontology analysis revealed that 6 pairs of these interacting proteins were classified into “extracellular region” or “receptor complex” GO-terms. KEGG pathway analysis showed that they were involved in the “ECM-receptor interaction pathway.” In the 6 pairs of interacting proteins, an envelope protein called “collagen-like protein” (WSSV-CLP) encoded by an early virus gene “wsv001” in WSSV interacted with 6 deduced proteins from the shrimp, including three integrin alpha (ITGA), two integrin beta (ITGB), and one syndecan (SDC). Sequence analysis on WSSV-CLP, ITGA, ITGB, and SDC revealed that they possessed the sequence features for protein-protein interactions. This study might provide new insights into the interaction mechanisms between WSSV and shrimp. Zheng Sun, Shihao Li, Fuhua Li, and Jianhai Xiang Copyright © 2014 Zheng Sun et al. All rights reserved. A Priori Knowledge and Probability Density Based Segmentation Method for Medical CT Image Sequences Mon, 19 May 2014 06:00:45 +0000 This paper briefly introduces a novel segmentation strategy for CT images sequences. As first step of our strategy, we extract a priori intensity statistical information from object region which is manually segmented by radiologists. Then we define a search scope for object and calculate probability density for each pixel in the scope using a voting mechanism. Moreover, we generate an optimal initial level set contour based on a priori shape of object of previous slice. Finally the modified distance regularity level set method utilizes boundaries feature and probability density to conform final object. The main contributions of this paper are as follows: a priori knowledge is effectively used to guide the determination of objects and a modified distance regularization level set method can accurately extract actual contour of object in a short time. The proposed method is compared to other seven state-of-the-art medical image segmentation methods on abdominal CT image sequences datasets. The evaluated results demonstrate our method performs better and has the potential for segmentation in CT image sequences. Huiyan Jiang, Hanqing Tan, and Benqiang Yang Copyright © 2014 Huiyan Jiang et al. All rights reserved. High-Dimensional Additive Hazards Regression for Oral Squamous Cell Carcinoma Using Microarray Data: A Comparative Study Mon, 19 May 2014 05:42:13 +0000 Microarray technology results in high-dimensional and low-sample size data sets. Therefore, fitting sparse models is substantial because only a small number of influential genes can reliably be identified. A number of variable selection approaches have been proposed for high-dimensional time-to-event data based on Cox proportional hazards where censoring is present. The present study applied three sparse variable selection techniques of Lasso, smoothly clipped absolute deviation and the smooth integration of counting, and absolute deviation for gene expression survival time data using the additive risk model which is adopted when the absolute effects of multiple predictors on the hazard function are of interest. The performances of used techniques were evaluated by time dependent ROC curve and bootstrap .632+ prediction error curves. The selected genes by all methods were highly significant . The Lasso showed maximum median of area under ROC curve over time (0.95) and smoothly clipped absolute deviation showed the lowest prediction error (0.105). It was observed that the selected genes by all methods improved the prediction of purely clinical model indicating the valuable information containing in the microarray features. So it was concluded that used approaches can satisfactorily predict survival based on selected gene expression measurements. Omid Hamidi, Lily Tapak, Aarefeh Jafarzadeh Kohneloo, and Majid Sadeghifar Copyright © 2014 Omid Hamidi et al. All rights reserved. Identification of Influenza A/H7N9 Virus Infection-Related Human Genes Based on Shortest Paths in a Virus-Human Protein Interaction Network Sun, 18 May 2014 13:12:13 +0000 The recently emerging Influenza A/H7N9 virus is reported to be able to infect humans and cause mortality. However, viral and host factors associated with the infection are poorly understood. It is suggested by the “guilt by association” rule that interacting proteins share the same or similar functions and hence may be involved in the same pathway. In this study, we developed a computational method to identify Influenza A/H7N9 virus infection-related human genes based on this rule from the shortest paths in a virus-human protein interaction network. Finally, we screened out the most significant 20 human genes, which could be the potential infection related genes, providing guidelines for further experimental validation. Analysis of the 20 genes showed that they were enriched in protein binding, saccharide or polysaccharide metabolism related pathways and oxidative phosphorylation pathways. We also compared the results with those from human rhinovirus (HRV) and respiratory syncytial virus (RSV) by the same method. It was indicated that saccharide or polysaccharide metabolism related pathways might be especially associated with the H7N9 infection. These results could shed some light on the understanding of the virus infection mechanism, providing basis for future experimental biology studies and for the development of effective strategies for H7N9 clinical therapies. Ning Zhang, Min Jiang, Tao Huang, and Yu-Dong Cai Copyright © 2014 Ning Zhang et al. All rights reserved. Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks Sun, 18 May 2014 06:33:00 +0000 Identification of protein complexes from protein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived. The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures. Min Li, Weijie Chen, Jianxin Wang, Fang-Xiang Wu, and Yi Pan Copyright © 2014 Min Li et al. All rights reserved. A Network Biology Approach to Discover the Molecular Biomarker Associated with Hepatocellular Carcinoma Wed, 14 May 2014 09:12:48 +0000 In recent years, high throughput technologies such as microarray platform have provided a new avenue for hepatocellular carcinoma (HCC) investigation. Traditionally, gene sets enrichment analysis of survival related genes is commonly used to reveal the underlying functional mechanisms. However, this approach usually produces too many candidate genes and cannot discover detailed signaling transduction cascades, which greatly limits their clinical application such as biomarker development. In this study, we have proposed a network biology approach to discover novel biomarkers from multidimensional omics data. This approach effectively combines clinical survival data with topological characteristics of human protein interaction networks and patients expression profiling data. It can produce novel network based biomarkers together with biological understanding of molecular mechanism. We have analyzed eighty HCC expression profiling arrays and identified that extracellular matrix and programmed cell death are the main themes related to HCC progression. Compared with traditional enrichment analysis, this approach can provide concrete and testable hypothesis on functional mechanism. Furthermore, the identified subnetworks can potentially be used as suitable targets for therapeutic intervention in HCC. Liwei Zhuang, Yun Wu, Jiwu Han, Xiaohua Ling, Liguo Wang, Chengyan Zhu, and Yili Fu Copyright © 2014 Liwei Zhuang et al. All rights reserved. Breast Cancer Prognosis Risk Estimation Using Integrated Gene Expression and Clinical Data Wed, 14 May 2014 00:00:00 +0000 Background. Novel prognostic markers are needed so newly diagnosed breast cancer patients do not undergo any unnecessary therapy. Various microarray gene expression datasets based studies have generated gene signatures to predict the prognosis outcomes, while ignoring the large amount of information contained in established clinical markers. Nevertheless, small sample sizes in individual microarray datasets remain a bottleneck in generating robust gene signatures that show limited predictive power. The aim of this study is to achieve high classification accuracy for the good prognosis group and then achieve high classification accuracy for the poor prognosis group. Methods. We propose a novel algorithm called the IPRE (integrated prognosis risk estimation) algorithm. We used integrated microarray datasets from multiple studies to increase the sample sizes (∼2,700 samples). The IPRE algorithm consists of a virtual chromosome for the extraction of the prognostic gene signature that has 79 genes, and a multivariate logistic regression model that incorporates clinical data along with expression data to generate the risk score formula that accurately categorizes breast cancer patients into two prognosis groups. Results. The evaluation on two testing datasets showed that the IPRE algorithm achieved high classification accuracies of 82% and 87%, which was far greater than any existing algorithms. Ashish Saini, Jingyu Hou, and Wanlei Zhou Copyright © 2014 Ashish Saini et al. All rights reserved. Local Alignment Tool Based on Hadoop Framework and GPU Architecture Wed, 14 May 2014 00:00:00 +0000 With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance. Che-Lun Hung and Guan-Jie Hua Copyright © 2014 Che-Lun Hung and Guan-Jie Hua. All rights reserved. Meta-Analysis of Low Density Lipoprotein Receptor (LDLR) rs2228671 Polymorphism and Coronary Heart Disease Mon, 12 May 2014 14:08:07 +0000 Low density lipoprotein receptor (LDLR) can regulate cholesterol metabolism by removing the excess low density lipoprotein cholesterol (LDL-C) in blood. Since cholesterol metabolism is often disrupted in coronary heart disease (CHD), LDLR as a candidate gene of CHD has been intensively studied. The goal of our study is to evaluate the overall contribution of LDLR rs2228671 polymorphism to the risk of CHD by combining the genotyping data from multiple case-control studies. Our meta-analysis is involved with 8 case-control studies among 7588 cases and 9711 controls to test the association between LDLR rs2228671 polymorphism and CHD. In addition, we performed a case-control study of LDLR rs2228671 polymorphism with the risk of CHD in Chinese population. Our meta-analysis showed that rs2228671-T allele was significantly associated with a reduced risk of CHD (, odds ratio (OR) = 0.83, and 95% confidence interval (95% CI) = 0.75–0.92). However, rs2228671-T allele frequency was rare (1%) and was not associated with CHD in Han Chinese (), suggesting an ethnic difference of LDLR rs2228671 polymorphism. Meta-analysis has established rs2228671 as a protective factor of CHD in Europeans. The lack of association in Chinese reflects an ethnic difference of this genetic variant between Chinese and European populations. Huadan Ye, Qianlei Zhao, Yi Huang, Lingyan Wang, Haibo Liu, Chunming Wang, Dongjun Dai, Leiting Xu, Meng Ye, and Shiwei Duan Copyright © 2014 Huadan Ye et al. All rights reserved. Integration of Residue Attributes for Sequence Diversity Characterization of Terpenoid Enzymes Sun, 11 May 2014 13:35:58 +0000 Progress in the “omics” fields such as genomics, transcriptomics, proteomics, and metabolomics has engendered a need for innovative analytical techniques to derive meaningful information from the ever increasing molecular data. KNApSAcK motorcycle DB is a popular database for enzymes related to secondary metabolic pathways in plants. One of the challenges in analyses of protein sequence data in such repositories is the standard notation of sequences as strings of alphabetical characters. This has created lack of a natural underlying metric that eases amenability to computation. In view of this requirement, we applied novel integration of selected biochemical and physical attributes of amino acids derived from the amino acid index and quantified in numerical scale, to examine diversity of peptide sequences of terpenoid synthases accumulated in KNApSAcK motorcycle DB. We initially generated a reduced amino acid index table. This is a set of biochemical and physical properties obtained by random forest feature selection of important indices from the amino acid index. Principal component analysis was then applied for characterization of enzymes involved in synthesis of terpenoids. The variance explained was increased by incorporation of residue attributes for analyses. Nelson Kibinge, Shun Ikeda, Naoaki Ono, Md. Altaf-Ul-Amin, and Shigehiko Kanaya Copyright © 2014 Nelson Kibinge et al. All rights reserved. Topography Prediction of Helical Transmembrane Proteins by a New Modification of the Sliding Window Method Sun, 11 May 2014 00:00:00 +0000 Protein functions are specified by its three-dimensional structure, which is usually obtained by X-ray crystallography. Due to difficulty of handling membrane proteins experimentally to date the structure has only been determined for a very limited part of membrane proteins (<4%). Nevertheless, investigation of structure and functions of membrane proteins is important for medicine and pharmacology and, therefore, is of significant interest. Methods of computer modeling based on the data on the primary protein structure or the symbolic amino acid sequence have become an actual alternative to the experimental method of X-ray crystallography for investigating the structure of membrane proteins. Here we presented the results of the study of 35 transmembrane proteins, mainly GPCRs, using the novel method of cascade averaging of hydrophobicity function within the limits of a sliding window. The proposed method allowed revealing 139 transmembrane domains out of 140 (or 99.3%) identified by other methods. Also 236 transmembrane domain boundary positions out of 280 (or 84%) were predicted correctly by the proposed method with deviation from the predictions made by other methods that does not exceed the detection error of this method. Maria N. Simakova and Nikolai N. Simakov Copyright © 2014 Maria N. Simakova and Nikolai N. Simakov. All rights reserved. Network of microRNAs-mRNAs Interactions in Pancreatic Cancer Wed, 07 May 2014 13:18:55 +0000 Background. MicroRNAs are small RNA molecules that regulate the expression of certain genes through interaction with mRNA targets and are mainly involved in human cancer. This study was conducted to make the network of miRNAs-mRNAs interactions in pancreatic cancer as the fourth leading cause of cancer death. Methods. 56 miRNAs that were exclusively expressed and 1176 genes that were downregulated or silenced in pancreas cancer were extracted from beforehand investigations. MiRNA–mRNA interactions data analysis and related networks were explored using MAGIA tool and Cytoscape 3 software. Functional annotations of candidate genes in pancreatic cancer were identified by DAVID annotation tool. Results. This network is made of 217 nodes for mRNA, 15 nodes for miRNA, and 241 edges that show 241 regulations between 15 miRNAs and 217 target genes. The miR-24 was the most significantly powerful miRNA that regulated series of important genes. ACVR2B, GFRA1, and MTHFR were significant target genes were that downregulated. Conclusion. Although the collected previous data seems to be a treasure trove, there was no study simultaneous to analysis of miRNAs and mRNAs interaction. Network of miRNA-mRNA interactions will help to corroborate experimental remarks and could be used to refine miRNA target predictions for developing new therapeutic approaches. Elnaz Naderi, Mehdi Mostafaei, Akram Pourshams, and Ashraf Mohamadkhani Copyright © 2014 Elnaz Naderi et al. All rights reserved. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway Wed, 07 May 2014 12:20:42 +0000 Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. Fengfeng Wang, S. C. Cesar Wong, Lawrence W. C. Chan, William C. S. Cho, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2014 Fengfeng Wang et al. All rights reserved. Double-Bottom Chaotic Map Particle Swarm Optimization Based on Chi-Square Test to Determine Gene-Gene Interactions Wed, 07 May 2014 11:02:38 +0000 Gene-gene interaction studies focus on the investigation of the association between the single nucleotide polymorphisms (SNPs) of genes for disease susceptibility. Statistical methods are widely used to search for a good model of gene-gene interaction for disease analysis, and the previously determined models have successfully explained the effects between SNPs and diseases. However, the huge numbers of potential combinations of SNP genotypes limit the use of statistical methods for analysing high-order interaction, and finding an available high-order model of gene-gene interaction remains a challenge. In this study, an improved particle swarm optimization with double-bottom chaotic maps (DBM-PSO) was applied to assist statistical methods in the analysis of associated variations to disease susceptibility. A big data set was simulated using the published genotype frequencies of 26 SNPs amongst eight genes for breast cancer. Results showed that the proposed DBM-PSO successfully determined two- to six-order models of gene-gene interaction for the risk association with breast cancer (odds ratio > 1.0; value ). Analysis results supported that the proposed DBM-PSO can identify good models and provide higher chi-square values than conventional PSO. This study indicates that DBM-PSO is a robust and precise algorithm for determination of gene-gene interaction models for breast cancer. Cheng-Hong Yang, Yu-Da Lin, Li-Yeh Chuang, and Hsueh-Wei Chang Copyright © 2014 Cheng-Hong Yang et al. All rights reserved. Pathway-Driven Discovery of Rare Mutational Impact on Cancer Sun, 04 May 2014 12:48:09 +0000 Identifying driver mutation is important in understanding disease mechanism and future application of custom tailored therapeutic decision. Functional analysis of mutational impact usually focuses on the gene expression level of the mutated gene itself. However, complex regulatory network may cause differential gene expression among functional neighbors of the mutated gene. We suggest a new approach for discovering rare mutations that have real impact in the context of pathway; the philosophy of our method is iteratively combining rare mutations until no more mutations can be added under the condition that the combined mutational event can statistically discriminate pathway level mRNA expression between groups with and without mutational events. Breast cancer patients with somatic mutation and mRNA expression were analyzed by our approach. Our approach is shown to sensitively capture mutations that change pathway level mRNA expression, concurrently discovering important mutations previously reported in breast cancer such as TP53, PIK3CA, and RB1. In addition, out of 15,819 genes considered in breast cancer, our approach identified mutational events of 32 genes showing pathway level mRNA expression differences. TaeJin Ahn and Taesung Park Copyright © 2014 TaeJin Ahn and Taesung Park. All rights reserved. Mining Seasonal Marine Microbial Pattern with Greedy Heuristic Clustering and Symmetrical Nonnegative Matrix Factorization Sun, 27 Apr 2014 09:56:31 +0000 With the development of high-throughput and low-cost sequencing technology, a large number of marine microbial sequences were generated. The association patterns between marine microbial species and environment factors are hidden in these large amount sequences. Mining these association patterns is beneficial to exploit the marine resources. However, very few marine microbial association patterns are well investigated in this field. The present study reports the development of a novel method called HC-sNMF to detect the marine microbial association patterns. The results show that the four seasonal marine microbial association networks have characters of complex networks, the same environmental factor influences different species in the four seasons, and the correlative relationships are stronger between OTUs (taxa) than with environmental factors in the four seasons detecting community. Fei Liu, Shao-Wu Zhang, Ze-Gang Wei, Wei Chen, and Chen Zhou Copyright © 2014 Fei Liu et al. All rights reserved. OWL Reasoning Framework over Big Biological Knowledge Network Sun, 27 Apr 2014 00:00:00 +0000 Recently, huge amounts of data are generated in the domain of biology. Embedded with domain knowledge from different disciplines, the isolated biological resources are implicitly connected. Thus it has shaped a big network of versatile biological knowledge. Faced with such massive, disparate, and interlinked biological data, providing an efficient way to model, integrate, and analyze the big biological network becomes a challenge. In this paper, we present a general OWL (web ontology language) reasoning framework to study the implicit relationships among biological entities. A comprehensive biological ontology across traditional Chinese medicine (TCM) and western medicine (WM) is used to create a conceptual model for the biological network. Then corresponding biological data is integrated into a biological knowledge network as the data model. Based on the conceptual model and data model, a scalable OWL reasoning method is utilized to infer the potential associations between biological entities from the biological network. In our experiment, we focus on the association discovery between TCM and WM. The derived associations are quite useful for biologists to promote the development of novel drugs and TCM modernization. The experimental results show that the system achieves high efficiency, accuracy, scalability, and effectivity. Huajun Chen, Xi Chen, Peiqin Gu, Zhaohui Wu, and Tong Yu Copyright © 2014 Huajun Chen et al. All rights reserved. Novel Design Strategy for Checkpoint Kinase 2 Inhibitors Using Pharmacophore Modeling, Combinatorial Fusion, and Virtual Screening Wed, 23 Apr 2014 09:23:00 +0000 Checkpoint kinase 2 (Chk2) has a great effect on DNA-damage and plays an important role in response to DNA double-strand breaks and related lesions. In this study, we will concentrate on Chk2 and the purpose is to find the potential inhibitors by the pharmacophore hypotheses (PhModels), combinatorial fusion, and virtual screening techniques. Applying combinatorial fusion into PhModels and virtual screening techniques is a novel design strategy for drug design. We used combinatorial fusion to analyze the prediction results and then obtained the best correlation coefficient of the testing set () with the value 0.816 by combining the and prediction results. The potential inhibitors were selected from NCI database by screening according to + prediction results and molecular docking with CDOCKER docking program. Finally, the selected compounds have high interaction energy between a ligand and a receptor. Through these approaches, 23 potential inhibitors for Chk2 are retrieved for further study. Chun-Yuan Lin and Yen-Ling Wang Copyright © 2014 Chun-Yuan Lin and Yen-Ling Wang. All rights reserved. Syn-Lethality: An Integrative Knowledge Base of Synthetic Lethality towards Discovery of Selective Anticancer Therapies Tue, 22 Apr 2014 00:00:00 +0000 Synthetic lethality (SL) is a novel strategy for anticancer therapies, whereby mutations of two genes will kill a cell but mutation of a single gene will not. Therefore, a cancer-specific mutation combined with a drug-induced mutation, if they have SL interactions, will selectively kill cancer cells. While numerous SL interactions have been identified in yeast, only a few have been known in human. There is a pressing need to systematically discover and understand SL interactions specific to human cancer. In this paper, we present Syn-Lethality, the first integrative knowledge base of SL that is dedicated to human cancer. It integrates experimentally discovered and verified human SL gene pairs into a network, associated with annotations of gene function, pathway, and molecular mechanisms. It also includes yeast SL genes from high-throughput screenings which are mapped to orthologous human genes. Such an integrative knowledge base, organized as a relational database with user interface for searching and network visualization, will greatly expedite the discovery of novel anticancer drug targets based on synthetic lethality interactions. The database can be downloaded as a stand-alone Java application. Xue-juan Li, Shital K. Mishra, Min Wu, Fan Zhang, and Jie Zheng Copyright © 2014 Xue-juan Li et al. All rights reserved. Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem Wed, 16 Apr 2014 15:52:01 +0000 The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane’s compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors. Maan Haj Rachid, Qutaibah Malluhi, and Mohamed Abouelhoda Copyright © 2014 Maan Haj Rachid et al. All rights reserved. A Knowledge-Driven Approach to Extract Disease-Related Biomarkers from the Literature Wed, 16 Apr 2014 15:51:54 +0000 The biomedical literature represents a rich source of biomarker information. However, both the size of literature databases and their lack of standardization hamper the automatic exploitation of the information contained in these resources. Text mining approaches have proven to be useful for the exploitation of information contained in the scientific publications. Here, we show that a knowledge-driven text mining approach can exploit a large literature database to extract a dataset of biomarkers related to diseases covering all therapeutic areas. Our methodology takes advantage of the annotation of MEDLINE publications pertaining to biomarkers with MeSH terms, narrowing the search to specific publications and, therefore, minimizing the false positive ratio. It is based on a dictionary-based named entity recognition system and a relation extraction module. The application of this methodology resulted in the identification of 131,012 disease-biomarker associations between 2,803 genes and 2,751 diseases, and represents a valuable knowledge base for those interested in disease-related biomarkers. Additionally, we present a bibliometric analysis of the journals reporting biomarker related information during the last 40 years. À. Bravo, M. Cases, N. Queralt-Rosinach, F. Sanz, and L. I. Furlong Copyright © 2014 À. Bravo et al. All rights reserved. Integrated Analysis of Gene Network in Childhood Leukemia from Microarray and Pathway Databases Tue, 15 Apr 2014 14:07:22 +0000 Glucocorticoids (GCs) have been used as therapeutic agents for children with acute lymphoblastic leukaemia (ALL) for over 50 years. However, much remains to be understood about the molecular mechanism of GCs actions in ALL subtypes. In this study, we delineate differential responses of ALL subtypes, B- and T-ALL, to GCs treatment at systems level by identifying the differences among biological processes, molecular pathways, and interaction networks that emerge from the action of GCs through the use of a selected number of available bioinformatics methods and tools. We provide biological insight into GC-regulated genes, their related functions, and their networks specific to the ALL subtypes. We show that differentially expressed GC-regulated genes participate in distinct underlying biological processes affected by GCs in B-ALL and T-ALL with little to no overlap. These findings provide the opportunity towards identifying new therapeutic targets. Amphun Chaiboonchoe, Sandhya Samarasinghe, Don Kulasiri, and Kourosh Salehi-Ashtiani Copyright © 2014 Amphun Chaiboonchoe et al. All rights reserved. A Novel Algorithm for Detecting Protein Complexes with the Breadth First Search Thu, 10 Apr 2014 11:03:26 +0000 Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes. Xiwei Tang, Jianxin Wang, Min Li, Yiming He, and Yi Pan Copyright © 2014 Xiwei Tang et al. All rights reserved. Gene Expression Correlation for Cancer Diagnosis: A Pilot Study Wed, 09 Apr 2014 14:12:08 +0000 Poor prognosis for late-stage, high-grade, and recurrent cancers has been motivating cancer researchers to search for more efficient biomarkers to identify the onset of cancer. Recent advances in constructing and dynamically analyzing biomolecular networks for different types of cancer have provided a promising novel strategy to detect tumorigenesis and metastasis. The observation of different biomolecular networks associated with normal and cancerous states led us to hypothesize that correlations for gene expressions could serve as valid indicators of early cancer development. In this pilot study, we tested our hypothesis by examining whether the mRNA expressions of three randomly selected cancer-related genes PIK3C3, PIM3, and PTEN were correlated during cancer progression and the correlation coefficients could be used for cancer diagnosis. Strong correlations were observed between PIK3C3 and PIM3 in breast cancer, between PIK3C3 and PTEN in breast and ovary cancers, and between PIM3 and PTEN in breast, kidney, liver, and thyroid cancers during disease progression, implicating that the correlations for cancer network gene expressions could serve as a supplement to current clinical biomarkers, such as cancer antigens, for early cancer diagnosis. Binbing Ling, Lifeng Chen, Qiang Liu, and Jian Yang Copyright © 2014 Binbing Ling et al. All rights reserved. Computational Systems Biology Methods in Molecular Biology, Chemistry Biology, Molecular Biomedicine, and Biopharmacy Wed, 09 Apr 2014 13:17:43 +0000 Yudong Cai, Julio Vera González, Zengrong Liu, and Tao Huang Copyright © 2014 Yudong Cai et al. All rights reserved. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data Wed, 09 Apr 2014 12:35:01 +0000 A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. Nozomu Sakurai, Takeshi Ara, Mitsuo Enomoto, Takeshi Motegi, Yoshihiko Morishita, Atsushi Kurabayashi, Yoko Iijima, Yoshiyuki Ogata, Daisuke Nakajima, Hideyuki Suzuki, and Daisuke Shibata Copyright © 2014 Nozomu Sakurai et al. All rights reserved. An Infrastructure to Mine Molecular Descriptors for Ligand Selection on Virtual Screening Wed, 09 Apr 2014 11:34:08 +0000 The receptor-ligand interaction evaluation is one important step in rational drug design. The databases that provide the structures of the ligands are growing on a daily basis. This makes it impossible to test all the ligands for a target receptor. Hence, a ligand selection before testing the ligands is needed. One possible approach is to evaluate a set of molecular descriptors. With the aim of describing the characteristics of promising compounds for a specific receptor we introduce a data warehouse-based infrastructure to mine molecular descriptors for virtual screening (VS). We performed experiments that consider as target the receptor HIV-1 protease and different compounds for this protein. A set of 9 molecular descriptors are taken as the predictive attributes and the free energy of binding is taken as a target attribute. By applying the J48 algorithm over the data we obtain decision tree models that achieved up to 84% of accuracy. The models indicate which molecular descriptors and their respective values are relevant to influence good FEB results. Using their rules we performed ligand selection on ZINC database. Our results show important reduction in ligands selection to be applied in VS experiments; for instance, the best selection model picked only 0.21% of the total amount of drug-like ligands. Vinicius Rosa Seus, Giovanni Xavier Perazzo, Ana T. Winck, Adriano V. Werhli, and Karina S. Machado Copyright © 2014 Vinicius Rosa Seus et al. All rights reserved. An Intelligent Clinical Decision Support System for Patient-Specific Predictions to Improve Cervical Intraepithelial Neoplasia Detection Wed, 09 Apr 2014 08:12:50 +0000 Nowadays, there are molecular biology techniques providing information related to cervical cancer and its cause: the human Papillomavirus (HPV), including DNA microarrays identifying HPV subtypes, mRNA techniques such as nucleic acid based amplification or flow cytometry identifying E6/E7 oncogenes, and immunocytochemistry techniques such as overexpression of p16. Each one of these techniques has its own performance, limitations and advantages, thus a combinatorial approach via computational intelligence methods could exploit the benefits of each method and produce more accurate results. In this article we propose a clinical decision support system (CDSS), composed by artificial neural networks, intelligently combining the results of classic and ancillary techniques for diagnostic accuracy improvement. We evaluated this method on 740 cases with complete series of cytological assessment, molecular tests, and colposcopy examination. The CDSS demonstrated high sensitivity (89.4%), high specificity (97.1%), high positive predictive value (89.4%), and high negative predictive value (97.1%), for detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+). In comparison to the tests involved in this study and their combinations, the CDSS produced the most balanced results in terms of sensitivity, specificity, PPV, and NPV. The proposed system may reduce the referral rate for colposcopy and guide personalised management and therapeutic interventions. Panagiotis Bountris, Maria Haritou, Abraham Pouliakis, Niki Margari, Maria Kyrgiou, Aris Spathis, Asimakis Pappas, Ioannis Panayiotides, Evangelos A. Paraskevaidis, Petros Karakitsos, and Dimitrios-Dionyssios Koutsouris Copyright © 2014 Panagiotis Bountris et al. All rights reserved. Supervised Clustering Based on DPClusO: Prediction of Plant-Disease Relations Using Jamu Formulas of KNApSAcK Database Mon, 07 Apr 2014 14:04:55 +0000 Indonesia has the largest medicinal plant species in the world and these plants are used as Jamu medicines. Jamu medicines are popular traditional medicines from Indonesia and we need to systemize the formulation of Jamu and develop basic scientific principles of Jamu to meet the requirement of Indonesian Healthcare System. We propose a new approach to predict the relation between plant and disease using network analysis and supervised clustering. At the preliminary step, we assigned 3138 Jamu formulas to 116 diseases of International Classification of Diseases (ver. 10) which belong to 18 classes of disease from National Center for Biotechnology Information. The correlation measures between Jamu pairs were determined based on their ingredient similarity. Networks are constructed and analyzed by selecting highly correlated Jamu pairs. Clusters were then generated by using the network clustering algorithm DPClusO. By using matching score of a cluster, the dominant disease and high frequency plant associated to the cluster are determined. The plant to disease relations predicted by our method were evaluated in the context of previously published results and were found to produce around 90% successful predictions. Sony Hartono Wijaya, Husnawati Husnawati, Farit Mochamad Afendi, Irmanida Batubara, Latifah K. Darusman, Md. Altaf-Ul-Amin, Tetsuo Sato, Naoaki Ono, Tadao Sugiura, and Shigehiko Kanaya Copyright © 2014 Sony Hartono Wijaya et al. All rights reserved. Combining Haar Wavelet and Karhunen Loeve Transforms for Medical Images Watermarking Mon, 07 Apr 2014 08:14:41 +0000 This paper presents a novel watermarking method, applied to the medical imaging domain, used to embed the patient’s data into the corresponding image or set of images used for the diagnosis. The main objective behind the proposed technique is to perform the watermarking of the medical images in such a way that the three main attributes of the hidden information (i.e., imperceptibility, robustness, and integration rate) can be jointly ameliorated as much as possible. These attributes determine the effectiveness of the watermark, resistance to external attacks, and increase the integration rate. In order to improve the robustness, a combination of the characteristics of Discrete Wavelet and Karhunen Loeve Transforms is proposed. The Karhunen Loeve Transform is applied on the subblocks (sized ) of the different wavelet coefficients (in the HL2, LH2, and HH2 subbands). In this manner, the watermark will be adapted according to the energy values of each of the Karhunen Loeve components, with the aim of ensuring a better watermark extraction under various types of attacks. For the correct identification of inserted data, the use of an Errors Correcting Code (ECC) mechanism is required for the check and, if possible, the correction of errors introduced into the inserted data. Concerning the enhancement of the imperceptibility factor, the main goal is to determine the optimal value of the visibility factor, which depends on several parameters of the DWT and the KLT transforms. As a first step, a Fuzzy Inference System (FIS) has been set up and then applied to determine an initial visibility factor value. Several features extracted from the Cooccurrence matrix are used as an input to the FIS and used to determine an initial visibility factor for each block; these values are subsequently reweighted in function of the eigenvalues extracted from each subblock. Regarding the integration rate, the previous works insert one bit per coefficient. In our proposal, the integration of the data to be hidden is 3 bits per coefficient so that we increase the integration rate by a factor of magnitude 3. Mohamed Ali Hajjaji, El-Bay Bourennane, Abdessalem Ben Abdelali, and Abdellatif Mtibaa Copyright © 2014 Mohamed Ali Hajjaji et al. All rights reserved. A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System Sun, 06 Apr 2014 07:51:12 +0000 Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES) is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE) algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased -score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion. Jingbo Xia, Alex Chengyu Fang, and Xing Zhang Copyright © 2014 Jingbo Xia et al. All rights reserved. A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data Thu, 03 Apr 2014 13:31:48 +0000 With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data). Yu Bai, Yuki Iwasaki, Shigehiko Kanaya, Yue Zhao, and Toshimichi Ikemura Copyright © 2014 Yu Bai et al. All rights reserved. msiDBN: A Method of Identifying Critical Proteins in Dynamic PPI Networks Wed, 02 Apr 2014 12:56:21 +0000 Dynamics of protein-protein interactions (PPIs) reveals the recondite principles of biological processes inside a cell. Shown in a wealth of study, just a small group of proteins, rather than the majority, play more essential roles at crucial points of biological processes. This present work focuses on identifying these critical proteins exhibiting dramatic structural changes in dynamic PPI networks. First, a comprehensive way of modeling the dynamic PPIs is presented which simultaneously analyzes the activity of proteins and assembles the dynamic coregulation correlation between proteins at each time point. Second, a novel method is proposed, named msiDBN, which models a common representation of multiple PPI networks using a deep belief network framework and analyzes the reconstruction errors and the variabilities across the time courses in the biological process. Experiments were implemented on data of yeast cell cycles. We evaluated our network construction method by comparing the functional representations of the derived networks with two other traditional construction methods. The ranking results of critical proteins in msiDBN were compared with the results from the baseline methods. The results of comparison showed that msiDBN had better reconstruction rate and identified more proteins of critical value to yeast cell cycle process. Yuan Zhang, Nan Du, Kang Li, Jinchao Feng, Kebin Jia, and Aidong Zhang Copyright © 2014 Yuan Zhang et al. All rights reserved. Applied Graph-Mining Algorithms to Study Biomolecular Interaction Networks Wed, 02 Apr 2014 11:57:36 +0000 Protein-protein interaction (PPI) networks carry vital information on the organization of molecular interactions in cellular systems. The identification of functionally relevant modules in PPI networks is one of the most important applications of biological network analysis. Computational analysis is becoming an indispensable tool to understand large-scale biomolecular interaction networks. Several types of computational methods have been developed and employed for the analysis of PPI networks. Of these computational methods, graph comparison and module detection are the two most commonly used strategies. This review summarizes current literature on graph kernel and graph alignment methods for graph comparison strategies, as well as module detection approaches including seed-and-extend, hierarchical clustering, optimization-based, probabilistic, and frequent subgraph methods. Herein, we provide a comprehensive review of the major algorithms employed under each theme, including our recently published frequent subgraph method, for detecting functional modules commonly shared across multiple cancer PPI networks. Ru Shen and Chittibabu Guda Copyright © 2014 Ru Shen and Chittibabu Guda. All rights reserved. An Unsupervised Approach to Predict Functional Relations between Genes Based on Expression Data Mon, 31 Mar 2014 07:16:16 +0000 This work presents a novel approach to predict functional relations between genes using gene expression data. Genes may have various types of relations between them, for example, regulatory relations, or they may be concerned with the same protein complex or metabolic/signaling pathways and obviously gene expression data should contain some clues to such relations. The present approach first digitizes the log-ratio type gene expression data of S. cerevisiae to a matrix consisting of 1, 0, and −1 indicating highly expressed, no major change, and highly suppressed conditions for genes, respectively. For each gene pair, a probability density mass function table is constructed indicating nine joint probabilities. Then gene pairs were selected based on linear and probabilistic relation between their profiles indicated by the sum of probability density masses in selected points. The selected gene pairs share many Gene Ontology terms. Furthermore a network is constructed by selecting a large number of gene pairs based on FDR analysis and the clustering of the network generates many modules rich with similar function genes. Also, the promoters of the gene sets in many modules are rich with binding sites of known transcription factors indicating the effectiveness of the proposed approach in predicting regulatory relations. Md. Altaf-Ul-Amin, Tetsuo Katsuragi, Tetsuo Sato, Naoaki Ono, and Shigehiko Kanaya Copyright © 2014 Md. Altaf-Ul-Amin et al. All rights reserved. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms Sun, 30 Mar 2014 09:04:21 +0000 Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. Jiuwen Cao and Lianglin Xiong Copyright © 2014 Jiuwen Cao and Lianglin Xiong. All rights reserved. Association between 2/3/4, Promoter Polymorphism (−491A/T, −427T/C, and −219T/G) at the Apolipoprotein E Gene, and Mental Retardation in Children from an Iodine Deficiency Area, China Tue, 25 Mar 2014 12:55:09 +0000 Background. Several common single-nucleotide polymorphisms (SNPs) at apolipoprotein E (ApoE) have been linked with late onset sporadic Alzheimer’s disease and declining normative cognitive ability in elder people, but we are unclear about their relationship with cognition in children. Results. We studied , , and promoter polymorphisms and at ApoE among children with mental retardation (MR, ), borderline MR (), and controls () from an iodine deficiency area in China. The allelic and genotypic distribution of individual locus did not significantly differ among three groups with Mantel-Haenszel test (). However, frequencies of haplotype of /// were distributed as MR > borderline MR > controls ( uncorrected = 0.004), indicating that the presence of this haplotype may increase the risk of disease. Conclusions. In this large population-based study in children, we did not find any significant association between single locus of the four common ApoE polymorphisms (, , , and ) and MR or borderline MR. However, we found that the presence of ATT haplotype was associated with an increased risk of MR and borderline MR. Our present work may help enlarge our knowledge of the cognitive role of ApoE across the lifespan and the mechanisms of human cognition. Jun Li, Fuchang Zhang, Yunliang Wang, Yan Wang, Wei Qin, Qinghe Xing, Xueqing Qian, Tingwei Guo, Xiaocai Gao, Lin He, and Jianjun Gao Copyright © 2014 Jun Li et al. All rights reserved. Survey of Network-Based Approaches to Research of Cardiovascular Diseases Thu, 20 Mar 2014 08:21:55 +0000 Cardiovascular diseases (CVDs) are the leading health problem worldwide. Investigating causes and mechanisms of CVDs calls for an integrative approach that would take into account its complex etiology. Biological networks generated from available data on biomolecular interactions are an excellent platform for understanding interconnectedness of all processes within a living cell, including processes that underlie diseases. Consequently, topology of biological networks has successfully been used for identifying genes, pathways, and modules that govern molecular actions underlying various complex diseases. Here, we review approaches that explore and use relationships between topological properties of biological networks and mechanisms underlying CVDs. Anida Sarajlić and Nataša Pržulj Copyright © 2014 Anida Sarajlić and Nataša Pržulj. All rights reserved. New Strategies for Evaluation and Analysis of SELEX Experiments Wed, 19 Mar 2014 13:58:14 +0000 Aptamers are an interesting alternative to antibodies in pharmaceutics and biosensorics, because they are able to bind to a multitude of possible target molecules with high affinity. Therefore the process of finding such aptamers, which is commonly a SELEX screening process, becomes crucial. The standard SELEX procedure schedules the validation of certain found aptamers via binding experiments, which is not leading to any detailed specification of the aptamer enrichment during the screening. For the purpose of advanced analysis of the accrued enrichment within the SELEX library we used sequence information gathered by next generation sequencing techniques in addition to the standard SELEX procedure. As sequence motifs are one possibility of enrichment description, the need of finding those recurring sequence motifs corresponding to substructures within the aptamers, which are characteristically fitted to specific binding sites of the target, arises. In this paper a motif search algorithm is presented, which helps to describe the aptamers enrichment in more detail. The extensive characterization of target and binding aptamers may later reveal a functional connection between these molecules, which can be modeled and used to optimize future SELEX runs in case of the generation of target-specific starting libraries. Rico Beier, Elke Boschke, and Dirk Labudde Copyright © 2014 Rico Beier et al. All rights reserved. Essential Functional Modules for Pathogenic and Defensive Mechanisms in Candida albicans Infections Tue, 18 Mar 2014 12:20:46 +0000 The clinical and biological significance of the study of fungal pathogen Candida albicans (C. albicans) has markedly increased. However, the explicit pathogenic and invasive mechanisms of such host-pathogen interactions have not yet been fully elucidated. Therefore, the essential functional modules involved in C. albicans-zebrafish interactions were investigated in this study. Adopting a systems biology approach, the early-stage and late-stage protein-protein interaction (PPI) networks for both C. albicans and zebrafish were constructed. By comparing PPI networks at the early and late stages of the infection process, several critical functional modules were identified in both pathogenic and defensive mechanisms. Functional modules in C. albicans, like those involved in hyphal morphogenesis, ion and small molecule transport, protein secretion, and shifts in carbon utilization, were seen to play important roles in pathogen invasion and damage caused to host cells. Moreover, the functional modules in zebrafish, such as those involved in immune response, apoptosis mechanisms, ion transport, protein secretion, and hemostasis-related processes, were found to be significant as defensive mechanisms during C. albicans infection. The essential functional modules thus determined could provide insights into the molecular mechanisms of host-pathogen interactions during the infection process and thereby devise potential therapeutic strategies to treat C. albicans infection. Yu-Chao Wang, I-Chun Tsai, Che Lin, Wen-Ping Hsieh, Chung-Yu Lan, Yung-Jen Chuang, and Bor-Sen Chen Copyright © 2014 Yu-Chao Wang et al. All rights reserved. A Diverse Stochastic Search Algorithm for Combination Therapeutics Wed, 12 Mar 2014 00:00:00 +0000 Background. Design of drug combination cocktails to maximize sensitivity for individual patients presents a challenge in terms of minimizing the number of experiments to attain the desired objective. The enormous number of possible drug combinations constrains exhaustive experimentation approaches, and personal variations in genetic diseases restrict the use of prior knowledge in optimization. Results. We present a stochastic search algorithm that consisted of a parallel experimentation phase followed by a combination of focused and diversified sequential search. We evaluated our approach on seven synthetic examples; four of them were evaluated twice with different parameters, and two biological examples of bacterial and lung cancer cell inhibition response to combination drugs. The performance of our approach as compared to recently proposed adaptive reference update approach was superior for all the examples considered, achieving an average of 45% reduction in the number of experimental iterations. Conclusions. As the results illustrate, the proposed diverse stochastic search algorithm can produce optimized combinations in relatively smaller number of iterative steps. This approach can be combined with available knowledge on the genetic makeup of the patient to design optimal selection of drug cocktails. Mehmet Umut Caglar and Ranadip Pal Copyright © 2014 Mehmet Umut Caglar and Ranadip Pal. All rights reserved. Visualization of Genome Signatures of Eukaryote Genomes by Batch-Learning Self-Organizing Map with a Special Emphasis on Drosophila Genomes Tue, 11 Mar 2014 09:27:17 +0000 A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method “BLSOM” for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering. Takashi Abe, Yuta Hamano, and Toshimichi Ikemura Copyright © 2014 Takashi Abe et al. All rights reserved. Exact and Heuristic Methods for Network Completion for Time-Varying Genetic Networks Sun, 09 Mar 2014 11:48:52 +0000 Robustness in biological networks can be regarded as an important feature of living systems. A system maintains its functions against internal and external perturbations, leading to topological changes in the network with varying delays. To understand the flexibility of biological networks, we propose a novel approach to analyze time-dependent networks, based on the framework of network completion, which aims to make the minimum amount of modifications to a given network so that the resulting network is most consistent with the observed data. We have developed a novel network completion method for time-varying networks by extending our previous method for the completion of stationary networks. In particular, we introduce a double dynamic programming technique to identify change time points and required modifications. Although this extended method allows us to guarantee the optimality of the solution, this method has relatively low computational efficiency. In order to resolve this difficulty, we developed a heuristic method for speeding up the calculation of minimum least squares errors. We demonstrate the effectiveness of our proposed methods through computational experiments using synthetic data and real microarray gene expression data. The results indicate that our methods exhibit good performance in terms of completing and inferring gene association networks with time-varying structures. Natsu Nakajima and Tatsuya Akutsu Copyright © 2014 Natsu Nakajima and Tatsuya Akutsu. All rights reserved. Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks Thu, 06 Mar 2014 13:34:51 +0000 Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements in -measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks. Buzhou Tang, Hongxin Cao, Xiaolong Wang, Qingcai Chen, and Hua Xu Copyright © 2014 Buzhou Tang et al. All rights reserved. Identifying Gastric Cancer Related Genes Using the Shortest Path Algorithm and Protein-Protein Interaction Network Wed, 05 Mar 2014 16:35:58 +0000 Gastric cancer, as one of the leading causes of cancer related deaths worldwide, causes about 800,000 deaths per year. Up to now, the mechanism underlying this disease is still not totally uncovered. Identification of related genes of this disease is an important step which can help to understand the mechanism underlying this disease, thereby designing effective treatments. In this study, some novel gastric cancer related genes were discovered based on the knowledge of known gastric cancer related ones. These genes were searched by applying the shortest path algorithm in protein-protein interaction network. The analysis results suggest that some of them are indeed involved in the biological process of gastric cancer, which indicates that they are the actual gastric cancer related genes with high probability. It is hopeful that the findings in this study may help promote the study of this disease and the methods can provide new insights to study various diseases. Yang Jiang, Yang Shu, Ying Shi, Li-Peng Li, Fei Yuan, and Hui Ren Copyright © 2014 Yang Jiang et al. All rights reserved. TF2LncRNA: Identifying Common Transcription Factors for a List of lncRNA Genes from ChIP-Seq Data Tue, 04 Mar 2014 07:37:38 +0000 High-throughput genomic technologies like lncRNA microarray and RNA-Seq often generate a set of lncRNAs of interest, yet little is known about the transcriptional regulation of the set of lncRNA genes. Here, based on ChIP-Seq peak lists of transcription factors (TFs) from ENCODE and annotated human lncRNAs from GENCODE, we developed a web-based interface titled “TF2lncRNA,” where TF peaks from each ChIP-Seq experiment are crossed with the genomic coordinates of a set of input lncRNAs, to identify which TFs present a statistically significant number of binding sites (peaks) within the regulatory region of the input lncRNA genes. The input can be a set of coexpressed lncRNA genes or any other cluster of lncRNA genes. Users can thus infer which TFs are likely to be common transcription regulators of the set of lncRNAs. In addition, users can retrieve all lncRNAs potentially regulated by a specific TF in a specific cell line of interest or retrieve all TFs that have one or more binding sites in the regulatory region of a given lncRNA in the specific cell line. TF2LncRNA is an efficient and easy-to-use web-based tool. Qinghua Jiang, Jixuan Wang, Yadong Wang, Rui Ma, Xiaoliang Wu, and Yu Li Copyright © 2014 Qinghua Jiang et al. All rights reserved. Comparative Metagenomic Analysis of Human Gut Microbiome Composition Using Two Different Bioinformatic Pipelines Tue, 25 Feb 2014 09:12:11 +0000 Technological advances in next-generation sequencing-based approaches have greatly impacted the analysis of microbial community composition. In particular, 16S rRNA-based methods have been widely used to analyze the whole set of bacteria present in a target environment. As a consequence, several specific bioinformatic pipelines have been developed to manage these data. MetaGenome Rapid Annotation using Subsystem Technology (MG-RAST) and Quantitative Insights Into Microbial Ecology (QIIME) are two freely available tools for metagenomic analyses that have been used in a wide range of studies. Here, we report the comparative analysis of the same dataset with both QIIME and MG-RAST in order to evaluate their accuracy in taxonomic assignment and in diversity analysis. We found that taxonomic assignment was more accurate with QIIME which, at family level, assigned a significantly higher number of reads. Thus, QIIME generated a more accurate BIOM file, which in turn improved the diversity analysis output. Finally, although informatics skills are needed to install QIIME, it offers a wide range of metrics that are useful for downstream applications and, not less important, it is not dependent on server times. Valeria D’Argenio, Giorgio Casaburi, Vincenza Precone, and Francesco Salvatore Copyright © 2014 Valeria D’Argenio et al. All rights reserved. Approaches for Recognizing Disease Genes Based on Network Mon, 24 Feb 2014 07:40:35 +0000 Diseases are closely related to genes, thus indicating that genetic abnormalities may lead to certain diseases. The recognition of disease genes has long been a goal in biology, which may contribute to the improvement of health care and understanding gene functions, pathways, and interactions. However, few large-scale gene-gene association datasets, disease-disease association datasets, and gene-disease association datasets are available. A number of machine learning methods have been used to recognize disease genes based on networks. This paper states the relationship between disease and gene, summarizes the approaches used to recognize disease genes based on network, analyzes the core problems and challenges of the methods, and outlooks future research direction. Quan Zou, Jinjin Li, Chunyu Wang, and Xiangxiang Zeng Copyright © 2014 Quan Zou et al. All rights reserved. Predicting Glycerophosphoinositol Identities in Lipidomic Datasets Using VaLID (Visualization and Phospholipid Identification)—An Online Bioinformatic Search Engine Thu, 20 Feb 2014 09:59:05 +0000 The capacity to predict and visualize all theoretically possible glycerophospholipid molecular identities present in lipidomic datasets is currently limited. To address this issue, we expanded the search-engine and compositional databases of the online Visualization and Phospholipid Identification (VaLID) bioinformatic tool to include the glycerophosphoinositol superfamily. VaLID v1.0.0 originally allowed exact and average mass libraries of 736,584 individual species from eight phospholipid classes: glycerophosphates, glyceropyrophosphates, glycerophosphocholines, glycerophosphoethanolamines, glycerophosphoglycerols, glycerophosphoglycerophosphates, glycerophosphoserines, and cytidine 5′-diphosphate 1,2-diacyl-sn-glycerols to be searched for any mass to charge value (with adjustable tolerance levels) under a variety of mass spectrometry conditions. Here, we describe an update that now includes all possible glycerophosphoinositols, glycerophosphoinositol monophosphates, glycerophosphoinositol bisphosphates, and glycerophosphoinositol trisphosphates. This update expands the total number of lipid species represented in the VaLID v2.0.0 database to 1,473,168 phospholipids. Each phospholipid can be generated in skeletal representation. A subset of species curated by the Canadian Institutes of Health Research Training Program in Neurodegenerative Lipidomics (CTPNL) team is provided as an array of high-resolution structures. VaLID is freely available and responds to all users through the CTPNL resources web site. Graeme S. V. McDowell, Alexandre P. Blanchard, Graeme P. Taylor, Daniel Figeys, Stephen Fai, and Steffany A. L. Bennett Copyright © 2014 Graeme S. V. McDowell et al. All rights reserved. Integrative Analysis of miRNA-mRNA and miRNA-miRNA Interactions Wed, 12 Feb 2014 15:17:43 +0000 MicroRNAs (miRNAs) are small, noncoding regulatory molecules. They are involved in many essential biological processes and act by suppressing gene expression. The present work reports an integrative analysis of miRNA-mRNA and miRNA-miRNA interactions and their regulatory patterns using high-throughput miRNA and mRNA datasets. Aberrantly expressed miRNA and mRNA profiles were obtained based on fold change analysis, and qRT-PCR was used for further validation of deregulated miRNAs. miRNAs and target mRNAs were found to show various expression patterns. miRNA-miRNA interactions and clustered/homologous miRNAs were also found to contribute to the flexible and selective regulatory network. Interacting miRNAs (e.g., miRNA-103a and miR-103b) showed more pronounced differences in expression, which suggests the potential “restricted interaction” in the miRNA world. miRNAs from the same gene clusters (e.g., miR-23b gene cluster) or gene families (e.g., miR-10 gene family) always showed the same types of deregulation patterns, although they sometimes differed in expression levels. These clustered and homologous miRNAs may have close functional relationships, which may indicate collaborative interactions between miRNAs. The integrative analysis of miRNA-mRNA based on biological characteristics of miRNA will further enrich miRNA study. Li Guo, Yang Zhao, Sheng Yang, Hui Zhang, and Feng Chen Copyright © 2014 Li Guo et al. All rights reserved. Network-Assisted Prediction of Potential Drugs for Addiction Sun, 09 Feb 2014 12:25:55 +0000 Drug addiction is a chronic and complex brain disease, adding much burden on the community. Though numerous efforts have been made to identify the effective treatment, it is necessary to find more novel therapeutics for this complex disease. As network pharmacology has become a promising approach for drug repurposing, we proposed to apply the approach to drug addiction, which might provide new clues for the development of effective addiction treatment drugs. We first extracted 44 addictive drugs from the NIDA and their targets from DrugBank. Then, we constructed two networks: an addictive drug-target network and an expanded addictive drug-target network by adding other drugs that have at least one common target with these addictive drugs. By performing network analyses, we found that those addictive drugs with similar actions tended to cluster together. Additionally, we predicted 94 nonaddictive drugs with potential pharmacological functions to the addictive drugs. By examining the PubMed data, 51 drugs significantly cooccurred with addictive keywords than expected. Thus, the network analyses provide a list of candidate drugs for further investigation of their potential in addiction treatment or risk. Jingchun Sun, Liang-Chin Huang, Hua Xu, and Zhongming Zhao Copyright © 2014 Jingchun Sun et al. All rights reserved. Erratum to “New Optical Methods for Liveness Detection on Fingers” Sun, 02 Feb 2014 13:42:02 +0000 Martin Drahansky, Michal Dolezel, Jan Vana, Eva Brezinova, Jaegeol Yim, and Kyubark Shim Copyright © 2014 Martin Drahansky et al. All rights reserved. A Novel Approach for Discovering Condition-Specific Correlations of Gene Expressions within Biological Pathways by Using Cloud Computing Technology Wed, 22 Jan 2014 17:16:42 +0000 Microarrays are widely used to assess gene expressions. Most microarray studies focus primarily on identifying differential gene expressions between conditions (e.g., cancer versus normal cells), for discovering the major factors that cause diseases. Because previous studies have not identified the correlations of differential gene expression between conditions, crucial but abnormal regulations that cause diseases might have been disregarded. This paper proposes an approach for discovering the condition-specific correlations of gene expressions within biological pathways. Because analyzing gene expression correlations is time consuming, an Apache Hadoop cloud computing platform was implemented. Three microarray data sets of breast cancer were collected from the Gene Expression Omnibus, and pathway information from the Kyoto Encyclopedia of Genes and Genomes was applied for discovering meaningful biological correlations. The results showed that adopting the Hadoop platform considerably decreased the computation time. Several correlations of differential gene expressions were discovered between the relapse and nonrelapse breast cancer samples, and most of them were involved in cancer regulation and cancer-related pathways. The results showed that breast cancer recurrence might be highly associated with the abnormal regulations of these gene pairs, rather than with their individual expression levels. The proposed method was computationally efficient and reliable, and stable results were obtained when different data sets were used. The proposed method is effective in identifying meaningful biological regulation patterns between conditions. Tzu-Hao Chang, Shih-Lin Wu, Wei-Jen Wang, Jorng-Tzong Horng, and Cheng-Wei Chang Copyright © 2014 Tzu-Hao Chang et al. All rights reserved. Microsatellites in the Genome of the Edible Mushroom, Volvariella volvacea Sun, 19 Jan 2014 00:00:00 +0000 Using bioinformatics software and database, we have characterized the microsatellite pattern in the V. volvacea genome and compared it with microsatellite patterns found in the genomes of four other edible fungi: Coprinopsis cinerea, Schizophyllum commune, Agaricus bisporus, and Pleurotus ostreatus. A total of 1346 microsatellites have been identified, with mono-nucleotides being the most frequent motif. The relative abundance of microsatellites was lower in coding regions with 21 No./Mb. However, the microsatellites in the V. volvacea gene models showed a greater tendency to be located in the CDS regions. There was also a higher preponderance of trinucleotide repeats, especially in the kinase genes, which implied a possible role in phenotypic variation. Among the five fungal genomes, microsatellite abundance appeared to be unrelated to genome size. Furthermore, the short motifs (mono- to tri-nucleotides) outnumbered other categories although these differed in proportion. Data analysis indicated a possible relationship between the most frequent microsatellite types and the genetic distance between the five fungal genomes. Ying Wang, Mingjie Chen, Hong Wang, Jing-Fang Wang, and Dapeng Bao Copyright © 2014 Ying Wang et al. All rights reserved. Integration of High-Volume Molecular and Imaging Data for Composite Biomarker Discovery in the Study of Melanoma Thu, 16 Jan 2014 16:36:04 +0000 In this work the effects of simple imputations are studied, regarding the integration of multimodal data originating from different patients. Two separate datasets of cutaneous melanoma are used, an image analysis (dermoscopy) dataset together with a transcriptomic one, specifically DNA microarrays. Each modality is related to a different set of patients, and four imputation methods are employed to the formation of a unified, integrative dataset. The application of backward selection together with ensemble classifiers (random forests), followed by principal components analysis and linear discriminant analysis, illustrates the implication of the imputations on feature selection and dimensionality reduction methods. The results suggest that the expansion of the feature space through the data integration, achieved by the exploitation of imputation schemes in general, aids the classification task, imparting stability as regards the derivation of putative classifiers. In particular, although the biased imputation methods increase significantly the predictive performance and the class discrimination of the datasets, they still contribute to the study of prominent features and their relations. The fusion of separate datasets, which provide a multimodal description of the same pathology, represents an innovative, promising avenue, enhancing robust composite biomarker derivation and promoting the interpretation of the biomedical problem studied. Konstantinos Moutselos, Ilias Maglogiannis, and Aristotelis Chatziioannou Copyright © 2014 Konstantinos Moutselos et al. All rights reserved. Network Analysis of Neurodegenerative Disease Highlights a Role of Toll-Like Receptor Signaling Thu, 16 Jan 2014 13:33:49 +0000 Despite significant advances in the study of the molecular mechanisms altered in the development and progression of neurodegenerative diseases (NDs), the etiology is still enigmatic and the distinctions between diseases are not always entirely clear. We present an efficient computational method based on protein-protein interaction network (PPI) to model the functional network of NDs. The aim of this work is fourfold: (i) reconstruction of a PPI network relating to the NDs, (ii) construction of an association network between diseases based on proximity in the disease PPI network, (iii) quantification of disease associations, and (iv) inference of potential molecular mechanism involved in the diseases. The functional links of diseases not only showed overlap with the traditional classification in clinical settings, but also offered new insight into connections between diseases with limited clinical overlap. To gain an expanded view of the molecular mechanisms involved in NDs, both direct and indirect connector proteins were investigated. The method uncovered molecular relationships that are in common apparently distinct diseases and provided important insight into the molecular networks implicated in disease pathogenesis. In particular, the current analysis highlighted the Toll-like receptor signaling pathway as a potential candidate pathway to be targeted by therapy in neurodegeneration. Thanh-Phuong Nguyen, Laura Caberlotto, Melissa J. Morine, and Corrado Priami Copyright © 2014 Thanh-Phuong Nguyen et al. All rights reserved. Computational Analysis of Transcriptional Circuitries in Human Embryonic Stem Cells Reveals Multiple and Independent Networks Thu, 09 Jan 2014 14:26:11 +0000 It has been known that three core transcription factors (TFs), NANOG, OCT4, and SOX2, collaborate to form a transcriptional circuitry to regulate pluripotency and self-renewal of human embryonic stem (ES) cells. Similarly, MYC also plays an important role in regulating pluripotency and self-renewal of human ES cells. However, the precise mechanism by which the transcriptional regulatory networks control the activity of ES cells remains unclear. In this study, we reanalyzed an extended core network, which includes the set of genes that are cobound by the three core TFs and additional TFs that also bind to these cobound genes. Our results show that beyond the core transcriptional network, additional transcriptional networks are potentially important in the regulation of the fate of human ES cells. Several gene families that encode TFs play a key role in the transcriptional circuitry of ES cells. We also demonstrate that MYC acts independently of the core module in the regulation of the fate of human ES cells, consistent with the established argument. We find that TP53 is a key connecting molecule between the core-centered and MYC-centered modules. This study provides additional insights into the underlying regulatory mechanisms involved in the fate determination of human ES cells. Xiaosheng Wang and Chittibabu Guda Copyright © 2014 Xiaosheng Wang and Chittibabu Guda. All rights reserved. De Novo Assembly and Characterization of Sophora japonica Transcriptome Using RNA-seq Thu, 02 Jan 2014 11:42:06 +0000 Sophora japonica Linn (Chinese Scholar Tree) is a shrub species belonging to the subfamily Faboideae of the pea family Fabaceae. In this study, RNA sequencing of S. japonica transcriptome was performed to produce large expression datasets for functional genomic analysis. Approximate 86.1 million high-quality clean reads were generated and assembled de novo into 143010 unique transcripts and 57614 unigenes. The average length of unigenes was 901 bps with an N50 of 545 bps. Four public databases, including the NCBI nonredundant protein (NR), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Cluster of Orthologous Groups (COG), were used to annotate unigenes through NCBI BLAST procedure. A total of 27541 of 57614 unigenes (47.8%) were annotated for gene descriptions, conserved protein domains, or gene ontology. Moreover, an interaction network of unigenes in S. japonica was predicted based on known protein-protein interactions of putative orthologs of well-studied plant genomes. The transcriptome data of S. japonica reported here represents first genome-scale investigation of gene expressions in Faboideae plants. We expect that our study will provide a useful resource for further studies on gene expression, genomics, functional genomics, and protein-protein interaction in S. japonica. Liucun Zhu, Ying Zhang, Wenna Guo, Xin-Jian Xu, and Qiang Wang Copyright © 2014 Liucun Zhu et al. All rights reserved. Application of Systems Biology and Bioinformatics Methods in Biochemistry and Biomedicine Tue, 31 Dec 2013 11:31:30 +0000 Yudong Cai, Tao Huang, Lei Chen, and Bin Niu Copyright © 2013 Yudong Cai et al. All rights reserved. HGF Accelerates Wound Healing by Promoting the Dedifferentiation of Epidermal Cells through -Integrin/ILK Pathway Mon, 30 Dec 2013 13:52:26 +0000 Skin wound healing is a critical and complex biological process after trauma. This process is activated by signaling pathways of both epithelial and nonepithelial cells, which release a myriad of different cytokines and growth factors. Hepatocyte growth factor (HGF) is a cytokine known to play multiple roles during the various stages of wound healing. This study evaluated the benefits of HGF on reepithelialization during wound healing and investigated its mechanisms of action. Gross and histological results showed that HGF significantly accelerated reepithelialization in diabetic (DB) rats. HGF increased the expressions of the cell adhesion molecules -integrin and the cytoskeleton remodeling protein integrin-linked kinase (ILK) in epidermal cells in vivo and in vitro. Silencing of ILK gene expression by RNA interference reduced expression of -integrin, ILK, and c-met in epidermal cells, concomitantly decreasing the proliferation and migration ability of epidermal cells. -Integrin can be an important maker of poorly differentiated epidermal cells. Therefore, these data demonstrate that epidermal cells become poorly differentiated state and regained some characteristics of epidermal stem cells under the role of HGF after wound. Taken together, the results provide evidence that HGF can accelerate reepithelialization in skin wound healing by dedifferentiation of epidermal cells in a manner related to the -integrin/ILK pathway. Jin-Feng Li, Hai-Feng Duan, Chu-Tse Wu, Da-Jin Zhang, Youping Deng, Hong-Lei Yin, Bing Han, Hui-Cui Gong, Hong-Wei Wang, and Yun-Liang Wang Copyright © 2013 Jin-Feng Li et al. All rights reserved. Prediction of Substrate-Enzyme-Product Interaction Based on Molecular Descriptors and Physicochemical Properties Sun, 22 Dec 2013 18:10:05 +0000 It is important to correctly and efficiently predict the interaction of substrate-enzyme and to predict their product in metabolic pathway. In this work, a novel approach was introduced to encode substrate/product and enzyme molecules with molecular descriptors and physicochemical properties, respectively. Based on this encoding method, KNN was adopted to build the substrate-enzyme-product interaction network. After selecting the optimal features that are able to represent the main factors of substrate-enzyme-product interaction in our prediction, totally 160 features out of 290 features were attained which can be clustered into ten categories: elemental analysis, geometry, chemistry, amino acid composition, predicted secondary structure, hydrophobicity, polarizability, solvent accessibility, normalized van der Waals volume, and polarity. As a result, our predicting model achieved an MCC of 0.423 and an overall prediction accuracy of 89.1% for 10-fold cross-validation test. Bing Niu, Guohua Huang, Linfeng Zheng, Xueyuan Wang, Fuxue Chen, Yuhui Zhang, and Tao Huang Copyright © 2013 Bing Niu et al. All rights reserved. Identification of Age-Related Macular Degeneration Related Genes by Applying Shortest Path Algorithm in Protein-Protein Interaction Network Wed, 18 Dec 2013 12:38:15 +0000 This study attempted to find novel age-related macular degeneration (AMD) related genes based on 36 known AMD genes. The well-known shortest path algorithm, Dijkstra’s algorithm, was applied to find the shortest path connecting each pair of known AMD related genes in protein-protein interaction (PPI) network. The genes occurring in any shortest path were considered as candidate AMD related genes. As a result, 125 novel AMD genes were predicted. The further analysis based on betweenness and permutation test indicates that there are 10 genes involved in the formation or development of AMD and may be the actual AMD related genes with high probability. We hope that this contribution would promote the study of age-related macular degeneration and discovery of novel effective treatments. Jian Zhang, Min Jiang, Fei Yuan, Kai-Yan Feng, Yu-Dong Cai, Xun Xu, and Lei Chen Copyright © 2013 Jian Zhang et al. All rights reserved. Biometrics and Biosecurity 2013 Tue, 10 Dec 2013 13:42:09 +0000 Tai-hoon Kim, Sabah Mohammed, and Wai-Chi Fang Copyright © 2013 Tai-hoon Kim et al. All rights reserved. iEzy-Drug: A Web Server for Identifying the Interaction between Enzymes and Drugs in Cellular Networking Tue, 26 Nov 2013 18:00:45 +0000 With the features of extremely high selectivity and efficiency in catalyzing almost all the chemical reactions in cells, enzymes play vitally important roles for the life of an organism and hence have become frequent targets for drug design. An essential step in developing drugs by targeting enzymes is to identify drug-enzyme interactions in cells. It is both time-consuming and costly to do this purely by means of experimental techniques alone. Although some computational methods were developed in this regard based on the knowledge of the three-dimensional structure of enzyme, unfortunately their usage is quite limited because three-dimensional structures of many enzymes are still unknown. Here, we reported a sequence-based predictor, called “iEzy-Drug,” in which each drug compound was formulated by a molecular fingerprint with 258 feature components, each enzyme by the Chou’s pseudo amino acid composition generated via incorporating sequential evolution information and physicochemical features derived from its sequence, and the prediction engine was operated by the fuzzy -nearest neighbor algorithm. The overall success rate achieved by iEzy-Drug via rigorous cross-validations was about 91%. Moreover, to maximize the convenience for the majority of experimental scientists, a user-friendly web server was established, by which users can easily obtain their desired results. Jian-Liang Min, Xuan Xiao, and Kuo-Chen Chou Copyright © 2013 Jian-Liang Min et al. All rights reserved. Multiple Biomarker Panels for Early Detection of Breast Cancer in Peripheral Blood Tue, 26 Nov 2013 14:26:09 +0000 Detecting breast cancer at early stages can be challenging. Traditional mammography and tissue microarray that have been studied for early breast cancer detection and prediction have many drawbacks. Therefore, there is a need for more reliable diagnostic tools for early detection of breast cancer due to a number of factors and challenges. In the paper, we presented a five-marker panel approach based on SVM for early detection of breast cancer in peripheral blood and show how to use SVM to model the classification and prediction problem of early detection of breast cancer in peripheral blood. We found that the five-marker panel can improve the prediction performance (area under curve) in the testing data set from 0.5826 to 0.7879. Further pathway analysis showed that the top four five-marker panels are associated with signaling, steroid hormones, metabolism, immune system, and hemostasis, which are consistent with previous findings. Our prediction model can serve as a general model for multibiomarker panel discovery in early detection of other cancers. Fan Zhang, Youping Deng, and Renee Drabier Copyright © 2013 Fan Zhang et al. All rights reserved. Gene Prioritization of Resistant Rice Gene against Xanthomas oryzae pv. oryzae by Using Text Mining Technologies Mon, 25 Nov 2013 16:01:48 +0000 To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization. Jingbo Xia, Xing Zhang, Daojun Yuan, Lingling Chen, Jonathan Webster, and Alex Chengyu Fang Copyright © 2013 Jingbo Xia et al. All rights reserved. QSBR Study of Bitter Taste of Peptides: Application of GA-PLS in Combination with MLR, SVM, and ANN Approaches Mon, 25 Nov 2013 08:38:58 +0000 Detailed information about the relationships between structures and properties/activities of peptides as drugs and nutrients is useful in the development of drugs and functional foods containing peptides as active compounds. The bitterness of the peptides is an undesirable property which should be reduced during drug/nutrient production, and quantitative structure bitter taste relationship (QSBR) studies can help researchers to design less bitter peptides with higher target efficiency. Calculated structural parameters were used to develop three different QSBR models (i.e., multiple linear regression, support vector machine, and artificial neural network) to predict the bitterness of 229 peptides (containing 2–12 amino acids, obtained from the literature). The developed models were validated using internal and external validation methods, and the prediction errors were checked using mean percentage deviation and absolute average error values. All developed models predicted the activities successfully (with prediction errors less than experimental error values), whereas the prediction errors for nonlinear methods were less than those for linear methods. The selected structural descriptors successfully differentiated between bitter and nonbitter peptides. Somaieh Soltani, Hossein Haghaei, Ali Shayanfar, Javad Vallipour, Karim Asadpour Zeynali, and Abolghasem Jouyban Copyright © 2013 Somaieh Soltani et al. All rights reserved. Expression Sensitivity Analysis of Human Disease Related Genes Sun, 24 Nov 2013 11:16:16 +0000 Background. Genome-wide association studies (GWAS) have shown its revolutionary power in seeking the influenced loci on complex diseases genetically. Thousands of replicated loci for common traits are helpful in diseases risk assessment. However it is still difficult to elucidate the variations in these loci that directly cause susceptibility to diseases by disrupting the expression or function of a protein currently. Results. We evaluate the expression features of disease related genes and find that different diseases related genes show different expression perturbation sensitivities in various conditions. It is worth noting that the expression of some robust disease-genes doesn’t show significant change in their corresponding diseases, these genes might be easily ignored in the expression profile analysis. Conclusion. Gene ontology enrichment analysis indicates that robust disease-genes execute essential function in comparison with sensitive disease-genes. The diseases associated with robust genes seem to be relatively lethal like cancer and aging. On the other hand, the diseases associated with sensitive genes are apparently nonlethal like psych and chemical dependency diseases. Liang-Xiao Ma, Ya-Jun Wang, Jing-Fang Wang, Xuan Li, and Pei Hao Copyright © 2013 Liang-Xiao Ma et al. All rights reserved. Translational Biomedical Informatics and Computational Systems Medicine Thu, 21 Nov 2013 14:39:08 +0000 Zhongming Zhao, Bairong Shen, Xinghua Lu, and Wanwipa Vongsangnak Copyright © 2013 Zhongming Zhao et al. All rights reserved. An Improved Biometrics-Based Remote User Authentication Scheme with User Anonymity Thu, 21 Nov 2013 13:09:31 +0000 The authors review the biometrics-based user authentication scheme proposed by An in 2012. The authors show that there exist loopholes in the scheme which are detrimental for its security. Therefore the authors propose an improved scheme eradicating the flaws of An’s scheme. Then a detailed security analysis of the proposed scheme is presented followed by its efficiency comparison. The proposed scheme not only withstands security problems found in An’s scheme but also provides some extra features with mere addition of only two hash operations. The proposed scheme allows user to freely change his password and also provides user anonymity with untraceability. Muhammad Khurram Khan and Saru Kumari Copyright © 2013 Muhammad Khurram Khan and Saru Kumari. All rights reserved. Prediction of Drugs Target Groups Based on ChEBI Ontology Wed, 20 Nov 2013 17:06:28 +0000 Most drugs have beneficial as well as adverse effects and exert their biological functions by adjusting and altering the functions of their target proteins. Thus, knowledge of drugs target proteins is essential for the improvement of therapeutic effects and mitigation of undesirable side effects. In the study, we proposed a novel prediction method based on drug/compound ontology information extracted from ChEBI to identify drugs target groups from which the kind of functions of a drug may be deduced. By collecting data in KEGG, a benchmark dataset consisting of 876 drugs, categorized into four target groups, was constructed. To evaluate the method more thoroughly, the benchmark dataset was divided into a training dataset and an independent test dataset. It is observed by jackknife test that the overall prediction accuracy on the training dataset was 83.12%, while it was 87.50% on the test dataset—the predictor exhibited an excellent generalization. The good performance of the method indicates that the ontology information of the drugs contains rich information about their target groups, and the study may become an inspiration to solve the problems of this sort and bridge the gap between ChEBI ontology and drugs target groups. Yu-Fei Gao, Lei Chen, Guo-Hua Huang, Tao Zhang, Kai-Yan Feng, Hai-Peng Li, and Yang Jiang Copyright © 2013 Yu-Fei Gao et al. All rights reserved. Identifying Breast Cancer Subtype Related miRNAs from Two Constructed miRNAs Interaction Networks in Silico Method Wed, 20 Nov 2013 08:32:57 +0000 Background. It has been known that microRNAs (miRNAs) regulate the expression of multiple proteins and therefore are likely to emerge as more effective targets of selective therapeutic modalities for breast cancer. Although recent lines of evidence have approved that miRNAs are associated with the most common molecular breast cancer subtypes, the studies to breast cancer subtypes have not been well characterized. Objectives. In this study, we propose a silico method to identify breast cancer subtype related miRNAs based on two constructed miRNAs interaction networks using miRNA-mRNA dual expression profiling data arising from the same samples. Methods. Firstly, we used a new mutual information estimation method to construct two miRNAs interaction networks based on miRNA-mRNA dual expression profiling data. Secondly, we compared and analyzed the topological properties of these two networks. Finally, miRNAs showing the outstanding topological properties in both of the two networks were identified. Results. Further functional analysis and literature evidence confirm that the identified potential breast cancer subtype related miRNAs are essential to unraveling their biological function. Conclusions. This study provides a new silico method to predict candidate miRNAs of breast cancer subtype from a system biology level and can help exploit for functional studies of important breast cancer subtype related miRNAs. Lin Hua, Lin Li, and Ping Zhou Copyright © 2013 Lin Hua et al. All rights reserved.